Skip to main content Skip to secondary navigation
Page Content

AI Has Sped Up Biological Discovery by Cracking the Mystery of Proteins

Scientists at Google’s DeepMind project have trained a computational system that can deduce how these life-giving molecules take shape by studying their DNA blueprints.

Image
Examples of protein structures

Emw

How proteins function depends on how they form into complex and convoluted three-dimensional structures. AlphaFold is able to predict these shapes with impressive speed and accuracy.

Proteins are the molecular machinery of life. DNA blueprints tell cells how to make a given protein by stringing amino acids together like a necklace of different-sized beads. But how these molecular machines function depends on how these necklaces form into complex and convoluted three-dimensional structures, a process known as protein folding. Experimental scientists have spent decades developing techniques to resolve 3-D structures but, so far, computational researchers have struggled to predict how a protein will fold just by studying the genetic sequence, which is all cells need to build them.

Now, AlphaFold, an AI algorithm from Google’s DeepMind project, has bested the world’s top computational labs in a biennial competition called the Critical Assessment of Structure Prediction, or CASP.  The scientific journal Nature called AlphaFold “a gargantuan leap in solving one of biology’s grandest challenges.” A blog post from DeepMind said it created the potential for “biologists to use computational structure prediction as a core tool in scientific research.”

Russ Altman, a professor of bioengineering and associate director of the Stanford Institute for Human-Centered Artificial Intelligence (HAI), reflected on what this means for AI and biomedicine.

Did DeepMind solve the mystery of protein folding?

To a mathematician or computer scientist, “solved” means “works every time, just turn the crank.” This is more like “works most of the time, turn the crank, then check to see if it looks usable.” That said, this is a great win for AI. There have been many skeptics who thought computational prediction with AI could never be this good and they were wrong. Nature quoted Mohammed AlQuraishi, a computational biologist at Columbia University who studied at Stanford, as calling this “a breakthrough of the first order.” I see this as a lot like what happened in 2012 when the ImageNet effort, led by HAI co-director Fei-Fei Li, dramatically improved the ability of algorithms to identify objects and animals, leading to what many called the deep learning revolution. Today, image recognition is used routinely, and the ImageNet methodology has been adapted to natural language processing and related fields. The same is possible in biology. I was talking with Alex Derry, a graduate student in my lab; he wants to study how DeepMind trained AlphaFold to gain insights into how to improve computational models for questions beyond protein folding.

How does this affect biomedical discovery?

DNA sequencing an organism is cheap and easy, and it gives us the amino acid sequence of proteins. But for years we’ve had to stop and wait for experimentalists to resolve the 3-D structures of those proteins in order to design drugs and vaccines, understand biological mechanisms and pathways, and study evolution. The ability to reliably predict protein structures could remove years to decades from the process and accelerate discovery and understanding. The experimental scientists who resolve protein structures will still have plenty to do, but they’ll focus on problems where the algorithms falter or where confirmation is critical before moving forward, such as identifying potentially dangerous mutations in existing proteins.

How did the competition work?

The true heroes are the experimental scientists who solved unknown protein structures but withheld their results from publication for a few months to give CASP a chance to put computational systems to a true test: Given a genetic sequence that codes for a protein, how accurately can your computational model predict its 3-D structure with no additional information? Over time, CASP has evolved two judging criteria — how close the predicted structure is to the experimental result (by far the most important criterion) and how much useful biological information it reveals. There is a strong correlation between the accuracy of a prediction and its utility, but there are instances where the most accurate overall prediction is less useful than another prediction that was less accurate over all. These instances usually occur when the prediction that was less accurate overall turned out to better at judging the structure of the “business end” of the protein, the specific part of the molecular machine that is critical to its function. Local accuracy will often be more important, so global measures of accuracy can be misleading and why experimental confirmation of structures will remain essential.

What next?

AlphaFold needs to be tested and validated by others to identify where the algorithm works well and where it needs improvement. When a big paper comes out, researchers in the field need to review their projects and decide which have become irrelevant and which more urgent. Students should be excited that we can now ask questions and do projects that were impossible previously. All of this depends on DeepMind making the algorithm available, which chief executive Demis Hassabis told Nature it plans to do.  This dovetails nicely with a project that HAI policy director Russell Wald is leading, to create a national cloud research infrastructure. AlphaFold would be a perfect algorithm to install on such infrastructure and make widely available for use by scientists from all disciplines.

Russ B. Altman is the Kenneth Fong Professor of Bioengineering, Genetics, Medicine, Biomedical Data Science and (by courtesy) Computer Science at Stanford University.

Stanford HAI's mission is to advance AI research, education, policy and practice to improve the human condition. Learn more

More News Topics

Related Content