Artificial intelligence firm Google DeepMind has adapted its AlphaFold system for predicting protein structure to assess whether a huge number of simple mutations are harmful.
The adapted system, called AlphaMissense, has done this for 71 million possible mutations of a kind called missense mutations in the 20,000 human proteins, and the results made freely available.
“We think this is very helpful for clinicians and human geneticists,” says Jun Cheng at Google DeepMind. “Hopefully, this can help them to pinpoint the cause of genetic disease.”
Almost everyone is born with between about 50 and 100 mutations not found in their parents, resulting in a huge amount of genetic variation between individuals. For doctors sequencing a person’s genome in an attempt to find the cause of a disease, this poses an enormous challenge, because there may be thousands of mutations that could be linked to that condition.
AlphaMissense has been developed to try to predict whether these genetic variants are harmless or might produce a protein linked to a disease.
A protein-coding gene tells a cell which amino acids need to be strung together to make a protein, with each set of three DNA letters coding for an amino acid. The AI focuses on missense mutations, which is when one of the DNA letters in a triplet becomes changed to another letter and can result in the wrong amino acid being added to a protein. Depending on where in the protein this happens, it can result in anything from no effect to a crucial protein no longer working at all.
People tend to have about 9000 missense mutations each. But the effects of only 0.1 per cent of the 71 million possible missense mutations we could get have been identified so far.
AlphaMissense doesn’t attempt to work out how a missense mutation alters the structure or stability of a protein, and what effect this has on its interactions with other proteins, although understanding this could help find treatments. Instead, it compares the sequence of each possible mutated protein to those of all the proteins that AlphaFold was trained on to see if it looks “natural”, says Žiga Avsec at Google DeepMind. Proteins that look “unnatural” are rated as potentially harmful on a scale from 0 to 1.
Pushmeet Kohli at Google DeepMind uses the term “intuition” to describe how it works. “In some sense, this model is leveraging the intuition that it had gained while solving the task of structure prediction,” he says.
“It’s like if we substitute a word from an English sentence, a person familiar with English can immediately see whether this word substitution will change the meaning of the sentence,” says Avsec.
The team says AlphaMissense outperformed other computational methods when tested on known variants.
In an article commenting on the research, Joseph Marsh at the University of Edinburgh, UK, and Sarah Teichmann at the University of Cambridge write that AlphaMissense produced “remarkable results” in several different tests of its performance and it will be helpful for prioritising which possible disease-causing mutations should be investigated further.
However, such systems can still only aid in the diagnosis process, they write.
Missense mutations are just one of many different kinds of mutations. Bits of DNA can also be added, deleted, duplicated, flipped around and so on. And many disease-causing mutations don’t alter proteins, but instead occur in nearby sequences involved in regulating the activity of genes.