January 27th 2023
Scientists have produced a system that produces protein from starch using AI.
Scientists have created an AI system, called ProGen, that generates artificial enzymes from scratch. In laboratory tests, some of these enzymes worked as well as those found in nature, even when their artificially generated amino acid sequences diverged significantly from any known natural protein.
The experiment demonstrates that natural language processing, although it was developed to read and write language text, can learn at least some of the underlying principles of biology. Salesforce Research developed the AI program, called ProGen, which uses next-token prediction to assemble amino acid sequences into artificial proteins.
Scientists said the new technology could become more powerful than directed evolution, the Nobel-prize-winning protein design technology, and it will energize the 50-year-old field of protein engineering by speeding the development of new proteins that can be used for almost anything from therapeutics to degrading plastic.
“The artificial designs perform much better than designs that were inspired by the evolutionary process,” said James Fraser, Ph.D., professor of bioengineering and therapeutic sciences at the UCSF School of Pharmacy, and an author of the work, which was published on Jan. 26, in Nature Biotechnology. A previous version of the paper has been available on the preprint server BiorXiv since July 2021, where it garnered several dozen citations before being published in a peer-reviewed journal.
ProGen works in a similar way to AIs that can generate text. ProGen learned how to generate new proteins by learning the grammar of how amino acids combine to form 280 million existing proteins. Instead of the researchers choosing a topic for the AI to write about, they could specify a group of similar proteins for it to focus on. In this case, they chose a group of proteins with antimicrobial activity.
The researchers programmed checks into the AI’s process so it wouldn’t produce the amino acids, but they also tested a sample of the AI-proposed molecules in real cells. Of the 100 molecules they physically created, 66 participated in chemical reactions similar to those of natural proteins that destroy bacteria in egg whites and saliva. This suggested that these new proteins could also kill bacteria.
Scientists said the new technology could become more powerful than directed evolution, a Nobel-prize-winning protein design technology, and will energize the 50-year-old field of protein engineering by speeding the development of new proteins that can be used for almost anything from therapeutics to degrading plastic.
“The language model is learning aspects of evolution, but it’s different than the normal evolutionary process,” Fraser said. “We now have the ability to tune the generation of these properties for specific effects. For example, an enzyme that’s incredibly thermostable or likes acidic environments or won’t interact with other proteins.”
To create the model, scientists simply fed the amino acid sequences of 280 million different proteins of all kinds into the machine learning model and let it digest the information for a couple of weeks. Then, they fine-tuned the model by priming it with 56,000 sequences from five lysozyme families, along with some contextual information about these proteins.
“It was sort of an ‘it looks like a duck, it quacks like a duck’ situation and X-rays confirmed it also walked like a duck,” says Fraser. He was surprised to have found a well-functioning protein in the first relatively small fraction of all the ProGen-generated proteins that they tested.
A similar process could be used to create new test molecules for drug development, though they will still have to be tested in labs, which is time-consuming, says Madani.
“The capability to generate functional proteins from scratch out-of-the-box demonstrates we are entering into a new era of protein design,” said Ali Madani, Ph.D., founder of Profluent Bio, a former research scientist at Salesforce Research, and the paper’s first author. “This is a versatile new tool available to protein engineers, and we’re looking forward to seeing the therapeutic applications.”