
In this study, we develop the first, to our knowledge, computational approach capable of predicting protein structures to near experimental accuracy in a majority of cases. Despite these advances, contemporary physical and evolutionary-history-based approaches produce predictions that are far short of experimental accuracy in the majority of cases in which a close homologue has not been solved experimentally and this has limited their utility for many biological applications. This bioinformatics approach has benefited greatly from the steady growth of experimental protein structures deposited in the Protein Data Bank (PDB) 5, the explosion of genomic sequencing and the rapid development of deep learning techniques to interpret these correlations. The evolutionary programme has provided an alternative in recent years, in which the constraints on protein structure are derived from bioinformatics analysis of the evolutionary history of proteins, homology to solved structures 18, 19 and pairwise evolutionary correlations 20, 21, 22, 23, 24. Although theoretically very appealing, this approach has proved highly challenging for even moderate-sized proteins due to the computational intractability of molecular simulation, the context dependence of protein stability and the difficulty of producing sufficiently accurate models of protein physics. The physical interaction programme heavily integrates our understanding of molecular driving forces into either thermodynamic or kinetic simulation of protein physics 16 or statistical approximations thereof 17. The development of computational methods to predict three-dimensional (3D) protein structures from the protein sequence has proceeded along two complementary paths that focus on either the physical interactions or the evolutionary history. Underpinning the latest version of AlphaFold is a novel machine learning approach that incorporates physical and biological knowledge about protein structure, leveraging multi-sequence alignments, into the design of the deep learning algorithm. We validated an entirely redesigned version of our neural network-based model, AlphaFold, in the challenging 14th Critical Assessment of protein Structure Prediction (CASP14) 15, demonstrating accuracy competitive with experimental structures in a majority of cases and greatly outperforming other methods. Here we provide the first computational method that can regularly predict protein structures with atomic accuracy even in cases in which no similar structure is known. Despite recent progress 10, 11, 12, 13, 14, existing methods fall far short of atomic accuracy, especially when no homologous structure is available. Predicting the three-dimensional structure that a protein will adopt based solely on its amino acid sequence-the structure prediction component of the ‘protein folding problem’ 8-has been an important open research problem for more than 50 years 9.

Accurate computational approaches are needed to address this gap and to enable large-scale structural bioinformatics.

Structural coverage is bottlenecked by the months to years of painstaking effort required to determine a single protein structure. Through an enormous experimental effort 1, 2, 3, 4, the structures of around 100,000 unique proteins have been determined 5, but this represents a small fraction of the billions of known protein sequences 6, 7. Proteins are essential to life, and understanding their structure can facilitate a mechanistic understanding of their function. Nature volume 596, pages 583–589 ( 2021) Cite this article Highly accurate protein structure prediction with AlphaFold
