This page serves as an archive for salient readings distributed to our group. New lab members are encouraged to explore papers here; they likely inform the field of enzyme engineering and protein design, but may extend to broader topics like best lab practices. They are organized only by the time point at which they were distributed to the group (most recent at the top). Happy reading!
2021
ML for Biologists. Looks like excellent and highly review. https://www.nature.com/articles/s41580-021-00407-0
ML and ProteinDesign. Very relevant! With forward design beyond singe point mutations. https://www.biorxiv.org/content/10.1101/2021.12.08.471728v1
More AlphaFold and RoseTTAFold. Thought everyone would be excited to see AlphaFold and RoseTTAFold selected as Science breakthroughs of the year! Great overview of significance and work below: https://www.science.org/content/article/breakthrough-2021
ML visualization tool. Looks interesting and potentially useful. https://twitter.com/machinelearnflx/status/1469947419441963013?s=21
AlphaFill Ligands. Looks super useful genome mining! https://www.biorxiv.org/content/10.1101/2021.11.26.470110v1
ML for protein sequence to function! https://www.pnas.org/content/118/48/e2104878118
Nature Article Collection. Computational chemistry and machine learning articles… lots of interesting work here! https://www.nature.com/collections/gcijejjahe/
Data Science Educational Tool. https://datascience-book.gitlab.io/book.html
Design of Functional Proteins. https://www.biorxiv.org/content/10.1101/2021.11.10.468128v1
Super charged AlphaFold. https://www.biorxiv.org/content/10.1101/2021.08.15.456425v2
Deep Learning Github Course. Looks like good course in deep learning with slides and recording of lectures! https://niessner.github.io/I2DL/
Protein fitness. While focused on disease seems like this could be a general predictor of sequence fitness (ie stability)…. Would be interesting to look at prediction against Bagel and other data sets! https://www.nature.com/articles/s41586-021-04043-8
Missing Loops Alphafold Rosetta combo. Neat combo for common problem!!! https://blog.matteoferla.com/2021/10/filling-missing-loops-by-cannibalising.html
Alphafold-Multimer. We need the A1000s. https://www.biorxiv.org/content/10.1101/2021.10.04.463034v1
High throughput structures. Important concepts on how to start laying various tools on top of AlphaFold to screen millions of structures! https://www.biorxiv.org/content/10.1101/2021.09.30.462231v1
ML for Enzyme Engineering. Super interesting and along lines sod what Simon and Peishan are working on. Let’s definitely try on bagel and maybe some things to learn/develop further on. https://www.nature.com/articles/s41467-021-25976-8. There is also a preliminary work published recently. https://www.biorxiv.org/content/10.1101/2021.09.01.458592v1.abstract
Rosetta Review Collection. For those of you just getting into Rosetta, this is probably a good place to start.
AlphaFold Review. A good early summary of what’s been evaluated in the last couple months with AlphaFold from the European contingent + Sergey. https://www.biorxiv.org/content/10.1101/2021.09.26.461876v1
Enzyme annotation. Nothing surprising but some good data and thoughtful ways to graphically convey info we often are interested in! https://journals.plos.org/ploscompbiol/article/figures?id=10.1371/journal.pcbi.1009446
Great review and perspective on High Throughput Docking. https://www.nature.com/articles/s41596-021-00597-z
AlphaFold and Ligands and Rosetta. So…. This looks REALLY useful! Both from ideas and actual use. https://colab.research.google.com/github/matteoferla/pyrosetta_help/blob/main/colabs-pyrosetta-migrate_ligands.ipynb
Protein Design Review. Great review from Possu and Sergey on molecular vs ML protein design approaches. https://www.sciencedirect.com/science/article/pii/S1367593121001125
Jupyter Notebooks Docking. Really interesting tool for docking! Would be great for people to try and give feedback. I’d look at as more a replacement to OpenEye/quick analysis than Rosetta (at this point). Definitely starting to get to point where 130B may need to be systematically updated in next couple years as frameworks like this roll out. https://chem-workflows.com/articles/2021/09/18/1-molecular-docking/
ML and Enzymes. https://arxiv.org/abs/2109.03900
Won’t stop can’t stop… more AF2. Discussion on predicting enzyme activity based on AF2 and CNN... https://medium.com/@christianclough/enzyme-structure-to-activity-modeling-27d8a81a3fbd
More AlphaFold Apps. New build on AlphaFold… tidal wave is building… https://www.biorxiv.org/content/10.1101/2021.09.07.459290v1
Rosetta Multistate Enzyme Design with Neural Networks. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0256691
One more Design with AlphaFold. Ok… last one (today… or at least for next 4 hrs…). Design with AlphaFold! Would be great to evaluate and understand if/where applicable. https://www.biorxiv.org/content/10.1101/2021.08.24.457549v1
AlphaFold 2 Blog and Reflection. Good summary of AlphaFold 2 from technical break down to implications to many fields. Definitely worth a good read… besides, it had been too long since my last AF2 email ;) https://www.blopig.com/blog/2021/07/alphafold-2-is-here-whats-behind-the-structure-prediction-miracle/
Geometric Deep Learning Resources. Great set of resources for learning geometric deep learning with a focus on proteins and bio from the world leaders in the space!!! All materials free and available below! https://geometricdeeplearning.com/lectures/geometricdeeplearning.com/lectures/
Great overview of AlphaFold and free tool. Sergey doing an awesome job making AF2 (at least a “light version” accessible to all! Looking forward to getting full version up and running here soon. https://www.youtube.com/watch?v=Rfw7thgGTwI https://docs.google.com/presentation/d/1mnffk23ev2QMDzGZ5w1skXEadTe54l8-Uei6ACce8eI/mobilepresent?slide=id.p
Alpha Fold in Foldit. Huge!!!! https://fold.it/portal/node/2011929
AlphaFold 2 Light. Sergei put together a Google colab of a “light” version of alpha fold 2. Super excited to get full RoseTTAFold and AF2 up and running internally and benchmarked on our enzyme set!!! These will likely fundamentally change the majority of workflows in our lab (eg every protein we order should be run through before ordering)… lots of integration into pipelines… etc. https://colab.research.google.com/drive/1qWO6ArwDMeba1Nl57kk_cQ8aorJ76N6x
More AlphaFold. Good mid level discussion (deep, but not down to the ultra technical details) of AF2: https://youtu.be/nGVFbPKrRWQ
Great summary of AlphaFold2 and resources. Must summary or those who’s head is spinning from all that’s happened in the past couple weeks. https://towardsdatascience.com/alphafold-based-databases-and-fully-fledged-easy-to-use-alphafold-interfaces-poised-to-baf865c6d75e
Deep Technical Walkthrough of AF2. A must read if you want an in depth understanding of AlphaFold: https://moalquraishi.wordpress.com/2021/07/25/the-alphafold2-method-paper-a-fount-of-good-ideas/
Full proteomes a from AlphaFold. https://alphafold.ebi.ac.uk/
Long but really well written summary of AlphaFold 2! Simons working hard on getting it up and running… in meantime it’s looking like RoseTTAFold has some very impressive results. https://www.blopig.com/blog/2021/07/alphafold-2-is-here-whats-behind-the-structure-prediction-miracle/
AutoDock Vina 1.2.0: New Docking Methods, Expanded Force Field, and Python Bindings | Journal of Chemical Information and Modeling. FYI, could be useful for rapid docking evaluations! https://autodock-vina.readthedocs.io/
Accurate prediction of protein structures and interactions using a three-track neural network. And the RosettaFold Paper also published https://science.sciencemag.org/content/early/2021/07/14/science.abj8754
Highly accurate protein structure prediction with AlphaFold. Paper: https://www.nature.com/articles/s41586-021-03819-2. GitHub: https://github.com/deepmind/alphafold
Alpha Fold Code. Deepmind released their code for AlphaFold. See https://github.com/deepmind/alphafold for repo and early version of paper: https://www.nature.com/articles/s41586-021-03819-2#Abs1. Might need to use the Docker script in repo and GPU to run it.
RoseTTAFold. Check out the new protein structure prediction method RoseTTAFold which is reported to have a better performance than trRosetta! Github repo. Paper.
Enzyme engineering review. Good primer, despite not much love for our group. Ha! https://www.nature.com/articles/s43586-021-00044-z
Development and Evaluation of GlycanDock: A Protein–Glycoligand Docking Refinement Algorithm in Rosetta
Very relevant work for many of us from a lab alum!!! Definitely worth reading and likely will be very useful. https://pubs.acs.org/doi/10.1021/acs.jpcb.1c00910 . https://pubs.acs.org/doi/abs/10.1021/acs.jpcb.1c00910
Rosetta catching up to AlphaFold. Looks like it will be a server in UW.... and you won't need 1000000 GPU's to use it :D https://www.biorxiv.org/content/10.1101/2021.06.14.448402v1
Great Matplot lib resource: Looks very useful for figure generation! https://github.com/matplotlib/cheatsheets/blob/master/cheatsheets.pdf
The imperative of physics-based modeling and inverse theory in computational science | Nature Computational Science. https://www.nature.com/articles/s43588-021-00040-z
The confidence gap predicts the gender pay gap among STEM graduates. https://www.pnas.org/content/117/48/30303
Machine Learning in a Molecular Modeling Course for Chemistry, Biochemistry, and Biophysics Students. https://meridian.allenpress.com/the-biophysicist/article/1/2/11/442424/Machine-Learning-in-a-Molecular-Modeling-Course
Deep Learning: CS 182 Spring 2021 - YouTube. https://www.youtube.com/playlist?list=PL_iWQOsE6TfVmKkQHucjPAoRtIJYt8a5A
Predict b-factor/seq conserved sites. Has several features that might be useful for designs. And if you don't have Geneious, this will give you sequence conservation of AA positions. https://predictprotein.org/
Important Rosetta Ligand Update. Force Field Optimization Guided by Small Molecule Crystal Lattice Data Enables Consistent Sub-Angstrom Protein–Ligand Docking. https://pubs.acs.org/doi/abs/10.1021/acs.jctc.0c01184
Making Scientific Figures with Illustrator - YouTube. Looks like a great set of videos for making professional quality figures!. https://www.youtube.com/playlist?list=PLRCLlYmhDNMyTLhQCtlPryR_6odf6tBBF
Software behind swiss-model FINALLY was released!!! AlphaFold definitely is shaking things up. ProMod3—A versatile homology modelling toolbox. https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008667
RetroBioCat as a computer-aided synthesis planning tool for biocatalytic reactions and cascades | Nature Catalysis. Looks like a really interesting and useful tool! Many companies have been working on this and this is the first tool I’ve seen like this that’s open for the academic groups to use. https://www.nature.com/articles/s41929-020-00556-z
Another useful ML resource! Again, goal is not to become a deep expert here but knowledgeable enough to utilize in order to compliment our deep expertise in structure and enzyme function. http://themlbook.com/wiki/. http://themlbook.com/wiki/doku.php
Good thread on getting started with ML. https://twitter.com/prasoonpratham/status/1344912051798376448?s=21
2020
Synthetic biology 2020–2030: six commercially-available products that are changing our world | Nature Communications. Good read! https://www.nature.com/articles/s41467-020-20122-2
A worthwhile read. https://moalquraishi.wordpress.com/2020/12/08/alphafold2-casp14-it-feels-like-ones-child-has-left-home/
A nice single site for accessing their suite of tools! https://loschmidt.chemi.muni.cz/portal/
Fantastic blog post on AlphaFold Results. https://www.blopig.com/blog/2020/12/casp14-what-google-deepminds-alphafold-2-really-achieved-and-what-it-means-for-protein-folding-biology-and-bioinformatics/
Another great paper from Jiri! Fantastic list of enzyme design tools. https://www.preprints.org/manuscript/202012.0089/v1
And here is a set of videos that you might be interested in or want to share with others!
- While this is DeepMind's PR video, a fun watch and really nice graph at ~7:30
- Deeper dive into AlphaFold's tech. This one you might want to skip to ~45min, but the middle has a good description of AlphaFold 1; the first part is basic biochem 101. Last part is 1st discussion of potential AlphaFold 2.0
- And this one is shorter but good as well!
- A quick video on the basics of DeepMind's AlphaFold 2 breakthrough. Here are the timestamps: 0:00 - What happened? 1:03 - How big is this accomplishment? 4:39 - Proteins and amino acids 5:17 - Protein folding 8:26 - How AlphaFold 1 works 9:45 - How AlphaFold 2 works 12:09 - Why is this breakthrough important? 13:19 - Long-term future impact
Looks like a super interesting and useful tool, nice find Peishan! Coupled with AI deep mind protein folding today could lead to a new workflow for the lab. https://f1000research.com/articles/9-213
Interesting Twitter discussion on what software people use for making figures. https://twitter.com/kiara_bellido/status/1330938234583519233?s=12
Looks like a really exciting resource of curated stability data (with lots of bagel data in there!!!) FireProtDB: database of manually curated protein stability data. Nucleic Acids Research, gkaa981, https://doi.org/10.1093/nar/gkaa981
Great thread with lots of tools of potential use/interest... especially with iTOL no longer being free. https://twitter.com/stevenjrobbins/status/1316541833799843840?s=12
GitHub - samsinai/FLEXS: Fitness landscape exploration sandbox for biological sequence design. https://github.com/samsinai/FLEXS
Ten simple rules to colorize biological data visualization. PLOS Computational Biology. https://doi.org/10.1371/journal.pcbi.1008259
Looks like potentially great source for learning machine learning. https://github.com/rasbt/python-machine-learning-book-2nd-edition
A really well written and thought provoking paper. Strongly suggested reading even if just for the intro and good historical perspective. The NK Landscape as a Versatile Benchmark for Machine Learning Driven Protein Engineering. https://doi.org/10.1101/2020.09.30.319780
Illustration: get your research the attention it deserves. Three scientific artists explain how to create impact with attractive visuals. Nature. Andy Tay. September 24 2020. https://www.nature.com/articles/d41586-020-02660-3
Understanding the origins of loss of protein function by analyzing the effects of thousands of variants on activity and abundance. Matteo Cagiada, Kristoffer E. Johansson, Audronė Valančiūtė, Sofie V. Nielsen, Rasmus Hartmann-Petersen, Jun J. Yang, Douglas M. Fowler, Amelie Stein, Kresten Lindorff-Larse. bioRxiv 2020.09.28.317040; doi: https://doi.org/10.1101/2020.09.28.317040
Definitely one to read and likely implement. Pavlovicz, R. E., Park, H., & DiMaio, F. (2020). Efficient consideration of coordinated water molecules improves computational protein-protein and protein-ligand docking discrimination. PLOS Computational Biology, 16(9), e1008103. https://doi.org/10.1371/journal.pcbi.1008103
Broom, A., Rakotoharisoa, R.V., Thompson, M.C. et al. Ensemble-based enzyme design can recapitulate the effects of laboratory directed evolution in silico. Nat Commun 11, 4808 (2020). https://doi.org/10.1038/s41467-020-18619-x
A MUST READ. Heckmann, C., & Paradisi, F. Looking back: A short history of the discovery of enzymes and how they became powerful chemical tools. ChemCatChem. https://doi.org/10.1002/cctc.202001107
Interesting discussion on picking colors. Likely holds for protein images as well and most figures. A great general resource, particularly the end where it discuss to be inspired by color combos from nature. https://blog.datawrapper.de/beautifulcolors/
Protein graph analysis. Looks like a really interesting package that might be accessible an easy to use for obtaining interesting metrics on proteins! https://www.biorxiv.org/content/10.1101/2020.07.15.204701v1
It's a good story that may cheer you up, especially when your experiments fail. They created a new tool to Edit Mitochondrial DNA, however the serendipity started from failure: Most deaminases target single strands of DNA, or RNA, which is naturally single-stranded. This deaminase was odd – it didn’t appear to work on either. For months, de Moraes unsuccessfully tested the protein. Then one night, alone in the lab, he decided to try it out on something he didn’t expect to work: double-stranded DNA. https://www.hhmi.org/news/how-to-precisely-edit-mitochondrial-dna
Russ, W. P., Figliuzzi, M., Stocker, C., Barrat-Charlaix, P., Socolich, M., Kast, P., ... & Ranganathan, R. (2020). Evolution-based design of chorismate mutase enzymes. bioRxiv. https://doi.org/10.1101/2020.04.01.020487
Empirical analysis tells Reviewer 2: “Go F’ Yourself”. ARS Technica. John Timmer 6/27/2020. https://arstechnica.com/science/2020/06/empirical-analysis-tells-reviewer-2-go-f-yourself/
St. John, P.C., Guan, Y., Kim, Y. et al. Prediction of organic homolytic bond dissociation enthalpies at near chemical accuracy with sub-second computational cost. Nat Commun 11, 2328 (2020). https://doi.org/10.1038/s41467-020-16201-z
Dunham, A., & Beltrao, P. (2020). Exploring amino acid functions in a deep mutational landscape. BioRxiv. https://doi.org/10.1101/2020.05.26.116756
Paar, M., Schrabmair, W., Mairold, M., Oettl, K., & Reibnegger, G. (2019). Global Regression Using the Explicit Solution of Michaelis‐Menten Kinetics Employing Lambert's W Function: High Robustness of Parameter Estimates. ChemistrySelect, 4(6), 1903-1908. https://doi.org/10.1002/slct.201803610
Hon, J., Borko, S., Stourac, J., Prokop, Z., Zendulka, J., Bednar, D., ... & Damborsky, J. (2020). EnzymeMiner: automated mining of soluble enzymes with diverse structures, catalytic properties and stabilities. Nucleic Acids Research. https://loschmidt.chemi.muni.cz/peg/wp-content/uploads/2020/05/gkaa372.pdf
Perkel, J. M. (2020). The software that powers scientific illustration. Nature. https://www.nature.com/articles/d41586-020-01404-7
Machine learning for protein engineering (and small molecules). Jennifer Listgarten. EECS & Center for computational biology. Berkeley Artificial Intelligence Lab. https://youtu.be/_4OMm_3ZoXw
A protein solubility prediction tool. https://loschmidt.chemi.muni.cz/soluprot/?page=about
Koehler Leman J, Weitzner BD, Renfrew PD, Lewis SM, Moretti R, Watkins AM, et al. (2020) Better together: Elements of successful scientific software development in a distributed collaborative community. PLoS Comput Biol 16(5): e1007507. https://doi.org/10.1371/journal.pcbi.1007507
Taujale, R., Venkat, A., Huang, L. C., Zhou, Z., Yeung, W., Rasheed, K. M., ... & Kannan, N. (2020). Deep evolutionary analysis reveals the design principles of fold A glycosyltransferases. Elife, 9, e54532. DOI: 10.7554/eLife.54532
Callaway, E. (2020). The race for coronavirus vaccines: a graphical guide. Nature, 580(7805), 576. https://www.nature.com/articles/d41586-020-01221-y?utm_source=twitter&utm_medium=social&utm_content=organic&utm_campaign=NGMT_USG_JC01_GL_Nature
Computational protein design using geometric deep learning. Michael Bronstein. https://www.youtube.com/watch?v=pDp-uxR4JDI
Le, K.; Adolf-Bryfogle, J.; Klima, J.; Lyskov, S.; Labonte, J.; Bertolani, S.; Roy Burman, S.; Leaver-Fay, A.; Weitzner, B.; Maguire, J.; Rangan, R.; Adrianowycz, M.; Alford, R.; Adal, A.; Nance, M.; Das, R.; Dunbrack, R.; Schief, W.; Kuhlman, B.; Siegel, J.; Gray, J. PyRosetta Jupyter Notebooks Teach Biomolecular Structure Prediction and Design. Preprints 2020, 2020020097 (doi: 10.20944/preprints202002.0097.v1)
Lin, L., Kightlinger, W., Prabhu, S. K., Hockenberry, A. J., Li, C., Wang, L. X., ... & Mrksich, M. (2020). Sequential Glycosylation of Proteins with Substrate-Specific N-Glycosyltransferases. ACS Central Science, 6(2), 144-154. https://doi.org/10.1021/acscentsci.9b00021
Macromolecular modeling and design in Rosetta: new methods and frameworks" https://www.preprints.org/manuscript/201904.0263/v3
"Protein Sequence Design with a learned Potential" https://www.biorxiv.org/content/10.1101/2020.01.06.895466v1.full.pdf+html
Virtanen, P., Gommers, R., Oliphant, T.E. et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat Methods 17, 261–272 (2020). https://doi.org/10.1038/s41592-019-0686-2
High throughput quantum chemistry for molecular design. https://www.chemalive.com/construqt/
Torrisi, M., Pollastri, G., & Le, Q. (2020). Deep learning methods in protein structure prediction. Computational and Structural Biotechnology Journal. https://doi.org/10.1016/j.csbj.2019.12.011
AlQuraishi, M. (2020). A watershed moment for protein structure prediction. https://www.nature.com/articles/d41586-019-03951-0
Cunningham, J.M., Koytiger, G., Sorger, P.K. et al. Biophysical prediction of protein–peptide interactions and signaling networks using machine learning. Nat Methods 17, 175–183 (2020). https://doi.org/10.1038/s41592-019-0687-1