Readings Archive

This page serves as an archive for salient readings distributed to our group. New lab members are encouraged to explore papers here; they likely inform the field of enzyme engineering and protein design, but may extend to broader topics like best lab practices. They are organized only by the time point at which they were distributed to the group (most recent at the top). Happy reading! 


ML for Biologists. Looks like excellent and highly review.

ML and ProteinDesign. Very relevant!   With forward design beyond singe point mutations.

More AlphaFold and RoseTTAFold. Thought everyone would be excited to see AlphaFold and RoseTTAFold selected as Science breakthroughs of the year! Great overview of significance and work below:

ML visualization tool.  Looks interesting and potentially useful.

AlphaFill Ligands.  Looks super useful genome mining!

ML for protein sequence to function!

Nature Article Collection.  Computational chemistry and machine learning articles… lots of interesting work here!

Data Science Educational Tool.

Design of Functional Proteins.

Super charged AlphaFold. 

Deep Learning Github Course.  Looks like good course in deep learning with slides and recording of lectures!

Protein fitness. While focused on disease seems like this could be a general predictor of sequence fitness (ie stability)…. Would be interesting to look at prediction against Bagel and other data sets!

Missing Loops Alphafold Rosetta combo. Neat combo for common problem!!!

Alphafold-Multimer. We need the A1000s.

High throughput structures. Important concepts on how to start laying various tools on top of AlphaFold to screen millions of structures!

ML for Enzyme Engineering. Super interesting and along lines sod what Simon and Peishan are working on.   Let’s definitely try on bagel and maybe some things to learn/develop further on. There is also a preliminary work published recently.

Rosetta Review Collection. For those of you just getting into Rosetta, this is probably a good place to start.

AlphaFold Review. A good early summary of what’s been evaluated in the last couple months with AlphaFold from the European contingent + Sergey.

Enzyme annotation. Nothing surprising but some good data and thoughtful ways to graphically convey info we often are interested in!

Great review and perspective on High Throughput Docking.

AlphaFold and Ligands and Rosetta. So….  This looks REALLY useful!  Both from ideas and actual use.

Protein Design Review. Great review from Possu and Sergey on molecular vs ML protein design approaches.

Jupyter Notebooks Docking. Really interesting tool for docking!   Would be great for people to try and give feedback.   I’d look at as more a replacement to OpenEye/quick analysis than Rosetta (at this point).   Definitely starting to get to point where 130B may need to be systematically updated in next couple years as frameworks like this roll out.

ML and Enzymes.

Won’t stop can’t stop… more AF2. Discussion on predicting enzyme activity based on AF2 and CNN...

More AlphaFold Apps. New build on AlphaFold… tidal wave is building…

Rosetta Multistate Enzyme Design with Neural Networks.

One more Design with AlphaFold. Ok… last one (today… or at least for next 4 hrs…).  Design with AlphaFold!   Would be great to evaluate and understand if/where applicable.

AlphaFold 2 Blog and Reflection. Good summary of AlphaFold 2 from technical break down to implications to many fields.  Definitely worth a good read… besides, it had been too long since my last AF2 email ;)

Geometric Deep Learning Resources. Great set of resources for learning geometric deep learning with a focus on proteins and bio from the world leaders in the space!!!  All materials free and available below!

Great overview of AlphaFold and free tool. Sergey doing an awesome job making AF2 (at least a “light version” accessible to all!   Looking forward to getting full version up and running here soon.

Alpha Fold in Foldit. Huge!!!!

AlphaFold 2 Light. Sergei put together a Google colab of a “light” version of alpha fold 2. Super excited to get full RoseTTAFold and AF2 up and running internally and benchmarked on our enzyme set!!!   These will likely fundamentally change the majority of workflows in our lab (eg every protein we order should be run through before ordering)… lots of integration into pipelines… etc.

More AlphaFold. Good mid level discussion (deep, but not down to the ultra technical details) of AF2:

Great summary of AlphaFold2 and resources. Must summary or those who’s head is spinning from all that’s happened in the past couple weeks.

Deep Technical Walkthrough of AF2. A must read if you want an in depth understanding of AlphaFold:

Full proteomes a from AlphaFold.

Long but really well written summary of AlphaFold 2! Simons working hard on getting it up and running… in meantime it’s looking like RoseTTAFold has some very impressive results.

AutoDock Vina 1.2.0: New Docking Methods, Expanded Force Field, and Python Bindings | Journal of Chemical Information and Modeling. FYI, could be useful for rapid docking evaluations!

Accurate prediction of protein structures and interactions using a three-track neural network. And the RosettaFold Paper also published

Highly accurate protein structure prediction with AlphaFold. Paper: GitHub:

Alpha Fold Code. Deepmind released their code for AlphaFold. See for repo and early version of paper: Might need to use the Docker script in repo and GPU to run it.

RoseTTAFold. Check out the new protein structure prediction method RoseTTAFold which is reported to have a  better performance than trRosetta!  Github repoPaper.

Enzyme engineering review. Good primer, despite not much love for our group.  Ha!

Development and Evaluation of GlycanDock: A Protein–Glycoligand Docking Refinement Algorithm in Rosetta

Very relevant work for many of us from a lab alum!!!   Definitely worth reading and likely will be very useful. .

Rosetta catching up to AlphaFold. Looks like it will be a server in UW.... and you won't need 1000000 GPU's to use it :D

Great Matplot lib resource: Looks very useful for figure generation!

The imperative of physics-based modeling and inverse theory in computational science | Nature Computational Science.

The confidence gap predicts the gender pay gap among STEM graduates.

Machine Learning in a Molecular Modeling Course for Chemistry, Biochemistry, and Biophysics Students.

Deep Learning: CS 182 Spring 2021 - YouTube.

Predict b-factor/seq conserved sites. Has several features that might be useful for designs. And if you don't have Geneious, this will give you sequence conservation of AA positions.

Important Rosetta Ligand Update. Force Field Optimization Guided by Small Molecule Crystal Lattice Data Enables Consistent Sub-Angstrom Protein–Ligand Docking.

Making Scientific Figures with Illustrator - YouTube. Looks like a great set of videos for making professional quality figures!.

Software behind swiss-model FINALLY was released!!!  AlphaFold definitely is shaking things up. ProMod3—A versatile homology modelling toolbox.

RetroBioCat as a computer-aided synthesis planning tool for biocatalytic reactions and cascades | Nature Catalysis. Looks like a really interesting and useful tool!   Many companies have been working on this and this is the first tool I’ve seen like this that’s open for the academic groups to use.

Another useful ML resource!   Again, goal is not to become a deep expert here but knowledgeable enough to utilize in order to compliment our deep expertise in structure and enzyme function.

Good thread on getting started with ML.



Synthetic biology 2020–2030: six commercially-available products that are changing our world | Nature Communications. Good read!

A worthwhile read.

A nice single site for accessing their suite of tools!

Fantastic blog post on AlphaFold Results.

Another great paper from Jiri!   Fantastic list of enzyme design tools.

This week the big development in protein structure prediction is really exciting! Here are a couple of articles and resources that summarize AlphaFold's leap forward:

And here is a set of videos that you might be interested in or want to share with others! 

  • While this is DeepMind's PR video, a fun watch and really nice graph at ~7:30
  • Deeper dive into AlphaFold's tech. This one you might want to skip to ~45min, but the middle has a good description of AlphaFold 1; the first part is basic biochem 101.  Last part is 1st discussion of potential AlphaFold 2.0 
  • And this one is shorter but good as well! 
  • A quick video on the basics of DeepMind's AlphaFold 2 breakthrough. Here are the timestamps: 0:00 - What happened? 1:03 - How big is this accomplishment? 4:39 - Proteins and amino acids 5:17 - Protein folding 8:26 - How AlphaFold 1 works 9:45 - How AlphaFold 2 works 12:09 - Why is this breakthrough important? 13:19 - Long-term future impact


Looks like a super interesting and useful tool, nice find Peishan!  Coupled with AI deep mind protein folding today could lead to a new workflow for the lab.

Interesting Twitter discussion on what software people use for making figures.

Looks like a really exciting resource of curated stability data (with lots of bagel data in there!!!) FireProtDB: database of manually curated protein stability data. Nucleic Acids Research, gkaa981,

Great thread with lots of tools of potential use/interest... especially with iTOL no longer being free.

GitHub - samsinai/FLEXS: Fitness landscape exploration sandbox for biological sequence design.

Ten simple rules to colorize biological data visualization. PLOS Computational Biology.

Looks like potentially great source for learning machine learning.

A really well written and thought provoking paper.  Strongly suggested reading even if just for the intro and good historical perspective. The NK Landscape as a Versatile Benchmark for Machine Learning Driven Protein Engineering.

Illustration: get your research the attention it deserves. Three scientific artists explain how to create impact with attractive visuals. Nature. Andy Tay. September 24 2020.

Understanding the origins of loss of protein function by analyzing the effects of thousands of variants on activity and abundance. Matteo Cagiada, Kristoffer E. Johansson, Audronė Valančiūtė, Sofie V. Nielsen, Rasmus Hartmann-Petersen, Jun J. Yang, Douglas M. Fowler, Amelie Stein, Kresten Lindorff-Larse. bioRxiv 2020.09.28.317040; doi:

Definitely one to read and likely implement. Pavlovicz, R. E., Park, H., & DiMaio, F. (2020). Efficient consideration of coordinated water molecules improves computational protein-protein and protein-ligand docking discrimination. PLOS Computational Biology16(9), e1008103.

Broom, A., Rakotoharisoa, R.V., Thompson, M.C. et al. Ensemble-based enzyme design can recapitulate the effects of laboratory directed evolution in silico. Nat Commun 11, 4808 (2020).

A MUST READ. Heckmann, C., & Paradisi, F. Looking back: A short history of the discovery of enzymes and how they became powerful chemical tools. ChemCatChem

Interesting discussion on picking colors.  Likely holds for protein images as well and most figures.   A great general resource, particularly the end where it discuss to be inspired by color combos from nature.

Protein graph analysis. Looks like a really interesting package that might be accessible an easy to use for obtaining interesting metrics on proteins!

It's a good story that may cheer you up, especially when your experiments fail. They created a new tool to Edit Mitochondrial DNA, however the serendipity started from failure: Most deaminases target single strands of DNA, or RNA, which is naturally single-stranded. This deaminase was odd – it didn’t appear to work on either. For months, de Moraes unsuccessfully tested the proteinThen one night, alone in the lab, he decided to try it out on something he didn’t expect to work: double-stranded DNA.

Russ, W. P., Figliuzzi, M., Stocker, C., Barrat-Charlaix, P., Socolich, M., Kast, P., ... & Ranganathan, R. (2020). Evolution-based design of chorismate mutase enzymes. bioRxiv

Empirical analysis tells Reviewer 2: “Go F’ Yourself”. ARS Technica. John Timmer 6/27/2020.

St. John, P.C., Guan, Y., Kim, Y. et al. Prediction of organic homolytic bond dissociation enthalpies at near chemical accuracy with sub-second computational cost. Nat Commun 11, 2328 (2020).

Dunham, A., & Beltrao, P. (2020). Exploring amino acid functions in a deep mutational landscape. BioRxiv

Paar, M., Schrabmair, W., Mairold, M., Oettl, K., & Reibnegger, G. (2019). Global Regression Using the Explicit Solution of Michaelis‐Menten Kinetics Employing Lambert's W Function: High Robustness of Parameter Estimates. ChemistrySelect4(6), 1903-1908.

Hon, J., Borko, S., Stourac, J., Prokop, Z., Zendulka, J., Bednar, D., ... & Damborsky, J. (2020). EnzymeMiner: automated mining of soluble enzymes with diverse structures, catalytic properties and stabilities. Nucleic Acids Research

Perkel, J. M. (2020). The software that powers scientific illustration. Nature.

Machine learning for protein engineering (and small molecules). Jennifer Listgarten. EECS & Center for computational biology. Berkeley Artificial Intelligence Lab.

A protein solubility prediction tool.

Koehler Leman J, Weitzner BD, Renfrew PD, Lewis SM, Moretti R, Watkins AM, et al. (2020) Better together: Elements of successful scientific software development in a distributed collaborative community. PLoS Comput Biol 16(5): e1007507.

Taujale, R., Venkat, A., Huang, L. C., Zhou, Z., Yeung, W., Rasheed, K. M., ... & Kannan, N. (2020). Deep evolutionary analysis reveals the design principles of fold A glycosyltransferases. Elife9, e54532. DOI: 10.7554/eLife.54532

Callaway, E. (2020). The race for coronavirus vaccines: a graphical guide. Nature580(7805), 576.

Computational protein design using geometric deep learning. Michael Bronstein.

Le, K.; Adolf-Bryfogle, J.; Klima, J.; Lyskov, S.; Labonte, J.; Bertolani, S.; Roy Burman, S.; Leaver-Fay, A.; Weitzner, B.; Maguire, J.; Rangan, R.; Adrianowycz, M.; Alford, R.; Adal, A.; Nance, M.; Das, R.; Dunbrack, R.; Schief, W.; Kuhlman, B.; Siegel, J.; Gray, J. PyRosetta Jupyter Notebooks Teach Biomolecular Structure Prediction and Design. Preprints 2020, 2020020097 (doi: 10.20944/preprints202002.0097.v1)

Lin, L., Kightlinger, W., Prabhu, S. K., Hockenberry, A. J., Li, C., Wang, L. X., ... & Mrksich, M. (2020). Sequential Glycosylation of Proteins with Substrate-Specific N-Glycosyltransferases. ACS Central Science6(2), 144-154.

Macromolecular modeling and design in Rosetta: new methods and frameworks"

"Protein Sequence Design with a learned Potential"

Virtanen, P., Gommers, R., Oliphant, T.E. et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat Methods 17, 261–272 (2020).

High throughput quantum chemistry for molecular design.

Torrisi, M., Pollastri, G., & Le, Q. (2020). Deep learning methods in protein structure prediction. Computational and Structural Biotechnology Journal

AlQuraishi, M. (2020). A watershed moment for protein structure prediction.

Cunningham, J.M., Koytiger, G., Sorger, P.K. et al. Biophysical prediction of protein–peptide interactions and signaling networks using machine learning. Nat Methods 17, 175–183 (2020).