Readings Archive

This page serves as an archive for salient readings distributed to our group. New lab members are encouraged to explore papers here; they likely inform the field of enzyme engineering and protein design, but may extend to broader topics like best lab practices. They are organized only by the time point at which they were distributed to the group (most recent at the top). Happy reading! 


The imperative of physics-based modeling and inverse theory in computational science | Nature Computational Science.

The confidence gap predicts the gender pay gap among STEM graduates.

Machine Learning in a Molecular Modeling Course for Chemistry, Biochemistry, and Biophysics Students.

Deep Learning: CS 182 Spring 2021 - YouTube.

Predict b-factor/seq conserved sites. Has several features that might be useful for designs. And if you don't have Geneious, this will give you sequence conservation of AA positions.

Important Rosetta Ligand Update. Force Field Optimization Guided by Small Molecule Crystal Lattice Data Enables Consistent Sub-Angstrom Protein–Ligand Docking.

Making Scientific Figures with Illustrator - YouTube. Looks like a great set of videos for making professional quality figures!.

Software behind swiss-model FINALLY was released!!!  AlphaFold definitely is shaking things up. ProMod3—A versatile homology modelling toolbox.

RetroBioCat as a computer-aided synthesis planning tool for biocatalytic reactions and cascades | Nature Catalysis. Looks like a really interesting and useful tool!   Many companies have been working on this and this is the first tool I’ve seen like this that’s open for the academic groups to use.

Another useful ML resource!   Again, goal is not to become a deep expert here but knowledgeable enough to utilize in order to compliment our deep expertise in structure and enzyme function.

Good thread on getting started with ML.



Synthetic biology 2020–2030: six commercially-available products that are changing our world | Nature Communications. Good read!

A worthwhile read.

A nice single site for accessing their suite of tools!

Fantastic blog post on AlphaFold Results.

Another great paper from Jiri!   Fantastic list of enzyme design tools.

This week the big development in protein structure prediction is really exciting! Here are a couple of articles and resources that summarize AlphaFold's leap forward:

And here is a set of videos that you might be interested in or want to share with others! 

  • While this is DeepMind's PR video, a fun watch and really nice graph at ~7:30
  • Deeper dive into AlphaFold's tech. This one you might want to skip to ~45min, but the middle has a good description of AlphaFold 1; the first part is basic biochem 101.  Last part is 1st discussion of potential AlphaFold 2.0 
  • And this one is shorter but good as well! 
  • A quick video on the basics of DeepMind's AlphaFold 2 breakthrough. Here are the timestamps: 0:00 - What happened? 1:03 - How big is this accomplishment? 4:39 - Proteins and amino acids 5:17 - Protein folding 8:26 - How AlphaFold 1 works 9:45 - How AlphaFold 2 works 12:09 - Why is this breakthrough important? 13:19 - Long-term future impact


Looks like a super interesting and useful tool, nice find Peishan!  Coupled with AI deep mind protein folding today could lead to a new workflow for the lab.

Interesting Twitter discussion on what software people use for making figures.

Looks like a really exciting resource of curated stability data (with lots of bagel data in there!!!) FireProtDB: database of manually curated protein stability data. Nucleic Acids Research, gkaa981,

Great thread with lots of tools of potential use/interest... especially with iTOL no longer being free.

GitHub - samsinai/FLEXS: Fitness landscape exploration sandbox for biological sequence design.

Ten simple rules to colorize biological data visualization. PLOS Computational Biology.

    Looks like potentially great source for learning machine learning.

    A really well written and thought provoking paper.  Strongly suggested reading even if just for the intro and good historical perspective. The NK Landscape as a Versatile Benchmark for Machine Learning Driven Protein Engineering.

    Illustration: get your research the attention it deserves. Three scientific artists explain how to create impact with attractive visuals. Nature. Andy Tay. September 24 2020.

    Understanding the origins of loss of protein function by analyzing the effects of thousands of variants on activity and abundance. Matteo Cagiada, Kristoffer E. Johansson, Audronė Valančiūtė, Sofie V. Nielsen, Rasmus Hartmann-Petersen, Jun J. Yang, Douglas M. Fowler, Amelie Stein, Kresten Lindorff-Larse. bioRxiv 2020.09.28.317040; doi:

    Definitely one to read and likely implement. Pavlovicz, R. E., Park, H., & DiMaio, F. (2020). Efficient consideration of coordinated water molecules improves computational protein-protein and protein-ligand docking discrimination. PLOS Computational Biology16(9), e1008103.

    Broom, A., Rakotoharisoa, R.V., Thompson, M.C. et al. Ensemble-based enzyme design can recapitulate the effects of laboratory directed evolution in silico. Nat Commun 11, 4808 (2020).

    A MUST READ. Heckmann, C., & Paradisi, F. Looking back: A short history of the discovery of enzymes and how they became powerful chemical tools. ChemCatChem

    Interesting discussion on picking colors.  Likely holds for protein images as well and most figures.   A great general resource, particularly the end where it discuss to be inspired by color combos from nature.

    Protein graph analysis. Looks like a really interesting package that might be accessible an easy to use for obtaining interesting metrics on proteins!

    It's a good story that may cheer you up, especially when your experiments fail. They created a new tool to Edit Mitochondrial DNA, however the serendipity started from failure: Most deaminases target single strands of DNA, or RNA, which is naturally single-stranded. This deaminase was odd – it didn’t appear to work on either. For months, de Moraes unsuccessfully tested the proteinThen one night, alone in the lab, he decided to try it out on something he didn’t expect to work: double-stranded DNA.

    Russ, W. P., Figliuzzi, M., Stocker, C., Barrat-Charlaix, P., Socolich, M., Kast, P., ... & Ranganathan, R. (2020). Evolution-based design of chorismate mutase enzymes. bioRxiv

    Empirical analysis tells Reviewer 2: “Go F’ Yourself”. ARS Technica. John Timmer 6/27/2020.

    St. John, P.C., Guan, Y., Kim, Y. et al. Prediction of organic homolytic bond dissociation enthalpies at near chemical accuracy with sub-second computational cost. Nat Commun 11, 2328 (2020).

    Dunham, A., & Beltrao, P. (2020). Exploring amino acid functions in a deep mutational landscape. BioRxiv

    Paar, M., Schrabmair, W., Mairold, M., Oettl, K., & Reibnegger, G. (2019). Global Regression Using the Explicit Solution of Michaelis‐Menten Kinetics Employing Lambert's W Function: High Robustness of Parameter Estimates. ChemistrySelect4(6), 1903-1908.

    Hon, J., Borko, S., Stourac, J., Prokop, Z., Zendulka, J., Bednar, D., ... & Damborsky, J. (2020). EnzymeMiner: automated mining of soluble enzymes with diverse structures, catalytic properties and stabilities. Nucleic Acids Research

    Perkel, J. M. (2020). The software that powers scientific illustration. Nature.

    Machine learning for protein engineering (and small molecules). Jennifer Listgarten. EECS & Center for computational biology. Berkeley Artificial Intelligence Lab.

    A protein solubility prediction tool.

    Koehler Leman J, Weitzner BD, Renfrew PD, Lewis SM, Moretti R, Watkins AM, et al. (2020) Better together: Elements of successful scientific software development in a distributed collaborative community. PLoS Comput Biol 16(5): e1007507.

    Taujale, R., Venkat, A., Huang, L. C., Zhou, Z., Yeung, W., Rasheed, K. M., ... & Kannan, N. (2020). Deep evolutionary analysis reveals the design principles of fold A glycosyltransferases. Elife9, e54532. DOI: 10.7554/eLife.54532

    Callaway, E. (2020). The race for coronavirus vaccines: a graphical guide. Nature580(7805), 576.

    Computational protein design using geometric deep learning. Michael Bronstein.

    Le, K.; Adolf-Bryfogle, J.; Klima, J.; Lyskov, S.; Labonte, J.; Bertolani, S.; Roy Burman, S.; Leaver-Fay, A.; Weitzner, B.; Maguire, J.; Rangan, R.; Adrianowycz, M.; Alford, R.; Adal, A.; Nance, M.; Das, R.; Dunbrack, R.; Schief, W.; Kuhlman, B.; Siegel, J.; Gray, J. PyRosetta Jupyter Notebooks Teach Biomolecular Structure Prediction and Design. Preprints 2020, 2020020097 (doi: 10.20944/preprints202002.0097.v1)

    Lin, L., Kightlinger, W., Prabhu, S. K., Hockenberry, A. J., Li, C., Wang, L. X., ... & Mrksich, M. (2020). Sequential Glycosylation of Proteins with Substrate-Specific N-Glycosyltransferases. ACS Central Science6(2), 144-154.

    Macromolecular modeling and design in Rosetta: new methods and frameworks"

    "Protein Sequence Design with a learned Potential"

    Virtanen, P., Gommers, R., Oliphant, T.E. et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat Methods 17, 261–272 (2020).

    High throughput quantum chemistry for molecular design.

    Torrisi, M., Pollastri, G., & Le, Q. (2020). Deep learning methods in protein structure prediction. Computational and Structural Biotechnology Journal

    AlQuraishi, M. (2020). A watershed moment for protein structure prediction.

    Cunningham, J.M., Koytiger, G., Sorger, P.K. et al. Biophysical prediction of protein–peptide interactions and signaling networks using machine learning. Nat Methods 17, 175–183 (2020).