Be the first to know and get exclusive access to offers by signing up for our mailing list(s).

Subscribe

We ❤️ Open Source

A community education resource

You don’t need a biochemistry degree to analyze proteins

How pre-trained models democratize advanced research.

Protein research once took months or years of intensive lab work. Today, transformer-based models can predict protein structures and provide functional insights far more quickly. In this lightning talk at All Things AI, Tia Pope, a third-year PhD student at North Carolina A&T, shows how tools like ProtGPT2 and ESM are lowering the barrier to advanced protein analysis for anyone curious enough to try.

Subscribe to our All Things Open YouTube channel to get notifications when new videos are available.

Tia frames protein research as a hidden war taking place inside our bodies. One side consists of proteins such as antibodies, enzymes, and hormones that build, repair, and protect us. The other side includes viral proteins that hijack cells, as well as misfolded or dysfunctional proteins that contribute to disease. COVID-19 made this dynamic more visible, as the virus relies on its spike protein to bind to human cells and begin infection.

Artificial intelligence has accelerated how we study this microscopic world. Tools like AlphaFold, including its newer versions, have dramatically improved the speed of protein structure prediction. Tasks that once required months or years of experimental work can now often be approximated computationally in hours or days. Experimental techniques such as X-ray crystallography are still necessary for confirmation, but AI has made structural insights far more accessible.

A major driver of this progress is the transformer architecture. Unlike older recurrent neural networks, which process sequences step by step, transformers analyze entire sequences at once using attention mechanisms. This approach has enabled powerful advances in protein modeling, including systems that learn from large datasets of amino acid sequences.

Read more: What if your AI agent could actually help?

Despite these advances, much of biology remains poorly understood. Many proteins in public databases have not been experimentally validated, and large portions of the so-called dark proteome have unknown or unclear functions. At the same time, evolving pathogens and increasing antibiotic resistance highlight the need for faster and more effective discovery methods.

In practice, tools such as ProtGPT2 can generate novel protein sequences using platforms like Hugging Face. These generated sequences can then be analyzed with models such as ESM, which provide predictions about structure and potential function. These predictions are probabilistic and require experimental validation before any real-world conclusions can be made.

Key takeaways

  • Models such as AlphaFold and protein language models have significantly accelerated protein structure and sequence analysis. They complement rather than replace experimental methods.
  • Generative models such as ProtGPT2 can propose new protein sequences, while models such as ESM can help evaluate their plausibility.
  • Significant gaps remain in our understanding of protein function, especially within the dark proteome, which creates opportunities for further research.
  • These tools are increasingly accessible, but meaningful impact still depends on careful interpretation, validation, and domain expertise.

Tia’s research builds on these models and she provides notebooks for hands-on exploration. The barrier to entry is low and the impact is high. Curious minds are essential to test these models, separate signal from noise, and refine them so they can safely contribute to real-world treatments.

More from We Love Open Source

The opinions expressed on this website are those of each author, not of the author's employer or All Things Open/We Love Open Source.

Want to contribute your open source content?

Contribute to We ❤️ Open Source

Help educate our community by contributing a blog post, tutorial, or how-to.

Two World-class Events

If you didn't make it to All Things AI, check out the event summary, and make plans to join us October 19-20 for All Things Open.

Open Source Meetups

We host some of the most active open source meetups in the U.S. Get more info and RSVP to an upcoming event.