Scientific Calendar Event



Starts 14 May 2025 11:00
Ends 14 May 2025 11:00
Central European Time
Leonardo Building
Leonardo Building - Luigi Stasi Seminar Room

Edith Natalia Villegas Garcia
(AREA Science Park)
 
 
Abstract:
Protein Language Models are deep transformer networks that learn about protein structure and function by training to fill in missing sections in amino acid sequences that make up the proteins. The protein representations learned from these models can be used for several downstream tasks, ranging from predicting structure and function to molecular interaction or mutational effects. However, due to the black-box nature of these models, understanding how they achieve these capabilities remains a challenge. In this presentation I will introduce the concept of Sparse Autoencoders, an interpretability technique for understanding the internal representations of Large Language Models, and show how it can be applied to Protein Language Models. With the use of this technique, we identify several components in the model that seem linked to different protein characteristics, including transmembrane regions, specific binding sites, and specialized motifs. We then focus on specific components recognizing zinc finger motifs, and show how they can be used to guide a protein language model to generate sequences containing these specific motifs.