Spaces:
Running
Running
mention companion spaces
Browse files
app.py
CHANGED
|
@@ -30,7 +30,9 @@ with gr.Blocks(title="ESM2 Protein Embeddings") as demo:
|
|
| 30 |
# ESM2 for candidate sequence filtering 🤖
|
| 31 |
|
| 32 |
Once one has generated de novo protein sequences using a tool like LigandMPNN, one must rank them to select promising candidates for experimental validation. One powerful approach is to use <a href="https://www.science.org/doi/10.1126/science.ade2574" target="_blank">protein language models like Meta's ESM2.</a>
|
| 33 |
-
These language models rely on a BERT-like architecture and a Masked Language Modeling (MLM) objective to learn rich representations of protein sequences.
|
|
|
|
|
|
|
| 34 |
1. **Generating embeddings**: ESM's hidden layers creates high-dimensional representations of protein sequences that capture structural and functional information.
|
| 35 |
These embeddings can be used as input features for downstream machine learning models to predict function, properties or even for folding.
|
| 36 |
Embeddings can also be used with dimensionality reduction techniques like t-SNE to visualize to identify clusters or compare against known proteins.
|
|
|
|
| 30 |
# ESM2 for candidate sequence filtering 🤖
|
| 31 |
|
| 32 |
Once one has generated de novo protein sequences using a tool like LigandMPNN, one must rank them to select promising candidates for experimental validation. One powerful approach is to use <a href="https://www.science.org/doi/10.1126/science.ade2574" target="_blank">protein language models like Meta's ESM2.</a>
|
| 33 |
+
These language models rely on a BERT-like architecture and a Masked Language Modeling (MLM) objective to learn rich representations of protein sequences. Note that this Space pairs well with the companion <a href="https://huggingface.co/spaces/hugging-science/RFdiffusion3" target="_blank">RFdiffusion3</a>, <a href="https://huggingface.co/spaces/hugging-science/LigandMPNN" target="_blank">LigandMPNN</a> and RosettaFold3 Spaces for a full de novo design pipeline!
|
| 34 |
+
|
| 35 |
+
ESM is used for two main purposes:
|
| 36 |
1. **Generating embeddings**: ESM's hidden layers creates high-dimensional representations of protein sequences that capture structural and functional information.
|
| 37 |
These embeddings can be used as input features for downstream machine learning models to predict function, properties or even for folding.
|
| 38 |
Embeddings can also be used with dimensionality reduction techniques like t-SNE to visualize to identify clusters or compare against known proteins.
|