Instructions to use Synthyra/ESMplusplus_small with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Synthyra/ESMplusplus_small with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("fill-mask", model="Synthyra/ESMplusplus_small", trust_remote_code=True)# Load model directly from transformers import AutoModelForMaskedLM model = AutoModelForMaskedLM.from_pretrained("Synthyra/ESMplusplus_small", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -43,7 +43,7 @@ model = AutoModelForMaskedLM.from_pretrained('Synthyra/ESMplusplus_small', trust
|
|
| 43 |
```
|
| 44 |
|
| 45 |
## Embed entire datasets with no new code
|
| 46 |
-
To embed a list of protein sequences **fast**, just call embed_dataset. Sequences are sorted to reduce padding tokens, so the progress bar is usually much longer than the actual time.
|
| 47 |
```python
|
| 48 |
embeddings = model.embed_dataset(
|
| 49 |
sequences=sequences, # list of protein strings
|
|
@@ -92,7 +92,7 @@ The plot below showcases performance normalized between the negative control (ra
|
|
| 92 |
## Inference speeds
|
| 93 |
We look at various ESM models and their throughput on an H100. Adding efficient batching between ESMC and ESM++ significantly improves the throughput. ESM++ small is even faster than ESM2-35M with long sequences!
|
| 94 |
The most gains will be seen with PyTorch > 2.5 on linux machines.
|
| 95 |
-
. Bibtex for both coming soon.
|
|
|
|
| 43 |
```
|
| 44 |
|
| 45 |
## Embed entire datasets with no new code
|
| 46 |
+
To embed a list of protein sequences **fast**, just call embed_dataset. Sequences are sorted to reduce padding tokens, so the initial progress bar estimation is usually much longer than the actual time.
|
| 47 |
```python
|
| 48 |
embeddings = model.embed_dataset(
|
| 49 |
sequences=sequences, # list of protein strings
|
|
|
|
| 92 |
## Inference speeds
|
| 93 |
We look at various ESM models and their throughput on an H100. Adding efficient batching between ESMC and ESM++ significantly improves the throughput. ESM++ small is even faster than ESM2-35M with long sequences!
|
| 94 |
The most gains will be seen with PyTorch > 2.5 on linux machines.
|
| 95 |
+

|
| 96 |
|
| 97 |
### Citation
|
| 98 |
If you use any of this implementation or work please cite it (as well as the ESMC preprint). Bibtex for both coming soon.
|