Fill-Mask
Transformers
Safetensors
ESMplusplus
biology
esm
protein
protein-language-model
masked-language-modeling
custom_code
Instructions to use Synthyra/ESMplusplus_6B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Synthyra/ESMplusplus_6B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("fill-mask", model="Synthyra/ESMplusplus_6B", trust_remote_code=True)# Load model directly from transformers import AutoModelForMaskedLM model = AutoModelForMaskedLM.from_pretrained("Synthyra/ESMplusplus_6B", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -73,55 +73,74 @@ print(output.logits.shape)
|
|
| 73 |
print(output.last_hidden_state.shape)
|
| 74 |
```
|
| 75 |
|
| 76 |
-
Pass `output_hidden_states=True` if you need all intermediate hidden states.
|
| 77 |
-
|
| 78 |
-
##
|
| 79 |
-
|
| 80 |
-
|
| 81 |
-
|
| 82 |
-
|
| 83 |
-
|
| 84 |
-
|
| 85 |
-
|
| 86 |
-
|
| 87 |
-
|
| 88 |
-
```
|
| 89 |
-
|
| 90 |
-
|
| 91 |
-
|
| 92 |
-
|
| 93 |
-
|
| 94 |
-
|
| 95 |
-
|
| 96 |
-
|
| 97 |
-
|
| 98 |
-
|
| 99 |
-
|
| 100 |
-
|
| 101 |
-
|
| 102 |
-
|
| 103 |
-
|
| 104 |
-
|
| 105 |
-
|
| 106 |
-
|
| 107 |
-
|
| 108 |
-
|
| 109 |
-
|
| 110 |
-
|
| 111 |
-
|
| 112 |
-
|
| 113 |
-
|
| 114 |
-
|
| 115 |
-
|
| 116 |
-
|
| 117 |
-
|
| 118 |
-
|
| 119 |
-
|
| 120 |
-
|
| 121 |
-
|
| 122 |
-
|
| 123 |
-
|
| 124 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 125 |
|
| 126 |
```python
|
| 127 |
import torch
|
|
|
|
| 73 |
print(output.last_hidden_state.shape)
|
| 74 |
```
|
| 75 |
|
| 76 |
+
Pass `output_hidden_states=True` if you need all intermediate hidden states.
|
| 77 |
+
|
| 78 |
+
## Experimental Test-Time Training
|
| 79 |
+
|
| 80 |
+
TTT is disabled by default. Normal ESM++ inference, embeddings, logits, and
|
| 81 |
+
`state_dict()` keys are unchanged unless you explicitly call `model.ttt(...)`.
|
| 82 |
+
The current implementation is experimental and trains only local LoRA adapters
|
| 83 |
+
on the ESMC backbone with masked language modeling on the test protein. It can
|
| 84 |
+
help some difficult proteins, but it adds test-time compute and can degrade
|
| 85 |
+
already confident predictions. The 6B checkpoint is large, so start with small
|
| 86 |
+
`steps`, `ags`, and `batch_size` values.
|
| 87 |
+
|
| 88 |
+
```python
|
| 89 |
+
metrics = model.ttt(
|
| 90 |
+
seq="MSTNPKPQRKTKRNT",
|
| 91 |
+
ttt_config={"steps": 1, "ags": 1, "batch_size": 1},
|
| 92 |
+
)
|
| 93 |
+
model.ttt_reset()
|
| 94 |
+
print(metrics["losses"])
|
| 95 |
+
```
|
| 96 |
+
|
| 97 |
+
## Binder Design Regularizer
|
| 98 |
+
|
| 99 |
+
The FastPLMs binder design tutorial uses `Synthyra/ESMplusplus_6B` as the
|
| 100 |
+
ESMC-style masked-LM regularizer while FastPLMs ESMFold2 experimental models
|
| 101 |
+
provide differentiable folding losses and final critics. The script lives at
|
| 102 |
+
`cookbook/tutorials/binder_design_fastplms.py` and supports local CUDA Docker
|
| 103 |
+
runs plus Modal deployment.
|
| 104 |
+
|
| 105 |
+
Run the verified EGFR 128 amino acid de novo minibinder example:
|
| 106 |
+
|
| 107 |
+
```bash
|
| 108 |
+
cd /home/ubuntu/FastPLMs
|
| 109 |
+
|
| 110 |
+
sudo -n docker run --gpus all --rm \
|
| 111 |
+
-v /home/ubuntu/FastPLMs:/app \
|
| 112 |
+
-v /home/ubuntu/FastPLMs:/workspace \
|
| 113 |
+
-v /home/ubuntu/.cache/huggingface:/workspace/.cache/huggingface \
|
| 114 |
+
-w /workspace fastplms-esmfold2 \
|
| 115 |
+
python /app/cookbook/tutorials/binder_design_fastplms.py \
|
| 116 |
+
--backend local \
|
| 117 |
+
--target-name egfr \
|
| 118 |
+
--binder-sequence '################################################################################################################################' \
|
| 119 |
+
--not-antibody \
|
| 120 |
+
--steps 150 \
|
| 121 |
+
--batch-size 1 \
|
| 122 |
+
--seed 103 \
|
| 123 |
+
--output-dir /workspace/campaign_egfr_len128_b1_s150_seed103_consensus_cli
|
| 124 |
+
```
|
| 125 |
+
|
| 126 |
+
The run writes `trajectory.jsonl`, `best_sequences.fasta`, `results.parquet`,
|
| 127 |
+
`selection.parquet`, and per-critic PDB/CIF/logit files. The verified candidate
|
| 128 |
+
had hero mean iPTM `0.913870`, hero min iPTM `0.904600`, and all four ESMFold2
|
| 129 |
+
hero critics above `0.9`.
|
| 130 |
+
|
| 131 |
+
Binder sequence:
|
| 132 |
+
|
| 133 |
+
```text
|
| 134 |
+
SAVKHLLEIVKYLEEAIEKALEVDPVFLVPPAAEELLIAAKVIKELAKENPELIEVYELLMKAVKGLKKLVRSNDKEILREVIRLLRKAAKVIREILKNNPDLDPELRKALEELAKVLEEIAEVLEQQ
|
| 135 |
+
```
|
| 136 |
+
|
| 137 |
+
See [`docs/binder_design.md`](https://github.com/Synthyra/FastPLMs/blob/main/docs/binder_design.md)
|
| 138 |
+
for the full strategy, Modal backend, official pI and selection scoring,
|
| 139 |
+
per-critic metrics, and caveats.
|
| 140 |
+
|
| 141 |
+
## Embed Datasets
|
| 142 |
+
|
| 143 |
+
All FastPLMs sequence models include `embed_dataset`, which handles batching, length sorting, pooling, FASTA parsing, optional resume from existing outputs, and `.pth` or SQLite storage.
|
| 144 |
|
| 145 |
```python
|
| 146 |
import torch
|