lhallee commited on
Commit
d4458f5
·
verified ·
1 Parent(s): ccc8e40

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +68 -49
README.md CHANGED
@@ -73,55 +73,74 @@ print(output.logits.shape)
73
  print(output.last_hidden_state.shape)
74
  ```
75
 
76
- Pass `output_hidden_states=True` if you need all intermediate hidden states.
77
-
78
- ## Binder Design Regularizer
79
-
80
- The FastPLMs binder design tutorial uses `Synthyra/ESMplusplus_6B` as the
81
- ESMC-style masked-LM regularizer while FastPLMs ESMFold2 experimental models
82
- provide differentiable folding losses and final critics. The script lives at
83
- `cookbook/tutorials/binder_design_fastplms.py` and supports local CUDA Docker
84
- runs plus Modal deployment.
85
-
86
- Run the verified EGFR 128 amino acid de novo minibinder example:
87
-
88
- ```bash
89
- cd /home/ubuntu/FastPLMs
90
-
91
- sudo -n docker run --gpus all --rm \
92
- -v /home/ubuntu/FastPLMs:/app \
93
- -v /home/ubuntu/FastPLMs:/workspace \
94
- -v /home/ubuntu/.cache/huggingface:/workspace/.cache/huggingface \
95
- -w /workspace fastplms-esmfold2 \
96
- python /app/cookbook/tutorials/binder_design_fastplms.py \
97
- --backend local \
98
- --target-name egfr \
99
- --binder-sequence '################################################################################################################################' \
100
- --not-antibody \
101
- --steps 150 \
102
- --batch-size 1 \
103
- --seed 103 \
104
- --output-dir /workspace/campaign_egfr_len128_b1_s150_seed103_consensus_cli
105
- ```
106
-
107
- The run writes `trajectory.jsonl`, `best_sequences.fasta`, `results.parquet`,
108
- `selection.parquet`, and per-critic PDB/CIF/logit files. The verified candidate
109
- had hero mean iPTM `0.913870`, hero min iPTM `0.904600`, and all four ESMFold2
110
- hero critics above `0.9`.
111
-
112
- Binder sequence:
113
-
114
- ```text
115
- SAVKHLLEIVKYLEEAIEKALEVDPVFLVPPAAEELLIAAKVIKELAKENPELIEVYELLMKAVKGLKKLVRSNDKEILREVIRLLRKAAKVIREILKNNPDLDPELRKALEELAKVLEEIAEVLEQQ
116
- ```
117
-
118
- See [`docs/binder_design.md`](https://github.com/Synthyra/FastPLMs/blob/main/docs/binder_design.md)
119
- for the full strategy, Modal backend, official pI and selection scoring,
120
- per-critic metrics, and caveats.
121
-
122
- ## Embed Datasets
123
-
124
- All FastPLMs sequence models include `embed_dataset`, which handles batching, length sorting, pooling, FASTA parsing, optional resume from existing outputs, and `.pth` or SQLite storage.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
125
 
126
  ```python
127
  import torch
 
73
  print(output.last_hidden_state.shape)
74
  ```
75
 
76
+ Pass `output_hidden_states=True` if you need all intermediate hidden states.
77
+
78
+ ## Experimental Test-Time Training
79
+
80
+ TTT is disabled by default. Normal ESM++ inference, embeddings, logits, and
81
+ `state_dict()` keys are unchanged unless you explicitly call `model.ttt(...)`.
82
+ The current implementation is experimental and trains only local LoRA adapters
83
+ on the ESMC backbone with masked language modeling on the test protein. It can
84
+ help some difficult proteins, but it adds test-time compute and can degrade
85
+ already confident predictions. The 6B checkpoint is large, so start with small
86
+ `steps`, `ags`, and `batch_size` values.
87
+
88
+ ```python
89
+ metrics = model.ttt(
90
+ seq="MSTNPKPQRKTKRNT",
91
+ ttt_config={"steps": 1, "ags": 1, "batch_size": 1},
92
+ )
93
+ model.ttt_reset()
94
+ print(metrics["losses"])
95
+ ```
96
+
97
+ ## Binder Design Regularizer
98
+
99
+ The FastPLMs binder design tutorial uses `Synthyra/ESMplusplus_6B` as the
100
+ ESMC-style masked-LM regularizer while FastPLMs ESMFold2 experimental models
101
+ provide differentiable folding losses and final critics. The script lives at
102
+ `cookbook/tutorials/binder_design_fastplms.py` and supports local CUDA Docker
103
+ runs plus Modal deployment.
104
+
105
+ Run the verified EGFR 128 amino acid de novo minibinder example:
106
+
107
+ ```bash
108
+ cd /home/ubuntu/FastPLMs
109
+
110
+ sudo -n docker run --gpus all --rm \
111
+ -v /home/ubuntu/FastPLMs:/app \
112
+ -v /home/ubuntu/FastPLMs:/workspace \
113
+ -v /home/ubuntu/.cache/huggingface:/workspace/.cache/huggingface \
114
+ -w /workspace fastplms-esmfold2 \
115
+ python /app/cookbook/tutorials/binder_design_fastplms.py \
116
+ --backend local \
117
+ --target-name egfr \
118
+ --binder-sequence '################################################################################################################################' \
119
+ --not-antibody \
120
+ --steps 150 \
121
+ --batch-size 1 \
122
+ --seed 103 \
123
+ --output-dir /workspace/campaign_egfr_len128_b1_s150_seed103_consensus_cli
124
+ ```
125
+
126
+ The run writes `trajectory.jsonl`, `best_sequences.fasta`, `results.parquet`,
127
+ `selection.parquet`, and per-critic PDB/CIF/logit files. The verified candidate
128
+ had hero mean iPTM `0.913870`, hero min iPTM `0.904600`, and all four ESMFold2
129
+ hero critics above `0.9`.
130
+
131
+ Binder sequence:
132
+
133
+ ```text
134
+ SAVKHLLEIVKYLEEAIEKALEVDPVFLVPPAAEELLIAAKVIKELAKENPELIEVYELLMKAVKGLKKLVRSNDKEILREVIRLLRKAAKVIREILKNNPDLDPELRKALEELAKVLEEIAEVLEQQ
135
+ ```
136
+
137
+ See [`docs/binder_design.md`](https://github.com/Synthyra/FastPLMs/blob/main/docs/binder_design.md)
138
+ for the full strategy, Modal backend, official pI and selection scoring,
139
+ per-critic metrics, and caveats.
140
+
141
+ ## Embed Datasets
142
+
143
+ All FastPLMs sequence models include `embed_dataset`, which handles batching, length sorting, pooling, FASTA parsing, optional resume from existing outputs, and `.pth` or SQLite storage.
144
 
145
  ```python
146
  import torch