Fill-Mask
Transformers
Safetensors
ESMplusplus
custom_code
lhallee commited on
Commit
5ba87b9
·
verified ·
1 Parent(s): bb43645

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +56 -38
README.md CHANGED
@@ -31,44 +31,44 @@ config = AutoConfig.from_pretrained('Synthyra/ESMplusplus_large', trust_remote_c
31
  config.attn_backend = "flex" # or "kernels_flash", "sdpa", "auto"
32
  model = AutoModelForMaskedLM.from_pretrained('Synthyra/ESMplusplus_large', config=config, trust_remote_code=True)
33
  ```
34
-
35
- `torch.compile(model)` is heavily recommended for sustained throughput, especially with Flex Attention.
36
-
37
- ## Binder Design Regularizer
38
-
39
- The FastPLMs binder design tutorial uses the ESM++ model family as the
40
- masked-LM pseudoperplexity regularizer while FastPLMs ESMFold2 experimental
41
- models provide differentiable folding losses and final critics. The verified
42
- EGFR example defaults to `Synthyra/ESMplusplus_6B`; this 600M checkpoint exposes
43
- the same `AutoModelForMaskedLM` API and can be used as a lower-memory
44
- regularizer by editing `FastPLMsBinderDesign.lm_name` in
45
- `cookbook/tutorials/binder_design_fastplms.py`.
46
-
47
- Default verified run:
48
-
49
- ```bash
50
- python cookbook/tutorials/binder_design_fastplms.py \
51
- --backend local \
52
- --target-name egfr \
53
- --binder-sequence '################################################################################################################################' \
54
- --not-antibody \
55
- --steps 150 \
56
- --batch-size 1 \
57
- --seed 103 \
58
- --output-dir binder_design_egfr_len128_seed103
59
- ```
60
-
61
- The verified 6B-regularized result had hero mean iPTM `0.913870`, hero min iPTM
62
- `0.904600`, and all four ESMFold2 hero critics above `0.9`.
63
-
64
- See [`docs/binder_design.md`](https://github.com/Synthyra/FastPLMs/blob/main/docs/binder_design.md)
65
- for the complete workflow, output files, metrics, and Modal/local compute
66
- options.
67
-
68
- ## Use with Hugging Face Transformers
69
- ```python
70
- from transformers import AutoModelForMaskedLM
71
- model = AutoModelForMaskedLM.from_pretrained('Synthyra/ESMplusplus_large', trust_remote_code=True)
72
  tokenizer = model.tokenizer
73
 
74
  sequences = ['MPRTEIN', 'MSEQWENCE']
@@ -99,6 +99,24 @@ import torch
99
  model = AutoModelForMaskedLM.from_pretrained('Synthyra/ESMplusplus_large', trust_remote_code=True, dtype=torch.float16) # or torch.bfloat16
100
  ```
101
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
102
  ## Embed entire datasets with no new code
103
  To embed a list of protein sequences **fast**, just call embed_dataset. Sequences are sorted to reduce padding tokens, so the initial progress bar estimation is usually much longer than the actual time it will take.
104
 
 
31
  config.attn_backend = "flex" # or "kernels_flash", "sdpa", "auto"
32
  model = AutoModelForMaskedLM.from_pretrained('Synthyra/ESMplusplus_large', config=config, trust_remote_code=True)
33
  ```
34
+
35
+ `torch.compile(model)` is heavily recommended for sustained throughput, especially with Flex Attention.
36
+
37
+ ## Binder Design Regularizer
38
+
39
+ The FastPLMs binder design tutorial uses the ESM++ model family as the
40
+ masked-LM pseudoperplexity regularizer while FastPLMs ESMFold2 experimental
41
+ models provide differentiable folding losses and final critics. The verified
42
+ EGFR example defaults to `Synthyra/ESMplusplus_6B`; this 600M checkpoint exposes
43
+ the same `AutoModelForMaskedLM` API and can be used as a lower-memory
44
+ regularizer by editing `FastPLMsBinderDesign.lm_name` in
45
+ `cookbook/tutorials/binder_design_fastplms.py`.
46
+
47
+ Default verified run:
48
+
49
+ ```bash
50
+ python cookbook/tutorials/binder_design_fastplms.py \
51
+ --backend local \
52
+ --target-name egfr \
53
+ --binder-sequence '################################################################################################################################' \
54
+ --not-antibody \
55
+ --steps 150 \
56
+ --batch-size 1 \
57
+ --seed 103 \
58
+ --output-dir binder_design_egfr_len128_seed103
59
+ ```
60
+
61
+ The verified 6B-regularized result had hero mean iPTM `0.913870`, hero min iPTM
62
+ `0.904600`, and all four ESMFold2 hero critics above `0.9`.
63
+
64
+ See [`docs/binder_design.md`](https://github.com/Synthyra/FastPLMs/blob/main/docs/binder_design.md)
65
+ for the complete workflow, output files, metrics, and Modal/local compute
66
+ options.
67
+
68
+ ## Use with Hugging Face Transformers
69
+ ```python
70
+ from transformers import AutoModelForMaskedLM
71
+ model = AutoModelForMaskedLM.from_pretrained('Synthyra/ESMplusplus_large', trust_remote_code=True)
72
  tokenizer = model.tokenizer
73
 
74
  sequences = ['MPRTEIN', 'MSEQWENCE']
 
99
  model = AutoModelForMaskedLM.from_pretrained('Synthyra/ESMplusplus_large', trust_remote_code=True, dtype=torch.float16) # or torch.bfloat16
100
  ```
101
 
102
+ ## Experimental test-time training
103
+
104
+ TTT is disabled by default. Normal ESM++ inference, embeddings, logits, and
105
+ `state_dict()` keys are unchanged unless you explicitly call `model.ttt(...)`.
106
+ The current implementation is experimental and trains only local LoRA adapters
107
+ on the ESMC backbone with masked language modeling on the test protein. It can
108
+ help some difficult proteins, but it adds test-time compute and can degrade
109
+ already confident predictions.
110
+
111
+ ```python
112
+ metrics = model.ttt(
113
+ seq="MSTNPKPQRKTKRNT",
114
+ ttt_config={"steps": 3, "ags": 1, "batch_size": 1},
115
+ )
116
+ model.ttt_reset()
117
+ print(metrics["losses"])
118
+ ```
119
+
120
  ## Embed entire datasets with no new code
121
  To embed a list of protein sequences **fast**, just call embed_dataset. Sequences are sorted to reduce padding tokens, so the initial progress bar estimation is usually much longer than the actual time it will take.
122