Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,30 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: mit
|
| 3 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
---
|
| 4 |
+
|
| 5 |
+
## Constant-650M
|
| 6 |
+
Constant-650M is an antibody language model that uses an [ESM-2](https://www.science.org/doi/10.1126/science.ade2574) architecture.
|
| 7 |
+
It was pre-trained on unpaired and paired sequences from the [OAS](https://opig.stats.ox.ac.uk/webapps/oas/), using the constant approach described in our [preprint on biorxiv](https://doi.org/10.1101/2025.02.27.640641).
|
| 8 |
+
Datasets used for pre-training are avaliable on [Zenodo](https://doi.org/10.5281/zenodo.14661302) and code is avaliable on [GitHub](https://github.com/brineylab/curriculum-paper).
|
| 9 |
+
|
| 10 |
+
### Use
|
| 11 |
+
Load the model and tokenizer as follows:
|
| 12 |
+
```python
|
| 13 |
+
from transformers import EsmTokenizer, EsmForMaskedLM
|
| 14 |
+
|
| 15 |
+
model = EsmForMaskedLM.from_pretrained("brineylab/Constant-650M")
|
| 16 |
+
tokenizer = EsmTokenizer.from_pretrained("brineylab/Constant-650M")
|
| 17 |
+
```
|
| 18 |
+
|
| 19 |
+
The tokenizer expects inputs in the format: ["VQ..SS\<cls>EV..IK"] for paired sequences, ["VQ..SS\<cls>"] for unpaired heavy chains and ["\<cls>EV..IK"] for unpaired light chains.
|
| 20 |
+
|
| 21 |
+
The model can be finetuned for classification tasks (such as specificity and pair classification in the paper) by loading the model with a sequence classification head:
|
| 22 |
+
```python
|
| 23 |
+
from transformers import EsmForSequenceClassification
|
| 24 |
+
|
| 25 |
+
model = EsmForSequenceClassification.from_pretrained("brineylab/Constant-650M")
|
| 26 |
+
|
| 27 |
+
# freeze the base model weights prior to finetuning
|
| 28 |
+
for param in model.base_model.parameters():
|
| 29 |
+
param.requires_grad = False
|
| 30 |
+
```
|