File size: 1,487 Bytes
891f721
 
 
 
 
 
d020a0f
 
891f721
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
---
license: mit
---

## Constant-650M
Constant-650M is an antibody language model that uses an [ESM-2](https://www.science.org/doi/10.1126/science.ade2574) architecture. 
It was pre-trained on unpaired and paired sequences from the [OAS](https://opig.stats.ox.ac.uk/webapps/oas/), using the constant approach described in [our paper](https://doi.org/10.1371/journal.pcbi.1013473) published in PLOS Computational Biology. 
Datasets used for pre-training are available on [Zenodo](https://doi.org/10.5281/zenodo.14661302) and code is available on [GitHub](https://github.com/brineylab/curriculum-paper).  

### Use
Load the model and tokenizer as follows:
```python
from transformers import EsmTokenizer, EsmForMaskedLM

model = EsmForMaskedLM.from_pretrained("brineylab/Constant-650M")
tokenizer = EsmTokenizer.from_pretrained("brineylab/Constant-650M")
```

The tokenizer expects inputs in the format: ["VQ..SS\<cls>EV..IK"] for paired sequences, ["VQ..SS\<cls>"] for unpaired heavy chains and ["\<cls>EV..IK"] for unpaired light chains. 

The model can be finetuned for classification tasks (such as specificity and pair classification in the paper) by loading the model with a sequence classification head:
```python
from transformers import EsmForSequenceClassification

model = EsmForSequenceClassification.from_pretrained("brineylab/Constant-650M")

# freeze the base model weights prior to finetuning
for param in model.base_model.parameters():
    param.requires_grad = False
```