sburbach commited on
Commit
891f721
·
verified ·
1 Parent(s): 5d899fc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +30 -3
README.md CHANGED
@@ -1,3 +1,30 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ ---
4
+
5
+ ## Constant-650M
6
+ Constant-650M is an antibody language model that uses an [ESM-2](https://www.science.org/doi/10.1126/science.ade2574) architecture.
7
+ It was pre-trained on unpaired and paired sequences from the [OAS](https://opig.stats.ox.ac.uk/webapps/oas/), using the constant approach described in our [preprint on biorxiv](https://doi.org/10.1101/2025.02.27.640641).
8
+ Datasets used for pre-training are avaliable on [Zenodo](https://doi.org/10.5281/zenodo.14661302) and code is avaliable on [GitHub](https://github.com/brineylab/curriculum-paper).
9
+
10
+ ### Use
11
+ Load the model and tokenizer as follows:
12
+ ```python
13
+ from transformers import EsmTokenizer, EsmForMaskedLM
14
+
15
+ model = EsmForMaskedLM.from_pretrained("brineylab/Constant-650M")
16
+ tokenizer = EsmTokenizer.from_pretrained("brineylab/Constant-650M")
17
+ ```
18
+
19
+ The tokenizer expects inputs in the format: ["VQ..SS\<cls>EV..IK"] for paired sequences, ["VQ..SS\<cls>"] for unpaired heavy chains and ["\<cls>EV..IK"] for unpaired light chains.
20
+
21
+ The model can be finetuned for classification tasks (such as specificity and pair classification in the paper) by loading the model with a sequence classification head:
22
+ ```python
23
+ from transformers import EsmForSequenceClassification
24
+
25
+ model = EsmForSequenceClassification.from_pretrained("brineylab/Constant-650M")
26
+
27
+ # freeze the base model weights prior to finetuning
28
+ for param in model.base_model.parameters():
29
+ param.requires_grad = False
30
+ ```