brineylab
/

Constant-650M

Model card Files Files and versions

Constant-650M / README.md

sburbach's picture

update paper link and fix typos

d020a0f verified about 1 month ago

|

history blame contribute delete

1.49 kB

	---
	license: mit
	---

	## Constant-650M
	Constant-650M is an antibody language model that uses an [ESM-2](https://www.science.org/doi/10.1126/science.ade2574) architecture.
	It was pre-trained on unpaired and paired sequences from the [OAS](https://opig.stats.ox.ac.uk/webapps/oas/), using the constant approach described in [our paper](https://doi.org/10.1371/journal.pcbi.1013473) published in PLOS Computational Biology.
	Datasets used for pre-training are available on [Zenodo](https://doi.org/10.5281/zenodo.14661302) and code is available on [GitHub](https://github.com/brineylab/curriculum-paper).

	### Use
	Load the model and tokenizer as follows:
	```python
	from transformers import EsmTokenizer, EsmForMaskedLM

	model = EsmForMaskedLM.from_pretrained("brineylab/Constant-650M")
	tokenizer = EsmTokenizer.from_pretrained("brineylab/Constant-650M")
	```

	The tokenizer expects inputs in the format: ["VQ..SS\<cls>EV..IK"] for paired sequences, ["VQ..SS\<cls>"] for unpaired heavy chains and ["\<cls>EV..IK"] for unpaired light chains.

	The model can be finetuned for classification tasks (such as specificity and pair classification in the paper) by loading the model with a sequence classification head:
	```python
	from transformers import EsmForSequenceClassification

	model = EsmForSequenceClassification.from_pretrained("brineylab/Constant-650M")

	# freeze the base model weights prior to finetuning
	for param in model.base_model.parameters():
	param.requires_grad = False
	```