|
|
--- |
|
|
library_name: transformers |
|
|
license: mit |
|
|
--- |
|
|
|
|
|
## Preferential-250k |
|
|
Preferential-250k is an antibody language model that uses an [ESM-2](https://www.science.org/doi/10.1126/science.ade2574) architecture. |
|
|
It was pre-trained on paired sequences from [Jaffe et al.](https://www.nature.com/articles/s41586-022-05371-z) and [Hurtado et al.](https://doi.org/10.1016/j.celrep.2024.114307) |
|
|
Datasets used for pre-training are available on [Zenodo](https://doi.org/10.5281/zenodo.14019655) and code is available on [GitHub](https://github.com/brineylab/preferential-masking-paper). |
|
|
More details can be found in [our paper](https://doi.org/10.1016/j.patter.2025.101239) published in Patterns. |
|
|
|
|
|
### Use |
|
|
Load the model and tokenizer as follows: |
|
|
```python |
|
|
from transformers import EsmTokenizer, EsmForMaskedLM |
|
|
|
|
|
model = EsmForMaskedLM.from_pretrained("brineylab/preferential-250k") |
|
|
tokenizer = EsmTokenizer.from_pretrained("brineylab/preferential-250k") |
|
|
``` |
|
|
|
|
|
The tokenizer expects sequences formatted as: `HEAVY_CHAIN<cls><cls>LIGHT_CHAIN`. |
|
|
|
|
|
The model can be finetuned for classification tasks (such as specificity and pair classification in the paper) by loading the model with a sequence classification head: |
|
|
```python |
|
|
from transformers import EsmForSequenceClassification |
|
|
|
|
|
model = EsmForSequenceClassification.from_pretrained("brineylab/preferential-250k") |
|
|
|
|
|
# freeze the base model weights prior to finetuning |
|
|
for param in model.base_model.parameters(): |
|
|
param.requires_grad = False |
|
|
``` |