AbLMs scaling laws

Antibody language models (AbLMs) help decode immune repertoires for therapeutic discovery, but optimal scaling strategies remain unclear. We trained ESM-2-based AbLMs across five sizes (8Mโ€“650M parameters) and three paired Ab sequences dataset scales (~1.6M total sequences). Results described in our preprint on biorxiv. Effective AbLM scaling requires balancing model size with data availability to avoid diminishing returns.Datasets used for pre-training are avaliable on [Zenodo] and code is avaliable on ...

Use

Load the model and tokenizer as follows:

from transformers import EsmTokenizer, EsmForMaskedLM

model = EsmForMaskedLM.from_pretrained("brineylab/8M_half_checkpoint-435000") 
tokenizer = EsmTokenizer.from_pretrained("facebook/esm2_t30_150M_UR50D")

The model can be finetuned for classification tasks (such as specificity and pair classification in the paper) by loading the model with a sequence classification head:

from transformers import EsmForSequenceClassification

model = EsmForSequenceClassification.from_pretrained("brineylab/8M_half_checkpoint-435000")

# freeze the base model weights prior to finetuning
for param in model.base_model.parameters():
    param.requires_grad = False
Downloads last month
60
Safetensors
Model size
7.62M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Collection including brineylab/8M_half_checkpoint-435000