AbLMs scaling laws

Antibody language models (AbLMs) help decode immune repertoires for therapeutic discovery, but optimal scaling strategies remain unclear. We trained ESM-2-based AbLMs across five sizes (8M–650M parameters) and three paired Ab sequences dataset scales (~1.6M total sequences). Results described in our preprint on biorxiv. Effective AbLM scaling requires balancing model size with data availability to avoid diminishing returns.Datasets used for pre-training are avaliable on [Zenodo] and code is avaliable on ...

Use

Load the model and tokenizer as follows:

from transformers import EsmTokenizer, EsmForMaskedLM

model = EsmForMaskedLM.from_pretrained("brineylab/8M_half_checkpoint-435000") 
tokenizer = EsmTokenizer.from_pretrained("facebook/esm2_t30_150M_UR50D")

The model can be finetuned for classification tasks (such as specificity and pair classification in the paper) by loading the model with a sequence classification head:

from transformers import EsmForSequenceClassification

model = EsmForSequenceClassification.from_pretrained("brineylab/8M_half_checkpoint-435000")

# freeze the base model weights prior to finetuning
for param in model.base_model.parameters():
    param.requires_grad = False

Downloads last month: 1

Safetensors

Model size

7.62M params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including brineylab/8M_half_checkpoint-435000

AbLMs Scaling Laws

Collection

15 items • Updated Dec 10, 2025