AbLMs Scaling Laws
Collection
15 items
โข
Updated
Antibody language models (AbLMs) help decode immune repertoires for therapeutic discovery, but optimal scaling strategies remain unclear. We trained ESM-2-based AbLMs across five sizes (8Mโ650M parameters) and three paired Ab sequences dataset scales (~1.6M total sequences). Results described in our preprint on biorxiv. Effective AbLM scaling requires balancing model size with data availability to avoid diminishing returns.Datasets used for pre-training are avaliable on [Zenodo] and code is avaliable on ...
Load the model and tokenizer as follows:
from transformers import EsmTokenizer, EsmForMaskedLM
model = EsmForMaskedLM.from_pretrained("brineylab/8M_half_checkpoint-435000")
tokenizer = EsmTokenizer.from_pretrained("facebook/esm2_t30_150M_UR50D")
The model can be finetuned for classification tasks (such as specificity and pair classification in the paper) by loading the model with a sequence classification head:
from transformers import EsmForSequenceClassification
model = EsmForSequenceClassification.from_pretrained("brineylab/8M_half_checkpoint-435000")
# freeze the base model weights prior to finetuning
for param in model.base_model.parameters():
param.requires_grad = False