BALM-paired

BALM-paired is an antibody language model that uses a RoBERTa architecture and was pre-trained on the ~1.6M paired antibody sequences from Jaffe et al. Datasets used for pre-training are available on Zenodo and code is available on GitHub. More details can be found in our paper published in Patterns.

Use

Load the model and tokenizer as follows:

from transformers import RobertaTokenizer, RobertaForMaskedLM

model = RobertaForMaskedLM.from_pretrained("brineylab/BALM-paired")
tokenizer = RobertaTokenizer.from_pretrained("brineylab/BALM-paired")

The tokenizer expects sequences formatted as: HEAVY_CHAIN</s>LIGHT_CHAIN.

Downloads last month: 822

Safetensors

Model size

0.3B params

Tensor type

F32

Collection including brineylab/BALM-paired

BALM Paper

Collection

Models from the publication: "Improving antibody language models with native pairing", Patterns (2024) • 4 items • Updated Dec 10, 2025

Paper for brineylab/BALM-paired

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Paper • 1907.11692 • Published Jul 26, 2019 • 10