AlignAIR Pretrained Models

AlignAIR is a deep learning tool for aligning immunoglobulin (IG) and T-cell receptor (TCR) sequences to germline gene databases. It simultaneously predicts V/D/J gene assignments, segment boundaries, mutation rates, and productivity β€” all in a single forward pass.

Available Models

Model Chain Germline DB V Alleles D Alleles J Alleles Size
HUMAN_IGH_OGRDB_576 IGH (Heavy) OGRDB 198 33 7 17 MB
HUMAN_IGH_EXTENDED_576 IGH (Heavy) Extended 342 37 10 28 MB
HUMAN_IGK_OGRDB_576 IGK (Kappa) OGRDB 168 β€” 8 12 MB
HUMAN_IGL_OGRDB_576 IGL (Lambda) OGRDB 181 β€” 10 13 MB
HUMAN_TCRB_IMGT_576 TCRB (Beta) IMGT 130 3 14 12 MB

All models use a maximum sequence length of 576 nucleotides and were trained on 1000 epochs of synthetic data generated by GenAIRR.

Quick Start

pip install alignair[hub]

Python API

from AlignAIR.Models import SingleChainAlignAIR
from AlignAIR.Hub import get_model_path

# Download and load a model (cached automatically)
model_path = get_model_path("igh")  # or "HUMAN_IGH_OGRDB_576"
model = SingleChainAlignAIR.from_pretrained(model_path)

CLI

# Run inference with a pretrained model
alignair --model-dir HUMAN_IGH_OGRDB_576 input_sequences.csv -o results/

Benchmark Results (100K synthetic sequences)

Model AlignAIR V IgBLAST V AlignAIR D IgBLAST D AlignAIR J IgBLAST J AlignAIR Speed
IGH OGRDB 94.1% 95.5% 81.7% 69.8% 99.3% 99.5% 4,272 seq/s
IGH Extended 92.3% 93.9% 88.5% 82.6% 98.7% 98.4% 4,245 seq/s
IGK OGRDB 94.6% 95.4% β€” β€” 97.2% 96.0% 4,807 seq/s
IGL OGRDB 93.9% 95.3% β€” β€” 98.4% 96.7% 5,384 seq/s
TCRB IMGT 96.5% 96.2% 89.6% 76.3% 99.6% 99.1% 4,317 seq/s

Speed measured on NVIDIA RTX 3090 Ti (GPU) vs IgBLAST 1.22.0 (8 CPU threads).

Model Architecture

Each model is a SingleChainAlignAIR module combining:

  • Nucleotide embedding (5β†’64 dim) with center-padded tokenization
  • Spatial segmentation via 9-layer dilated convolutions (receptive field = 1023 nt)
  • Conditioned boundary heads with chain decoding (v_start β†’ v_end β†’ d_start β†’ ...)
  • Classification heads for V/D/J allele assignment
  • Analysis heads for mutation rate and productivity prediction
  • In-model orientation correction (4-class: forward, reverse-complement, complement, reverse)

Bundle Format

Each model directory contains:

  • model.pt β€” PyTorch state dict
  • config.json β€” Architecture hyperparameters
  • dataconfig.pkl β€” Germline allele database (GenAIRR DataConfig)
  • training_meta.json β€” Training provenance
  • VERSION β€” Bundle format version
  • fingerprint.txt β€” SHA-256 integrity hash

Citation

If you use AlignAIR in your research, please cite:

Konstantinovsky, T., Peres, A., Eisenberg, R., Polak, P., Lindenbaum, O., & Yaari, G. (2025). Enhancing sequence alignment of adaptive immune receptors through multi-task deep learning. Nucleic Acids Research, 53(13). https://doi.org/10.1093/nar/gkaf651

@article{Konstantinovsky2025,
  title = {Enhancing sequence alignment of adaptive immune receptors through multi-task deep learning},
  volume = {53},
  ISSN = {1362-4962},
  url = {http://dx.doi.org/10.1093/nar/gkaf651},
  DOI = {10.1093/nar/gkaf651},
  number = {13},
  journal = {Nucleic Acids Research},
  publisher = {Oxford University Press (OUP)},
  author = {Konstantinovsky, Thomas and Peres, Ayelet and Eisenberg, Ran and Polak, Pazit and Lindenbaum, Ofir and Yaari, Gur},
  year = {2025},
  month = jul
}

License

GPL-3.0. See LICENSE.

Links

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support