--- language: en license: gpl-3.0 tags: - immunoinformatics - antibody - TCR - AIRR - sequence-alignment - bioinformatics - pytorch library_name: alignair pipeline_tag: token-classification --- # AlignAIR Pretrained Models **AlignAIR** is a deep learning tool for aligning immunoglobulin (IG) and T-cell receptor (TCR) sequences to germline gene databases. It simultaneously predicts V/D/J gene assignments, segment boundaries, mutation rates, and productivity — all in a single forward pass. ## Available Models | Model | Chain | Germline DB | V Alleles | D Alleles | J Alleles | Size | |-------|-------|-------------|-----------|-----------|-----------|------| | `HUMAN_IGH_OGRDB_576` | IGH (Heavy) | OGRDB | 198 | 33 | 7 | 17 MB | | `HUMAN_IGH_EXTENDED_576` | IGH (Heavy) | Extended | 342 | 37 | 10 | 28 MB | | `HUMAN_IGK_OGRDB_576` | IGK (Kappa) | OGRDB | 168 | — | 8 | 12 MB | | `HUMAN_IGL_OGRDB_576` | IGL (Lambda) | OGRDB | 181 | — | 10 | 13 MB | | `HUMAN_TCRB_IMGT_576` | TCRB (Beta) | IMGT | 130 | 3 | 14 | 12 MB | All models use a maximum sequence length of 576 nucleotides and were trained on 1000 epochs of synthetic data generated by [GenAIRR](https://github.com/MuteJester/GenAIRR). ## Quick Start ```bash pip install alignair[hub] ``` ### Python API ```python from AlignAIR.Models import SingleChainAlignAIR from AlignAIR.Hub import get_model_path # Download and load a model (cached automatically) model_path = get_model_path("igh") # or "HUMAN_IGH_OGRDB_576" model = SingleChainAlignAIR.from_pretrained(model_path) ``` ### CLI ```bash # Run inference with a pretrained model alignair --model-dir HUMAN_IGH_OGRDB_576 input_sequences.csv -o results/ ``` ## Benchmark Results (100K synthetic sequences) | Model | AlignAIR V | IgBLAST V | AlignAIR D | IgBLAST D | AlignAIR J | IgBLAST J | AlignAIR Speed | |-------|-----------|-----------|-----------|-----------|-----------|-----------|----------------| | IGH OGRDB | 94.1% | 95.5% | 81.7% | 69.8% | 99.3% | 99.5% | 4,272 seq/s | | IGH Extended | 92.3% | 93.9% | 88.5% | 82.6% | 98.7% | 98.4% | 4,245 seq/s | | IGK OGRDB | 94.6% | 95.4% | — | — | 97.2% | 96.0% | 4,807 seq/s | | IGL OGRDB | 93.9% | 95.3% | — | — | 98.4% | 96.7% | 5,384 seq/s | | TCRB IMGT | 96.5% | 96.2% | 89.6% | 76.3% | 99.6% | 99.1% | 4,317 seq/s | Speed measured on NVIDIA RTX 3090 Ti (GPU) vs IgBLAST 1.22.0 (8 CPU threads). ## Model Architecture Each model is a `SingleChainAlignAIR` module combining: - **Nucleotide embedding** (5→64 dim) with center-padded tokenization - **Spatial segmentation** via 9-layer dilated convolutions (receptive field = 1023 nt) - **Conditioned boundary heads** with chain decoding (v_start → v_end → d_start → ...) - **Classification heads** for V/D/J allele assignment - **Analysis heads** for mutation rate and productivity prediction - **In-model orientation correction** (4-class: forward, reverse-complement, complement, reverse) ## Bundle Format Each model directory contains: - `model.pt` — PyTorch state dict - `config.json` — Architecture hyperparameters - `dataconfig.pkl` — Germline allele database (GenAIRR DataConfig) - `training_meta.json` — Training provenance - `VERSION` — Bundle format version - `fingerprint.txt` — SHA-256 integrity hash ## Citation If you use AlignAIR in your research, please cite: > Konstantinovsky, T., Peres, A., Eisenberg, R., Polak, P., Lindenbaum, O., & Yaari, G. (2025). Enhancing sequence alignment of adaptive immune receptors through multi-task deep learning. *Nucleic Acids Research*, 53(13). https://doi.org/10.1093/nar/gkaf651 ```bibtex @article{Konstantinovsky2025, title = {Enhancing sequence alignment of adaptive immune receptors through multi-task deep learning}, volume = {53}, ISSN = {1362-4962}, url = {http://dx.doi.org/10.1093/nar/gkaf651}, DOI = {10.1093/nar/gkaf651}, number = {13}, journal = {Nucleic Acids Research}, publisher = {Oxford University Press (OUP)}, author = {Konstantinovsky, Thomas and Peres, Ayelet and Eisenberg, Ran and Polak, Pazit and Lindenbaum, Ofir and Yaari, Gur}, year = {2025}, month = jul } ``` ## License GPL-3.0. See [LICENSE](https://github.com/MuteJester/AlignAIR/blob/main/LICENSE). ## Links - [GitHub Repository](https://github.com/MuteJester/AlignAIR) - [Documentation](https://mutejester.github.io/AlignAIR/) - [PyPI Package](https://pypi.org/project/AlignAIR/)