--- #language: #- en license: mit tags: - biology - protein - antibody - ablang - transformers - pytorch - chemistry - oas - cdr - ablang2 hf implementation - roberta - ESM - ablang2 - antibody-design # datasets: # - oas metrics: - sequence modeling - protein language model library_name: transformers pipeline_tag: fill-mask --- # 🧬 AbLang2: Transformer-based Antibody Language Model This repository provides HuggingFace-compatible 🤗 implementation of the AbLang2 language model for antibodies. The original AbLang2 model was developed by the [Oxford Protein Informatics Group (OPIG)](https://opig.stats.ox.ac.uk/) and is available at: - **AbLang2**: [https://github.com/TobiasHeOl/AbLang2](https://github.com/TobiasHeOl/AbLang2) ## 🎯 Model Available - **ablang2**: AbLang2 model for antibody sequences ## 📦 Installation Install the required dependencies: ```bash # Install core dependencies pip install transformers numpy pandas rotary-embedding-torch # Install ANARCI from bioconda (required for antibody numbering) conda install -c bioconda anarci ``` **Note**: ANARCI is required for antibody sequence numbering and alignment features. It must be installed from the bioconda channel. ## 🚀 Loading Model from Hugging Face Hub ### Method 1: Load Model and Tokenizer, then Import Adapter ```python import sys import os from transformers import AutoModel, AutoTokenizer from huggingface_hub import hf_hub_download # Load model and tokenizer from Hugging Face Hub model = AutoModel.from_pretrained("hemantn/ablang2", trust_remote_code=True) tokenizer = AutoTokenizer.from_pretrained("hemantn/ablang2", trust_remote_code=True) # Download adapter and add to path adapter_path = hf_hub_download(repo_id="hemantn/ablang2", filename="adapter.py") cached_model_dir = os.path.dirname(adapter_path) sys.path.insert(0, cached_model_dir) # Import and create the adapter from adapter import AbLang2PairedHuggingFaceAdapter ablang = AbLang2PairedHuggingFaceAdapter(model=model, tokenizer=tokenizer) ``` ### Method 2: Using importlib (Alternative) ```python import importlib.util from transformers import AutoModel, AutoTokenizer from huggingface_hub import hf_hub_download # Load model and tokenizer model = AutoModel.from_pretrained("hemantn/ablang2", trust_remote_code=True) tokenizer = AutoTokenizer.from_pretrained("hemantn/ablang2", trust_remote_code=True) # Load adapter dynamically adapter_path = hf_hub_download(repo_id="hemantn/ablang2", filename="adapter.py") spec = importlib.util.spec_from_file_location("adapter", adapter_path) adapter_module = importlib.util.module_from_spec(spec) spec.loader.exec_module(adapter_module) # Create the adapter ablang = adapter_module.AbLang2PairedHuggingFaceAdapter(model=model, tokenizer=tokenizer) ``` **Note**: Model automatically use GPU when available, otherwise fall back to CPU. ## ⚙️ Available Utilities This wrapper translates between HuggingFace's model format and AbLang2's expected input/output structure, making it easy to use AbLang2's powerful antibody analysis tools with model loaded from HuggingFace. - **seqcoding**: Sequence-level representations (averaged across residues) - **rescoding**: Residue-level representations (per-residue embeddings) - **likelihood**: Raw logits for amino acid prediction at each position - **probability**: Normalized probabilities for amino acid prediction - **pseudo_log_likelihood**: Uncertainty scoring with stepwise masking (masks each residue) - **confidence**: Fast uncertainty scoring (single forward pass, no masking) - **restore**: Restore masked residues (*) with predicted amino acids All these utilities work seamlessly with the HuggingFace-loaded model, maintaining the same API as the original AbLang2 implementation. The `AbLang2PairedHuggingFaceAdapter` class is a wrapper that lets you use AbLang2 model utilities after loading the model from HuggingFace. This class enables you to: - **Access all AbLang2 utilities** (seqcoding, rescoding, likelihood, probability, etc.) with the same interface as the original implementation - **Work with antibody sequences** (heavy and light chains) seamlessly - **Maintain compatibility** with the original AbLang2 API while leveraging HuggingFace's model loading and caching capabilities ## 💡 Examples ### 🔗 AbLang2 (Paired Sequences) - Restore Example ```python import sys import os from transformers import AutoModel, AutoTokenizer from huggingface_hub import hf_hub_download # 1. Load model and tokenizer from Hugging Face Hub model = AutoModel.from_pretrained("hemantn/ablang2", trust_remote_code=True) tokenizer = AutoTokenizer.from_pretrained("hemantn/ablang2", trust_remote_code=True) # 2. Download adapter and add to path adapter_path = hf_hub_download(repo_id="hemantn/ablang2", filename="adapter.py") cached_model_dir = os.path.dirname(adapter_path) sys.path.insert(0, cached_model_dir) from adapter import AbLang2PairedHuggingFaceAdapter # 3. Create adapter ablang = AbLang2PairedHuggingFaceAdapter(model=model, tokenizer=tokenizer) # 4. Restore masked sequences masked_seqs = [ ['EVQ***SGGEVKKPGASVKVSCRASGYTFRNYGLTWVRQAPGQGLEWMGWISAYNGNTNYAQKFQGRVTLTTDTSTSTAYMELRSLRSDDTAVYFCAR**PGHGAAFMDVWGTGTTVTVSS', 'DIQLTQSPLSLPVTLGQPASISCRSS*SLEASDTNIYLSWFQQRPGQSPRRLIYKI*NRDSGVPDRFSGSGSGTHFTLRISRVEADDVAVYYCMQGTHWPPAFGQGTKVDIK'] ] restored = ablang(masked_seqs, mode='restore') print(f"Restored sequences: {restored}") ``` ## 📚 Detailed Usage For comprehensive examples of all utilities (seqcoding, rescoding, likelihood, probability, pseudo_log_likelihood, confidence, and more), see: - **[`test_ablang2_HF_implementation.ipynb`](test_ablang2_HF_implementation.ipynb)** - Complete notebook with all utilities and advanced usage patterns ## 📖 Citation If you use these models in your research, please cite the original AbLang2 paper: **AbLang2:** ``` @article{Olsen2024, title={Addressing the antibody germline bias and its effect on language models for improved antibody design}, author={Tobias H. Olsen, Iain H. Moal and Charlotte M. Deane}, journal={bioRxiv}, doi={https://doi.org/10.1101/2024.02.02.578678}, year={2024} } ```