Fill-Mask
Transformers
PyTorch
ablang2-paired
biology
protein
antibody
ablang
chemistry
oas
cdr
ablang2 hf implementation
roberta
ESM
ablang2
antibody-design
custom_code
Instructions to use aaronkollasch/ablang2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use aaronkollasch/ablang2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("fill-mask", model="aaronkollasch/ablang2", trust_remote_code=True)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("aaronkollasch/ablang2", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
| #language: | |
| #- en | |
| license: mit | |
| tags: | |
| - biology | |
| - protein | |
| - antibody | |
| - ablang | |
| - transformers | |
| - pytorch | |
| - chemistry | |
| - oas | |
| - cdr | |
| - ablang2 hf implementation | |
| - roberta | |
| - ESM | |
| - ablang2 | |
| - antibody-design | |
| # datasets: | |
| # - oas | |
| metrics: | |
| - sequence modeling | |
| - protein language model | |
| library_name: transformers | |
| pipeline_tag: fill-mask | |
| # ๐งฌ AbLang2: Transformer-based Antibody Language Model | |
| This repository provides HuggingFace-compatible ๐ค implementation of the AbLang2 language model for antibodies. The original AbLang2 model was developed by the [Oxford Protein Informatics Group (OPIG)](https://opig.stats.ox.ac.uk/) and is available at: | |
| - **AbLang2**: [https://github.com/TobiasHeOl/AbLang2](https://github.com/TobiasHeOl/AbLang2) | |
| ## ๐ฏ Model Available | |
| - **ablang2**: AbLang2 model for antibody sequences | |
| ## ๐ฆ Installation | |
| Install the required dependencies: | |
| ```bash | |
| # Install core dependencies | |
| pip install transformers numpy pandas rotary-embedding-torch | |
| # Install ANARCI from bioconda (required for antibody numbering) | |
| conda install -c bioconda anarci | |
| ``` | |
| **Note**: ANARCI is required for antibody sequence numbering and alignment features. It must be installed from the bioconda channel. | |
| ## ๐ Loading Model from Hugging Face Hub | |
| ### Method 1: Load Model and Tokenizer, then Import Adapter | |
| ```python | |
| import sys | |
| import os | |
| from transformers import AutoModel, AutoTokenizer | |
| from huggingface_hub import hf_hub_download | |
| # Load model and tokenizer from Hugging Face Hub | |
| model = AutoModel.from_pretrained("hemantn/ablang2", trust_remote_code=True) | |
| tokenizer = AutoTokenizer.from_pretrained("hemantn/ablang2", trust_remote_code=True) | |
| # Download adapter and add to path | |
| adapter_path = hf_hub_download(repo_id="hemantn/ablang2", filename="adapter.py") | |
| cached_model_dir = os.path.dirname(adapter_path) | |
| sys.path.insert(0, cached_model_dir) | |
| # Import and create the adapter | |
| from adapter import AbLang2PairedHuggingFaceAdapter | |
| ablang = AbLang2PairedHuggingFaceAdapter(model=model, tokenizer=tokenizer) | |
| ``` | |
| ### Method 2: Using importlib (Alternative) | |
| ```python | |
| import importlib.util | |
| from transformers import AutoModel, AutoTokenizer | |
| from huggingface_hub import hf_hub_download | |
| # Load model and tokenizer | |
| model = AutoModel.from_pretrained("hemantn/ablang2", trust_remote_code=True) | |
| tokenizer = AutoTokenizer.from_pretrained("hemantn/ablang2", trust_remote_code=True) | |
| # Load adapter dynamically | |
| adapter_path = hf_hub_download(repo_id="hemantn/ablang2", filename="adapter.py") | |
| spec = importlib.util.spec_from_file_location("adapter", adapter_path) | |
| adapter_module = importlib.util.module_from_spec(spec) | |
| spec.loader.exec_module(adapter_module) | |
| # Create the adapter | |
| ablang = adapter_module.AbLang2PairedHuggingFaceAdapter(model=model, tokenizer=tokenizer) | |
| ``` | |
| **Note**: Model automatically use GPU when available, otherwise fall back to CPU. | |
| ## โ๏ธ Available Utilities | |
| This wrapper translates between HuggingFace's model format and AbLang2's expected input/output structure, making it easy to use AbLang2's powerful antibody analysis tools with model loaded from HuggingFace. | |
| - **seqcoding**: Sequence-level representations (averaged across residues) | |
| - **rescoding**: Residue-level representations (per-residue embeddings) | |
| - **likelihood**: Raw logits for amino acid prediction at each position | |
| - **probability**: Normalized probabilities for amino acid prediction | |
| - **pseudo_log_likelihood**: Uncertainty scoring with stepwise masking (masks each residue) | |
| - **confidence**: Fast uncertainty scoring (single forward pass, no masking) | |
| - **restore**: Restore masked residues (*) with predicted amino acids | |
| All these utilities work seamlessly with the HuggingFace-loaded model, maintaining the same API as the original AbLang2 implementation. | |
| The `AbLang2PairedHuggingFaceAdapter` class is a wrapper that lets you use AbLang2 model utilities after loading the model from HuggingFace. This class enables you to: | |
| - **Access all AbLang2 utilities** (seqcoding, rescoding, likelihood, probability, etc.) with the same interface as the original implementation | |
| - **Work with antibody sequences** (heavy and light chains) seamlessly | |
| - **Maintain compatibility** with the original AbLang2 API while leveraging HuggingFace's model loading and caching capabilities | |
| ## ๐ก Examples | |
| ### ๐ AbLang2 (Paired Sequences) - Restore Example | |
| ```python | |
| import sys | |
| import os | |
| from transformers import AutoModel, AutoTokenizer | |
| from huggingface_hub import hf_hub_download | |
| # 1. Load model and tokenizer from Hugging Face Hub | |
| model = AutoModel.from_pretrained("hemantn/ablang2", trust_remote_code=True) | |
| tokenizer = AutoTokenizer.from_pretrained("hemantn/ablang2", trust_remote_code=True) | |
| # 2. Download adapter and add to path | |
| adapter_path = hf_hub_download(repo_id="hemantn/ablang2", filename="adapter.py") | |
| cached_model_dir = os.path.dirname(adapter_path) | |
| sys.path.insert(0, cached_model_dir) | |
| from adapter import AbLang2PairedHuggingFaceAdapter | |
| # 3. Create adapter | |
| ablang = AbLang2PairedHuggingFaceAdapter(model=model, tokenizer=tokenizer) | |
| # 4. Restore masked sequences | |
| masked_seqs = [ | |
| ['EVQ***SGGEVKKPGASVKVSCRASGYTFRNYGLTWVRQAPGQGLEWMGWISAYNGNTNYAQKFQGRVTLTTDTSTSTAYMELRSLRSDDTAVYFCAR**PGHGAAFMDVWGTGTTVTVSS', | |
| 'DIQLTQSPLSLPVTLGQPASISCRSS*SLEASDTNIYLSWFQQRPGQSPRRLIYKI*NRDSGVPDRFSGSGSGTHFTLRISRVEADDVAVYYCMQGTHWPPAFGQGTKVDIK'] | |
| ] | |
| restored = ablang(masked_seqs, mode='restore') | |
| print(f"Restored sequences: {restored}") | |
| ``` | |
| ## ๐ Detailed Usage | |
| For comprehensive examples of all utilities (seqcoding, rescoding, likelihood, probability, pseudo_log_likelihood, confidence, and more), see: | |
| - **[`test_ablang2_HF_implementation.ipynb`](test_ablang2_HF_implementation.ipynb)** - Complete notebook with all utilities and advanced usage patterns | |
| ## ๐ Citation | |
| If you use these models in your research, please cite the original AbLang2 paper: | |
| **AbLang2:** | |
| ``` | |
| @article{Olsen2024, | |
| title={Addressing the antibody germline bias and its effect on language models for improved antibody design}, | |
| author={Tobias H. Olsen, Iain H. Moal and Charlotte M. Deane}, | |
| journal={bioRxiv}, | |
| doi={https://doi.org/10.1101/2024.02.02.578678}, | |
| year={2024} | |
| } | |
| ``` |