Antibody ESM2 Paired Model
Model Description
This model is a fine-tuned version of ESM2-3B for paired antibody sequences (heavy and light chains).
Key Features:
- Trained on paired antibody sequences
- 15% WC followed by 50% CDR fine-tuning
- Input format: Heavy-Light chains separated by "-"
- Output: 2560-dimensional embeddings
- Optimized for antibody CDR region understanding
Preprocessing
Sequences were:
- Combined as: HEAVY-LIGHT (with "-" separator)
- Tokenized with ESM2 tokenizer
- CDR regions annotated for masking
Usage
Loading the Model
from transformers import EsmModel, AutoTokenizer
import torch
# Load model and tokenizer
model = EsmModel.from_pretrained("MahTala/AbCDR-ESM2")
tokenizer = AutoTokenizer.from_pretrained("MahTala/AbCDR-ESM2")
model.eval()
Extract Embeddings
# Prepare paired sequence
SEP_TOKEN = "-"
heavy_chain = (
"EVQLVESGGGLVQPGGSLRLSCAASGFTFSSYAMSWVRQAPGKGLEWVAVISYDGSNKYYADSVKGRF"
"TISADTSKNTAYLQMNSLRAEDTAVYYCAREGYYGSSYWYFDYWGQGTLVTVSS"
)
light_chain = (
"DIQMTQSPSSLSASVGDRVTITCRASQSISSYLNWYQQKPGKAPKLLIYAASSLQSGVPSRFSGSGS"
"GTDFTLTISSLQPEDFATYYCQQSYSTPLTFGGGTKVEIK"
)
paired_sequence = f"{heavy_chain}{SEP_TOKEN}{light_chain}"
# Tokenize
inputs = tokenizer(paired, return_tensors="pt", add_special_tokens=True)
# Extract embeddings
with torch.no_grad():
outputs = model(**inputs)
embeddings = outputs.last_hidden_state
# Mean pooling
mask = inputs["attention_mask"].unsqueeze(-1)
pooled = (embeddings * mask).sum(1) / mask.sum(1)
print(f"Embedding shape: {pooled.shape}") # (1, 2560)
Input Format
Required Format: HEAVY_CHAIN-LIGHT_CHAIN
- Heavy and light chains must be separated by hyphen (
-) - Use standard single-letter amino acid codes
- No spaces in sequence
- Uncommon residues should be replaced with X
Example:
sequence = "EVQLVESGGGLVQPGGSLRLSCAASGFTFSSYAMS...-DIQMTQSPSSLSASVGDRVTITCRASQSISS..."
Output
- Embedding dimension: 2560
- Sequence length: Variable (up to ~1024 tokens including special tokens)
- Format: PyTorch tensor
Model Card Authors
Mahtab Talaei
Contact
- Maintainer: Network Optimization & Control (NOC) Lab
- Email: mtalaei@bu.edu
- GitHub: https://github.com/Mah-Tala/AbCDR-ESM
- Paper: bioRxiv preprint
License
This model is released under the MIT License.
Acknowledgments
- Base model: ESM2 by Meta AI
- Data: OAS database
Note: For private repositories, you'll need to authenticate:
# Option 1: CLI login
huggingface-cli login
# Option 2: Environment variable
export HF_TOKEN="your_token_here"
- Downloads last month
- 6
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for Paschalidis-NOC-Lab/AbCDR-ESM2
Base model
facebook/esm2_t36_3B_UR50D