You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Antibody ESM2 Paired Model

Model Description

This model is a fine-tuned version of ESM2-3B for paired antibody sequences (heavy and light chains).

Key Features:

  • Trained on paired antibody sequences
  • 15% WC followed by 50% CDR fine-tuning
  • Input format: Heavy-Light chains separated by "-"
  • Output: 2560-dimensional embeddings
  • Optimized for antibody CDR region understanding

Preprocessing

Sequences were:

  1. Combined as: HEAVY-LIGHT (with "-" separator)
  2. Tokenized with ESM2 tokenizer
  3. CDR regions annotated for masking

Usage

Loading the Model

from transformers import EsmModel, AutoTokenizer
import torch

# Load model and tokenizer
model = EsmModel.from_pretrained("MahTala/AbCDR-ESM2")
tokenizer = AutoTokenizer.from_pretrained("MahTala/AbCDR-ESM2")
model.eval()

Extract Embeddings

# Prepare paired sequence
SEP_TOKEN = "-" 
heavy_chain = (
    "EVQLVESGGGLVQPGGSLRLSCAASGFTFSSYAMSWVRQAPGKGLEWVAVISYDGSNKYYADSVKGRF"
    "TISADTSKNTAYLQMNSLRAEDTAVYYCAREGYYGSSYWYFDYWGQGTLVTVSS"
)
light_chain = (
    "DIQMTQSPSSLSASVGDRVTITCRASQSISSYLNWYQQKPGKAPKLLIYAASSLQSGVPSRFSGSGS"
    "GTDFTLTISSLQPEDFATYYCQQSYSTPLTFGGGTKVEIK"
)
paired_sequence = f"{heavy_chain}{SEP_TOKEN}{light_chain}"

# Tokenize
inputs = tokenizer(paired, return_tensors="pt", add_special_tokens=True)

# Extract embeddings
with torch.no_grad():
    outputs = model(**inputs)
    embeddings = outputs.last_hidden_state
    
# Mean pooling
mask = inputs["attention_mask"].unsqueeze(-1)
pooled = (embeddings * mask).sum(1) / mask.sum(1)

print(f"Embedding shape: {pooled.shape}")  # (1, 2560)

Input Format

Required Format: HEAVY_CHAIN-LIGHT_CHAIN

  • Heavy and light chains must be separated by hyphen (-)
  • Use standard single-letter amino acid codes
  • No spaces in sequence
  • Uncommon residues should be replaced with X

Example:

sequence = "EVQLVESGGGLVQPGGSLRLSCAASGFTFSSYAMS...-DIQMTQSPSSLSASVGDRVTITCRASQSISS..."

Output

  • Embedding dimension: 2560
  • Sequence length: Variable (up to ~1024 tokens including special tokens)
  • Format: PyTorch tensor

Model Card Authors

Mahtab Talaei

Contact

License

This model is released under the MIT License.

Acknowledgments

  • Base model: ESM2 by Meta AI
  • Data: OAS database

Note: For private repositories, you'll need to authenticate:

# Option 1: CLI login
huggingface-cli login

# Option 2: Environment variable
export HF_TOKEN="your_token_here"
Downloads last month
6
Safetensors
Model size
3B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Paschalidis-NOC-Lab/AbCDR-ESM2

Finetuned
(2)
this model