ESM-2-powered MHC-II Ligand Elution and Binding Affinity predictor.
Fine-tuned ESM-2 650M model for MHC Class II peptide-MHC binding prediction
Model Overview
This model predicts MHC Class II peptide-MHC binding using:
- Base Model: facebook/esm2_t33_650M_UR50D (1280 hidden dim)
- Fine-tuning: LoRA (r=16, α=32) on query/key/value/dense
- Dual Heads:
- BA Head: Binding affinity regression (0-1 scale)
- EL Head: Presentation likelihood classification (binary)
Training Data
- EL (Eluted Ligands): ~1.5M peptide-MHC pairs
- BA (Binding Affinity): ~30k peptide-MHC pairs with measured affinities
- Alleles: ~200 human HLA-II alleles (DRB1, DRB3-5, DPA1, DPB1, DQA1, DQB1)
Architecture
Input Format
peptide [SEP] allele pseudosequence [SEP] context (flanking amino acids)
Example:
AAAAAMAEQESARN [SEP] QEFFIASGAAVDAIMWLFLECYDLQRATYHVGFT [SEP] MAAAAAARNGGR
MHC-II Specific Features
- Peptide Length: 13-21 amino acids (vs 8-11 for MHC-I)
- Allele Format: Single genes (DRB101:01) or heterodimers (DPA101:03-DPB1*02:01)
- Context Sequences: Flanking regions from source proteins
- Pseudosequences: 34-residue binding groove representation
Prediction Heads
BA Head (Regression)
- Input: Mean-pooled sequence embeddings
- Output: Binding affinity [0, 1]
- Loss: RegularizedMSELoss (with σ/μ constraints)
EL Head (Classification)
- Input: Full sequence embeddings (attention-pooled)
- Output: Presentation likelihood [0, 1]
- Loss: BalancedFocalBCELoss
Usage
from transformers import AutoTokenizer, AutoModel
from peft import PeftModel
import torch
# Load model
tokenizer = AutoTokenizer.from_pretrained("facebook/esm2_t33_650M_UR50D")
base_model = AutoModel.from_pretrained("facebook/esm2_t33_650M_UR50D")
model = PeftModel.from_pretrained(base_model, "O047/esm2_MHC-II_Reforged_Single")
# Prepare input
peptide = "AAAAAMAEQESARN"
allele = "HLA-DRB1*01:01"
context = "MAAAAAARNGGR"
sequence = f"{peptide} [SEP] {allele} [SEP] {context}"
# Tokenize
inputs = tokenizer(sequence, return_tensors="pt")
# Get embeddings
with torch.no_grad():
outputs = model(**inputs)
embeddings = outputs.last_hidden_state
# Load prediction heads (BA and EL)
# [Load your trained heads here]
Performance
Metrics reported on independent test set:
- BA Head: Pearson R, Spearman R, RMSE, MAE
- EL Head: ROC-AUC, PR-AUC, F1, MCC, Sensitivity, Specificity
(Performance metrics will be updated upon training completion)
Related Models
- MHC-I Model: O047/esm2_MHC-I_Reforged_Single
Citation
If you use this model in your research, please cite:
(placeholder)
Acknowledgments
- ESM-2 Base Model: Meta AI Research
- Training Framework: HuggingFace Transformers + PEFT
- Data Sources: NetMHCIIpan datasets
Model Status: In Training