ESM-2-powered MHC-II Ligand Elution and Binding Affinity predictor.

Fine-tuned ESM-2 650M model for MHC Class II peptide-MHC binding prediction

Model Overview

This model predicts MHC Class II peptide-MHC binding using:

Base Model: facebook/esm2_t33_650M_UR50D (1280 hidden dim)
Fine-tuning: LoRA (r=16, α=32) on query/key/value/dense
Dual Heads:
- BA Head: Binding affinity regression (0-1 scale)
- EL Head: Presentation likelihood classification (binary)

Training Data

EL (Eluted Ligands): ~1.5M peptide-MHC pairs
BA (Binding Affinity): ~30k peptide-MHC pairs with measured affinities
Alleles: ~200 human HLA-II alleles (DRB1, DRB3-5, DPA1, DPB1, DQA1, DQB1)

Architecture

Input Format

peptide [SEP] allele pseudosequence [SEP] context (flanking amino acids)

Example:

AAAAAMAEQESARN [SEP] QEFFIASGAAVDAIMWLFLECYDLQRATYHVGFT [SEP] MAAAAAARNGGR

MHC-II Specific Features

Peptide Length: 13-21 amino acids (vs 8-11 for MHC-I)
Allele Format: Single genes (DRB101:01) or heterodimers (DPA101:03-DPB1*02:01)
Context Sequences: Flanking regions from source proteins
Pseudosequences: 34-residue binding groove representation

Prediction Heads

BA Head (Regression)
- Input: Mean-pooled sequence embeddings
- Output: Binding affinity [0, 1]
- Loss: RegularizedMSELoss (with σ/μ constraints)
EL Head (Classification)
- Input: Full sequence embeddings (attention-pooled)
- Output: Presentation likelihood [0, 1]
- Loss: BalancedFocalBCELoss

Usage

from transformers import AutoTokenizer, AutoModel
from peft import PeftModel
import torch

# Load model
tokenizer = AutoTokenizer.from_pretrained("facebook/esm2_t33_650M_UR50D")
base_model = AutoModel.from_pretrained("facebook/esm2_t33_650M_UR50D")
model = PeftModel.from_pretrained(base_model, "O047/esm2_MHC-II_Reforged_Single")

# Prepare input
peptide = "AAAAAMAEQESARN"
allele = "HLA-DRB1*01:01"
context = "MAAAAAARNGGR"
sequence = f"{peptide} [SEP] {allele} [SEP] {context}"

# Tokenize
inputs = tokenizer(sequence, return_tensors="pt")

# Get embeddings
with torch.no_grad():
    outputs = model(**inputs)
    embeddings = outputs.last_hidden_state

# Load prediction heads (BA and EL)
# [Load your trained heads here]

Performance

Metrics reported on independent test set:

BA Head: Pearson R, Spearman R, RMSE, MAE
EL Head: ROC-AUC, PR-AUC, F1, MCC, Sensitivity, Specificity

(Performance metrics will be updated upon training completion)

Related Models

MHC-I Model: O047/esm2_MHC-I_Reforged_Single

Citation

If you use this model in your research, please cite:

(placeholder)

Acknowledgments

ESM-2 Base Model: Meta AI Research
Training Framework: HuggingFace Transformers + PEFT
Data Sources: NetMHCIIpan datasets

Model Status: In Training

Downloads last month: -; Downloads are not tracked for this model. How to track