ESM-2-powered MHC-II Ligand Elution and Binding Affinity predictor.

Fine-tuned ESM-2 650M model for MHC Class II peptide-MHC binding prediction

Model Overview

This model predicts MHC Class II peptide-MHC binding using:

  • Base Model: facebook/esm2_t33_650M_UR50D (1280 hidden dim)
  • Fine-tuning: LoRA (r=16, α=32) on query/key/value/dense
  • Dual Heads:
    • BA Head: Binding affinity regression (0-1 scale)
    • EL Head: Presentation likelihood classification (binary)

Training Data

  • EL (Eluted Ligands): ~1.5M peptide-MHC pairs
  • BA (Binding Affinity): ~30k peptide-MHC pairs with measured affinities
  • Alleles: ~200 human HLA-II alleles (DRB1, DRB3-5, DPA1, DPB1, DQA1, DQB1)

Architecture

Input Format

peptide [SEP] allele pseudosequence [SEP] context (flanking amino acids)

Example:

AAAAAMAEQESARN [SEP] QEFFIASGAAVDAIMWLFLECYDLQRATYHVGFT [SEP] MAAAAAARNGGR

MHC-II Specific Features

  • Peptide Length: 13-21 amino acids (vs 8-11 for MHC-I)
  • Allele Format: Single genes (DRB101:01) or heterodimers (DPA101:03-DPB1*02:01)
  • Context Sequences: Flanking regions from source proteins
  • Pseudosequences: 34-residue binding groove representation

Prediction Heads

  1. BA Head (Regression)

    • Input: Mean-pooled sequence embeddings
    • Output: Binding affinity [0, 1]
    • Loss: RegularizedMSELoss (with σ/μ constraints)
  2. EL Head (Classification)

    • Input: Full sequence embeddings (attention-pooled)
    • Output: Presentation likelihood [0, 1]
    • Loss: BalancedFocalBCELoss

Usage

from transformers import AutoTokenizer, AutoModel
from peft import PeftModel
import torch

# Load model
tokenizer = AutoTokenizer.from_pretrained("facebook/esm2_t33_650M_UR50D")
base_model = AutoModel.from_pretrained("facebook/esm2_t33_650M_UR50D")
model = PeftModel.from_pretrained(base_model, "O047/esm2_MHC-II_Reforged_Single")

# Prepare input
peptide = "AAAAAMAEQESARN"
allele = "HLA-DRB1*01:01"
context = "MAAAAAARNGGR"
sequence = f"{peptide} [SEP] {allele} [SEP] {context}"

# Tokenize
inputs = tokenizer(sequence, return_tensors="pt")

# Get embeddings
with torch.no_grad():
    outputs = model(**inputs)
    embeddings = outputs.last_hidden_state

# Load prediction heads (BA and EL)
# [Load your trained heads here]

Performance

Metrics reported on independent test set:

  • BA Head: Pearson R, Spearman R, RMSE, MAE
  • EL Head: ROC-AUC, PR-AUC, F1, MCC, Sensitivity, Specificity

(Performance metrics will be updated upon training completion)

Related Models

Citation

If you use this model in your research, please cite:

(placeholder)

Acknowledgments

  • ESM-2 Base Model: Meta AI Research
  • Training Framework: HuggingFace Transformers + PEFT
  • Data Sources: NetMHCIIpan datasets

Model Status: In Training

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support