---
language:
- en
tags:
- biology
- immunology
- MHC-II
- peptide-binding
- ESM-2
- LoRA
pipeline_tag: feature-extraction
---

# ESM-2-powered MHC-II Ligand Elution and Binding Affinity predictor.

**Fine-tuned ESM-2 650M model for MHC Class II peptide-MHC binding prediction**

## Model Overview

This model predicts **MHC Class II peptide-MHC binding** using:
- **Base Model:** facebook/esm2_t33_650M_UR50D (1280 hidden dim)
- **Fine-tuning:** LoRA (r=16, α=32) on query/key/value/dense
- **Dual Heads:**
  - **BA Head:** Binding affinity regression (0-1 scale)
  - **EL Head:** Presentation likelihood classification (binary)

## Training Data

- **EL (Eluted Ligands):** ~1.5M peptide-MHC pairs
- **BA (Binding Affinity):** ~30k peptide-MHC pairs with measured affinities
- **Alleles:** ~200 human HLA-II alleles (DRB1, DRB3-5, DPA1, DPB1, DQA1, DQB1)

##  Architecture

### Input Format
```
peptide [SEP] allele pseudosequence [SEP] context (flanking amino acids)
```

Example:
```
AAAAAMAEQESARN [SEP] QEFFIASGAAVDAIMWLFLECYDLQRATYHVGFT [SEP] MAAAAAARNGGR
```

### MHC-II Specific Features
- **Peptide Length:** 13-21 amino acids (vs 8-11 for MHC-I)
- **Allele Format:** Single genes (DRB1*01:01) or heterodimers (DPA1*01:03-DPB1*02:01)
- **Context Sequences:** Flanking regions from source proteins
- **Pseudosequences:** 34-residue binding groove representation

### Prediction Heads
1. **BA Head (Regression)**
   - Input: Mean-pooled sequence embeddings
   - Output: Binding affinity [0, 1]
   - Loss: RegularizedMSELoss (with σ/μ constraints)

2. **EL Head (Classification)**
   - Input: Full sequence embeddings (attention-pooled)
   - Output: Presentation likelihood [0, 1]
   - Loss: BalancedFocalBCELoss

##  Usage

```python
from transformers import AutoTokenizer, AutoModel
from peft import PeftModel
import torch

# Load model
tokenizer = AutoTokenizer.from_pretrained("facebook/esm2_t33_650M_UR50D")
base_model = AutoModel.from_pretrained("facebook/esm2_t33_650M_UR50D")
model = PeftModel.from_pretrained(base_model, "O047/esm2_MHC-II_Reforged_Single")

# Prepare input
peptide = "AAAAAMAEQESARN"
allele = "HLA-DRB1*01:01"
context = "MAAAAAARNGGR"
sequence = f"{peptide} [SEP] {allele} [SEP] {context}"

# Tokenize
inputs = tokenizer(sequence, return_tensors="pt")

# Get embeddings
with torch.no_grad():
    outputs = model(**inputs)
    embeddings = outputs.last_hidden_state

# Load prediction heads (BA and EL)
# [Load your trained heads here]
```

##  Performance

**Metrics reported on independent test set:**
- **BA Head:** Pearson R, Spearman R, RMSE, MAE
- **EL Head:** ROC-AUC, PR-AUC, F1, MCC, Sensitivity, Specificity

*(Performance metrics will be updated upon training completion)*

##  Related Models

- **MHC-I Model:** [O047/esm2_MHC-I_Reforged_Single](https://huggingface.co/O047/esm2_MHC-I_Reforged_Single)

## Citation

If you use this model in your research, please cite:

(placeholder)


## Acknowledgments

- **ESM-2 Base Model:** Meta AI Research
- **Training Framework:** HuggingFace Transformers + PEFT
- **Data Sources:** NetMHCIIpan datasets

---

**Model Status:** In Training