O047's picture
Update README.md
0ea3e14 verified
---
language:
- en
tags:
- biology
- immunology
- MHC-II
- peptide-binding
- ESM-2
- LoRA
pipeline_tag: feature-extraction
---
# ESM-2-powered MHC-II Ligand Elution and Binding Affinity predictor.
**Fine-tuned ESM-2 650M model for MHC Class II peptide-MHC binding prediction**
## Model Overview
This model predicts **MHC Class II peptide-MHC binding** using:
- **Base Model:** facebook/esm2_t33_650M_UR50D (1280 hidden dim)
- **Fine-tuning:** LoRA (r=16, α=32) on query/key/value/dense
- **Dual Heads:**
- **BA Head:** Binding affinity regression (0-1 scale)
- **EL Head:** Presentation likelihood classification (binary)
## Training Data
- **EL (Eluted Ligands):** ~1.5M peptide-MHC pairs
- **BA (Binding Affinity):** ~30k peptide-MHC pairs with measured affinities
- **Alleles:** ~200 human HLA-II alleles (DRB1, DRB3-5, DPA1, DPB1, DQA1, DQB1)
## Architecture
### Input Format
```
peptide [SEP] allele pseudosequence [SEP] context (flanking amino acids)
```
Example:
```
AAAAAMAEQESARN [SEP] QEFFIASGAAVDAIMWLFLECYDLQRATYHVGFT [SEP] MAAAAAARNGGR
```
### MHC-II Specific Features
- **Peptide Length:** 13-21 amino acids (vs 8-11 for MHC-I)
- **Allele Format:** Single genes (DRB1*01:01) or heterodimers (DPA1*01:03-DPB1*02:01)
- **Context Sequences:** Flanking regions from source proteins
- **Pseudosequences:** 34-residue binding groove representation
### Prediction Heads
1. **BA Head (Regression)**
- Input: Mean-pooled sequence embeddings
- Output: Binding affinity [0, 1]
- Loss: RegularizedMSELoss (with σ/μ constraints)
2. **EL Head (Classification)**
- Input: Full sequence embeddings (attention-pooled)
- Output: Presentation likelihood [0, 1]
- Loss: BalancedFocalBCELoss
## Usage
```python
from transformers import AutoTokenizer, AutoModel
from peft import PeftModel
import torch
# Load model
tokenizer = AutoTokenizer.from_pretrained("facebook/esm2_t33_650M_UR50D")
base_model = AutoModel.from_pretrained("facebook/esm2_t33_650M_UR50D")
model = PeftModel.from_pretrained(base_model, "O047/esm2_MHC-II_Reforged_Single")
# Prepare input
peptide = "AAAAAMAEQESARN"
allele = "HLA-DRB1*01:01"
context = "MAAAAAARNGGR"
sequence = f"{peptide} [SEP] {allele} [SEP] {context}"
# Tokenize
inputs = tokenizer(sequence, return_tensors="pt")
# Get embeddings
with torch.no_grad():
outputs = model(**inputs)
embeddings = outputs.last_hidden_state
# Load prediction heads (BA and EL)
# [Load your trained heads here]
```
## Performance
**Metrics reported on independent test set:**
- **BA Head:** Pearson R, Spearman R, RMSE, MAE
- **EL Head:** ROC-AUC, PR-AUC, F1, MCC, Sensitivity, Specificity
*(Performance metrics will be updated upon training completion)*
## Related Models
- **MHC-I Model:** [O047/esm2_MHC-I_Reforged_Single](https://huggingface.co/O047/esm2_MHC-I_Reforged_Single)
## Citation
If you use this model in your research, please cite:
(placeholder)
## Acknowledgments
- **ESM-2 Base Model:** Meta AI Research
- **Training Framework:** HuggingFace Transformers + PEFT
- **Data Sources:** NetMHCIIpan datasets
---
**Model Status:** In Training