| --- |
| language: |
| - en |
| tags: |
| - biology |
| - immunology |
| - MHC-II |
| - peptide-binding |
| - ESM-2 |
| - LoRA |
| pipeline_tag: feature-extraction |
| --- |
| |
| # ESM-2-powered MHC-II Ligand Elution and Binding Affinity predictor. |
|
|
| **Fine-tuned ESM-2 650M model for MHC Class II peptide-MHC binding prediction** |
|
|
| ## Model Overview |
|
|
| This model predicts **MHC Class II peptide-MHC binding** using: |
| - **Base Model:** facebook/esm2_t33_650M_UR50D (1280 hidden dim) |
| - **Fine-tuning:** LoRA (r=16, α=32) on query/key/value/dense |
| - **Dual Heads:** |
| - **BA Head:** Binding affinity regression (0-1 scale) |
| - **EL Head:** Presentation likelihood classification (binary) |
| |
| ## Training Data |
| |
| - **EL (Eluted Ligands):** ~1.5M peptide-MHC pairs |
| - **BA (Binding Affinity):** ~30k peptide-MHC pairs with measured affinities |
| - **Alleles:** ~200 human HLA-II alleles (DRB1, DRB3-5, DPA1, DPB1, DQA1, DQB1) |
| |
| ## Architecture |
| |
| ### Input Format |
| ``` |
| peptide [SEP] allele pseudosequence [SEP] context (flanking amino acids) |
| ``` |
| |
| Example: |
| ``` |
| AAAAAMAEQESARN [SEP] QEFFIASGAAVDAIMWLFLECYDLQRATYHVGFT [SEP] MAAAAAARNGGR |
| ``` |
| |
| ### MHC-II Specific Features |
| - **Peptide Length:** 13-21 amino acids (vs 8-11 for MHC-I) |
| - **Allele Format:** Single genes (DRB1*01:01) or heterodimers (DPA1*01:03-DPB1*02:01) |
| - **Context Sequences:** Flanking regions from source proteins |
| - **Pseudosequences:** 34-residue binding groove representation |
| |
| ### Prediction Heads |
| 1. **BA Head (Regression)** |
| - Input: Mean-pooled sequence embeddings |
| - Output: Binding affinity [0, 1] |
| - Loss: RegularizedMSELoss (with σ/μ constraints) |
| |
| 2. **EL Head (Classification)** |
| - Input: Full sequence embeddings (attention-pooled) |
| - Output: Presentation likelihood [0, 1] |
| - Loss: BalancedFocalBCELoss |
| |
| ## Usage |
| |
| ```python |
| from transformers import AutoTokenizer, AutoModel |
| from peft import PeftModel |
| import torch |
| |
| # Load model |
| tokenizer = AutoTokenizer.from_pretrained("facebook/esm2_t33_650M_UR50D") |
| base_model = AutoModel.from_pretrained("facebook/esm2_t33_650M_UR50D") |
| model = PeftModel.from_pretrained(base_model, "O047/esm2_MHC-II_Reforged_Single") |
| |
| # Prepare input |
| peptide = "AAAAAMAEQESARN" |
| allele = "HLA-DRB1*01:01" |
| context = "MAAAAAARNGGR" |
| sequence = f"{peptide} [SEP] {allele} [SEP] {context}" |
| |
| # Tokenize |
| inputs = tokenizer(sequence, return_tensors="pt") |
|
|
| # Get embeddings |
| with torch.no_grad(): |
| outputs = model(**inputs) |
| embeddings = outputs.last_hidden_state |
| |
| # Load prediction heads (BA and EL) |
| # [Load your trained heads here] |
| ``` |
| |
| ## Performance |
| |
| **Metrics reported on independent test set:** |
| - **BA Head:** Pearson R, Spearman R, RMSE, MAE |
| - **EL Head:** ROC-AUC, PR-AUC, F1, MCC, Sensitivity, Specificity |
| |
| *(Performance metrics will be updated upon training completion)* |
| |
| ## Related Models |
| |
| - **MHC-I Model:** [O047/esm2_MHC-I_Reforged_Single](https://huggingface.co/O047/esm2_MHC-I_Reforged_Single) |
| |
| ## Citation |
| |
| If you use this model in your research, please cite: |
| |
| (placeholder) |
| |
| |
| ## Acknowledgments |
| |
| - **ESM-2 Base Model:** Meta AI Research |
| - **Training Framework:** HuggingFace Transformers + PEFT |
| - **Data Sources:** NetMHCIIpan datasets |
| |
| --- |
| |
| **Model Status:** In Training |
| |
| |