--- language: - en tags: - biology - immunology - MHC-II - peptide-binding - ESM-2 - LoRA pipeline_tag: feature-extraction --- # ESM-2-powered MHC-II Ligand Elution and Binding Affinity predictor. **Fine-tuned ESM-2 650M model for MHC Class II peptide-MHC binding prediction** ## Model Overview This model predicts **MHC Class II peptide-MHC binding** using: - **Base Model:** facebook/esm2_t33_650M_UR50D (1280 hidden dim) - **Fine-tuning:** LoRA (r=16, α=32) on query/key/value/dense - **Dual Heads:** - **BA Head:** Binding affinity regression (0-1 scale) - **EL Head:** Presentation likelihood classification (binary) ## Training Data - **EL (Eluted Ligands):** ~1.5M peptide-MHC pairs - **BA (Binding Affinity):** ~30k peptide-MHC pairs with measured affinities - **Alleles:** ~200 human HLA-II alleles (DRB1, DRB3-5, DPA1, DPB1, DQA1, DQB1) ## Architecture ### Input Format ``` peptide [SEP] allele pseudosequence [SEP] context (flanking amino acids) ``` Example: ``` AAAAAMAEQESARN [SEP] QEFFIASGAAVDAIMWLFLECYDLQRATYHVGFT [SEP] MAAAAAARNGGR ``` ### MHC-II Specific Features - **Peptide Length:** 13-21 amino acids (vs 8-11 for MHC-I) - **Allele Format:** Single genes (DRB1*01:01) or heterodimers (DPA1*01:03-DPB1*02:01) - **Context Sequences:** Flanking regions from source proteins - **Pseudosequences:** 34-residue binding groove representation ### Prediction Heads 1. **BA Head (Regression)** - Input: Mean-pooled sequence embeddings - Output: Binding affinity [0, 1] - Loss: RegularizedMSELoss (with σ/μ constraints) 2. **EL Head (Classification)** - Input: Full sequence embeddings (attention-pooled) - Output: Presentation likelihood [0, 1] - Loss: BalancedFocalBCELoss ## Usage ```python from transformers import AutoTokenizer, AutoModel from peft import PeftModel import torch # Load model tokenizer = AutoTokenizer.from_pretrained("facebook/esm2_t33_650M_UR50D") base_model = AutoModel.from_pretrained("facebook/esm2_t33_650M_UR50D") model = PeftModel.from_pretrained(base_model, "O047/esm2_MHC-II_Reforged_Single") # Prepare input peptide = "AAAAAMAEQESARN" allele = "HLA-DRB1*01:01" context = "MAAAAAARNGGR" sequence = f"{peptide} [SEP] {allele} [SEP] {context}" # Tokenize inputs = tokenizer(sequence, return_tensors="pt") # Get embeddings with torch.no_grad(): outputs = model(**inputs) embeddings = outputs.last_hidden_state # Load prediction heads (BA and EL) # [Load your trained heads here] ``` ## Performance **Metrics reported on independent test set:** - **BA Head:** Pearson R, Spearman R, RMSE, MAE - **EL Head:** ROC-AUC, PR-AUC, F1, MCC, Sensitivity, Specificity *(Performance metrics will be updated upon training completion)* ## Related Models - **MHC-I Model:** [O047/esm2_MHC-I_Reforged_Single](https://huggingface.co/O047/esm2_MHC-I_Reforged_Single) ## Citation If you use this model in your research, please cite: (placeholder) ## Acknowledgments - **ESM-2 Base Model:** Meta AI Research - **Training Framework:** HuggingFace Transformers + PEFT - **Data Sources:** NetMHCIIpan datasets --- **Model Status:** In Training