O047
/

esm2_MHC-II_Reforged_Single

Feature Extraction

peptide-binding

Model card Files Files and versions

esm2_MHC-II_Reforged_Single / README.md

O047's picture

Update README.md

0ea3e14 verified about 1 month ago

|

history blame contribute delete

3.13 kB

	---
	language:
	- en
	tags:
	- biology
	- immunology
	- MHC-II
	- peptide-binding
	- ESM-2
	- LoRA
	pipeline_tag: feature-extraction
	---

	# ESM-2-powered MHC-II Ligand Elution and Binding Affinity predictor.

	Fine-tuned ESM-2 650M model for MHC Class II peptide-MHC binding prediction

	## Model Overview

	This model predicts MHC Class II peptide-MHC binding using:
	- Base Model: facebook/esm2_t33_650M_UR50D (1280 hidden dim)
	- Fine-tuning: LoRA (r=16, α=32) on query/key/value/dense
	- Dual Heads:
	- BA Head: Binding affinity regression (0-1 scale)
	- EL Head: Presentation likelihood classification (binary)

	## Training Data

	- EL (Eluted Ligands): ~1.5M peptide-MHC pairs
	- BA (Binding Affinity): ~30k peptide-MHC pairs with measured affinities
	- Alleles: ~200 human HLA-II alleles (DRB1, DRB3-5, DPA1, DPB1, DQA1, DQB1)

	## Architecture

	### Input Format
	```
	peptide [SEP] allele pseudosequence [SEP] context (flanking amino acids)
	```

	Example:
	```
	AAAAAMAEQESARN [SEP] QEFFIASGAAVDAIMWLFLECYDLQRATYHVGFT [SEP] MAAAAAARNGGR
	```

	### MHC-II Specific Features
	- Peptide Length: 13-21 amino acids (vs 8-11 for MHC-I)
	- Allele Format: Single genes (DRB101:01) or heterodimers (DPA101:03-DPB1*02:01)
	- Context Sequences: Flanking regions from source proteins
	- Pseudosequences: 34-residue binding groove representation

	### Prediction Heads
	1. BA Head (Regression)
	- Input: Mean-pooled sequence embeddings
	- Output: Binding affinity [0, 1]
	- Loss: RegularizedMSELoss (with σ/μ constraints)

	2. EL Head (Classification)
	- Input: Full sequence embeddings (attention-pooled)
	- Output: Presentation likelihood [0, 1]
	- Loss: BalancedFocalBCELoss

	## Usage

	```python
	from transformers import AutoTokenizer, AutoModel
	from peft import PeftModel
	import torch

	# Load model
	tokenizer = AutoTokenizer.from_pretrained("facebook/esm2_t33_650M_UR50D")
	base_model = AutoModel.from_pretrained("facebook/esm2_t33_650M_UR50D")
	model = PeftModel.from_pretrained(base_model, "O047/esm2_MHC-II_Reforged_Single")

	# Prepare input
	peptide = "AAAAAMAEQESARN"
	allele = "HLA-DRB1*01:01"
	context = "MAAAAAARNGGR"
	sequence = f"{peptide} [SEP] {allele} [SEP] {context}"

	# Tokenize
	inputs = tokenizer(sequence, return_tensors="pt")

	# Get embeddings
	with torch.no_grad():
	outputs = model(**inputs)
	embeddings = outputs.last_hidden_state

	# Load prediction heads (BA and EL)
	# [Load your trained heads here]
	```

	## Performance

	Metrics reported on independent test set:
	- BA Head: Pearson R, Spearman R, RMSE, MAE
	- EL Head: ROC-AUC, PR-AUC, F1, MCC, Sensitivity, Specificity

	(Performance metrics will be updated upon training completion)

	## Related Models

	- MHC-I Model: [O047/esm2_MHC-I_Reforged_Single](https://huggingface.co/O047/esm2_MHC-I_Reforged_Single)

	## Citation

	If you use this model in your research, please cite:

	(placeholder)


	## Acknowledgments

	- ESM-2 Base Model: Meta AI Research
	- Training Framework: HuggingFace Transformers + PEFT
	- Data Sources: NetMHCIIpan datasets

	---

	Model Status: In Training