SPEARMINT: Stability Prediction of Epitopes with Assay Recalibration using MINT

Predicts the stability (complex half-life in hours) of peptide-MHC class I (pMHC-I) complexes, conditioned on assay type and temperature via Feature-wise Linear Modulation (FiLM).

Model Description

This model (SPEARMINT) is an assay- and temperature-conditioned binding stability (half-life) prediction model of peptide-MHC class I (pMHC-I) complexes given peptide and full MHC-I sequences. It is a fine-tuned variant of the MINT model introduced in Ullanat et al. 2026, an ESM2-650M with cross-chain multimer attention and pretrained on PPIs. Stability predictions are made by mean-pooling the MINT embeddings, projecting them with a light trunk (Linear → ReLU), and modulating the result with a feature-wise linear modulation (FiLM) head conditioned on the assay type and temperature before a scalar readout. It builds on the Stage 2 stability model by freezing the entire backbone and training only the conditioning head on multi-assay pMHC-I stability data (~8.5K training samples), and returns predicted half-life in log1p(hours) scale, conditioned on a chosen assay and temperature. The assay/temperature conditioning lets a single model produce predictions consistent with a specified measurement modality, recalibrating the systematic shifts that exist between assays. This is the Stage 3 (flagship) model in the MINT stability pipeline, initialized from the Stage 2 stability model. It is released along with this paper.

Supported Assay Types

Index	Assay	Description
0	SPA	Scintillation proximity assay
1	Purified_Fluor	Purified MHC fluorescence assay
2	Cellular_Fluor	Cell-surface fluorescence assay
3	Other	Other / unknown assay type

Intended uses & limitations

This is a research model for predicting peptide–MHC class I binding stability (half-life) conditioned on a chosen assay type and temperature, intended for applications such as epitope and neoantigen prioritization and antigen-presentation modeling, where harmonizing stability estimates across heterogeneous assays is useful. The model is calibrated to the four assay types it was trained on (SPA, Purified_Fluor, Cellular_Fluor, and Other) and to temperatures of 25 °C and 37 °C, across roughly 34 HLA class I alleles. Conditioning is most reliable for the well-represented conditions (SPA and Purified_Fluor); Cellular_Fluor and Other are sparsely represented in training, and genuinely novel assays should be mapped to Other. The temperature encoder is continuous, so it can interpolate between the observed temperatures, but values far outside the 25–37 °C range are extrapolations. Likewise, predictions for alleles outside the training distribution, or for complexes far longer-lived than the observed range, should be treated with caution. Outputs are model estimates in log1p(hours) and are best interpreted as a relative ranking signal rather than an absolute half-life. Finally, stability is only one biophysical axis and does not by itself predict antigen presentation or T-cell immunogenicity.

Usage

import math
import torch
from transformers import AutoModel

# Load model
model = AutoModel.from_pretrained("dkarthikeyan1/spearmint", trust_remote_code=True)
model.eval()

# Tokenize a peptide-MHC pair with assay + temperature
from transformers.dynamic_module_utils import get_class_from_dynamic_module
SpearmintTokenizer = get_class_from_dynamic_module(
    "modeling_spearmint.SpearmintTokenizer",
    "dkarthikeyan1/spearmint",
    trust_remote_code=True,
)
tokenizer = SpearmintTokenizer()
peptide = "GILGFVFTL"
mhc_sequence = "MAVMAPRTLLLLLSGALALTQTWAG..."  # full MHC-I heavy chain sequence

chains, chain_ids, assay_idxs, temp_floats = tokenizer.prepare_input(
    peptide, mhc_sequence,
    assay="SPA",            # one of: SPA, Purified_Fluor, Cellular_Fluor, Other
    temperature_c=37.0,     # temperature in Celsius
)
chains = chains.unsqueeze(0)        # add batch dim
chain_ids = chain_ids.unsqueeze(0)

# Predict
with torch.no_grad():
    output = model(chains, chain_ids, assay_idxs, temp_floats)
    log_pred = output["logits"].item()              # model outputs log1p(half-life in hours)
    predicted_halflife_hours = math.expm1(log_pred)

print(f"Predicted half-life: {predicted_halflife_hours:.2f} hours")

Batch Inference

import torch
device = "cuda" if torch.cuda.is_available() else "cpu"
model = model.to(device)

peptides = ["GILGFVFTL", "NLVPMVATV"]
mhc_sequences = ["MAVMAPRTL...", "MAVMAPRTL..."]  # full sequences
assays = ["SPA", "Cellular_Fluor"]
temperatures_c = [37.0, 25.0]

chains, chain_ids, assay_idxs, temp_floats = tokenizer.prepare_batch(
    peptides, mhc_sequences, assays=assays, temperatures_c=temperatures_c,
)
with torch.no_grad():
    output = model(chains.to(device), chain_ids.to(device), assay_idxs.to(device), temp_floats.to(device))
    predictions_hours = torch.expm1(output["logits"].squeeze(-1))   # half-life in hours, shape (batch,)

Default Behavior (no metadata)

If assay/temperature are not provided, the model defaults to SPA at 37C:

# These are equivalent:
output = model(chains, chain_ids)
output = model(chains, chain_ids,
               assay_idxs=torch.tensor([0]),
               temp_floats=torch.tensor([37.0]))

Input Format

Peptide: Standard amino acid sequence (8-15 residues)
MHC sequence: Full MHC class I heavy chain sequence (~365 residues), NOT pseudo-sequences
Assay: One of "SPA", "Purified_Fluor", "Cellular_Fluor", "Other"
Temperature: Float in Celsius (typically 25.0 or 37.0)
The tokenizer handles concatenation, special tokens (<cls>, <eos>), and chain ID assignment (peptide=0, MHC=1)

Architecture Details

Parameter	Value
Backbone	ESM2-650M (33 layers, 1280 dim, 20 heads)
Multimer attention	Yes (cross-chain)
Projection hidden dim	512
Assay embedding	4 types, dim=32
Temperature encoder	Linear(1, 8)
FiLM MLP	Linear(552, 512) + ReLU + Linear(512, 1024)
FiLM init	gamma=1, beta=0 (identity)
Post-FiLM dropout	0.0
Label transform	`log1p(half_life_hours)`
Output	Scalar (log1p scale, unbounded)
Total parameters	~814.8M

Training Procedure

Preprocessing

Amino acids were standardized to fit the ESM-2 tokenizer. MHC allele information was standardized using mhcgnomes, before mapping allele information to the consensus HLA as found in IMGT.

Pre-training

MINT (Ullanat et al. 2026) was pretrained on 96 million physical protein–protein interactions from the STRING database (v12.0), clustered at 50% sequence identity to reduce redundancy, using a masked language modeling objective (15% token masking) with interaction-aware supervision. Operationally, the model receives concatenated protein pair sequences with chain ID labels, enabling the cross-chain attention heads to learn interaction-specific representations. The resulting checkpoint (mint.ckpt) serves as the initialization for all downstream fine-tuning stages.

Finetuning

Stage 3 (SPEARMINT) builds on the frozen Stage 2 stability model: the entire MINT backbone is held fixed, and only a lightweight conditioning head is trained. The pooled, projected representation is modulated by a FiLM (feature-wise linear modulation) layer conditioned on an assay-type embedding and a continuous temperature encoder, then read out to a scalar; the head is identity-initialized so that it recovers the Stage 2 prediction at the start of training. Optimization uses a Huber loss (δ=1.0) against log1p(half-life hours) labels, with AdamW, gradient-norm clipping (1.0), and a ReduceLROnPlateau schedule. Only ~1.46 M of ~814.8 M parameters (0.18%) are trainable — the assay embedding, temperature encoder, FiLM MLP, projection, and readout. Full hyperparameters are in the manuscript and its accompanying Supplementary Information (SI).

Citation

@article{dkarthikeyan2026stability,
    author = {Karthikeyan, Dhuvarakesh and Vincent, Benjamin and Rubinsteyn, Alexander},
    title = {Peptide:MHC Binding Stability Prediction Using Protein Language Models},
    elocation-id = {2026.06.28.735023},
    year = {2026},
    doi = {10.64898/2026.06.28.735023},
    publisher = {Cold Spring Harbor Laboratory},
    abstract = {Peptide:MHC class I (pMHC-I) binding stability governs the persistence of antigenic complexes at the cell surface and plays a key role in facilitating downstream immunological signals such as antigen presentation, T-cell activation, and immunodominance. However, methods for in silico stability prediction remain underexplored relative to binding affinity prediction, in part because available half-life datasets are sparse and expensive to collect. Here, we perform a systematic reassessment of pMHC-I stability prediction using controlled, similarity-aware data splits and apply a recently introduced supervised transfer-learning strategy to MINT, an interaction-aware protein language model, pretrained on binding affinity and fine-tuned for quantitative half-life prediction. We show that MINT improves stability prediction over standard ESM-2 representations and existing predictors, and that assay-conditioned recalibration corrects systematic shifts across experimental measurement modalities. Across eluted ligand, immunogenicity, and personalized neoantigen prioritization benchmarks, predicted stability provides signal beyond binding affinity, enriching for naturally presented and immunogenic peptides within affinity-filtered candidate sets. These results establish pMHC-I half-life as an orthogonal and transferable biophysical signal connecting peptide binding, surface presentation, and T-cell recognition, and provide a leakage-aware, assay-aware framework for future antigen-presentation modeling.Competing Interest StatementThe authors have declared no competing interest.},
    URL = {https://www.biorxiv.org/content/early/2026/06/29/2026.06.28.735023},
    eprint = {https://www.biorxiv.org/content/early/2026/06/29/2026.06.28.735023.full.pdf},
    journal = {bioRxiv}
}