Olink Inflammation Aging Clock (92 proteins)

A lightweight plasma-protein aging clock that predicts chronological age from 92 Olink Inflammation-panel proteins. The model is a TabM† student distilled from a TabPFN v2 teacher, so it runs at inference without any TabPFN dependency (small, DUA-friendly artifacts).

Trained on the MARS cohort (Minority Aging Research Study).

Cohort Platform Proteins N (persons) Teacher R² Student R² Gap (mean ± sd)
MARS Olink (Target 96 Inflammation) 92 642 (642) 0.348 0.346 0.002 ± 0.014

(10-fold person-grouped CV, R². Student fidelity-RMSE ≈ 0.066 in scaled-y.) The TabM† student reproduces its TabPFN v2 teacher to within ~0.002 R² — inside fold-to-fold noise.

Files

predict.py                       standalone inference script
meta.json                        feature order, y-scaler, arch + preproc config
models/
  T0001_model.pkl                slim model (~4 KB)
  T0001_student_tabm.pt          TabM† weights (~6 MB) + quantile bins
  T0001_student_tabm_preproc.npz median-impute / observed-mask state
results/, *_results.csv          aggregate CV metrics summaries

Target T0001 = age_at_visit (chronological age at the visit).

Usage

predict.py runs the model with only public dependencies:

pip install torch tabm rtdl_num_embeddings numpy pandas

python predict.py --input proteins.csv --output ages.csv

--input is a CSV/TSV with one row per sample and one column per protein, named exactly as in meta.jsonfeature_name (the 92 Olink OID####_GENE ids). Column order does not matter; an optional sample_id column is carried through. Output is sample_id, predicted_age. NaN cells are median-imputed. The script errors if any required protein is missing and warns about unused input columns.

Method

  • Teacher: TabPFN v2.
  • Student: TabM† distilled on a GMM-augmented transfer set (target 10,000 rows) with a 5-quantile pinball regression loss.
  • CV: 10-fold person-grouped (1 sample = 1 person in MARS).

Inference contract

Order the proteins by meta.json → feature_name; median-impute and append observed-mask channels; rebuild the TabM model with piecewise-linear numeric embeddings and load the weights; average the 5-quantile (trapezoidal-mean) point estimate over the 32 ensemble members; inverse the y-scaler to recover age.

Citation

Inflammatory aging clock for plasma inflammatory proteins (MARS). Manuscript in preparation. See also the SomaScan companion model: inflammatory-aging-clock/somascan-85.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support