Olink Inflammation Aging Clock (92 proteins)
A lightweight plasma-protein aging clock that predicts chronological age from 92 Olink Inflammation-panel proteins. The model is a TabM† student distilled from a TabPFN v2 teacher, so it runs at inference without any TabPFN dependency (small, DUA-friendly artifacts).
Trained on the MARS cohort (Minority Aging Research Study).
| Cohort | Platform | Proteins | N (persons) | Teacher R² | Student R² | Gap (mean ± sd) |
|---|---|---|---|---|---|---|
| MARS | Olink (Target 96 Inflammation) | 92 | 642 (642) | 0.348 | 0.346 | 0.002 ± 0.014 |
(10-fold person-grouped CV, R². Student fidelity-RMSE ≈ 0.066 in scaled-y.) The TabM† student reproduces its TabPFN v2 teacher to within ~0.002 R² — inside fold-to-fold noise.
Files
predict.py standalone inference script
meta.json feature order, y-scaler, arch + preproc config
models/
T0001_model.pkl slim model (~4 KB)
T0001_student_tabm.pt TabM† weights (~6 MB) + quantile bins
T0001_student_tabm_preproc.npz median-impute / observed-mask state
results/, *_results.csv aggregate CV metrics summaries
Target T0001 = age_at_visit (chronological age at the visit).
Usage
predict.py runs the model with only public dependencies:
pip install torch tabm rtdl_num_embeddings numpy pandas
python predict.py --input proteins.csv --output ages.csv
--input is a CSV/TSV with one row per sample and one column per protein, named
exactly as in meta.json → feature_name (the 92 Olink OID####_GENE ids).
Column order does not matter; an optional sample_id column is carried through.
Output is sample_id, predicted_age. NaN cells are median-imputed. The script
errors if any required protein is missing and warns about unused input columns.
Method
- Teacher: TabPFN v2.
- Student: TabM† distilled on a GMM-augmented transfer set (target 10,000 rows) with a 5-quantile pinball regression loss.
- CV: 10-fold person-grouped (1 sample = 1 person in MARS).
Inference contract
Order the proteins by meta.json → feature_name; median-impute and append
observed-mask channels; rebuild the TabM model with piecewise-linear numeric
embeddings and load the weights; average the 5-quantile (trapezoidal-mean) point
estimate over the 32 ensemble members; inverse the y-scaler to recover age.
Citation
Inflammatory aging clock for plasma inflammatory proteins (MARS). Manuscript in
preparation. See also the SomaScan companion model:
inflammatory-aging-clock/somascan-85.