| --- |
| license: cc-by-nc-4.0 |
| tags: |
| - computational-pathology |
| - survival-analysis |
| - whole-slide-imaging |
| - gene-expression |
| - oncology |
| - histopathology |
| extra_gated_prompt: | |
| These weights are released under CC BY-NC 4.0 — strictly non-commercial, |
| research and educational use only. By requesting access you agree to: |
| 1. Use the weights only for non-commercial research. |
| 2. Cite the SPARC paper in any derived publication. |
| 3. Not redistribute the weights to third parties. |
| extra_gated_fields: |
| Name: text |
| Affiliation: text |
| Email: text |
| Intended use: text |
| I agree to the non-commercial license: checkbox |
| --- |
| |
| # SPARC |
|
|
| **Gene-program-aware survival modelling from H&E whole-slide images.** |
|
|
| This repository hosts the trained model weights for the SPARC paper (Ayed, |
| Cohn, et al.). Code, configs, training scripts, and figure-regeneration |
| notebooks live at **[github.com/aziz-ayed/SPARC](https://github.com/aziz-ayed/SPARC)**. |
|
|
| SPARC is a two-stage pipeline: |
|
|
| 1. **SPARC-Map** predicts 40 hallmark gene-expression-program (GEP) scores |
| per H&E patch, recovering a *spatial* molecular map of each slide. |
| 2. **SPARC-Risk** fuses those per-patch GEP scores with the same H&E |
| features through a signature-query attention head and a cancer-aware |
| gate, producing a single per-patient risk score. |
|
|
| These weights cover the SPARC-Risk model and the image-only baseline used |
| for ablations. |
|
|
| <p align="center"> |
| <img src="https://raw.githubusercontent.com/aziz-ayed/SPARC/main/docs/figs/figure1.png" alt="SPARC pipeline" width="720"> |
| </p> |
|
|
| ## What you get |
|
|
| | Folder | Model | Description | |
| |---|---|---| |
| | `sparc_risk/` | **SPARC-Risk (canonical)** | Signature-query fusion + H&E. The model reported throughout the paper. | |
| | `image_only/` | **Image-only baseline** | Same backbone, GEP pathway disabled. Use for direct ablation against SPARC-Risk. | |
|
|
| Each folder contains 5 checkpoints — `fold_0_best.pt` through |
| `fold_4_best.pt` — corresponding to the 5-fold cross-validation splits |
| described in the paper and in |
| [`data/mmp_hybrid_splits_v2_20cancer.csv`](https://github.com/aziz-ayed/SPARC/blob/main/data/mmp_hybrid_splits_v2_20cancer.csv). |
|
|
| Every `.pt` carries both `model_state_dict` and the original training |
| `config`, so the model can be rebuilt with one line: |
|
|
| ```python |
| import torch |
| from sparc.models.factory import build_model |
| |
| ckpt = torch.load("sparc_risk/fold_0_best.pt", map_location="cpu", weights_only=False) |
| model = build_model(ckpt["config"]) |
| model.load_state_dict(ckpt["model_state_dict"]) |
| model.eval() |
| ``` |
|
|
| ## Quick start |
|
|
| ```bash |
| # 1. Install the SPARC package |
| git clone https://github.com/aziz-ayed/SPARC.git && cd SPARC |
| conda env create -f environment.yml |
| conda activate sparc |
| |
| # 2. Accept the license on https://huggingface.co/azizayed/SPARC, then: |
| pip install -U "huggingface_hub[cli]" |
| hf auth login |
| hf download azizayed/SPARC --local-dir checkpoints |
| |
| # 3. Inference on an external cohort (e.g. NLST lung) |
| python -m inference.run \ |
| --cohort nlst \ |
| --checkpoint_dir checkpoints/sparc_risk \ |
| --gpus 0,1,2,3 |
| ``` |
|
|
| The download produces: |
|
|
| ``` |
| checkpoints/ |
| ├── sparc_risk/ fold_{0..4}_best.pt |
| └── image_only/ fold_{0..4}_best.pt |
| ``` |
|
|
| ## Architecture (SPARC-Risk) |
|
|
| | Component | Setting | |
| |---|---| |
| | Image backbone | H-optimus-1 (1536-dim) | |
| | Patch size / magnification | 224 px @ 20× | |
| | Max patches per slide | 4096 | |
| | Fusion | Signature-query cross-attention (64-NN, 4 heads) | |
| | Aggregator | Gated attention MIL | |
| | Head | Discrete-time NLL survival, 4 bins | |
| | Cancer conditioning | Per-cancer learned gate | |
| | Hidden dim | 256 | |
| | Trainable params | ≈ 2.6 M | |
| | Optimiser / schedule | Adam, lr 1 × 10⁻⁴, cosine T_max 20 | |
| | Random seed | 1337 | |
| |
| Full config + reproduction recipe: [`configs/sparc_risk.yaml`](https://github.com/aziz-ayed/SPARC/blob/main/configs/sparc_risk.yaml). |
| |
| ## Training data |
| |
| 5-fold patient-level cross-validation over **20 TCGA cancer types** |
| (BLCA, BRCA, CESC, COAD, ESCA, GBM, HNSC, KIRC, KIRP, LGG, LIHC, LUAD, |
| LUSC, PAAD, READ, SARC, SKCM, STAD, UCEC, plus a held-out evaluation |
| split). Splits derive from the MMP hybrid scheme of Mahmood et al. and |
| are released alongside the code at |
| [`data/mmp_hybrid_splits_v2_20cancer.csv`](https://github.com/aziz-ayed/SPARC/blob/main/data/mmp_hybrid_splits_v2_20cancer.csv). |
| |
| External validation cohorts (not used for training) — NLST lung, SurGen |
| CRC, Yale breast, ovarian — are described in the paper. |
| |
| ## Intended use |
| |
| These weights are intended for **non-commercial biomedical research and |
| education only**. Acceptable uses include: |
| |
| - Reproducing the SPARC paper's results. |
| - Benchmarking against SPARC-Risk in computational-pathology research. |
| - Methodological extensions (new fusion designs, additional cohorts, |
| ablation studies). |
| |
| ## Citation |
| |
| The SPARC paper is currently under review. Once a preprint or accepted |
| version is available, a BibTeX entry will be added here. In the meantime, |
| if you use these weights, please link back to |
| [github.com/aziz-ayed/SPARC](https://github.com/aziz-ayed/SPARC) and |
| contact the corresponding author at <azizayed@mit.edu>. |
| |
| ## License |
| |
| These weights are released under the |
| [Creative Commons Attribution-NonCommercial 4.0 International License |
| (CC BY-NC 4.0)](https://creativecommons.org/licenses/by-nc/4.0/). For |
| commercial licensing, please contact the authors via the corresponding |
| GitHub issues page. |
| |