scFATE / REPRODUCE.md
farhan-ahmad's picture
Initial NeurIPS 2026 release: backbones, flow heads, SciPlex3 Path-B teachers/students, paper-table result JSONs
6d96029 verified
# scFATE NeurIPS 2026 β€” Reproduce the Paper
This directory ships every checkpoint behind the paper's Table 1 + a `reproduce.sh`
for each run. All 31 paper-headline runs are listed below.
## 0. Prerequisites
```bash
git clone https://huggingface.co/Angione-Lab/scFATE
cd scFATE/code # source code is in scfate-code submodule
uv venv && uv pip install -e . # or pip install -r requirements.txt
```
Then download datasets:
```bash
huggingface-cli download Angione-Lab/scFATE-datasets --local-dir datasets/scFATE/processed --repo-type dataset
```
## 1. Dependency graph
```
backbone (rotation autoencoder) β€” hf-assets/checkpoints/<dataset>/
└─→ flow head (s1, s2, s3) β€” runs/<dataset>_flow_*/flow_best.pt
└─→ reflow K=2 (K562) β€” runs/*_reflow_K2_*_s1/flow_best.pt
└─→ teachers Γ—18 (SciPlex3) β€” runs/*_priorkrr_V2B_s{1..9}, *_priornone_V2B_s{1..9}
└─→ student Γ—7 β€” runs/*_reflow_ensemble_mixed18_K16_V2B_s{1..7}
```
## 2. Per-run reproduction
Each `runs/<run_dir>/` contains:
- `flow_best.pt` β€” checkpoint with embedded hparams (load via `torch.load`, look at top-level keys or `ckpt['hparams']`)
- `config.json` β€” extracted hparams + result-JSON pointer + dataset path
- `reproduce.sh` β€” exact training command, ready to run
- `flow_metrics.jsonl` β€” training trajectory
- `krr_prior.pkl` β€” KRR-init prior (if `prior=krr`)
## 3. Paper-headline runs
| Paper row | Dataset | Run dir | Result JSON | Reproduce |
|---|---|---|---|---|
| Norman seed 1 | CRISPRa Norman | `runs/b200_norman_flow_e115_krrinit_s02_mask_30k` | `experiments/results/fair_comparison/norman_rotation_vs_direct__flow__b200_norman_flow_e115_krrinit_s02_mask_30k_K128.json` | `bash runs/b200_norman_flow_e115_krrinit_s02_mask_30k/reproduce.sh` |
| Norman seed 2 | CRISPRa Norman | `runs/b200_norman_flow_e115_krrinit_s02_mask_30k_seed2` | `experiments/results/fair_comparison/norman_rotation_vs_direct__flow__b200_norman_flow_e115_krrinit_s02_mask_30k_seed2.json` | `bash runs/b200_norman_flow_e115_krrinit_s02_mask_30k_seed2/reproduce.sh` |
| Norman seed 3 | CRISPRa Norman | `runs/b200_norman_flow_e115_krrinit_s02_mask_30k_seed3` | `experiments/results/fair_comparison/norman_rotation_vs_direct__flow__b200_norman_flow_e115_krrinit_s02_mask_30k_seed3.json` | `bash runs/b200_norman_flow_e115_krrinit_s02_mask_30k_seed3/reproduce.sh` |
| RPE1 seed 1 | Replogle RPE1 | `runs/b200_rpe1_flow_block_krrinit_mask_30k_s1` | `experiments/results/fair_comparison/rpe1_rotation_vs_direct__flow__b200_rpe1_flow_block_krrinit_mask_30k_s1_rpe1_block_K128.json` | `bash runs/b200_rpe1_flow_block_krrinit_mask_30k_s1/reproduce.sh` |
| K562 base flow (teacher for reflow) | Replogle K562 | `runs/b200_k562_flow_bs2048_krrinit_mask_30k` | `experiments/results/fair_comparison/replogle_rotation_vs_direct__flow__b200_k562_flow_bs2048_krrinit_mask_30k_K128.json` | `bash runs/b200_k562_flow_bs2048_krrinit_mask_30k/reproduce.sh` |
| K562 reflow K=2 (paper headline 81.2) | Replogle K562 | `runs/b200_k562_flow_bs2048_krrinit_mask_30k_reflow_K2_nomask_bracket1ep0_s1` | `experiments/results/fair_comparison/replogle_rotation_vs_direct__flow__b200_k562_flow_bs2048_krrinit_mask_30k_reflow_K2_nomask_bracket1ep0_s1_reflow_K2_bracket1p0_ens5seed_sigmainf0p15_antithetic_Kper128.json` | `bash runs/b200_k562_flow_bs2048_krrinit_mask_30k_reflow_K2_nomask_bracket1ep0_s1/reproduce.sh` |
| SciPlex3 priornone teacher s1 | SciPlex3 | `runs/b200_sciplex3_delta_flow_mv_v2_sig0p3_priornone_V2B_s1` | β€” | `bash runs/b200_sciplex3_delta_flow_mv_v2_sig0p3_priornone_V2B_s1/reproduce.sh` |
| SciPlex3 priorkrr teacher s1 | SciPlex3 | `runs/b200_sciplex3_delta_flow_mv_v2_sig0p3_priorkrr_V2B_s1` | β€” | `bash runs/b200_sciplex3_delta_flow_mv_v2_sig0p3_priorkrr_V2B_s1/reproduce.sh` |
| SciPlex3 priornone teacher s2 | SciPlex3 | `runs/b200_sciplex3_delta_flow_mv_v2_sig0p3_priornone_V2B_s2` | β€” | `bash runs/b200_sciplex3_delta_flow_mv_v2_sig0p3_priornone_V2B_s2/reproduce.sh` |
| SciPlex3 priorkrr teacher s2 | SciPlex3 | `runs/b200_sciplex3_delta_flow_mv_v2_sig0p3_priorkrr_V2B_s2` | β€” | `bash runs/b200_sciplex3_delta_flow_mv_v2_sig0p3_priorkrr_V2B_s2/reproduce.sh` |
| SciPlex3 priornone teacher s3 | SciPlex3 | `runs/b200_sciplex3_delta_flow_mv_v2_sig0p3_priornone_V2B_s3` | β€” | `bash runs/b200_sciplex3_delta_flow_mv_v2_sig0p3_priornone_V2B_s3/reproduce.sh` |
| SciPlex3 priorkrr teacher s3 | SciPlex3 | `runs/b200_sciplex3_delta_flow_mv_v2_sig0p3_priorkrr_V2B_s3` | β€” | `bash runs/b200_sciplex3_delta_flow_mv_v2_sig0p3_priorkrr_V2B_s3/reproduce.sh` |
| SciPlex3 priornone teacher s4 | SciPlex3 | `runs/b200_sciplex3_delta_flow_mv_v2_sig0p3_priornone_V2B_s4` | β€” | `bash runs/b200_sciplex3_delta_flow_mv_v2_sig0p3_priornone_V2B_s4/reproduce.sh` |
| SciPlex3 priorkrr teacher s4 | SciPlex3 | `runs/b200_sciplex3_delta_flow_mv_v2_sig0p3_priorkrr_V2B_s4` | β€” | `bash runs/b200_sciplex3_delta_flow_mv_v2_sig0p3_priorkrr_V2B_s4/reproduce.sh` |
| SciPlex3 priornone teacher s5 | SciPlex3 | `runs/b200_sciplex3_delta_flow_mv_v2_sig0p3_priornone_V2B_s5` | β€” | `bash runs/b200_sciplex3_delta_flow_mv_v2_sig0p3_priornone_V2B_s5/reproduce.sh` |
| SciPlex3 priorkrr teacher s5 | SciPlex3 | `runs/b200_sciplex3_delta_flow_mv_v2_sig0p3_priorkrr_V2B_s5` | β€” | `bash runs/b200_sciplex3_delta_flow_mv_v2_sig0p3_priorkrr_V2B_s5/reproduce.sh` |
| SciPlex3 priornone teacher s6 | SciPlex3 | `runs/b200_sciplex3_delta_flow_mv_v2_sig0p3_priornone_V2B_s6` | β€” | `bash runs/b200_sciplex3_delta_flow_mv_v2_sig0p3_priornone_V2B_s6/reproduce.sh` |
| SciPlex3 priorkrr teacher s6 | SciPlex3 | `runs/b200_sciplex3_delta_flow_mv_v2_sig0p3_priorkrr_V2B_s6` | β€” | `bash runs/b200_sciplex3_delta_flow_mv_v2_sig0p3_priorkrr_V2B_s6/reproduce.sh` |
| SciPlex3 priornone teacher s7 | SciPlex3 | `runs/b200_sciplex3_delta_flow_mv_v2_sig0p3_priornone_V2B_s7` | β€” | `bash runs/b200_sciplex3_delta_flow_mv_v2_sig0p3_priornone_V2B_s7/reproduce.sh` |
| SciPlex3 priorkrr teacher s7 | SciPlex3 | `runs/b200_sciplex3_delta_flow_mv_v2_sig0p3_priorkrr_V2B_s7` | β€” | `bash runs/b200_sciplex3_delta_flow_mv_v2_sig0p3_priorkrr_V2B_s7/reproduce.sh` |
| SciPlex3 priornone teacher s8 | SciPlex3 | `runs/b200_sciplex3_delta_flow_mv_v2_sig0p3_priornone_V2B_s8` | β€” | `bash runs/b200_sciplex3_delta_flow_mv_v2_sig0p3_priornone_V2B_s8/reproduce.sh` |
| SciPlex3 priorkrr teacher s8 | SciPlex3 | `runs/b200_sciplex3_delta_flow_mv_v2_sig0p3_priorkrr_V2B_s8` | β€” | `bash runs/b200_sciplex3_delta_flow_mv_v2_sig0p3_priorkrr_V2B_s8/reproduce.sh` |
| SciPlex3 priornone teacher s9 | SciPlex3 | `runs/b200_sciplex3_delta_flow_mv_v2_sig0p3_priornone_V2B_s9` | β€” | `bash runs/b200_sciplex3_delta_flow_mv_v2_sig0p3_priornone_V2B_s9/reproduce.sh` |
| SciPlex3 priorkrr teacher s9 | SciPlex3 | `runs/b200_sciplex3_delta_flow_mv_v2_sig0p3_priorkrr_V2B_s9` | β€” | `bash runs/b200_sciplex3_delta_flow_mv_v2_sig0p3_priorkrr_V2B_s9/reproduce.sh` |
| SciPlex3 mixed-18 K=16 student s1 (paper headline 70.0) | SciPlex3 | `runs/b200_sciplex3_delta_flow_reflow_ensemble_mixed18_K16_V2B_s1` | `experiments/results/sciplex3_iter196_reflow_ensemble_mixed18_K16_N5.json` | `bash runs/b200_sciplex3_delta_flow_reflow_ensemble_mixed18_K16_V2B_s1/reproduce.sh` |
| SciPlex3 mixed-18 K=16 student s2 (paper headline 70.0) | SciPlex3 | `runs/b200_sciplex3_delta_flow_reflow_ensemble_mixed18_K16_V2B_s2` | `experiments/results/sciplex3_iter196_reflow_ensemble_mixed18_K16_N5.json` | `bash runs/b200_sciplex3_delta_flow_reflow_ensemble_mixed18_K16_V2B_s2/reproduce.sh` |
| SciPlex3 mixed-18 K=16 student s3 (paper headline 70.0) | SciPlex3 | `runs/b200_sciplex3_delta_flow_reflow_ensemble_mixed18_K16_V2B_s3` | `experiments/results/sciplex3_iter196_reflow_ensemble_mixed18_K16_N5.json` | `bash runs/b200_sciplex3_delta_flow_reflow_ensemble_mixed18_K16_V2B_s3/reproduce.sh` |
| SciPlex3 mixed-18 K=16 student s4 (paper headline 70.0) | SciPlex3 | `runs/b200_sciplex3_delta_flow_reflow_ensemble_mixed18_K16_V2B_s4` | `experiments/results/sciplex3_iter196_reflow_ensemble_mixed18_K16_N5.json` | `bash runs/b200_sciplex3_delta_flow_reflow_ensemble_mixed18_K16_V2B_s4/reproduce.sh` |
| SciPlex3 mixed-18 K=16 student s5 (paper headline 70.0) | SciPlex3 | `runs/b200_sciplex3_delta_flow_reflow_ensemble_mixed18_K16_V2B_s5` | `experiments/results/sciplex3_iter196_reflow_ensemble_mixed18_K16_N5.json` | `bash runs/b200_sciplex3_delta_flow_reflow_ensemble_mixed18_K16_V2B_s5/reproduce.sh` |
| SciPlex3 mixed-18 K=16 student s6 (paper headline 70.0) | SciPlex3 | `runs/b200_sciplex3_delta_flow_reflow_ensemble_mixed18_K16_V2B_s6` | `experiments/results/sciplex3_iter197_reflow_ensemble_mixed18_K16_N7.json` | `bash runs/b200_sciplex3_delta_flow_reflow_ensemble_mixed18_K16_V2B_s6/reproduce.sh` |
| SciPlex3 mixed-18 K=16 student s7 (paper headline 70.0) | SciPlex3 | `runs/b200_sciplex3_delta_flow_reflow_ensemble_mixed18_K16_V2B_s7` | `experiments/results/sciplex3_iter197_reflow_ensemble_mixed18_K16_N7.json` | `bash runs/b200_sciplex3_delta_flow_reflow_ensemble_mixed18_K16_V2B_s7/reproduce.sh` |
## 4. Eval pipeline
```bash
.venv/bin/python scripts/eval_fair_comparison.py \\
--dataset <dataset_name> --flow_ckpt <run_dir>/flow_best.pt \\
--multiview data/gene_embeddings/<dataset>_multiview.pt --K_eval 128
```
## 5. Known caveats
- SciPlex3 paper-row cosine (0.491) and PDE (0.483) numbers in Table 1 are not in any saved
results JSON; only mean DA = 0.700 reproduces from `iter214_multi_metric_router_ensemble.json`
(`metric_da.tanimoto_morgan2048`). Re-eval needed before final submission.
- Norman cos/PDE in Table 1 differ by ~0.002 from `*_K128.json` (rounded).
- K562 reflow cos/PDE in Table 1 differ by ~0.007 from the saved ensemble JSON.