scFATE / README.md
farhan-ahmad's picture
Initial NeurIPS 2026 release: backbones, flow heads, SciPlex3 Path-B teachers/students, paper-table result JSONs
6d96029 verified
# scFATE β€” NeurIPS 2026
Lie-algebraic conditional flow matching for zero-shot perturbation prediction in single cells. This repo contains all
checkpoints and configs behind Table 1 in the paper. Datasets are split out into a separate repo:
[`Angione-Lab/scFATE-datasets`](https://huggingface.co/datasets/Angione-Lab/scFATE-datasets).
## Layout
```
backbones/<dataset>/ β€” scFATE rotation autoencoders (load first)
flow_heads/<dataset>/seed{1,2,3}/ β€” pure-flow heads on the rotation latent
flow_heads/k562/base/ β€” K562 flow teacher (for reflow)
flow_heads/k562/reflow_K2_s1/ β€” K562 reflow K=2 student (paper headline 81.2 DA)
sciplex3_path_b/teachers/{priorkrr,priornone}/s{1..9}/
sciplex3_path_b/students/mixed18_K16/s{1..7}/ β€” mixed-18 K=16 distilled student (paper headline 70.0 DA)
results/table1_inputs/ β€” saved eval JSONs for reviewer audit
REPRODUCE.md β€” paper reproduction guide
MANIFEST.json β€” machine-readable run inventory
```
## Quickstart
```python
from huggingface_hub import snapshot_download
import torch
# 1. Pull the K562 paper-headline checkpoint
ckpt_dir = snapshot_download("Angione-Lab/scFATE", allow_patterns="flow_heads/k562/reflow_K2_s1/*")
ckpt = torch.load(f"{ckpt_dir}/flow_heads/k562/reflow_K2_s1/flow_best.pt", map_location="cpu", weights_only=False)
# 2. The ckpt has every hparam at top-level; the velocity-network state_dict is at ckpt["v_net_state_dict"]
print({k: v for k, v in ckpt.items() if not k.endswith("_state_dict") and not k.startswith("_")})
```
For end-to-end reproduction (backbone β†’ flow β†’ reflow β†’ eval), see [`REPRODUCE.md`](./REPRODUCE.md).
## Caveat (important for paper integrity)
Table 1 of the paper reports `cos=0.491 / PDE=0.483` for the SciPlex3 (Path B) row. **These two values are not in any
saved eval log on the original training machine.** Only the mean DA = 0.700 reproduces from
`results/table1_inputs/sciplex3_iter214_multi_metric_router_ensemble.json`
(`metric_da.tanimoto_morgan2048 = 0.7000`). The router pipeline only stored DA, not cos/PDE. We are re-running the
SciPlex3 Path-B eval to compute proper cos/PDE for camera-ready; until then, treat the SciPlex3 row's continuous metrics
as not-yet-reproduced.
Norman cos/PDE in Table 1 differ from the saved JSONs by ~0.002 (rounding). K562 reflow cos/PDE differ by ~0.007 (likely
rounding/transcription).
## Citation
```
@inproceedings{ahmad2026scfate,
title={scFATE: Zero-shot Perturbation Prediction via Lie-Algebraic Conditional Flow Matching},
author={Ahmad, Farhan and Angione, Claudio and others},
booktitle={NeurIPS},
year={2026}
}
```