# scFATE — NeurIPS 2026 Lie-algebraic conditional flow matching for zero-shot perturbation prediction in single cells. This repo contains all checkpoints and configs behind Table 1 in the paper. Datasets are split out into a separate repo: [`Angione-Lab/scFATE-datasets`](https://huggingface.co/datasets/Angione-Lab/scFATE-datasets). ## Layout ``` backbones// — scFATE rotation autoencoders (load first) flow_heads//seed{1,2,3}/ — pure-flow heads on the rotation latent flow_heads/k562/base/ — K562 flow teacher (for reflow) flow_heads/k562/reflow_K2_s1/ — K562 reflow K=2 student (paper headline 81.2 DA) sciplex3_path_b/teachers/{priorkrr,priornone}/s{1..9}/ sciplex3_path_b/students/mixed18_K16/s{1..7}/ — mixed-18 K=16 distilled student (paper headline 70.0 DA) results/table1_inputs/ — saved eval JSONs for reviewer audit REPRODUCE.md — paper reproduction guide MANIFEST.json — machine-readable run inventory ``` ## Quickstart ```python from huggingface_hub import snapshot_download import torch # 1. Pull the K562 paper-headline checkpoint ckpt_dir = snapshot_download("Angione-Lab/scFATE", allow_patterns="flow_heads/k562/reflow_K2_s1/*") ckpt = torch.load(f"{ckpt_dir}/flow_heads/k562/reflow_K2_s1/flow_best.pt", map_location="cpu", weights_only=False) # 2. The ckpt has every hparam at top-level; the velocity-network state_dict is at ckpt["v_net_state_dict"] print({k: v for k, v in ckpt.items() if not k.endswith("_state_dict") and not k.startswith("_")}) ``` For end-to-end reproduction (backbone → flow → reflow → eval), see [`REPRODUCE.md`](./REPRODUCE.md). ## Caveat (important for paper integrity) Table 1 of the paper reports `cos=0.491 / PDE=0.483` for the SciPlex3 (Path B) row. **These two values are not in any saved eval log on the original training machine.** Only the mean DA = 0.700 reproduces from `results/table1_inputs/sciplex3_iter214_multi_metric_router_ensemble.json` (`metric_da.tanimoto_morgan2048 = 0.7000`). The router pipeline only stored DA, not cos/PDE. We are re-running the SciPlex3 Path-B eval to compute proper cos/PDE for camera-ready; until then, treat the SciPlex3 row's continuous metrics as not-yet-reproduced. Norman cos/PDE in Table 1 differ from the saved JSONs by ~0.002 (rounding). K562 reflow cos/PDE differ by ~0.007 (likely rounding/transcription). ## Citation ``` @inproceedings{ahmad2026scfate, title={scFATE: Zero-shot Perturbation Prediction via Lie-Algebraic Conditional Flow Matching}, author={Ahmad, Farhan and Angione, Claudio and others}, booktitle={NeurIPS}, year={2026} } ```