# PIVOT Perturbation-Informed Vector-field Optimization for Transcriptomic state control. PIVOT learns a perturbation-conditioned flow map over single-cell state embeddings and uses its Jacobian for differentiable inverse design: given a control cell state and a desired target state, it nominates gene-level interventions that move cells toward the target. The same model also does ordinary forward response prediction. ## Layout ``` src/ data/ loading + preprocessing of perturb-seq data, splits models/ perturbation encoder, flow map, the PIVOT module training/ training loop and losses evaluation/ inference, rewards, metrics, baselines experiments/ drivers for the result tables, ablations, figures utils/ scripts/ figure generation, extra ablations, GEARS comparison experiments/ saved result json ``` ## Setup ```bash pip install -r requirements.txt ``` Data is not committed. Download and preprocess from the public sources first (Norman 2019 and Replogle 2022 are pulled from scPerturb): ```bash python -m src.data.preprocess norman python -m src.data.preprocess replogle_k562 ``` This writes a PCA(50) embedding over 2000 highly variable genes plus the held-out splits to `data/processed//`. ## Running things ```bash # train one model python -m src.training.train --dataset norman --split perturbation # forward + nomination tables python -m src.experiments.run_tables --dataset norman --tables forward_cell forward_perturbation # ablations python -m src.experiments.run_ablations --dataset norman # figures python scripts/figures.py ``` The GEARS head-to-head runs in its own conda env (older torch + pyg), since the package is finicky about versions: ```bash bash scripts/setup_gears_env.sh conda run -n pivot_gears python scripts/gears_ranking.py ``` ## Models Trained PIVOT checkpoints live under `models/`: - `models/norman/` - trained on Norman 2019 (CRISPRa K562) - `models/replogle_k562/` - trained on Replogle 2022 (CRISPRi K562) each folder has `model.ptw` (a plain torch state dict), `config.json` (the training config), and `train_info.json` (history + run info). loading needs the matching preprocessed dataset, since the perturbation encoder vocabulary comes from the data: ```python import json, torch from src.data.perturb_data import load_dataset from src.training.train import TrainConfig, make_model cfg = TrainConfig(**json.load(open("models/norman/config.json"))) data = load_dataset(cfg.dataset) model = make_model(data, cfg, device="cpu") model.load_state_dict(torch.load("models/norman/model.ptw", map_location="cpu")) model.eval() ``` ## License MIT, Bryan Cheng 2026. See `LICENSE`.