| # PIVOT |
|
|
| Perturbation-Informed Vector-field Optimization for Transcriptomic state control. |
|
|
| PIVOT learns a perturbation-conditioned flow map over single-cell state embeddings and |
| uses its Jacobian for differentiable inverse design: given a control cell state and a |
| desired target state, it nominates gene-level interventions that move cells toward the |
| target. The same model also does ordinary forward response prediction. |
|
|
| ## Layout |
|
|
| ``` |
| src/ |
| data/ loading + preprocessing of perturb-seq data, splits |
| models/ perturbation encoder, flow map, the PIVOT module |
| training/ training loop and losses |
| evaluation/ inference, rewards, metrics, baselines |
| experiments/ drivers for the result tables, ablations, figures |
| utils/ |
| scripts/ figure generation, extra ablations, GEARS comparison |
| experiments/ saved result json |
| ``` |
|
|
| ## Setup |
|
|
| ```bash |
| pip install -r requirements.txt |
| ``` |
|
|
| Data is not committed. Download and preprocess from the public sources first (Norman 2019 |
| and Replogle 2022 are pulled from scPerturb): |
|
|
| ```bash |
| python -m src.data.preprocess norman |
| python -m src.data.preprocess replogle_k562 |
| ``` |
|
|
| This writes a PCA(50) embedding over 2000 highly variable genes plus the held-out splits to |
| `data/processed/<dataset>/`. |
|
|
| ## Running things |
|
|
| ```bash |
| # train one model |
| python -m src.training.train --dataset norman --split perturbation |
| |
| # forward + nomination tables |
| python -m src.experiments.run_tables --dataset norman --tables forward_cell forward_perturbation |
| |
| # ablations |
| python -m src.experiments.run_ablations --dataset norman |
| |
| # figures |
| python scripts/figures.py |
| ``` |
|
|
| The GEARS head-to-head runs in its own conda env (older torch + pyg), since the package is |
| finicky about versions: |
|
|
| ```bash |
| bash scripts/setup_gears_env.sh |
| conda run -n pivot_gears python scripts/gears_ranking.py |
| ``` |
|
|
| ## Models |
|
|
| Trained PIVOT checkpoints live under `models/`: |
|
|
| - `models/norman/` - trained on Norman 2019 (CRISPRa K562) |
| - `models/replogle_k562/` - trained on Replogle 2022 (CRISPRi K562) |
|
|
| each folder has `model.ptw` (a plain torch state dict), `config.json` (the training config), |
| and `train_info.json` (history + run info). loading needs the matching preprocessed dataset, |
| since the perturbation encoder vocabulary comes from the data: |
|
|
| ```python |
| import json, torch |
| from src.data.perturb_data import load_dataset |
| from src.training.train import TrainConfig, make_model |
| |
| cfg = TrainConfig(**json.load(open("models/norman/config.json"))) |
| data = load_dataset(cfg.dataset) |
| model = make_model(data, cfg, device="cpu") |
| model.load_state_dict(torch.load("models/norman/model.ptw", map_location="cpu")) |
| model.eval() |
| ``` |
|
|
| ## License |
|
|
| MIT, Bryan Cheng 2026. See `LICENSE`. |
|
|