CellFlow trained on Norman 2019 (ESM2 3B variant)

Produced as part of the sc-interp single-cell model comparison repo.

Provenance

Source code commit: fdc2ae0
Runner: scripts/run_cellflow.py
Dataset manifest: data/norman/manifest.yaml

Base model

Trained from scratch. CellFlow is a flow-matching based perturbation prediction framework and does not ship a foundation checkpoint. Perturbation conditions are encoded via ESM2 embeddings of the perturbed gene(s) using facebook/esm2_t36_3B_UR50D (3B parameter model, 2560-dim per-gene embeddings), matching the CellFlow reproducibility repo's default.

Training

Architecture and training hyperparameters match the cellflow_reproducibility repo's suppl_fig/norman/downstream_analysis/cellflow/ configs verbatim:
- condition_embedding_dim=1024, hidden_dims=(4096,4096,4096), decoder_dims=(4096,4096,4096), decoder_dropout=0.2
- time_encoder_dims=(2048,2048,2048), time_freqs=1024, cond_output_dropout=0.9
- layers_before_pool.target_gene = mlp[1024,1024] dropout 0.5, layers_after_pool = mlp[1024,1024] dropout 0.2
- match_fn = match_linear(epsilon=0.1, scale_cost='mean', tau_a=1.0, tau_b=1.0)
- optimizer = optax.MultiSteps(optax.adam(5e-5), 20)
- probability_path = {'constant_noise': 1.0}
- pooling = 'attention_token'
Sample representation: 50-dim PCA (sample_rep='X_pca'), fit on the train split cells and projected onto val and test.
Perturbation encoding: ESM2 3B embeddings per gene symbol, stored in adata.uns['esm2'] and referenced via perturbation_covariate_reps={'target_gene': 'esm2'}.
Split: GEARS simulation split with seed 42, not biolord (the CellFlow paper uses biolord). Deliberate divergence for internal consistency with our scGPT and scLDM runners.

Budget and stopping


iterations	200,000
batch size	1024
valid_freq	400,000 (larger than budget = no mid-training eval)
wall clock	0.7 hours (H100 PCIe)
sample_rep	X_pca (50 dims)
esm model	esm2_t36_3B_UR50D

Test set metrics (cell-eval)

metric	mean	median	max
pearson_delta	0.6061	0.7359	0.9654
discrimination_score_l1	0.7609	0.8687	1.0000
discrimination_score_l2	0.7736	0.8889	1.0000
discrimination_score_cosine	0.7484	0.9091	1.0000
pearson_edistance	0.6883	0.6883	0.6883
clustering_agreement	0.4352	0.4352	0.4352
overlap_at_N	0.0266	0.0245	0.1076
precision_at_N	0.0939	0.0981	0.2302
mse	0.0028	0.0018	0.0132
mae	0.0146	0.0127	0.0341

The CellFlow paper reports Norman results in R² and energy distance space. Our numbers use cell-eval's metric set on the GEARS simulation split so they are not directly comparable to the paper's Figure 4N, but they reproduce the paper's headline claim (CellFlow > scGPT on Norman) across every distributional metric. A sibling variant using ESM2 8M instead of 3B is available at matthewshu/cellflow-norman; the 3B variant shows meaningfully better pearson_delta (+0.04 mean, +0.055 median) and clustering_agreement (+0.11), while DE gene metrics (overlap_at_N, precision_at_N) are unchanged. This suggests larger protein language models help CellFlow's condition encoder learn broader cell-state structure but not specific regulatory gene identification.

Known limitations

Uses GEARS simulation split instead of biolord's 5 random splits. Our test perturbations are a different subset of Norman than the paper's.
Training uses valid_freq > num_iterations so there is no mid-training val evaluation. Convergence was not verified via a val curve; future runs should use a smaller valid_freq to plot the learning curve.
DE gene identification metrics (overlap_at_N, precision_at_N) did not improve from the 8M ESM variant to this 3B variant, suggesting that the DE gene bottleneck is architectural/data, not gene-embedding quality.

Files

CellFlow.pkl — Trained CellFlow model, pickled via cf.save(). Load via cellflow.model.CellFlow.load(path).
training_stats.json — iterations, wall clock, wandb run URL.

Usage

from huggingface_hub import hf_hub_download
from cellflow.model import CellFlow

path = hf_hub_download(
    repo_id="matthewshu/cellflow-norman-esm3b",
    filename="CellFlow.pkl",
)
cf = CellFlow.load(path)
# Then use sc-interp's run_cellflow.py with --esm-model esm2_t36_3B_UR50D

Citation

Dataset: Norman et al. 2019 (Science). Model: Klein, Fleck, Becker et al. 2025 bioRxiv (CellFlow). ESM2: Lin et al. 2023 Science. See the respective repos for proper BibTeX entries.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Other

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support