- cellpose-sam-onnx (CPSAM, FP16, browser-ready)
- TL;DR
- Intended use
- How to use
- Browser requirements
- Performance (browser, M1 Max, Chrome 135+)
- How the model was generated
- How this differs from the original
mouseland/cellpose-sam - Why FP16 (and why not FP32 or INT8)?
- Limitations and caveats
- Citation
- Provenance and reproducibility
- License
- Maintainers
- Related
- TL;DR
cellpose-sam-onnx (CPSAM, FP16, browser-ready)
Single-file FP16 ONNX export of Cellpose-SAM (CPSAM), the ViT-Lβbased cellular
segmentation model from Stringer et al., 2025.
Intended for in-browser inference via WebGPU through
cellpose-js, but usable from any
ONNX Runtime backend that supports opset 18 and FP16 graph IO.
- Architecture: SAM ViT-L image encoder (modified β patch size 8, no windowed
attention) + 3-channel dense regression head (
flow_y,flow_x,cellprob). - Source weights:
mouseland/cellpose-sam(1.23 GB PyTorch checkpoint, 304.6 M params). - File:
cpsam_fp16.onnxβ 588 MB, self-contained (weights merged into the graph, no external.datasidecar). - Graph IO dtype:
float16. - Input:
image: tensor(float16), shape(1, 3, 256, 256), RGB, per-channel percentile-normalized to ~[0, 1]. - Output:
flows_cellprob: tensor(float16), shape(1, 3, 256, 256)β channels are(flow_y, flow_x, cellprob). - Opset: 18.
- License: BSD-3-Clause (inherited from upstream Cellpose).
TL;DR
This is the same model as mouseland/cellpose-sam, just re-packaged as a
588 MB FP16 ONNX file that browsers can load and run on WebGPU. No retraining,
no pruning, no architecture changes. Numerical parity vs the PyTorch source is
1.24e-05 worst max-abs-error across 10 random tiles (gate 1e-3).
If you want a smaller model, this is not it β that's Phase 2 (SlimCPSAM), which has not been built yet.
Intended use
- Browser-side cellular segmentation in microscopy applications: H&E, brightfield, phase contrast, fluorescence (multi-channel).
- Drop-in inference engine for any ORT-compatible runtime (Python, Node, Rust, C++) at FP16. The browser is the primary target but not the only one.
- 2D images only. CPSAM's 3D
gradient_tracking_3Droutine is not part of the export and is not represented in this graph.
How to use
In the browser, via cellpose-js (recommended)
import { Cellpose, configureOrt } from 'cellpose-js';
configureOrt({ wasmPaths: '/ort/' }); // serve ORT WASM sidecars same-origin
const cp = await Cellpose.fromPretrained(
'https://huggingface.co/ballon999/cellpose-sam-onnx/resolve/main/cpsam_fp16.onnx',
{ preload: true },
);
const result = await cp.segment(
{ data: imageData.data, width, height, channels: 4 },
{ diameter: 30, cellprob_threshold: 0, chan: 0, chan2: 0 },
);
// result.masks: Uint32Array β instance label map at source resolution
The first call fetches 588 MB from the Hub; subsequent calls hit IndexedDB and cold-start in under ~2 s.
See the cellpose-js README for the full API, parameter reference, and Python parity notes.
With ONNX Runtime directly (Python)
import numpy as np
import onnxruntime as ort
sess = ort.InferenceSession("cpsam_fp16.onnx", providers=["CPUExecutionProvider"])
tile = np.random.rand(1, 3, 256, 256).astype(np.float16)
out = sess.run(None, {"image": tile})[0] # (1, 3, 256, 256) float16
flow_y, flow_x, cellprob = out[0, 0], out[0, 1], out[0, 2]
For flow-dynamics postprocessing (Euler integration β convergence clustering β connected components β size/flow filtering), use either:
- the JS port in
cellpose-js/src/dynamics, or - the original Python implementation in
cellpose.dynamicsβ input/output contracts match.
Browser requirements
Because the graph IO is FP16, ORT-web needs the native Float16Array
typed-array, not a Uint16Array bit-pattern. That requires:
- Chrome β₯ 135 (Feb 2025), or
- Safari β₯ 17.4.
- WebGPU available (
'gpu' in navigator). No WASM fallback in v1 of the consumer (cellpose-js) β see "Why not FP32?" below.
Older browsers fail fast with a clear error in cellpose-js. For direct ORT
use, ORT will throw on session create or input binding.
Performance (browser, M1 Max, Chrome 135+)
Measured end-to-end through cellpose-js:
| Step | Time |
|---|---|
| Cold model fetch (588 MB, CDN) | ~5 s |
| Warm fetch (IndexedDB) | < 100 ms |
ort.InferenceSession.create |
~1.3 s |
| Cold shader compile (first forward) | ~2.3 s |
| Steady-state per-tile inference (256Γ256) | ~277 ms |
| Per-tile preprocess (normalize + tile copy) | ~14 ms amortized |
| Full-image flow dynamics (400Γ400) | ~74 ms |
ORT-web 1.26 is ~2.3Γ faster than 1.20 on the WebGPU kernels β the steady-state 277 ms number is on 1.26. Stage 0 originally measured 628 ms/tile on 1.20.
How the model was generated
The export path is documented in
docs/STAGE0-RESULTS.md
and docs/PLAN.md Β§1.5, Β§2.
The short version:
- Source weights:
mouseland/cellpose-sam(PyTorch, 1.23 GB, 304.6 M params). - Wrap
cellpose.vit_sam.Transformer(this is not a HuggingFace Transformers class βoptimum-clidoes not apply here). - Instantiate in FP16 directly:
Transformer(dtype=torch.float16)then load the FP32 checkpoint and cast. Post-export FP16 conversion viaonnxconverter-commonoronnxruntime.transformers.float16produced broken graphs on the dynamo-exported topology (dangling FP16βFP32 type mismatches and duplicate node names respectively) β re-exporting from a natively-FP16nn.Moduleis the only path that worked. - Export with
torch.onnx.export(..., dynamo=True, strict=True)at opset 17 (auto-upgraded to 18 by the dynamo exporter). Requiresonnxscriptas an extra dependency.strict=Falsefailed;strict=Truesucceeded. dynamic_axes: batch only. H/W are hardcoded to 256 by the dynamo exporter β acceptable because CPSAM is always tiled at 256Γ256.- Merge externalized weights back into the graph file via
onnx.save_model(..., save_as_external_data=False). The 588 MB result fits comfortably under the 2 GB protobuf limit, so the browser fetches one file instead of.onnx+.onnx.data. - Parity check vs PyTorch on 10 deterministic random tiles
(
(1, 3, 256, 256)FP32, seed 0): worst max abs error 1.24e-05, mean 8.96e-06. Gate was 1e-3.
The PyTorch and exporter versions used: torch 2.12.0, cellpose 4.1.1,
onnx 1.21.0, onnxruntime 1.26.0, onnxscript (latest at export time).
How this differs from the original mouseland/cellpose-sam
It is the same network, same weights, same outputs β only the serialization format differs. Specifically:
| Aspect | mouseland/cellpose-sam (PyTorch) |
This repo (ONNX FP16) |
|---|---|---|
| Format | PyTorch .pt checkpoint |
ONNX, single file |
| Size | 1.23 GB | 588 MB |
| Precision | FP32 | FP16 |
| Runtime targets | PyTorch (Python only) | ORT WebGPU/CUDA/CPU/CoreML/DirectML |
| Input dtype | float32 |
float16 (native Float16Array) |
| Input shape | Variable; CPSAM tiles internally | Fixed (1, 3, 256, 256) |
| Postprocessing | Bundled in cellpose.dynamics |
Not included β caller's job |
| 3D segmentation | Yes (gradient_tracking_3D) |
No β 2D only |
| Promptable | No (CPSAM is dense regression) | No (unchanged) |
Numerical: worst observed max abs error vs the FP32 PyTorch reference on the same input is 1.24e-05 β the FP16 export is numerically indistinguishable from the original at the granularity that matters for downstream flow dynamics.
What's not here:
- The flow-dynamics postprocessing. This repo ships the encoder + head
only. Postprocessing (Euler integration, convergence clustering, connected
components, size / flow-consistency filtering) lives in
cellpose-js(TypeScript port, ~500 LOC) andcellposeitself (Python). Output of this ONNX graph is raw(flow_y, flow_x, cellprob)β you still need to turn that into instance masks. - 3D mode. CPSAM's Python implementation handles z-stacks via a separate routine; that's out of scope here.
- Prompt encoder / mask decoder. CPSAM does not have them β unlike SAM / SlimSAM, CPSAM is dense regression, not promptable mask generation.
Why FP16 (and why not FP32 or INT8)?
- WebGPU runs FP16 well. The whole point of this export is browser inference, and the steady-state tile latency is fine (277 ms on M1 Max).
- FP32 would double the download. ~1.1 GB instead of 588 MB, and the same WebGPU adapter ends up converting much of the graph internally anyway.
- INT8 wasn't worth the validation budget. Flow regression is more numerically sensitive than mask classification, so an INT8 path would need end-to-end IoU validation against the original Cellpose. We may revisit this in a future release if size complaints arrive.
Limitations and caveats
- Fixed input size 256Γ256. Larger images must be tiled by the caller.
cellpose-jsdoes this transparently with 32-px overlap. - Browser version floor (Chrome 135 / Safari 17.4). Native
Float16Arrayis non-negotiable for FP16 graph IO under ORT-web 1.20+. - Not promptable. This is dense per-pixel regression, not SAM-style prompted segmentation.
- No domain-specialized variants. This is the generalist CPSAM. Historical Cellpose variants (cyto / cyto2 / cyto3 / nuclei) and Omnipose bacteria models are not provided here β they are planned as Phase 2 SlimCPSAM finetunes and have not been built yet.
- No INT8 / FP32 fallback. WebGPU only.
Citation
If you use this model, please cite the original Cellpose-SAM paper:
Stringer, C., Pachitariu, M. et al.
Cellpose-SAM: superhuman generalization for cellular segmentation.
bioRxiv 2025.04.28.651001 (2025).
https://www.biorxiv.org/content/10.1101/2025.04.28.651001v1
Provenance and reproducibility
- Source checkpoint:
mouseland/cellpose-samon Hugging Face Hub. - Export scripts and ONNX artifact ETag (
52fd6881β¦) are recorded indocs/STAGE0-RESULTS.md. - Parity test fixtures (numpy-generated FP32 reference tiles + expected
flow outputs) live in
tests/fixtures/in thecellpose-jsrepo.
License
- This ONNX artifact: BSD-3-Clause, inherited from MouseLand/cellpose.
- The
cellpose-jsconsumer library: MIT (compatible). - Use, redistribution, and modification are permitted; attribution to the original Cellpose authors is required by the BSD-3 license.
Maintainers
- ONNX export,
cellpose-js, and this card: @belkassaby (HF:ballon999) β same person, different username on each platform. - Original Cellpose-SAM model and algorithm: the MouseLand team.
Related
- Code:
belkassaby/Cellpose.jsβ TypeScript inference + dynamics port. - npm:
cellpose-js. - Upstream:
MouseLand/cellposeandmouseland/cellpose-sam. - Paper: Cellpose-SAM (bioRxiv 2025.04.28).
Model tree for Ballon999/cellpose-sam-onnx
Base model
mouseland/cellpose-sam