YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
π«§ Microbubble Distillation Pipeline (v3 β Fixed)
Cellpose-SAM-FT β Pseudo-labels β TinyBubbleNet
A 3-stage pipeline for fast, lightweight microbubble sizing and counting via knowledge distillation.
β οΈ IMPORTANT BUG FIX: The original
train_student.pyin this repo usesBCEWithLogitsLosson binary foreground/background masks. This fails catastrophically because microbubble foreground is only ~0.2% of pixels β the model learns to predict ALL background and achieves 99.8% accuracy while detecting zero bubbles. The fixed scripttrain_mse_distill.pyuses MSE distillation on the teacher's raw cell_prob LOGITS instead, which gives gradients on ALL pixels (background pixels have informative negative logits ~-6). Seetrain_mse_distill.pyfor the corrected implementation.
The Problem
Cellpose-SAM is excellent for cell/bubble segmentation, but at ~300M params (1.1 GB) it's expensive at inference. For lab settings where your slides look similar and you're "just detecting circles", this is massive overkill. You're paying for the ability to also segment dogs, neurons, and a thousand other things β capacity you don't need.
The Solution: Distill Into a Tiny Specialist
| Model | Params | Size | 256Γ256 GPU | 256Γ256 CPU | FPS (GPU) |
|---|---|---|---|---|---|
| Cellpose-SAM | ~300M | 1.1 GB | ~100 ms | seconds | ~10 |
| TinyBubbleNet (base_ch=16) | 389K | 1.5 MB | 3 ms | 45 ms | 337 |
| TinyBubbleNet (base_ch=32) | 1.5M | 5.8 MB | ~5 ms | ~80 ms | ~200 |
~33Γ faster, ~750Γ smaller. And when your domain is narrow (similar-looking lab slides), the accuracy loss is minimal because the student only needs to learn one visual distribution.
Architecture
TinyBubbleNet is a depthwise-separable U-Net (inspired by PicoSAM2) with a 4-channel output:
| Channel | Name | What it encodes |
|---|---|---|
| 0 | dY |
Vertical gradient flow (Cellpose-compatible) |
| 1 | dX |
Horizontal gradient flow (Cellpose-compatible) |
| 2 | cell_prob |
Foreground/background probability |
| 3 | dist_transform |
Distance transform (peak = bubble radius) |
Instance masks are reconstructed via Euler integration of the flow field β identical to Cellpose post-processing. This means the student is fully compatible with the Cellpose ecosystem.
The distance transform head is the key addition for sizing: the peak value within each detected instance directly gives you the bubble radius.
The Bug and The Fix
The Bug (original train_student.py / losses.py)
# BAD: BCE on binary masks
prob_loss = BCEWithLogitsLoss(pred_prob, binary_mask)
With foreground at only ~0.2% of pixels, the model's dominant gradient signal is "predict all background". Even after 300 epochs with "best val loss 0.0008", the model predicts zero bubbles everywhere.
The Fix (train_mse_distill.py)
# GOOD: MSE on teacher's raw logits
prob_loss = MSE(pred_prob_logits, teacher_cell_prob_logits)
The teacher outputs cell_prob as raw logits (range roughly -9 to +5). Every pixel has an informative value β background pixels should reproduce ~-6, foreground pixels should reproduce ~+5. MSE on logits gives strong gradients everywhere, and the student successfully learns to segment bubbles.
Why this works
| Loss | Target | Gradient on bg pixels? | Result |
|---|---|---|---|
| BCE + binary mask | {0, 1} | NO (bg is "correct" at 0) | Predicts all background |
| MSE + teacher logits | Real numbers (~-6 to +5) | YES (bg must match ~-6) | Learns proper segmentation |
Pipeline Overview
βββββββββββββββββββββββ ββββββββββββββββββββββββββββ βββββββββββββββββββββββ
β Stage 1: Teacher β β Stage 2: Distillation β β Stage 3: Inference β
β β β β β β
β Cellpose-SAM-FT ββββββΆβ Train TinyBubbleNet ββββββΆβ Fast bubble sizing β
β generates pseudo- β β on raw teacher logits β β (~3ms/image GPU) β
β labels on 100s of β β (~400 epochs) β β β
β lab images β β β β β
βββββββββββββββββββββββ ββββββββββββββββββββββββββββ βββββββββββββββββββββββ
Quick Start
Install
pip install cellpose torch torchvision scipy scikit-image huggingface_hub numpy
Stage 1: Generate Pseudo-labels
python generate_pseudolabels.py \
--image_dir /path/to/lab_images/ \
--model_path /path/to/your/cellpose_sam_ft_model \
--output_dir /path/to/pseudolabels/ \
--diameter 30 \
--channels 0 0
This runs your fine-tuned Cellpose-SAM on all images and saves:
- Instance masks, flow fields, distance transforms (
.npy) - Bubble statistics (count, size distribution)
Stage 2: Train Student (FIXED β use this!)
# β
FIXED: Uses MSE distillation on teacher logits
python train_mse_distill.py \
--image_dir /path/to/lab_images/ \
--label_dir /path/to/pseudolabels/ \
--output_dir ./checkpoints/ \
--base_ch 16 \
--epochs 400 \
--batch_size 4 \
--lr 1e-3
# β DEPRECATED (has class imbalance bug)
# python train_student.py ...
The fixed script (train_mse_distill.py):
- Uses MSE distillation on all 4 output channels (including teacher's raw cell_prob logits)
- Uses a
RawPseudoLabelDatasetthat loads teacher logits directly (no binarization) - Achieves proper foreground segmentation instead of all-background predictions
Stage 3: Fast Inference
python inference.py \
--model_path ./checkpoints/best_model.pt \
--image_path /path/to/image.png \
--output_dir ./results/
Files
| File | Description | Status |
|---|---|---|
model.py |
TinyBubbleNet architecture (depthwise-separable U-Net) | β |
losses.py |
Original distillation loss (BCE+Dice β has bug) | β οΈ See train_mse_distill.py for fix |
dataset.py |
Original dataset (binary masks β has bug) | β οΈ See train_mse_distill.py for fix |
train_student.py |
Original training (BCE-based β has bug) | β οΈ Deprecated |
train_mse_distill.py |
Fixed training with MSE on teacher logits | β Use this! |
generate_pseudolabels.py |
Stage 1: Teacher β pseudo-labels | β |
inference.py |
Stage 3: Fast inference + bubble measurements | β |
Model Variants
base_ch |
Params | Size | GPU Speed | Use Case |
|---|---|---|---|---|
| 16 | 389K | 1.5 MB | 3 ms @ 256Β² | Default β fast & tiny |
| 32 | 1.5M | 5.8 MB | 5 ms @ 256Β² | More capacity if needed |
Use --no_depthwise for standard convolutions (more params, possibly better accuracy on complex images).
Key Design Decisions
Why Cellpose flows instead of direct mask prediction? Flows handle overlapping/touching bubbles via convergence β each pixel flows toward its instance center. Direct mask prediction can't separate touching instances.
Why distance transform head? For circles, the DT peak = radius. This gives you sizing "for free" without post-processing the mask.
Why depthwise-separable convs? ~8Γ fewer params than standard convs. For a narrow domain (your lab slides), this compression is lossless.
Why MSE on logits instead of BCE on masks? See "The Bug and The Fix" section above. BCE on sparse binary masks fails due to extreme class imbalance. MSE on teacher logits gives gradients everywhere.
When to Re-train
The student is specialized to your current lab setup. Re-train when:
- Microscope/camera settings change significantly
- Bubble preparation protocol changes
- Image resolution changes
Re-training is fast: ~30 min for 400 epochs on 50 images with a GPU.
References
- Cellpose-SAM: Superhuman Generalization for Cellular Segmentation (Pachitariu et al., 2025)
- PicoSAM2: Low-Latency Segmentation for Edge Vision β student architecture + loss design
- MobileSAM: Faster Segment Anything β decoupled distillation strategy
- Medical Image Segmentation with SAM-generated Annotations β pseudo-label β UNet recipe
- Gorce et al., 2010 β original optical microscopy microbubble sizing algorithm