YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

🫧 Microbubble Distillation Pipeline (v3 — Fixed)

Cellpose-SAM-FT → Pseudo-labels → TinyBubbleNet

A 3-stage pipeline for fast, lightweight microbubble sizing and counting via knowledge distillation.

⚠️ IMPORTANT BUG FIX: The original train_student.py in this repo uses BCEWithLogitsLoss on binary foreground/background masks. This fails catastrophically because microbubble foreground is only ~0.2% of pixels — the model learns to predict ALL background and achieves 99.8% accuracy while detecting zero bubbles. The fixed script train_mse_distill.py uses MSE distillation on the teacher's raw cell_prob LOGITS instead, which gives gradients on ALL pixels (background pixels have informative negative logits ~-6). See train_mse_distill.py for the corrected implementation.

The Problem

Cellpose-SAM is excellent for cell/bubble segmentation, but at ~300M params (1.1 GB) it's expensive at inference. For lab settings where your slides look similar and you're "just detecting circles", this is massive overkill. You're paying for the ability to also segment dogs, neurons, and a thousand other things — capacity you don't need.

The Solution: Distill Into a Tiny Specialist

Model	Params	Size	256×256 GPU	256×256 CPU	FPS (GPU)
Cellpose-SAM	~300M	1.1 GB	~100 ms	seconds	~10
TinyBubbleNet (base_ch=16)	389K	1.5 MB	3 ms	45 ms	337
TinyBubbleNet (base_ch=32)	1.5M	5.8 MB	~5 ms	~80 ms	~200

~33× faster, ~750× smaller. And when your domain is narrow (similar-looking lab slides), the accuracy loss is minimal because the student only needs to learn one visual distribution.

Architecture

TinyBubbleNet is a depthwise-separable U-Net (inspired by PicoSAM2) with a 4-channel output:

Channel	Name	What it encodes
0	`dY`	Vertical gradient flow (Cellpose-compatible)
1	`dX`	Horizontal gradient flow (Cellpose-compatible)
2	`cell_prob`	Foreground/background probability
3	`dist_transform`	Distance transform (peak = bubble radius)

Instance masks are reconstructed via Euler integration of the flow field — identical to Cellpose post-processing. This means the student is fully compatible with the Cellpose ecosystem.

The distance transform head is the key addition for sizing: the peak value within each detected instance directly gives you the bubble radius.

The Bug and The Fix

The Bug (original `train_student.py` / `losses.py`)

# BAD: BCE on binary masks
prob_loss = BCEWithLogitsLoss(pred_prob, binary_mask)

With foreground at only ~0.2% of pixels, the model's dominant gradient signal is "predict all background". Even after 300 epochs with "best val loss 0.0008", the model predicts zero bubbles everywhere.

The Fix (`train_mse_distill.py`)

# GOOD: MSE on teacher's raw logits
prob_loss = MSE(pred_prob_logits, teacher_cell_prob_logits)

The teacher outputs cell_prob as raw logits (range roughly -9 to +5). Every pixel has an informative value — background pixels should reproduce ~-6, foreground pixels should reproduce ~+5. MSE on logits gives strong gradients everywhere, and the student successfully learns to segment bubbles.

Why this works

Loss	Target	Gradient on bg pixels?	Result
BCE + binary mask	{0, 1}	NO (bg is "correct" at 0)	Predicts all background
MSE + teacher logits	Real numbers (~-6 to +5)	YES (bg must match ~-6)	Learns proper segmentation

Pipeline Overview

┌─────────────────────┐     ┌──────────────────────────┐     ┌─────────────────────┐
│  Stage 1: Teacher   │     │  Stage 2: Distillation   │     │  Stage 3: Inference  │
│                     │     │                          │     │                      │
│  Cellpose-SAM-FT    │────▶│  Train TinyBubbleNet     │────▶│  Fast bubble sizing  │
│  generates pseudo-  │     │  on raw teacher logits   │     │  (~3ms/image GPU)    │
│  labels on 100s of  │     │  (~400 epochs)           │     │                      │
│  lab images         │     │                          │     │                      │
└─────────────────────┘     └──────────────────────────┘     └─────────────────────┘

Quick Start

Install

pip install cellpose torch torchvision scipy scikit-image huggingface_hub numpy

Stage 1: Generate Pseudo-labels

python generate_pseudolabels.py \
    --image_dir /path/to/lab_images/ \
    --model_path /path/to/your/cellpose_sam_ft_model \
    --output_dir /path/to/pseudolabels/ \
    --diameter 30 \
    --channels 0 0

This runs your fine-tuned Cellpose-SAM on all images and saves:

Instance masks, flow fields, distance transforms (.npy)
Bubble statistics (count, size distribution)

Stage 2: Train Student (FIXED — use this!)

# ✅ FIXED: Uses MSE distillation on teacher logits
python train_mse_distill.py \
    --image_dir /path/to/lab_images/ \
    --label_dir /path/to/pseudolabels/ \
    --output_dir ./checkpoints/ \
    --base_ch 16 \
    --epochs 400 \
    --batch_size 4 \
    --lr 1e-3

# ❌ DEPRECATED (has class imbalance bug)
# python train_student.py ...

The fixed script (train_mse_distill.py):

Uses MSE distillation on all 4 output channels (including teacher's raw cell_prob logits)
Uses a RawPseudoLabelDataset that loads teacher logits directly (no binarization)
Achieves proper foreground segmentation instead of all-background predictions

Stage 3: Fast Inference

python inference.py \
    --model_path ./checkpoints/best_model.pt \
    --image_path /path/to/image.png \
    --output_dir ./results/

Files

File	Description	Status
`model.py`	TinyBubbleNet architecture (depthwise-separable U-Net)	✅
`losses.py`	Original distillation loss (BCE+Dice — has bug)	⚠️ See `train_mse_distill.py` for fix
`dataset.py`	Original dataset (binary masks — has bug)	⚠️ See `train_mse_distill.py` for fix
`train_student.py`	Original training (BCE-based — has bug)	⚠️ Deprecated
`train_mse_distill.py`	Fixed training with MSE on teacher logits	✅ Use this!
`generate_pseudolabels.py`	Stage 1: Teacher → pseudo-labels	✅
`inference.py`	Stage 3: Fast inference + bubble measurements	✅

Model Variants

`base_ch`	Params	Size	GPU Speed	Use Case
16	389K	1.5 MB	3 ms @ 256²	Default — fast & tiny
32	1.5M	5.8 MB	5 ms @ 256²	More capacity if needed

Use --no_depthwise for standard convolutions (more params, possibly better accuracy on complex images).

Key Design Decisions

Why Cellpose flows instead of direct mask prediction? Flows handle overlapping/touching bubbles via convergence — each pixel flows toward its instance center. Direct mask prediction can't separate touching instances.
Why distance transform head? For circles, the DT peak = radius. This gives you sizing "for free" without post-processing the mask.
Why depthwise-separable convs? ~8× fewer params than standard convs. For a narrow domain (your lab slides), this compression is lossless.
Why MSE on logits instead of BCE on masks? See "The Bug and The Fix" section above. BCE on sparse binary masks fails due to extreme class imbalance. MSE on teacher logits gives gradients everywhere.

When to Re-train

The student is specialized to your current lab setup. Re-train when:

Microscope/camera settings change significantly
Bubble preparation protocol changes
Image resolution changes

Re-training is fast: ~30 min for 400 epochs on 50 images with a GPU.

References

Cellpose-SAM: Superhuman Generalization for Cellular Segmentation (Pachitariu et al., 2025)
PicoSAM2: Low-Latency Segmentation for Edge Vision — student architecture + loss design
MobileSAM: Faster Segment Anything — decoupled distillation strategy
Medical Image Segmentation with SAM-generated Annotations — pseudo-label → UNet recipe
Gorce et al., 2010 — original optical microscopy microbubble sizing algorithm

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Papers for callumtilbury/bubble-distill