PicoSAM2: Low-Latency Segmentation In-Sensor for Edge Vision Applications
Paper β’ 2506.18807 β’ Published β’ 1
YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Distilled from Cellpose-SAM-FT for fast microbubble sizing and counting.
| Property | Value |
|---|---|
| Architecture | Depthwise-separable U-Net |
| Parameters | 388,580 |
| Model size | 1.48 MB |
| Inference @ 256x256 | 3.06 ms |
| Inference @ 1024x1280 | 42.41 ms |
| FPS @ 256x256 | 327 |
| FPS @ 1024x1280 | 24 |
| Best val loss | 0.9571 |
| Best epoch | 526 / 600 |
| Metric | Value |
|---|---|
| Mean teacher bubbles | 610.1 |
| Mean student bubbles | 576.5 |
| Mean abs diff | 40.4 |
| Median abs diff | 41.5 |
| Model | Params | Size | Speed @ 1024x1280 |
|---|---|---|---|
| Cellpose-SAM-FT | ~300M | 1.1 GB | ~2000 ms |
| TinyBubbleNet | 388,580 | 1.5 MB | 42.4 ms |
| Channel | Name | Description |
|---|---|---|
| 0 | dY | Vertical gradient flow (Cellpose-compatible) |
| 1 | dX | Horizontal gradient flow (Cellpose-compatible) |
| 2 | cell_prob | Foreground logit (apply sigmoid for probability) |
| 3 | dist_transform | Distance transform (peak = bubble radius) |
4-level U-Net with depthwise-separable convolutions (except first encoder block). Inspired by PicoSAM2. Encoder: 16β32β64β128β256 channels. Each level: double depthwise-separable conv blocks.
import torch
import numpy as np
from huggingface_hub import hf_hub_download
# Download
path = hf_hub_download("callumtilbury/bubble-student-v1", "best_model.pt")
ckpt = torch.load(path, map_location="cpu", weights_only=False)
# Model class needed β copy from this repo or see bubble-distill
model = TinyBubbleNet(in_channels=1, base_ch=16)
model.load_state_dict(ckpt["model_state_dict"])
model.eval()
# Inference (normalize grayscale image to [0,1] first)
img_tensor = ... # (1, 1, H, W)
with torch.no_grad():
out = model(img_tensor) # (1, 4, H, W)
dY, dX = out[0, 0].numpy(), out[0, 1].numpy()
cell_prob = torch.sigmoid(out[0, 2]).numpy() # foreground probability
dist = torch.relu(out[0, 3]).numpy() # distance transform (peak = radius)
# For instance segmentation, use Euler integration on (dY, dX, cell_prob)
# For bubble counting, use connected component analysis on (cell_prob > threshold)
| File | Description |
|---|---|
best_model.pt |
Best checkpoint (epoch 526, val_loss=0.9571) |
final_model.pt |
Final checkpoint (epoch 600) |
bubble_model.onnx |
ONNX export for fast CPU/edge inference |
config.json |
Training configuration |
history.json |
Full training history (all epochs) |
eval_results.json |
Evaluation results (student vs teacher) |
Distillation pipeline: bubble-distill | Teacher: cellpose-sam-teacher | Data: microbubble-images