Model Card: Nematostella Rosette Detector
Model Description
- Model type: Attention U-Net (Oktay et al. 2018)
- Task: Semantic segmentation — pixel-wise detection of epithelial rosette structures in Nematostella vectensis confocal microscopy images
- Input: 2-channel binary boundary representation (512×512×2): thin inner cell boundary lines + morphologically thickened cell boundaries. No fluorescence intensity used.
- Output: Pixel-wise probability map (0–1) of rosette likelihood
- Framework: PyTorch
- License: MIT
Intended Use
- Primary use: Generating candidate rosette proposals for expert-reviewed human-in-the-loop annotation in napari
- Out-of-scope: Direct automated quantification without expert review; application to other organisms, tissue types, or imaging modalities without retraining
Training Data
- 214 confocal microscopy images of Nematostella vectensis juvenile epidermis
- Acquired on Olympus IX83 FV3000, 60× silicone objective, 1024×1024 px, 0.134 µm/px
- Ground truth: manually annotated rosette instance masks (napari), minimum 5 cells sharing a common central axis or coalescing around an extruding cell
- Will be deposited on Zenodo upon publication
Evaluation
Evaluated on held-out validation set (54 images, 269 rosette instances, 20% of total dataset):
| Metric | Value |
|---|---|
| Pixel-level Dice | 0.54 |
| Pixel-level Recall | 0.65 |
| Event-level Recall (≥1px, threshold 0.5) | 88.8% (239/269) |
| Rosettes with ≥10% pixel coverage at threshold 0.4 | 82.9% (223/269) |
| Rosettes with >80% pixel coverage (threshold 0.5) | 47.2% (127/269) |
| Rosettes with >40% pixel coverage (threshold 0.5) | 72.5% (195/269) |
| Completely missed (no heatmap signal) | 11.2% (30/269) |
Note: Pixel-level recall (0.65) reflects boundary imprecision in detected rosettes, not missed detection events. Event-level recall (88.8%) is the operationally relevant metric for the human-in-the-loop workflow. Pixel-level metrics computed on full images using sliding window inference (512×512 patches, 256px overlap, threshold 0.5).
Architecture
- 4-level encoder-decoder (U-Net)
- Additive attention gates at 3 upsampling junctions
- Feature maps: 64 → 128 → 256 → 512 → 1024 (bottleneck)
- Final layer: 1×1 convolution + Sigmoid
Training Configuration
| Parameter | Value |
|---|---|
| Loss | 0.5× BCE + 0.5× Dice |
| Optimizer | AdamW |
| Learning rate | 1×10⁻⁴ |
| Batch size | 4 |
| Early stopping | Patience 15 epochs (val loss) |
| Input patch size | 512×512 |
Data Augmentation
Random rotation (p=0.5), elastic deformation (α=120, σ=6, p=0.4), affine transforms (p=0.6), coarse dropout (p=0.3) via Albumentations.
Limitations
- Trained exclusively on a single laboratory's images (single instrument, single organism, single staining protocol)
- Generalisation to other imaging setups not evaluated
- 11.2% of rosette events receive no predicted pixels at threshold 0.5 — expert full-image review of the full image is required
- Validation set was also used for early stopping (standard practice); the model was never trained on validation images
Hardware
Apple MacBook Pro M2 Max (64 GB unified memory), PyTorch MPS backend. Training: a few hours. Inference: <1 min/image.
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support