MidasMap: Automated Immunogold Particle Detection for TEM Synapse Images
The Problem
Neuroscientists use immunogold labeling to visualize receptor proteins at synapses in transmission electron microscopy (TEM) images.
- 6nm gold beads label AMPA receptors (panAMPA)
- 12nm gold beads label NR1 (NMDA) receptors
- 18nm gold beads label vGlut2 (vesicular glutamate transporter)
Manual counting is slow and subjective. Each image takes 30-60 minutes to annotate. With hundreds of synapses per experiment, this becomes a bottleneck.
The Challenge
- Particles are tiny (4-10 pixels radius) on 2048x2115 images
- Contrast delta is only 11-39 intensity units on a 0-255 scale
- Large dark vesicles look similar to gold particles to naive detectors
- Only 453 labeled particles across 10 training images
Previous Approaches (GoldDigger et al.)
| Approach | Result |
|---|---|
| CenterNet (initial attempt) | "Detection quality remained poor" |
| U-Net heatmap | Macro F1 = 0.005-0.017 |
| GoldDigger/cGAN | "No durable breakthrough" |
| Aggressive filtering | "FP dropped but TP dropped harder" |
Core issue: Previous systems failed due to:
- Incorrect coordinate conversion (microns treated as normalized values)
- Broken loss function (heatmap peaks not exactly 1.0)
- Overfitting to fixed training patches
MidasMap Architecture
Input: Raw TEM Image (any size)
|
[Sliding Window → 512x512 patches]
|
ResNet-50 Encoder (pretrained on CEM500K: 500K EM images)
|
BiFPN Neck (bidirectional feature pyramid, 2 rounds, 128ch)
|
Transposed Conv Decoder → stride-2 output
|
+------------------+-------------------+
| | |
Heatmap Head Offset Head
(2ch sigmoid) (2ch regression)
6nm channel sub-pixel x,y
12nm channel correction
| |
+------------------+-------------------+
|
Peak Extraction (max-pool NMS)
|
Cross-class NMS + Mask Filter
|
Output: [(x, y, class, confidence), ...]
Key Design Decisions
CEM500K Backbone: ResNet-50 pretrained on 500,000 electron microscopy images via self-supervised learning. The backbone already understands EM structures (membranes, vesicles, organelles) before seeing any gold particles. This is why the model reaches F1=0.93 in just 5 epochs.
Stride-2 Output: Standard CenterNet uses stride 4. At stride 4, a 6nm bead (4-6px radius) collapses to 1 pixel — too small to detect reliably. At stride 2, the same bead occupies 2-3 pixels, enough for Gaussian peak detection.
CornerNet Focal Loss: With positive:negative pixel ratio of 1:23,000, standard BCE would learn to predict all zeros. The focal loss uses (1-p)^alpha weighting to focus on hard examples and (1-gt)^beta penalty reduction near peaks.
Raw Image Input: No preprocessing. The CEM500K backbone was trained on raw EM images. Any heavy preprocessing (top-hat, CLAHE) creates a domain gap and hurts performance. The model learns to distinguish particles from vesicles through training data, not handcrafted filters.
Training Strategy
3-Phase Training with Discriminative Learning Rates
| Phase | Epochs | What's Trainable | Learning Rate |
|---|---|---|---|
| 1. Warm-up | 40 | BiFPN + heads only | 1e-3 |
| 2. Deep unfreeze | 40 | + layer3 + layer4 | 1e-5 to 5e-4 |
| 3. Full fine-tune | 60 | All layers | 1e-6 to 2e-4 |
Loss Curve (final model):
Phase 1 Phase 2 Phase 3
| | |
1.4 |\ | |
| \ | |
1.0 | \ | |
| ---- | |
0.8 | \ | |
| \ | |
0.6 | \--+--- |
| | \ |
0.4 | | \--- |
| | \-------+---
0.2 | | |
+---+---+----+---+---+----+---+---+--> Epoch
0 10 20 40 50 60 80 100 140
Data Augmentation
- Random 90-degree rotations (EM is rotation-invariant)
- Horizontal/vertical flips
- Conservative brightness/contrast (+-8% — preserves the subtle particle signal)
- Gaussian noise (simulates shot noise)
- Copy-paste augmentation: real bead crops blended onto training patches
- 70% hard mining: patches centered on particles, 30% random
Overfitting Prevention
- Unique patches every epoch: RNG reseeded per sample so the model never sees the same patch twice
- Early stopping: patience=20 epochs, monitoring validation F1
- Weight decay: 1e-4 on all parameters
Critical Bugs Found and Fixed
Bug 1: Coordinate Conversion
Problem: CSV files labeled "XY in microns" were assumed to be normalized [0,1] coordinates. They were actual micron values.
Effect: All particle annotations were offset by 50-80 pixels from the real locations. The model was learning to detect particles where none existed.
Fix: Multiply by 1790 px/micron (verified against researcher's color overlay TIFs across 7 synapses).
Bug 2: Heatmap Peak Values
Problem: Gaussian peaks were centered at float coordinates, producing peak values of 0.78-0.93 instead of exactly 1.0.
Effect: The CornerNet focal loss uses pos_mask = (gt == 1.0) to identify positive pixels. With no pixels at exactly 1.0, the model had zero positive training signal. It literally could not learn.
Fix: Center Gaussians at the integer grid point (always produces 1.0). Sub-pixel precision is handled by the offset regression head.
Bug 3: Overfitting on Fixed Patches
Problem: The dataset generated 200 random patches once at initialization. Every epoch replayed the same patches.
Effect: On fast CUDA GPUs, the model memorized all patches in ~17 epochs (loss crashed from 1.6 to 0.002). Validation F1 peaked at 0.66 and degraded.
Fix: Reseed RNG per __getitem__ call so every patch is unique.
Results
Leave-One-Image-Out Cross-Validation (10 folds, 5 seeds each)
| Fold | Avg F1 | Best F1 | Notes |
|---|---|---|---|
| S27 | 0.990 | 0.994 | |
| S8 | 0.981 | 0.988 | |
| S25 | 0.972 | 0.977 | |
| S29 | 0.956 | 0.966 | |
| S1 | 0.930 | 0.940 | |
| S4 | 0.919 | 0.972 | |
| S22 | 0.907 | 0.938 | |
| S13 | 0.890 | 0.912 | |
| S7 | 0.799 | 1.000 | Only 3 particles (noisy metric) |
| S15 | 0.633 | 0.667 | Only 1 particle (noisy metric) |
Mean F1 = 0.943 (8 folds with sufficient annotations)
Per-class Performance (S1 fold, best threshold)
| Class | Precision | Recall | F1 |
|---|---|---|---|
| 6nm (AMPA) | 0.895 | 1.000 | 0.944 |
| 12nm (NR1) | 0.833 | 1.000 | 0.909 |
100% recall on both classes — every particle is found. Only errors are a few false positives.
Generalization to Unseen Images
Tested on 15 completely unseen images from a different imaging session. Detections land correctly on particles with no manual tuning. The model successfully detects both 6nm and 12nm particles on:
- Wild-type (Wt2) samples
- Heterozygous (Het1) samples
- Different synapse regions (D1, E3, S1, S10, S12, S18)
System Components
MidasMap/
config/config.yaml # All hyperparameters
src/
preprocessing.py # Data loading (10 synapses, 453 particles)
model.py # CenterNet: ResNet-50 + BiFPN + heads (24.4M params)
loss.py # CornerNet focal loss + offset regression
heatmap.py # GT generation + peak extraction + NMS
dataset.py # Patch sampling, augmentation, copy-paste
postprocess.py # Mask filter, cross-class NMS
ensemble.py # D4 TTA + sliding window inference
evaluate.py # Hungarian matching, F1/precision/recall
visualize.py # Overlay visualizations
train.py # LOOCV training (--fold, --seed)
train_final.py # Final deployable model (all data)
predict.py # Inference on new images
evaluate_loocv.py # Full evaluation runner
app.py # Gradio web dashboard
slurm/ # HPC job scripts
tests/ # 36 unit tests
Dashboard
MidasMap includes a web-based dashboard (Gradio) for interactive use:
- Upload any TEM image (.tif)
- Adjust confidence threshold and NMS parameters
- View detections overlaid on the image
- Inspect per-class heatmaps
- Analyze confidence distributions and spatial patterns
- Export results as CSV (particle_id, x_px, y_px, x_um, y_um, class, confidence)
python app.py --checkpoint checkpoints/final/final_model.pth
# Opens at http://localhost:7860
Future Directions
- Spatial analytics: distance to synaptic cleft, nearest-neighbor analysis, Ripley's K-function
- Size regression head: predict actual bead diameter instead of binary classification
- 18nm detection: extend to vGlut2 particles (3-class model)
- Active learning: flag low-confidence detections for human review
- Cross-protocol generalization: fine-tune on cryo-EM or different staining protocols
Technical Summary
- Model: CenterNet with CEM500K-pretrained ResNet-50, BiFPN neck, stride-2 output
- Training: 3-phase with discriminative LRs, 140 epochs, 453 particles / 10 images
- Evaluation: Leave-one-image-out CV, Hungarian matching, F1 = 0.943
- Inference: Sliding window (512x512, 128px overlap), ~10s per image on GPU
- Output: Per-particle (x, y, class, confidence) with optional heatmap visualization