File size: 8,336 Bytes
987c4be |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 |
# Bug Investigation: Slice Comparison Prediction Overlay Not Visible
**Issue**: Prediction overlay is invisible in slice comparison while ground truth overlay is visible
**Date**: 2025-12-09
**Branch**: `debug/slice-comparison-prediction-overlay`
---
## Observed Behavior
In the Gradio UI "Slice Comparison" tab:
- **DWI Input** (left panel): Shows grayscale brain scan β
- **Prediction** (middle panel): Shows grayscale brain scan **without any visible overlay** β
- **Ground Truth** (right panel): Shows grayscale brain scan **with green overlay** β
## Expected Behavior
The Prediction panel should show a **red overlay** on the predicted lesion area, similar to how Ground Truth shows a green overlay.
---
## Code Analysis
### Visualization Code (`viewer.py:261-268`)
```python
# Prediction panel
axes[1].imshow(d_slice, cmap="gray")
axes[1].imshow(
np.ma.masked_where(p_slice == 0, p_slice),
cmap="Reds",
alpha=0.5,
vmin=0,
vmax=1,
)
```
### Ground Truth Code (`viewer.py:273-280`)
```python
# Ground Truth panel
axes[2].imshow(d_slice, cmap="gray")
axes[2].imshow(
np.ma.masked_where(g_slice == 0, g_slice),
cmap="Greens",
alpha=0.5,
vmin=0,
vmax=1,
)
```
The code is **structurally identical**. The only difference is:
- Prediction: `cmap="Reds"`
- Ground Truth: `cmap="Greens"`
---
## Hypothesis
### Primary Hypothesis: Probability vs Binary Mask Values
| Mask Type | Typical Values | Colormap Rendering | Visibility |
|-----------|----------------|-------------------|------------|
| Ground Truth | Binary (0 or 1) | 1.0 β **Dark Green** | High β |
| Prediction | Probabilities (0.0-0.3) | 0.1 β **Nearly White** | None β |
**Why this matters:**
1. Matplotlib's **"Reds" colormap** goes from white (0) β red (1)
2. With `vmin=0, vmax=1`:
- A value of `0.05` maps to 5% of the colormap = nearly white
- A value of `1.0` maps to 100% of the colormap = red
3. With `alpha=0.5` over a grayscale background, nearly-white overlays are **invisible**
**Evidence:**
- DeepISLES SEALS model may output probability maps, not binary masks
- The `compute_dice` function in `metrics.py` applies a `threshold=0.5` to binarize predictions
- The visualization does **not** apply any thresholding before display
### Alternative Hypotheses
1. **Empty slice**: Prediction mask is all zeros at the selected slice (unlikely given the slice selection logic uses `get_slice_at_max_lesion(prediction_path)`)
2. **Data type issue**: Float comparison `p_slice == 0` may fail for float32 arrays (unlikely - works for ground truth)
3. **File path mismatch**: Wrong file being loaded as prediction (need to verify)
---
## Diagnostic Steps
### 1. Check Prediction Mask Values
```python
import nibabel as nib
import numpy as np
# Load a prediction mask from a recent run
pred = nib.load("/path/to/prediction.nii.gz").get_fdata()
print(f"Shape: {pred.shape}")
print(f"Dtype: {pred.dtype}")
print(f"Min: {pred.min()}, Max: {pred.max()}")
print(f"Unique values: {np.unique(pred)[:20]}") # First 20 unique values
print(f"Non-zero count: {np.count_nonzero(pred)}")
print(f"Values > 0.5: {np.count_nonzero(pred > 0.5)}")
```
### 2. Check Ground Truth Mask Values
```python
gt = nib.load("/path/to/ground_truth.nii.gz").get_fdata()
print(f"Shape: {gt.shape}")
print(f"Dtype: {gt.dtype}")
print(f"Min: {gt.min()}, Max: {gt.max()}")
print(f"Unique values: {np.unique(gt)}")
```
### 3. Visual Comparison
```python
# Plot histogram of values
import matplotlib.pyplot as plt
fig, axes = plt.subplots(1, 2)
axes[0].hist(pred[pred > 0].flatten(), bins=50)
axes[0].set_title("Prediction non-zero values")
axes[1].hist(gt[gt > 0].flatten(), bins=50)
axes[1].set_title("Ground Truth non-zero values")
plt.savefig("mask_histograms.png")
```
---
## Proposed Fix
### Option A: Binarize Prediction Before Display (Recommended)
```python
# In render_slice_comparison, before creating overlay:
p_slice_binary = (p_slice > 0.5).astype(float)
axes[1].imshow(
np.ma.masked_where(p_slice_binary == 0, p_slice_binary),
cmap="Reds",
alpha=0.5,
vmin=0,
vmax=1,
)
```
**Pros:**
- Consistent with how `compute_dice` treats predictions
- Clear visualization of model decision boundary
- Matches clinical interpretation (lesion vs not-lesion)
**Cons:**
- Loses probability information in visualization
### Option B: Dynamic Normalization
```python
# Normalize to actual value range instead of fixed 0-1
p_max = p_slice.max() if p_slice.max() > 0 else 1.0
axes[1].imshow(
np.ma.masked_where(p_slice == 0, p_slice),
cmap="Reds",
alpha=0.5,
vmin=0,
vmax=p_max,
)
```
**Pros:**
- Shows probability information
- Works regardless of value range
**Cons:**
- Inconsistent intensity across cases
- Low-confidence predictions still appear bright (misleading)
### Option C: Threshold-Based Masking
```python
# Only show values above a threshold
threshold = 0.5
axes[1].imshow(
np.ma.masked_where(p_slice < threshold, p_slice),
cmap="Reds",
alpha=0.5,
vmin=threshold,
vmax=1.0,
)
```
**Pros:**
- Only shows confident predictions
- Good dynamic range for visible values
**Cons:**
- May hide uncertain but potentially relevant areas
---
## Recommendation
**Implement Option A (Binarize)** because:
1. It matches the clinical use case (segmentation β binary decision)
2. It's consistent with `compute_dice` threshold behavior
3. It provides clear, interpretable visualization
4. The raw probability map can still be viewed in NiiVue if needed
---
## Dependencies
| Package | Version | Relevant |
|---------|---------|----------|
| gradio | >=6.0.0 | Unlikely cause (renders matplotlib figure correctly) |
| matplotlib | >=3.8.0 | Colormap behavior is standard |
| numpy | >=1.26.0,<2.0.0 | Float comparison works correctly |
| nibabel | >=5.2.0 | Loads data correctly |
---
## Resolution
**Status**: FIXED (2025-12-09)
**Branch**: `debug/slice-comparison-prediction-overlay`
### Changes Made
**Primary Fix (Issue #23):**
1. **`viewer.py:270-275`**: Added binarization of prediction mask in `render_slice_comparison`:
```python
# Binarize prediction at threshold 0.5 for visible overlay (Issue #23)
p_slice_binary = (p_slice > 0.5).astype(float)
```
2. **`viewer.py:156-164`**: Added binarization in `render_3panel_view` for consistency
3. **`tests/conftest.py`**: Added `synthetic_probability_mask` and `synthetic_binary_mask` fixtures
4. **`tests/ui/test_viewer.py`**: Added `TestRenderSliceComparisonProbabilityMask` test class
**Additional Fixes (Found During Audit):**
5. **Race Condition (P2)**: Replaced global `_previous_results_dir` with `gr.State` for per-session thread-safe cleanup tracking
6. **Inconsistent Threshold in compute_volume_ml**: Added `threshold=0.5` parameter for consistent binarization
7. **render_3panel_view Wired Into UI**:
- Added `gr.Tabs` layout with "Interactive 3D" and "Static Report" tabs
- `render_3panel_view` now displayed in "Static Report" alongside slice comparison
- Provides WebGL2 fallback via static matplotlib figures
8. **Thread-Safe Matplotlib**: Refactored from `pyplot` API to Object-Oriented API (`Figure()`) for multi-user safety
### Verification
- All 136 tests pass
- Lint (ruff) passes
- Type check (mypy) passes
## Files Modified
| File | Changes |
|------|---------|
| `src/stroke_deepisles_demo/ui/viewer.py` | OO matplotlib API, binarization in both render functions |
| `src/stroke_deepisles_demo/ui/app.py` | gr.State, render_3panel_view integration, volume_ml |
| `src/stroke_deepisles_demo/ui/components.py` | Tabs layout (Interactive 3D / Static Report) |
| `src/stroke_deepisles_demo/metrics.py` | threshold parameter for compute_volume_ml |
| `tests/conftest.py` | New probability/binary mask fixtures |
| `tests/ui/test_viewer.py` | Probability mask tests |
| `tests/ui/test_app.py` | Updated for new return signature |
## Next Steps
1. [x] Run diagnostic script to confirm hypothesis
2. [x] Implement fix (Option A - binarize)
3. [x] Add test case for probability-valued masks
4. [x] Wire render_3panel_view into UI with tabs
5. [x] Fix race condition with gr.State
6. [x] Make matplotlib thread-safe with OO API
7. [ ] Verify fix in local Gradio app (manual testing recommended)
8. [ ] Create PR and merge to main
|