Bug Investigation: Slice Comparison Prediction Overlay Not Visible
Issue: Prediction overlay is invisible in slice comparison while ground truth overlay is visible
Date: 2025-12-09
Branch: debug/slice-comparison-prediction-overlay
Observed Behavior
In the Gradio UI "Slice Comparison" tab:
- DWI Input (left panel): Shows grayscale brain scan β
- Prediction (middle panel): Shows grayscale brain scan without any visible overlay β
- Ground Truth (right panel): Shows grayscale brain scan with green overlay β
Expected Behavior
The Prediction panel should show a red overlay on the predicted lesion area, similar to how Ground Truth shows a green overlay.
Code Analysis
Visualization Code (viewer.py:261-268)
# Prediction panel
axes[1].imshow(d_slice, cmap="gray")
axes[1].imshow(
np.ma.masked_where(p_slice == 0, p_slice),
cmap="Reds",
alpha=0.5,
vmin=0,
vmax=1,
)
Ground Truth Code (viewer.py:273-280)
# Ground Truth panel
axes[2].imshow(d_slice, cmap="gray")
axes[2].imshow(
np.ma.masked_where(g_slice == 0, g_slice),
cmap="Greens",
alpha=0.5,
vmin=0,
vmax=1,
)
The code is structurally identical. The only difference is:
- Prediction:
cmap="Reds" - Ground Truth:
cmap="Greens"
Hypothesis
Primary Hypothesis: Probability vs Binary Mask Values
| Mask Type | Typical Values | Colormap Rendering | Visibility |
|---|---|---|---|
| Ground Truth | Binary (0 or 1) | 1.0 β Dark Green | High β |
| Prediction | Probabilities (0.0-0.3) | 0.1 β Nearly White | None β |
Why this matters:
- Matplotlib's "Reds" colormap goes from white (0) β red (1)
- With
vmin=0, vmax=1:- A value of
0.05maps to 5% of the colormap = nearly white - A value of
1.0maps to 100% of the colormap = red
- A value of
- With
alpha=0.5over a grayscale background, nearly-white overlays are invisible
Evidence:
- DeepISLES SEALS model may output probability maps, not binary masks
- The
compute_dicefunction inmetrics.pyapplies athreshold=0.5to binarize predictions - The visualization does not apply any thresholding before display
Alternative Hypotheses
Empty slice: Prediction mask is all zeros at the selected slice (unlikely given the slice selection logic uses
get_slice_at_max_lesion(prediction_path))Data type issue: Float comparison
p_slice == 0may fail for float32 arrays (unlikely - works for ground truth)File path mismatch: Wrong file being loaded as prediction (need to verify)
Diagnostic Steps
1. Check Prediction Mask Values
import nibabel as nib
import numpy as np
# Load a prediction mask from a recent run
pred = nib.load("/path/to/prediction.nii.gz").get_fdata()
print(f"Shape: {pred.shape}")
print(f"Dtype: {pred.dtype}")
print(f"Min: {pred.min()}, Max: {pred.max()}")
print(f"Unique values: {np.unique(pred)[:20]}") # First 20 unique values
print(f"Non-zero count: {np.count_nonzero(pred)}")
print(f"Values > 0.5: {np.count_nonzero(pred > 0.5)}")
2. Check Ground Truth Mask Values
gt = nib.load("/path/to/ground_truth.nii.gz").get_fdata()
print(f"Shape: {gt.shape}")
print(f"Dtype: {gt.dtype}")
print(f"Min: {gt.min()}, Max: {gt.max()}")
print(f"Unique values: {np.unique(gt)}")
3. Visual Comparison
# Plot histogram of values
import matplotlib.pyplot as plt
fig, axes = plt.subplots(1, 2)
axes[0].hist(pred[pred > 0].flatten(), bins=50)
axes[0].set_title("Prediction non-zero values")
axes[1].hist(gt[gt > 0].flatten(), bins=50)
axes[1].set_title("Ground Truth non-zero values")
plt.savefig("mask_histograms.png")
Proposed Fix
Option A: Binarize Prediction Before Display (Recommended)
# In render_slice_comparison, before creating overlay:
p_slice_binary = (p_slice > 0.5).astype(float)
axes[1].imshow(
np.ma.masked_where(p_slice_binary == 0, p_slice_binary),
cmap="Reds",
alpha=0.5,
vmin=0,
vmax=1,
)
Pros:
- Consistent with how
compute_dicetreats predictions - Clear visualization of model decision boundary
- Matches clinical interpretation (lesion vs not-lesion)
Cons:
- Loses probability information in visualization
Option B: Dynamic Normalization
# Normalize to actual value range instead of fixed 0-1
p_max = p_slice.max() if p_slice.max() > 0 else 1.0
axes[1].imshow(
np.ma.masked_where(p_slice == 0, p_slice),
cmap="Reds",
alpha=0.5,
vmin=0,
vmax=p_max,
)
Pros:
- Shows probability information
- Works regardless of value range
Cons:
- Inconsistent intensity across cases
- Low-confidence predictions still appear bright (misleading)
Option C: Threshold-Based Masking
# Only show values above a threshold
threshold = 0.5
axes[1].imshow(
np.ma.masked_where(p_slice < threshold, p_slice),
cmap="Reds",
alpha=0.5,
vmin=threshold,
vmax=1.0,
)
Pros:
- Only shows confident predictions
- Good dynamic range for visible values
Cons:
- May hide uncertain but potentially relevant areas
Recommendation
Implement Option A (Binarize) because:
- It matches the clinical use case (segmentation β binary decision)
- It's consistent with
compute_dicethreshold behavior - It provides clear, interpretable visualization
- The raw probability map can still be viewed in NiiVue if needed
Dependencies
| Package | Version | Relevant |
|---|---|---|
| gradio | >=6.0.0 | Unlikely cause (renders matplotlib figure correctly) |
| matplotlib | >=3.8.0 | Colormap behavior is standard |
| numpy | >=1.26.0,<2.0.0 | Float comparison works correctly |
| nibabel | >=5.2.0 | Loads data correctly |
Resolution
Status: FIXED (2025-12-09)
Branch: debug/slice-comparison-prediction-overlay
Changes Made
Primary Fix (Issue #23):
viewer.py:270-275: Added binarization of prediction mask inrender_slice_comparison:# Binarize prediction at threshold 0.5 for visible overlay (Issue #23) p_slice_binary = (p_slice > 0.5).astype(float)viewer.py:156-164: Added binarization inrender_3panel_viewfor consistencytests/conftest.py: Addedsynthetic_probability_maskandsynthetic_binary_maskfixturestests/ui/test_viewer.py: AddedTestRenderSliceComparisonProbabilityMasktest class
Additional Fixes (Found During Audit):
Race Condition (P2): Replaced global
_previous_results_dirwithgr.Statefor per-session thread-safe cleanup trackingInconsistent Threshold in compute_volume_ml: Added
threshold=0.5parameter for consistent binarizationrender_3panel_view Wired Into UI:
- Added
gr.Tabslayout with "Interactive 3D" and "Static Report" tabs render_3panel_viewnow displayed in "Static Report" alongside slice comparison- Provides WebGL2 fallback via static matplotlib figures
- Added
Thread-Safe Matplotlib: Refactored from
pyplotAPI to Object-Oriented API (Figure()) for multi-user safety
Verification
- All 136 tests pass
- Lint (ruff) passes
- Type check (mypy) passes
Files Modified
| File | Changes |
|---|---|
src/stroke_deepisles_demo/ui/viewer.py |
OO matplotlib API, binarization in both render functions |
src/stroke_deepisles_demo/ui/app.py |
gr.State, render_3panel_view integration, volume_ml |
src/stroke_deepisles_demo/ui/components.py |
Tabs layout (Interactive 3D / Static Report) |
src/stroke_deepisles_demo/metrics.py |
threshold parameter for compute_volume_ml |
tests/conftest.py |
New probability/binary mask fixtures |
tests/ui/test_viewer.py |
Probability mask tests |
tests/ui/test_app.py |
Updated for new return signature |
Next Steps
- Run diagnostic script to confirm hypothesis
- Implement fix (Option A - binarize)
- Add test case for probability-valued masks
- Wire render_3panel_view into UI with tabs
- Fix race condition with gr.State
- Make matplotlib thread-safe with OO API
- Verify fix in local Gradio app (manual testing recommended)
- Create PR and merge to main