stroke-viewer-frontend / docs /specs /23-slice-comparison-overlay-bug.md
VibecoderMcSwaggins's picture
fix(ui): prediction overlay invisible, race condition, thread safety (#23) (#23)
987c4be unverified
|
raw
history blame
8.34 kB

Bug Investigation: Slice Comparison Prediction Overlay Not Visible

Issue: Prediction overlay is invisible in slice comparison while ground truth overlay is visible

Date: 2025-12-09 Branch: debug/slice-comparison-prediction-overlay


Observed Behavior

In the Gradio UI "Slice Comparison" tab:

  • DWI Input (left panel): Shows grayscale brain scan βœ“
  • Prediction (middle panel): Shows grayscale brain scan without any visible overlay βœ—
  • Ground Truth (right panel): Shows grayscale brain scan with green overlay βœ“

Expected Behavior

The Prediction panel should show a red overlay on the predicted lesion area, similar to how Ground Truth shows a green overlay.


Code Analysis

Visualization Code (viewer.py:261-268)

# Prediction panel
axes[1].imshow(d_slice, cmap="gray")
axes[1].imshow(
    np.ma.masked_where(p_slice == 0, p_slice),
    cmap="Reds",
    alpha=0.5,
    vmin=0,
    vmax=1,
)

Ground Truth Code (viewer.py:273-280)

# Ground Truth panel
axes[2].imshow(d_slice, cmap="gray")
axes[2].imshow(
    np.ma.masked_where(g_slice == 0, g_slice),
    cmap="Greens",
    alpha=0.5,
    vmin=0,
    vmax=1,
)

The code is structurally identical. The only difference is:

  • Prediction: cmap="Reds"
  • Ground Truth: cmap="Greens"

Hypothesis

Primary Hypothesis: Probability vs Binary Mask Values

Mask Type Typical Values Colormap Rendering Visibility
Ground Truth Binary (0 or 1) 1.0 β†’ Dark Green High βœ“
Prediction Probabilities (0.0-0.3) 0.1 β†’ Nearly White None βœ—

Why this matters:

  1. Matplotlib's "Reds" colormap goes from white (0) β†’ red (1)
  2. With vmin=0, vmax=1:
    • A value of 0.05 maps to 5% of the colormap = nearly white
    • A value of 1.0 maps to 100% of the colormap = red
  3. With alpha=0.5 over a grayscale background, nearly-white overlays are invisible

Evidence:

  • DeepISLES SEALS model may output probability maps, not binary masks
  • The compute_dice function in metrics.py applies a threshold=0.5 to binarize predictions
  • The visualization does not apply any thresholding before display

Alternative Hypotheses

  1. Empty slice: Prediction mask is all zeros at the selected slice (unlikely given the slice selection logic uses get_slice_at_max_lesion(prediction_path))

  2. Data type issue: Float comparison p_slice == 0 may fail for float32 arrays (unlikely - works for ground truth)

  3. File path mismatch: Wrong file being loaded as prediction (need to verify)


Diagnostic Steps

1. Check Prediction Mask Values

import nibabel as nib
import numpy as np

# Load a prediction mask from a recent run
pred = nib.load("/path/to/prediction.nii.gz").get_fdata()
print(f"Shape: {pred.shape}")
print(f"Dtype: {pred.dtype}")
print(f"Min: {pred.min()}, Max: {pred.max()}")
print(f"Unique values: {np.unique(pred)[:20]}")  # First 20 unique values
print(f"Non-zero count: {np.count_nonzero(pred)}")
print(f"Values > 0.5: {np.count_nonzero(pred > 0.5)}")

2. Check Ground Truth Mask Values

gt = nib.load("/path/to/ground_truth.nii.gz").get_fdata()
print(f"Shape: {gt.shape}")
print(f"Dtype: {gt.dtype}")
print(f"Min: {gt.min()}, Max: {gt.max()}")
print(f"Unique values: {np.unique(gt)}")

3. Visual Comparison

# Plot histogram of values
import matplotlib.pyplot as plt
fig, axes = plt.subplots(1, 2)
axes[0].hist(pred[pred > 0].flatten(), bins=50)
axes[0].set_title("Prediction non-zero values")
axes[1].hist(gt[gt > 0].flatten(), bins=50)
axes[1].set_title("Ground Truth non-zero values")
plt.savefig("mask_histograms.png")

Proposed Fix

Option A: Binarize Prediction Before Display (Recommended)

# In render_slice_comparison, before creating overlay:
p_slice_binary = (p_slice > 0.5).astype(float)

axes[1].imshow(
    np.ma.masked_where(p_slice_binary == 0, p_slice_binary),
    cmap="Reds",
    alpha=0.5,
    vmin=0,
    vmax=1,
)

Pros:

  • Consistent with how compute_dice treats predictions
  • Clear visualization of model decision boundary
  • Matches clinical interpretation (lesion vs not-lesion)

Cons:

  • Loses probability information in visualization

Option B: Dynamic Normalization

# Normalize to actual value range instead of fixed 0-1
p_max = p_slice.max() if p_slice.max() > 0 else 1.0
axes[1].imshow(
    np.ma.masked_where(p_slice == 0, p_slice),
    cmap="Reds",
    alpha=0.5,
    vmin=0,
    vmax=p_max,
)

Pros:

  • Shows probability information
  • Works regardless of value range

Cons:

  • Inconsistent intensity across cases
  • Low-confidence predictions still appear bright (misleading)

Option C: Threshold-Based Masking

# Only show values above a threshold
threshold = 0.5
axes[1].imshow(
    np.ma.masked_where(p_slice < threshold, p_slice),
    cmap="Reds",
    alpha=0.5,
    vmin=threshold,
    vmax=1.0,
)

Pros:

  • Only shows confident predictions
  • Good dynamic range for visible values

Cons:

  • May hide uncertain but potentially relevant areas

Recommendation

Implement Option A (Binarize) because:

  1. It matches the clinical use case (segmentation β†’ binary decision)
  2. It's consistent with compute_dice threshold behavior
  3. It provides clear, interpretable visualization
  4. The raw probability map can still be viewed in NiiVue if needed

Dependencies

Package Version Relevant
gradio >=6.0.0 Unlikely cause (renders matplotlib figure correctly)
matplotlib >=3.8.0 Colormap behavior is standard
numpy >=1.26.0,<2.0.0 Float comparison works correctly
nibabel >=5.2.0 Loads data correctly

Resolution

Status: FIXED (2025-12-09) Branch: debug/slice-comparison-prediction-overlay

Changes Made

Primary Fix (Issue #23):

  1. viewer.py:270-275: Added binarization of prediction mask in render_slice_comparison:

    # Binarize prediction at threshold 0.5 for visible overlay (Issue #23)
    p_slice_binary = (p_slice > 0.5).astype(float)
    
  2. viewer.py:156-164: Added binarization in render_3panel_view for consistency

  3. tests/conftest.py: Added synthetic_probability_mask and synthetic_binary_mask fixtures

  4. tests/ui/test_viewer.py: Added TestRenderSliceComparisonProbabilityMask test class

Additional Fixes (Found During Audit):

  1. Race Condition (P2): Replaced global _previous_results_dir with gr.State for per-session thread-safe cleanup tracking

  2. Inconsistent Threshold in compute_volume_ml: Added threshold=0.5 parameter for consistent binarization

  3. render_3panel_view Wired Into UI:

    • Added gr.Tabs layout with "Interactive 3D" and "Static Report" tabs
    • render_3panel_view now displayed in "Static Report" alongside slice comparison
    • Provides WebGL2 fallback via static matplotlib figures
  4. Thread-Safe Matplotlib: Refactored from pyplot API to Object-Oriented API (Figure()) for multi-user safety

Verification

  • All 136 tests pass
  • Lint (ruff) passes
  • Type check (mypy) passes

Files Modified

File Changes
src/stroke_deepisles_demo/ui/viewer.py OO matplotlib API, binarization in both render functions
src/stroke_deepisles_demo/ui/app.py gr.State, render_3panel_view integration, volume_ml
src/stroke_deepisles_demo/ui/components.py Tabs layout (Interactive 3D / Static Report)
src/stroke_deepisles_demo/metrics.py threshold parameter for compute_volume_ml
tests/conftest.py New probability/binary mask fixtures
tests/ui/test_viewer.py Probability mask tests
tests/ui/test_app.py Updated for new return signature

Next Steps

  1. Run diagnostic script to confirm hypothesis
  2. Implement fix (Option A - binarize)
  3. Add test case for probability-valued masks
  4. Wire render_3panel_view into UI with tabs
  5. Fix race condition with gr.State
  6. Make matplotlib thread-safe with OO API
  7. Verify fix in local Gradio app (manual testing recommended)
  8. Create PR and merge to main