context: stroke-deepisles-demo
Disclaimer: This software is for research and demonstration purposes only. Not for clinical use.
overview
This document explains why we're building stroke-deepisles-demo and the architectural context that informs our design decisions.
the problem we're solving
We want to demonstrate an end-to-end neuroimaging inference pipeline:
CURRENT (Phase 1A):
Local NIfTI files (extracted from ISLES24-MR-Lite ZIPs)
β
File-based loader (parse BIDS filenames)
β
DeepISLES Docker (stroke segmentation)
β
NiiVue visualization (Gradio Space)
FUTURE (Phase 1C-D):
HuggingFace Hub (properly uploaded dataset)
β
Tobias's datasets fork (BIDS loader + Nifti feature)
β
DeepISLES Docker (stroke segmentation)
β
NiiVue visualization (Gradio Space)
This showcases that:
- Neuroimaging data can be loaded from local BIDS-named files (NOW)
- Neuroimaging data can be consumed from HF Hub with proper BIDS/NIfTI support (FUTURE)
- Clinical-grade models can run via Docker as black boxes
- Results can be visualized interactively in a browser
critical discovery (2025-12-04)
The original ISLES24-MR-Lite dataset is NOT properly uploaded to HuggingFace.
It's just raw ZIP files dumped on HF, not a proper Dataset with parquet/Arrow format. This means load_dataset() fails. See data/discovery/isles24_schema_report.txt for full details.
Workaround: We extracted the ZIPs locally to data/isles24/ (git-ignored) and will implement a file-based loader first. Later, we'll re-upload properly and verify full HF consumption.
why we need tobias's datasets fork
As of December 2025, the official huggingface/datasets library has partial NIfTI support but lacks critical features for neuroimaging workflows.
what's merged upstream
| PR | Author | Status | Description |
|---|---|---|---|
| #7874 | CloseChoice (Tobias) | Merged Nov 21 | NIfTI visualization support |
| #7878 | CloseChoice (Tobias) | Merged Nov 27 | Replace papaya with NiiVue |
what's NOT merged (and why we need the fork)
| PR | Author | Status | Description |
|---|---|---|---|
| #7886 | The-Obstacle-Is-The-Way | Open | BIDS dataset loader - load_dataset('bids', ...) |
| #7887 | The-Obstacle-Is-The-Way | Open | NIfTI lazy loading fix - use dataobj not get_fdata() |
| #7892 | CloseChoice (Tobias) | Open | NIfTI encoding for lazy upload - fixes Arrow serialization |
The fork branch bundles all these features:
https://github.com/CloseChoice/datasets/tree/feat/bids-loader-streaming-upload-fix
We pin to this branch until upstream merges the PRs.
key components
1. data source: ISLES24-MR-Lite
- HF Dataset: YongchengYAO/ISLES24-MR-Lite (BROKEN - raw ZIPs, not proper dataset)
- Local extracted:
data/isles24/(git-ignored) - Content: 149 acute stroke MRI cases with DWI, ADC, and manual infarct masks
- Origin: Subset of ISLES 2024 challenge data
- Why suitable: DeepISLES was trained on ISLES 2022, so ISLES24 is an external test set (no data leakage)
File structure (after extraction):
data/isles24/
βββ Images-DWI/sub-stroke{XXXX}_ses-02_dwi.nii.gz # 149 files
βββ Images-ADC/sub-stroke{XXXX}_ses-02_adc.nii.gz # 149 files
βββ Masks/sub-stroke{XXXX}_ses-02_lesion-msk.nii.gz # 149 files
Schema reference: data/discovery/isles24_schema_report.txt
2. model: DeepISLES
- Paper: Nature Communications 2025 - "DeepISLES: A clinically validated ischemic stroke segmentation model"
- GitHub: ezequieldlrosa/DeepIsles
- Docker:
isleschallenge/deepisles - Inputs: DWI + ADC (required), FLAIR (required for ensemble, optional for fast mode)
- Output: 3D binary lesion mask (NIfTI)
- Mode:
fast=Trueruns SEALS only (the ISLES'22 challenge winner)
Why we use fast=True (SEALS-only mode)
DeepISLES is an ensemble of 3 models from the ISLES'22 challenge:
| Model | Based On | Inputs Required | Notes |
|---|---|---|---|
| SEALS | nnUNet | DWI + ADC | π ISLES'22 Winner - runs in --fast mode |
| NVAUTO | MONAI Auto3dseg | DWI + ADC + FLAIR | Requires FLAIR |
| SWAN | FACTORIZER | DWI + ADC + FLAIR | Requires FLAIR |
Key insight: ISLES24-MR-Lite contains only DWI + ADC (no FLAIR). Therefore:
--fast Trueβ Runs SEALS only β Perfect match for our dataset--fast Falseβ Would try to run all 3 models β NVAUTO/SWAN would fail without FLAIR
This is not a downgrade. SEALS won the ISLES'22 challenge and is state-of-the-art for stroke lesion segmentation using DWI+ADC alone.
Scientific validity: External validation with zero data leakage
| Dataset | Year | Used For |
|---|---|---|
| ISLES 2022 | 2022 | SEALS training data (250 cases) |
| ISLES 2024 | 2024 | Our test data (149 cases from MR-Lite) |
- Different patient cohorts (2 years apart, different hospitals)
- SEALS has never seen ISLES24 patients
- We have ground truth masks β can validate predictions
- This constitutes a legitimate external validation study
3. visualization: NiiVue
- Library: niivue/niivue
- Type: WebGL2-based neuroimaging viewer
- Formats: Native NIfTI support, overlays, multiplanar views
- Integration: Via Gradio custom HTML component or iframe
4. UI framework: Gradio 5
- Version: Gradio 5.x (latest as of Dec 2025)
- Features: SSR for fast loading, improved components, WebRTC support
- Deployment: Hugging Face Spaces
architecture diagram
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β stroke-deepisles-demo β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β data/ β β inference/ β β ui/ β β
β β β β β β β β
β β - loader βββββΆβ - docker βββββΆβ - gradio β β
β β - adapter β β - wrapper β β - niivue β β
β β - staging β β - pipeline β β - viewer β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β β β β
β βΌ βΌ βΌ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β core/ β β
β β - config (pydantic-settings) β β
β β - types (dataclasses, TypedDicts) β β
β β - exceptions β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β β
βΌ βΌ βΌ
ββββββββββββ ββββββββββββ ββββββββββββ
β HF Hub β β Docker β β Browser β
β datasets β β Engine β β WebGL2 β
ββββββββββββ ββββββββββββ ββββββββββββ
design principles
- Vertical slices: Each phase delivers runnable functionality
- TDD: Tests written before implementation
- Type safety: Full type hints, mypy/pyright strict mode
- Separation of concerns: Data, inference, and UI are independent modules
- Docker as black box: We don't reimplement DeepISLES, we call it
- Graceful degradation: Mock Docker for tests, fallback viewers if NiiVue fails
reference repositories
These are cloned locally (without git linkages) for reference:
| Directory | Source | Purpose |
|---|---|---|
_reference_repos/datasets-tobias-bids-fork/ |
CloseChoice/datasets@feat/bids-loader-streaming-upload-fix | BIDS loader + NIfTI lazy loading |
_reference_repos/arc-aphasia-bids/ |
The-Obstacle-Is-The-Way/arc-aphasia-bids | BIDS upload patterns (reference only) |
_reference_repos/DeepIsles/ |
ezequieldlrosa/DeepIsles | DeepISLES CLI interface reference |
_reference_repos/bids-neuroimaging-space/ |
TobiasPitters/bids-neuroimaging | Working NiiVue + FastAPI implementation |
key reference: tobias's bids-neuroimaging space
This is the most important reference for Phase 4 (UI). It demonstrates:
- NiiVue working in HF Spaces - Proof that WebGL2 viewer works in production
- FastAPI + raw HTML approach - Clean, no Gradio overhead for viewer
- Base64 data URLs for NIfTI -
data:application/octet-stream;base64,{b64} - NiiVue CDN loading -
https://unpkg.com/@niivue/niivue@0.57.0/dist/index.js - Multiplanar + 3D rendering -
setSliceType(sliceTypeMultiplanar)+setMultiplanarLayout(2)
Key file: main.py (~485 lines) - complete working implementation.