| # context: stroke-deepisles-demo | |
| > **Disclaimer**: This software is for research and demonstration purposes only. Not for clinical use. | |
| ## overview | |
| This document explains **why** we're building `stroke-deepisles-demo` and the architectural context that informs our design decisions. | |
| ## the problem we're solving | |
| We want to demonstrate an end-to-end neuroimaging inference pipeline: | |
| ``` | |
| CURRENT (Phase 1A): | |
| Local NIfTI files (extracted from ISLES24-MR-Lite ZIPs) | |
| β | |
| File-based loader (parse BIDS filenames) | |
| β | |
| DeepISLES Docker (stroke segmentation) | |
| β | |
| NiiVue visualization (Gradio Space) | |
| FUTURE (Phase 1C-D): | |
| HuggingFace Hub (properly uploaded dataset) | |
| β | |
| Tobias's datasets fork (BIDS loader + Nifti feature) | |
| β | |
| DeepISLES Docker (stroke segmentation) | |
| β | |
| NiiVue visualization (Gradio Space) | |
| ``` | |
| This showcases that: | |
| 1. Neuroimaging data can be loaded from local BIDS-named files (NOW) | |
| 2. Neuroimaging data can be consumed from HF Hub with proper BIDS/NIfTI support (FUTURE) | |
| 3. Clinical-grade models can run via Docker as black boxes | |
| 4. Results can be visualized interactively in a browser | |
| ## critical discovery (2025-12-04) | |
| **The original ISLES24-MR-Lite dataset is NOT properly uploaded to HuggingFace.** | |
| It's just raw ZIP files dumped on HF, not a proper Dataset with parquet/Arrow format. This means `load_dataset()` fails. See `data/discovery/isles24_schema_report.txt` for full details. | |
| **Workaround**: We extracted the ZIPs locally to `data/isles24/` (git-ignored) and will implement a file-based loader first. Later, we'll re-upload properly and verify full HF consumption. | |
| ## why we need tobias's datasets fork | |
| As of December 2025, the official `huggingface/datasets` library has **partial** NIfTI support but lacks critical features for neuroimaging workflows. | |
| ### what's merged upstream | |
| | PR | Author | Status | Description | | |
| |----|--------|--------|-------------| | |
| | [#7874](https://github.com/huggingface/datasets/pull/7874) | CloseChoice (Tobias) | Merged Nov 21 | NIfTI visualization support | | |
| | [#7878](https://github.com/huggingface/datasets/pull/7878) | CloseChoice (Tobias) | Merged Nov 27 | Replace papaya with NiiVue | | |
| ### what's NOT merged (and why we need the fork) | |
| | PR | Author | Status | Description | | |
| |----|--------|--------|-------------| | |
| | [#7886](https://github.com/huggingface/datasets/pull/7886) | The-Obstacle-Is-The-Way | Open | **BIDS dataset loader** - `load_dataset('bids', ...)` | | |
| | [#7887](https://github.com/huggingface/datasets/pull/7887) | The-Obstacle-Is-The-Way | Open | **NIfTI lazy loading fix** - use `dataobj` not `get_fdata()` | | |
| | [#7892](https://github.com/huggingface/datasets/pull/7892) | CloseChoice (Tobias) | Open | **NIfTI encoding for lazy upload** - fixes Arrow serialization | | |
| The fork branch bundles all these features: | |
| ``` | |
| https://github.com/CloseChoice/datasets/tree/feat/bids-loader-streaming-upload-fix | |
| ``` | |
| We pin to this branch until upstream merges the PRs. | |
| ## key components | |
| ### 1. data source: ISLES24-MR-Lite | |
| - **HF Dataset**: [YongchengYAO/ISLES24-MR-Lite](https://huggingface.co/datasets/YongchengYAO/ISLES24-MR-Lite) (**BROKEN** - raw ZIPs, not proper dataset) | |
| - **Local extracted**: `data/isles24/` (git-ignored) | |
| - **Content**: 149 acute stroke MRI cases with DWI, ADC, and manual infarct masks | |
| - **Origin**: Subset of ISLES 2024 challenge data | |
| - **Why suitable**: DeepISLES was trained on ISLES 2022, so ISLES24 is an **external** test set (no data leakage) | |
| **File structure** (after extraction): | |
| ``` | |
| data/isles24/ | |
| βββ Images-DWI/sub-stroke{XXXX}_ses-02_dwi.nii.gz # 149 files | |
| βββ Images-ADC/sub-stroke{XXXX}_ses-02_adc.nii.gz # 149 files | |
| βββ Masks/sub-stroke{XXXX}_ses-02_lesion-msk.nii.gz # 149 files | |
| ``` | |
| **Schema reference**: `data/discovery/isles24_schema_report.txt` | |
| ### 2. model: DeepISLES | |
| - **Paper**: Nature Communications 2025 - "DeepISLES: A clinically validated ischemic stroke segmentation model" | |
| - **GitHub**: [ezequieldlrosa/DeepIsles](https://github.com/ezequieldlrosa/DeepIsles) | |
| - **Docker**: `isleschallenge/deepisles` | |
| - **Inputs**: DWI + ADC (required), FLAIR (required for ensemble, optional for fast mode) | |
| - **Output**: 3D binary lesion mask (NIfTI) | |
| - **Mode**: `fast=True` runs **SEALS only** (the ISLES'22 challenge winner) | |
| #### Why we use `fast=True` (SEALS-only mode) | |
| DeepISLES is an ensemble of 3 models from the ISLES'22 challenge: | |
| | Model | Based On | Inputs Required | Notes | | |
| |-------|----------|-----------------|-------| | |
| | **SEALS** | nnUNet | DWI + ADC | π **ISLES'22 Winner** - runs in `--fast` mode | | |
| | NVAUTO | MONAI Auto3dseg | DWI + ADC + FLAIR | Requires FLAIR | | |
| | SWAN | FACTORIZER | DWI + ADC + FLAIR | Requires FLAIR | | |
| **Key insight**: ISLES24-MR-Lite contains only DWI + ADC (no FLAIR). Therefore: | |
| - `--fast True` β Runs SEALS only β **Perfect match** for our dataset | |
| - `--fast False` β Would try to run all 3 models β NVAUTO/SWAN would fail without FLAIR | |
| This is **not a downgrade**. SEALS won the ISLES'22 challenge and is state-of-the-art for stroke lesion segmentation using DWI+ADC alone. | |
| #### Scientific validity: External validation with zero data leakage | |
| | Dataset | Year | Used For | | |
| |---------|------|----------| | |
| | **ISLES 2022** | 2022 | SEALS training data (250 cases) | | |
| | **ISLES 2024** | 2024 | Our test data (149 cases from MR-Lite) | | |
| - Different patient cohorts (2 years apart, different hospitals) | |
| - SEALS has **never seen** ISLES24 patients | |
| - We have ground truth masks β can validate predictions | |
| - This constitutes a legitimate **external validation study** | |
| ### 3. visualization: NiiVue | |
| - **Library**: [niivue/niivue](https://github.com/niivue/niivue) | |
| - **Type**: WebGL2-based neuroimaging viewer | |
| - **Formats**: Native NIfTI support, overlays, multiplanar views | |
| - **Integration**: Via Gradio custom HTML component or iframe | |
| ### 4. UI framework: Gradio 5 | |
| - **Version**: Gradio 5.x (latest as of Dec 2025) | |
| - **Features**: SSR for fast loading, improved components, WebRTC support | |
| - **Deployment**: Hugging Face Spaces | |
| ## architecture diagram | |
| ``` | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β stroke-deepisles-demo β | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ | |
| β β | |
| β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β | |
| β β data/ β β inference/ β β ui/ β β | |
| β β β β β β β β | |
| β β - loader βββββΆβ - docker βββββΆβ - gradio β β | |
| β β - adapter β β - wrapper β β - niivue β β | |
| β β - staging β β - pipeline β β - viewer β β | |
| β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β | |
| β β β β β | |
| β βΌ βΌ βΌ β | |
| β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β | |
| β β core/ β β | |
| β β - config (pydantic-settings) β β | |
| β β - types (dataclasses, TypedDicts) β β | |
| β β - exceptions β β | |
| β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β | |
| β β | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β β β | |
| βΌ βΌ βΌ | |
| ββββββββββββ ββββββββββββ ββββββββββββ | |
| β HF Hub β β Docker β β Browser β | |
| β datasets β β Engine β β WebGL2 β | |
| ββββββββββββ ββββββββββββ ββββββββββββ | |
| ``` | |
| ## design principles | |
| 1. **Vertical slices**: Each phase delivers runnable functionality | |
| 2. **TDD**: Tests written before implementation | |
| 3. **Type safety**: Full type hints, mypy/pyright strict mode | |
| 4. **Separation of concerns**: Data, inference, and UI are independent modules | |
| 5. **Docker as black box**: We don't reimplement DeepISLES, we call it | |
| 6. **Graceful degradation**: Mock Docker for tests, fallback viewers if NiiVue fails | |
| ## reference repositories | |
| These are cloned locally (without git linkages) for reference: | |
| | Directory | Source | Purpose | | |
| |-----------|--------|---------| | |
| | `_reference_repos/datasets-tobias-bids-fork/` | CloseChoice/datasets@feat/bids-loader-streaming-upload-fix | BIDS loader + NIfTI lazy loading | | |
| | `_reference_repos/arc-aphasia-bids/` | The-Obstacle-Is-The-Way/arc-aphasia-bids | BIDS upload patterns (reference only) | | |
| | `_reference_repos/DeepIsles/` | ezequieldlrosa/DeepIsles | DeepISLES CLI interface reference | | |
| | `_reference_repos/bids-neuroimaging-space/` | [TobiasPitters/bids-neuroimaging](https://huggingface.co/spaces/TobiasPitters/bids-neuroimaging) | **Working NiiVue + FastAPI implementation** | | |
| ### key reference: tobias's bids-neuroimaging space | |
| This is the most important reference for Phase 4 (UI). It demonstrates: | |
| 1. **NiiVue working in HF Spaces** - Proof that WebGL2 viewer works in production | |
| 2. **FastAPI + raw HTML approach** - Clean, no Gradio overhead for viewer | |
| 3. **Base64 data URLs for NIfTI** - `data:application/octet-stream;base64,{b64}` | |
| 4. **NiiVue CDN loading** - `https://unpkg.com/@niivue/niivue@0.57.0/dist/index.js` | |
| 5. **Multiplanar + 3D rendering** - `setSliceType(sliceTypeMultiplanar)` + `setMultiplanarLayout(2)` | |
| Key file: `main.py` (~485 lines) - complete working implementation. | |
| ## sources | |
| - [uv project configuration](https://docs.astral.sh/uv/concepts/projects/config/) | |
| - [Python packaging guide - pyproject.toml](https://packaging.python.org/en/latest/guides/writing-pyproject-toml/) | |
| - [Real Python - Managing projects with uv](https://realpython.com/python-uv/) | |
| - [Gradio 5 announcement](https://huggingface.co/blog/gradio-5) | |
| - [NiiVue GitHub](https://github.com/niivue/niivue) | |
| - [Gradio custom HTML components](https://www.gradio.app/guides/custom_HTML_components) | |