File size: 11,411 Bytes
8e0cd11 aef1f5a 8e0cd11 aef1f5a d77e99f aef1f5a d77e99f 8e0cd11 aef1f5a d77e99f 8e0cd11 aef1f5a d77e99f aef1f5a d77e99f aef1f5a 8e0cd11 211e2f6 8e0cd11 211e2f6 8e0cd11 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 |
# context: stroke-deepisles-demo
> **Disclaimer**: This software is for research and demonstration purposes only. Not for clinical use.
## overview
This document explains **why** we're building `stroke-deepisles-demo` and the architectural context that informs our design decisions.
## the problem we're solving
We want to demonstrate an end-to-end neuroimaging inference pipeline:
```
CURRENT (Phase 1A):
Local NIfTI files (extracted from ISLES24-MR-Lite ZIPs)
β
File-based loader (parse BIDS filenames)
β
DeepISLES Docker (stroke segmentation)
β
NiiVue visualization (Gradio Space)
FUTURE (Phase 1C-D):
HuggingFace Hub (properly uploaded dataset)
β
Tobias's datasets fork (BIDS loader + Nifti feature)
β
DeepISLES Docker (stroke segmentation)
β
NiiVue visualization (Gradio Space)
```
This showcases that:
1. Neuroimaging data can be loaded from local BIDS-named files (NOW)
2. Neuroimaging data can be consumed from HF Hub with proper BIDS/NIfTI support (FUTURE)
3. Clinical-grade models can run via Docker as black boxes
4. Results can be visualized interactively in a browser
## critical discovery (2025-12-04)
**The original ISLES24-MR-Lite dataset is NOT properly uploaded to HuggingFace.**
It's just raw ZIP files dumped on HF, not a proper Dataset with parquet/Arrow format. This means `load_dataset()` fails. See `data/discovery/isles24_schema_report.txt` for full details.
**Workaround**: We extracted the ZIPs locally to `data/isles24/` (git-ignored) and will implement a file-based loader first. Later, we'll re-upload properly and verify full HF consumption.
## why we need tobias's datasets fork
As of December 2025, the official `huggingface/datasets` library has **partial** NIfTI support but lacks critical features for neuroimaging workflows.
### what's merged upstream
| PR | Author | Status | Description |
|----|--------|--------|-------------|
| [#7874](https://github.com/huggingface/datasets/pull/7874) | CloseChoice (Tobias) | Merged Nov 21 | NIfTI visualization support |
| [#7878](https://github.com/huggingface/datasets/pull/7878) | CloseChoice (Tobias) | Merged Nov 27 | Replace papaya with NiiVue |
### what's NOT merged (and why we need the fork)
| PR | Author | Status | Description |
|----|--------|--------|-------------|
| [#7886](https://github.com/huggingface/datasets/pull/7886) | The-Obstacle-Is-The-Way | Open | **BIDS dataset loader** - `load_dataset('bids', ...)` |
| [#7887](https://github.com/huggingface/datasets/pull/7887) | The-Obstacle-Is-The-Way | Open | **NIfTI lazy loading fix** - use `dataobj` not `get_fdata()` |
| [#7892](https://github.com/huggingface/datasets/pull/7892) | CloseChoice (Tobias) | Open | **NIfTI encoding for lazy upload** - fixes Arrow serialization |
The fork branch bundles all these features:
```
https://github.com/CloseChoice/datasets/tree/feat/bids-loader-streaming-upload-fix
```
We pin to this branch until upstream merges the PRs.
## key components
### 1. data source: ISLES24-MR-Lite
- **HF Dataset**: [YongchengYAO/ISLES24-MR-Lite](https://huggingface.co/datasets/YongchengYAO/ISLES24-MR-Lite) (**BROKEN** - raw ZIPs, not proper dataset)
- **Local extracted**: `data/isles24/` (git-ignored)
- **Content**: 149 acute stroke MRI cases with DWI, ADC, and manual infarct masks
- **Origin**: Subset of ISLES 2024 challenge data
- **Why suitable**: DeepISLES was trained on ISLES 2022, so ISLES24 is an **external** test set (no data leakage)
**File structure** (after extraction):
```
data/isles24/
βββ Images-DWI/sub-stroke{XXXX}_ses-02_dwi.nii.gz # 149 files
βββ Images-ADC/sub-stroke{XXXX}_ses-02_adc.nii.gz # 149 files
βββ Masks/sub-stroke{XXXX}_ses-02_lesion-msk.nii.gz # 149 files
```
**Schema reference**: `data/discovery/isles24_schema_report.txt`
### 2. model: DeepISLES
- **Paper**: Nature Communications 2025 - "DeepISLES: A clinically validated ischemic stroke segmentation model"
- **GitHub**: [ezequieldlrosa/DeepIsles](https://github.com/ezequieldlrosa/DeepIsles)
- **Docker**: `isleschallenge/deepisles`
- **Inputs**: DWI + ADC (required), FLAIR (required for ensemble, optional for fast mode)
- **Output**: 3D binary lesion mask (NIfTI)
- **Mode**: `fast=True` runs **SEALS only** (the ISLES'22 challenge winner)
#### Why we use `fast=True` (SEALS-only mode)
DeepISLES is an ensemble of 3 models from the ISLES'22 challenge:
| Model | Based On | Inputs Required | Notes |
|-------|----------|-----------------|-------|
| **SEALS** | nnUNet | DWI + ADC | π **ISLES'22 Winner** - runs in `--fast` mode |
| NVAUTO | MONAI Auto3dseg | DWI + ADC + FLAIR | Requires FLAIR |
| SWAN | FACTORIZER | DWI + ADC + FLAIR | Requires FLAIR |
**Key insight**: ISLES24-MR-Lite contains only DWI + ADC (no FLAIR). Therefore:
- `--fast True` β Runs SEALS only β **Perfect match** for our dataset
- `--fast False` β Would try to run all 3 models β NVAUTO/SWAN would fail without FLAIR
This is **not a downgrade**. SEALS won the ISLES'22 challenge and is state-of-the-art for stroke lesion segmentation using DWI+ADC alone.
#### Scientific validity: External validation with zero data leakage
| Dataset | Year | Used For |
|---------|------|----------|
| **ISLES 2022** | 2022 | SEALS training data (250 cases) |
| **ISLES 2024** | 2024 | Our test data (149 cases from MR-Lite) |
- Different patient cohorts (2 years apart, different hospitals)
- SEALS has **never seen** ISLES24 patients
- We have ground truth masks β can validate predictions
- This constitutes a legitimate **external validation study**
### 3. visualization: NiiVue
- **Library**: [niivue/niivue](https://github.com/niivue/niivue)
- **Type**: WebGL2-based neuroimaging viewer
- **Formats**: Native NIfTI support, overlays, multiplanar views
- **Integration**: Via Gradio custom HTML component or iframe
### 4. UI framework: Gradio 5
- **Version**: Gradio 5.x (latest as of Dec 2025)
- **Features**: SSR for fast loading, improved components, WebRTC support
- **Deployment**: Hugging Face Spaces
## architecture diagram
```
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β stroke-deepisles-demo β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β data/ β β inference/ β β ui/ β β
β β β β β β β β
β β - loader βββββΆβ - docker βββββΆβ - gradio β β
β β - adapter β β - wrapper β β - niivue β β
β β - staging β β - pipeline β β - viewer β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β β β β
β βΌ βΌ βΌ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β core/ β β
β β - config (pydantic-settings) β β
β β - types (dataclasses, TypedDicts) β β
β β - exceptions β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β β
βΌ βΌ βΌ
ββββββββββββ ββββββββββββ ββββββββββββ
β HF Hub β β Docker β β Browser β
β datasets β β Engine β β WebGL2 β
ββββββββββββ ββββββββββββ ββββββββββββ
```
## design principles
1. **Vertical slices**: Each phase delivers runnable functionality
2. **TDD**: Tests written before implementation
3. **Type safety**: Full type hints, mypy/pyright strict mode
4. **Separation of concerns**: Data, inference, and UI are independent modules
5. **Docker as black box**: We don't reimplement DeepISLES, we call it
6. **Graceful degradation**: Mock Docker for tests, fallback viewers if NiiVue fails
## reference repositories
These are cloned locally (without git linkages) for reference:
| Directory | Source | Purpose |
|-----------|--------|---------|
| `_reference_repos/datasets-tobias-bids-fork/` | CloseChoice/datasets@feat/bids-loader-streaming-upload-fix | BIDS loader + NIfTI lazy loading |
| `_reference_repos/arc-aphasia-bids/` | The-Obstacle-Is-The-Way/arc-aphasia-bids | BIDS upload patterns (reference only) |
| `_reference_repos/DeepIsles/` | ezequieldlrosa/DeepIsles | DeepISLES CLI interface reference |
| `_reference_repos/bids-neuroimaging-space/` | [TobiasPitters/bids-neuroimaging](https://huggingface.co/spaces/TobiasPitters/bids-neuroimaging) | **Working NiiVue + FastAPI implementation** |
### key reference: tobias's bids-neuroimaging space
This is the most important reference for Phase 4 (UI). It demonstrates:
1. **NiiVue working in HF Spaces** - Proof that WebGL2 viewer works in production
2. **FastAPI + raw HTML approach** - Clean, no Gradio overhead for viewer
3. **Base64 data URLs for NIfTI** - `data:application/octet-stream;base64,{b64}`
4. **NiiVue CDN loading** - `https://unpkg.com/@niivue/niivue@0.57.0/dist/index.js`
5. **Multiplanar + 3D rendering** - `setSliceType(sliceTypeMultiplanar)` + `setMultiplanarLayout(2)`
Key file: `main.py` (~485 lines) - complete working implementation.
## sources
- [uv project configuration](https://docs.astral.sh/uv/concepts/projects/config/)
- [Python packaging guide - pyproject.toml](https://packaging.python.org/en/latest/guides/writing-pyproject-toml/)
- [Real Python - Managing projects with uv](https://realpython.com/python-uv/)
- [Gradio 5 announcement](https://huggingface.co/blog/gradio-5)
- [NiiVue GitHub](https://github.com/niivue/niivue)
- [Gradio custom HTML components](https://www.gradio.app/guides/custom_HTML_components)
|