File size: 11,411 Bytes
8e0cd11
 
 
 
 
 
 
 
 
 
 
 
 
aef1f5a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8e0cd11
 
 
aef1f5a
 
 
 
 
 
 
 
 
d77e99f
aef1f5a
d77e99f
8e0cd11
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
aef1f5a
d77e99f
8e0cd11
 
 
 
aef1f5a
 
d77e99f
aef1f5a
 
 
 
 
d77e99f
aef1f5a
8e0cd11
 
 
 
 
211e2f6
8e0cd11
211e2f6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8e0cd11
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
# context: stroke-deepisles-demo

> **Disclaimer**: This software is for research and demonstration purposes only. Not for clinical use.

## overview

This document explains **why** we're building `stroke-deepisles-demo` and the architectural context that informs our design decisions.

## the problem we're solving

We want to demonstrate an end-to-end neuroimaging inference pipeline:

```
CURRENT (Phase 1A):
    Local NIfTI files (extracted from ISLES24-MR-Lite ZIPs)
            ↓
        File-based loader (parse BIDS filenames)
            ↓
        DeepISLES Docker (stroke segmentation)
            ↓
        NiiVue visualization (Gradio Space)

FUTURE (Phase 1C-D):
    HuggingFace Hub (properly uploaded dataset)
            ↓
        Tobias's datasets fork (BIDS loader + Nifti feature)
            ↓
        DeepISLES Docker (stroke segmentation)
            ↓
        NiiVue visualization (Gradio Space)
```

This showcases that:
1. Neuroimaging data can be loaded from local BIDS-named files (NOW)
2. Neuroimaging data can be consumed from HF Hub with proper BIDS/NIfTI support (FUTURE)
3. Clinical-grade models can run via Docker as black boxes
4. Results can be visualized interactively in a browser

## critical discovery (2025-12-04)

**The original ISLES24-MR-Lite dataset is NOT properly uploaded to HuggingFace.**

It's just raw ZIP files dumped on HF, not a proper Dataset with parquet/Arrow format. This means `load_dataset()` fails. See `data/discovery/isles24_schema_report.txt` for full details.

**Workaround**: We extracted the ZIPs locally to `data/isles24/` (git-ignored) and will implement a file-based loader first. Later, we'll re-upload properly and verify full HF consumption.

## why we need tobias's datasets fork

As of December 2025, the official `huggingface/datasets` library has **partial** NIfTI support but lacks critical features for neuroimaging workflows.

### what's merged upstream

| PR | Author | Status | Description |
|----|--------|--------|-------------|
| [#7874](https://github.com/huggingface/datasets/pull/7874) | CloseChoice (Tobias) | Merged Nov 21 | NIfTI visualization support |
| [#7878](https://github.com/huggingface/datasets/pull/7878) | CloseChoice (Tobias) | Merged Nov 27 | Replace papaya with NiiVue |

### what's NOT merged (and why we need the fork)

| PR | Author | Status | Description |
|----|--------|--------|-------------|
| [#7886](https://github.com/huggingface/datasets/pull/7886) | The-Obstacle-Is-The-Way | Open | **BIDS dataset loader** - `load_dataset('bids', ...)` |
| [#7887](https://github.com/huggingface/datasets/pull/7887) | The-Obstacle-Is-The-Way | Open | **NIfTI lazy loading fix** - use `dataobj` not `get_fdata()` |
| [#7892](https://github.com/huggingface/datasets/pull/7892) | CloseChoice (Tobias) | Open | **NIfTI encoding for lazy upload** - fixes Arrow serialization |

The fork branch bundles all these features:
```
https://github.com/CloseChoice/datasets/tree/feat/bids-loader-streaming-upload-fix
```

We pin to this branch until upstream merges the PRs.

## key components

### 1. data source: ISLES24-MR-Lite

- **HF Dataset**: [YongchengYAO/ISLES24-MR-Lite](https://huggingface.co/datasets/YongchengYAO/ISLES24-MR-Lite) (**BROKEN** - raw ZIPs, not proper dataset)
- **Local extracted**: `data/isles24/` (git-ignored)
- **Content**: 149 acute stroke MRI cases with DWI, ADC, and manual infarct masks
- **Origin**: Subset of ISLES 2024 challenge data
- **Why suitable**: DeepISLES was trained on ISLES 2022, so ISLES24 is an **external** test set (no data leakage)

**File structure** (after extraction):
```
data/isles24/
β”œβ”€β”€ Images-DWI/sub-stroke{XXXX}_ses-02_dwi.nii.gz        # 149 files
β”œβ”€β”€ Images-ADC/sub-stroke{XXXX}_ses-02_adc.nii.gz        # 149 files
└── Masks/sub-stroke{XXXX}_ses-02_lesion-msk.nii.gz      # 149 files
```

**Schema reference**: `data/discovery/isles24_schema_report.txt`

### 2. model: DeepISLES

- **Paper**: Nature Communications 2025 - "DeepISLES: A clinically validated ischemic stroke segmentation model"
- **GitHub**: [ezequieldlrosa/DeepIsles](https://github.com/ezequieldlrosa/DeepIsles)
- **Docker**: `isleschallenge/deepisles`
- **Inputs**: DWI + ADC (required), FLAIR (required for ensemble, optional for fast mode)
- **Output**: 3D binary lesion mask (NIfTI)
- **Mode**: `fast=True` runs **SEALS only** (the ISLES'22 challenge winner)

#### Why we use `fast=True` (SEALS-only mode)

DeepISLES is an ensemble of 3 models from the ISLES'22 challenge:

| Model | Based On | Inputs Required | Notes |
|-------|----------|-----------------|-------|
| **SEALS** | nnUNet | DWI + ADC | πŸ† **ISLES'22 Winner** - runs in `--fast` mode |
| NVAUTO | MONAI Auto3dseg | DWI + ADC + FLAIR | Requires FLAIR |
| SWAN | FACTORIZER | DWI + ADC + FLAIR | Requires FLAIR |

**Key insight**: ISLES24-MR-Lite contains only DWI + ADC (no FLAIR). Therefore:
- `--fast True` β†’ Runs SEALS only β†’ **Perfect match** for our dataset
- `--fast False` β†’ Would try to run all 3 models β†’ NVAUTO/SWAN would fail without FLAIR

This is **not a downgrade**. SEALS won the ISLES'22 challenge and is state-of-the-art for stroke lesion segmentation using DWI+ADC alone.

#### Scientific validity: External validation with zero data leakage

| Dataset | Year | Used For |
|---------|------|----------|
| **ISLES 2022** | 2022 | SEALS training data (250 cases) |
| **ISLES 2024** | 2024 | Our test data (149 cases from MR-Lite) |

- Different patient cohorts (2 years apart, different hospitals)
- SEALS has **never seen** ISLES24 patients
- We have ground truth masks β†’ can validate predictions
- This constitutes a legitimate **external validation study**

### 3. visualization: NiiVue

- **Library**: [niivue/niivue](https://github.com/niivue/niivue)
- **Type**: WebGL2-based neuroimaging viewer
- **Formats**: Native NIfTI support, overlays, multiplanar views
- **Integration**: Via Gradio custom HTML component or iframe

### 4. UI framework: Gradio 5

- **Version**: Gradio 5.x (latest as of Dec 2025)
- **Features**: SSR for fast loading, improved components, WebRTC support
- **Deployment**: Hugging Face Spaces

## architecture diagram

```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     stroke-deepisles-demo                        β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                  β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”       β”‚
β”‚  β”‚  data/       β”‚    β”‚  inference/  β”‚    β”‚  ui/         β”‚       β”‚
β”‚  β”‚              β”‚    β”‚              β”‚    β”‚              β”‚       β”‚
β”‚  β”‚  - loader    │───▢│  - docker    │───▢│  - gradio    β”‚       β”‚
β”‚  β”‚  - adapter   β”‚    β”‚  - wrapper   β”‚    β”‚  - niivue    β”‚       β”‚
β”‚  β”‚  - staging   β”‚    β”‚  - pipeline  β”‚    β”‚  - viewer    β”‚       β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜       β”‚
β”‚         β”‚                   β”‚                   β”‚                β”‚
β”‚         β–Ό                   β–Ό                   β–Ό                β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”       β”‚
β”‚  β”‚                    core/                              β”‚       β”‚
β”‚  β”‚  - config (pydantic-settings)                        β”‚       β”‚
β”‚  β”‚  - types (dataclasses, TypedDicts)                   β”‚       β”‚
β”‚  β”‚  - exceptions                                         β”‚       β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜       β”‚
β”‚                                                                  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚                    β”‚                    β”‚
         β–Ό                    β–Ό                    β–Ό
   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
   β”‚ HF Hub   β”‚        β”‚ Docker   β”‚         β”‚ Browser  β”‚
   β”‚ datasets β”‚        β”‚ Engine   β”‚         β”‚ WebGL2   β”‚
   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

## design principles

1. **Vertical slices**: Each phase delivers runnable functionality
2. **TDD**: Tests written before implementation
3. **Type safety**: Full type hints, mypy/pyright strict mode
4. **Separation of concerns**: Data, inference, and UI are independent modules
5. **Docker as black box**: We don't reimplement DeepISLES, we call it
6. **Graceful degradation**: Mock Docker for tests, fallback viewers if NiiVue fails

## reference repositories

These are cloned locally (without git linkages) for reference:

| Directory | Source | Purpose |
|-----------|--------|---------|
| `_reference_repos/datasets-tobias-bids-fork/` | CloseChoice/datasets@feat/bids-loader-streaming-upload-fix | BIDS loader + NIfTI lazy loading |
| `_reference_repos/arc-aphasia-bids/` | The-Obstacle-Is-The-Way/arc-aphasia-bids | BIDS upload patterns (reference only) |
| `_reference_repos/DeepIsles/` | ezequieldlrosa/DeepIsles | DeepISLES CLI interface reference |
| `_reference_repos/bids-neuroimaging-space/` | [TobiasPitters/bids-neuroimaging](https://huggingface.co/spaces/TobiasPitters/bids-neuroimaging) | **Working NiiVue + FastAPI implementation** |

### key reference: tobias's bids-neuroimaging space

This is the most important reference for Phase 4 (UI). It demonstrates:

1. **NiiVue working in HF Spaces** - Proof that WebGL2 viewer works in production
2. **FastAPI + raw HTML approach** - Clean, no Gradio overhead for viewer
3. **Base64 data URLs for NIfTI** - `data:application/octet-stream;base64,{b64}`
4. **NiiVue CDN loading** - `https://unpkg.com/@niivue/niivue@0.57.0/dist/index.js`
5. **Multiplanar + 3D rendering** - `setSliceType(sliceTypeMultiplanar)` + `setMultiplanarLayout(2)`

Key file: `main.py` (~485 lines) - complete working implementation.

## sources

- [uv project configuration](https://docs.astral.sh/uv/concepts/projects/config/)
- [Python packaging guide - pyproject.toml](https://packaging.python.org/en/latest/guides/writing-pyproject-toml/)
- [Real Python - Managing projects with uv](https://realpython.com/python-uv/)
- [Gradio 5 announcement](https://huggingface.co/blog/gradio-5)
- [NiiVue GitHub](https://github.com/niivue/niivue)
- [Gradio custom HTML components](https://www.gradio.app/guides/custom_HTML_components)