Spaces:

VibecoderMcSwaggins
/

stroke-viewer-frontend

Running

App Files Files Community

VibecoderMcSwaggins commited on 10 days ago

Commit

8e0cd11

1 Parent(s): d88c04d

Add initial project structure and documentation for stroke-deepisles-demo

Browse files

Files changed (7) hide show

docs/specs/00-context.md +156 -0
docs/specs/01-phase-0-repo-bootstrap.md +438 -0
docs/specs/02-phase-1-data-access.md +695 -0
docs/specs/03-phase-2-deepisles-docker.md +884 -0
docs/specs/04-phase-3-pipeline.md +705 -0
docs/specs/05-phase-4-gradio-ui.md +817 -0
docs/specs/06-phase-5-polish.md +667 -0

docs/specs/00-context.md ADDED Viewed

	@@ -0,0 +1,156 @@

+# context: stroke-deepisles-demo
+> **Disclaimer**: This software is for research and demonstration purposes only. Not for clinical use.
+## overview
+This document explains **why** we're building `stroke-deepisles-demo` and the architectural context that informs our design decisions.
+## the problem we're solving
+We want to demonstrate an end-to-end neuroimaging inference pipeline:
+```
+HuggingFace Hub (ISLES24-MR-Lite)
+        ↓
+    BIDS/NIfTI loader (datasets fork)
+        ↓
+    DeepISLES Docker (stroke segmentation)
+        ↓
+    NiiVue visualization (Gradio Space)
+```
+This showcases that:
+1. Neuroimaging data can be consumed from HF Hub with proper BIDS/NIfTI support
+2. Clinical-grade models can run via Docker as black boxes
+3. Results can be visualized interactively in a browser
+## why we need tobias's datasets fork
+As of December 2025, the official `huggingface/datasets` library has **partial** NIfTI support but lacks critical features for neuroimaging workflows.
+### what's merged upstream
+| PR | Author | Status | Description |
+|----|--------|--------|-------------|
+| [#7874](https://github.com/huggingface/datasets/pull/7874) | CloseChoice (Tobias) | Merged Nov 21 | NIfTI visualization support |
+| [#7878](https://github.com/huggingface/datasets/pull/7878) | CloseChoice (Tobias) | Merged Nov 27 | Replace papaya with NiiVue |
+### what's NOT merged (and why we need the fork)
+| PR | Author | Status | Description |
+|----|--------|--------|-------------|
+| [#7886](https://github.com/huggingface/datasets/pull/7886) | The-Obstacle-Is-The-Way | Open | **BIDS dataset loader** - `load_dataset('bids', ...)` |
+| [#7887](https://github.com/huggingface/datasets/pull/7887) | The-Obstacle-Is-The-Way | Open | **NIfTI lazy loading fix** - use `dataobj` not `get_fdata()` |
+| [#7892](https://github.com/huggingface/datasets/pull/7892) | CloseChoice (Tobias) | Open | **NIfTI encoding for lazy upload** - fixes Arrow serialization |
+The fork branch bundles all these features:
+```
+https://github.com/CloseChoice/datasets/tree/feat/bids-loader-streaming-upload-fix
+```
+We pin to this branch until upstream merges the PRs.
+## key components
+### 1. data source: ISLES24-MR-Lite
+- **HF Dataset**: [YongchengYAO/ISLES24-MR-Lite](https://huggingface.co/datasets/YongchengYAO/ISLES24-MR-Lite)
+- **Content**: 149 acute stroke MRI cases with DWI, ADC, and manual infarct masks
+- **Origin**: Subset of ISLES 2024 challenge data
+- **Why suitable**: DeepISLES was trained on ISLES 2022, so ISLES24 is an **external** test set (no data leakage)
+### 2. model: DeepISLES
+- **Paper**: Nature Communications 2025 - "DeepISLES: A clinically validated ischemic stroke segmentation model"
+- **GitHub**: [ezequieldlrosa/DeepIsles](https://github.com/ezequieldlrosa/DeepIsles)
+- **Docker**: `isleschallenge/deepisles`
+- **Inputs**: DWI + ADC (required), FLAIR (optional)
+- **Output**: 3D binary lesion mask (NIfTI)
+- **Mode**: We use `fast=True` (single model) not the full 3-model ensemble
+### 3. visualization: NiiVue
+- **Library**: [niivue/niivue](https://github.com/niivue/niivue)
+- **Type**: WebGL2-based neuroimaging viewer
+- **Formats**: Native NIfTI support, overlays, multiplanar views
+- **Integration**: Via Gradio custom HTML component or iframe
+### 4. UI framework: Gradio 5
+- **Version**: Gradio 5.x (latest as of Dec 2025)
+- **Features**: SSR for fast loading, improved components, WebRTC support
+- **Deployment**: Hugging Face Spaces
+## architecture diagram
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                     stroke-deepisles-demo                        │
+├─────────────────────────────────────────────────────────────────┤
+│                                                                  │
+│  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐       │
+│  │  data/       │    │  inference/  │    │  ui/         │       │
+│  │              │    │              │    │              │       │
+│  │  - loader    │───▶│  - docker    │───▶│  - gradio    │       │
+│  │  - adapter   │    │  - wrapper   │    │  - niivue    │       │
+│  │  - staging   │    │  - pipeline  │    │  - viewer    │       │
+│  └──────────────┘    └──────────────┘    └──────────────┘       │
+│         │                   │                   │                │
+│         ▼                   ▼                   ▼                │
+│  ┌──────────────────────────────────────────────────────┐       │
+│  │                    core/                              │       │
+│  │  - config (pydantic-settings)                        │       │
+│  │  - types (dataclasses, TypedDicts)                   │       │
+│  │  - exceptions                                         │       │
+│  └──────────────────────────────────────────────────────┘       │
+│                                                                  │
+└─────────────────────────────────────────────────────────────────┘
+         │                    │                    │
+         ▼                    ▼                    ▼
+   ┌──────────┐        ┌──────────┐         ┌──────────┐
+   │ HF Hub   │        │ Docker   │         │ Browser  │
+   │ datasets │        │ Engine   │         │ WebGL2   │
+   └──────────┘        └──────────┘         └──────────┘
+```
+## design principles
+1. **Vertical slices**: Each phase delivers runnable functionality
+2. **TDD**: Tests written before implementation
+3. **Type safety**: Full type hints, mypy/pyright strict mode
+4. **Separation of concerns**: Data, inference, and UI are independent modules
+5. **Docker as black box**: We don't reimplement DeepISLES, we call it
+6. **Graceful degradation**: Mock Docker for tests, fallback viewers if NiiVue fails
+## reference repositories
+These are cloned locally (without git linkages) for reference:
+| Directory | Source | Purpose |
+|-----------|--------|---------|
+| `_reference_repos/datasets-tobias-bids-fork/` | CloseChoice/datasets@feat/bids-loader-streaming-upload-fix | BIDS loader + NIfTI lazy loading |
+| `_reference_repos/arc-aphasia-bids/` | The-Obstacle-Is-The-Way/arc-aphasia-bids | BIDS upload patterns (reference only) |
+| `_reference_repos/DeepIsles/` | ezequieldlrosa/DeepIsles | DeepISLES CLI interface reference |
+| `_reference_repos/bids-neuroimaging-space/` | [TobiasPitters/bids-neuroimaging](https://huggingface.co/spaces/TobiasPitters/bids-neuroimaging) | **Working NiiVue + FastAPI implementation** |
+### key reference: tobias's bids-neuroimaging space
+This is the most important reference for Phase 4 (UI). It demonstrates:
+1. **NiiVue working in HF Spaces** - Proof that WebGL2 viewer works in production
+2. **FastAPI + raw HTML approach** - Clean, no Gradio overhead for viewer
+3. **Base64 data URLs for NIfTI** - `data:application/octet-stream;base64,{b64}`
+4. **NiiVue CDN loading** - `https://unpkg.com/@niivue/niivue@0.57.0/dist/index.js`
+5. **Multiplanar + 3D rendering** - `setSliceType(sliceTypeMultiplanar)` + `setMultiplanarLayout(2)`
+Key file: `main.py` (~485 lines) - complete working implementation.
+## sources
+- [uv project configuration](https://docs.astral.sh/uv/concepts/projects/config/)
+- [Python packaging guide - pyproject.toml](https://packaging.python.org/en/latest/guides/writing-pyproject-toml/)
+- [Real Python - Managing projects with uv](https://realpython.com/python-uv/)
+- [Gradio 5 announcement](https://huggingface.co/blog/gradio-5)
+- [NiiVue GitHub](https://github.com/niivue/niivue)
+- [Gradio custom HTML components](https://www.gradio.app/guides/custom_HTML_components)

docs/specs/01-phase-0-repo-bootstrap.md ADDED Viewed

	@@ -0,0 +1,438 @@

+# phase 0: repo bootstrap
+## purpose
+Set up the foundational project structure with 2025 Python best practices. At the end of this phase, we have a working skeleton that can be installed, linted, type-checked, and tested (even if tests are empty).
+## deliverables
+- [ ] `pyproject.toml` with uv + hatchling backend
+- [ ] `src/stroke_deepisles_demo/` package structure
+- [ ] `tests/` directory with pytest configuration
+- [ ] Development tooling: ruff, mypy, pre-commit
+- [ ] Basic `README.md` with clinical disclaimer
+- [ ] `.gitignore` updates if needed
+## repo structure
+```
+stroke-deepisles-demo/
+├── pyproject.toml              # Project metadata, deps, tool config
+├── uv.lock                     # Locked dependencies (auto-generated)
+├── .python-version             # Python version (3.12)
+├── README.md                   # Project overview + disclaimer
+├── .gitignore                  # Standard Python ignores
+├── .pre-commit-config.yaml     # Pre-commit hooks
+│
+├── src/
+│   └── stroke_deepisles_demo/
+│       ├── __init__.py         # Package version, exports
+│       ├── py.typed            # PEP 561 marker
+│       │
+│       ├── core/               # Shared utilities
+│       │   ├── __init__.py
+│       │   ├── config.py       # Pydantic settings (stub)
+│       │   ├── types.py        # Shared type definitions (stub)
+│       │   └── exceptions.py   # Custom exceptions (stub)
+│       │
+│       ├── data/               # Data loading (stub)
+│       │   └── __init__.py
+│       │
+│       ├── inference/          # DeepISLES integration (stub)
+│       │   └── __init__.py
+│       │
+│       └── ui/                 # Gradio app (stub)
+│           └── __init__.py
+│
+├── tests/
+│   ├── __init__.py
+│   ├── conftest.py             # Shared fixtures
+│   └── test_package.py         # Smoke test: package imports
+│
+└── docs/
+    └── specs/                  # These spec documents
+        ├── 00-context.md
+        ├── 01-phase-0-repo-bootstrap.md
+        └── ...
+```
+## pyproject.toml specification
+```toml
+[project]
+name = "stroke-deepisles-demo"
+version = "0.1.0"
+description = "Demo: HF datasets + DeepISLES stroke segmentation + Gradio visualization"
+readme = "README.md"
+license = { text = "MIT" }
+requires-python = ">=3.11"
+authors = [
+    { name = "Your Name", email = "you@example.com" }
+]
+classifiers = [
+    "Development Status :: 3 - Alpha",
+    "Intended Audience :: Science/Research",
+    "License :: OSI Approved :: MIT License",
+    "Programming Language :: Python :: 3.11",
+    "Programming Language :: Python :: 3.12",
+    "Topic :: Scientific/Engineering :: Medical Science Apps.",
+]
+keywords = ["stroke", "neuroimaging", "segmentation", "BIDS", "NIfTI", "deep-learning"]
+dependencies = [
+    # Core - pinned to Tobias's fork for BIDS + NIfTI lazy loading
+    "datasets @ git+https://github.com/CloseChoice/datasets.git@feat/bids-loader-streaming-upload-fix",
+    "huggingface-hub>=0.25.0",
+    # NIfTI handling
+    "nibabel>=5.2.0",
+    "numpy>=1.26.0",
+    # Configuration
+    "pydantic>=2.5.0",
+    "pydantic-settings>=2.1.0",
+    # UI (Gradio 5.x)
+    "gradio>=5.0.0",
+]
+[dependency-groups]
+dev = [
+    "pytest>=8.0.0",
+    "pytest-cov>=4.1.0",
+    "pytest-mock>=3.12.0",
+    "mypy>=1.8.0",
+    "ruff>=0.8.0",
+    "pre-commit>=3.6.0",
+    # Type stubs
+    "types-requests",
+]
+[build-system]
+requires = ["hatchling"]
+build-backend = "hatchling.build"
+[tool.hatch.build.targets.wheel]
+packages = ["src/stroke_deepisles_demo"]
+[tool.uv]
+dev-dependencies = [
+    "pytest>=8.0.0",
+    "pytest-cov>=4.1.0",
+    "pytest-mock>=3.12.0",
+    "mypy>=1.8.0",
+    "ruff>=0.8.0",
+    "pre-commit>=3.6.0",
+]
+# ─────────────────────────────────────────────────────────────────
+# Tool configurations
+# ─────────────────────────────────────────────────────────────────
+[tool.ruff]
+target-version = "py311"
+line-length = 100
+src = ["src", "tests"]
+[tool.ruff.lint]
+select = [
+    "E",      # pycodestyle errors
+    "W",      # pycodestyle warnings
+    "F",      # pyflakes
+    "I",      # isort
+    "B",      # flake8-bugbear
+    "C4",     # flake8-comprehensions
+    "UP",     # pyupgrade
+    "ARG",    # flake8-unused-arguments
+    "SIM",    # flake8-simplify
+    "TCH",    # flake8-type-checking
+    "PTH",    # flake8-use-pathlib
+    "RUF",    # ruff-specific
+]
+ignore = [
+    "E501",   # line too long (handled by formatter)
+]
+[tool.ruff.lint.isort]
+known-first-party = ["stroke_deepisles_demo"]
+[tool.mypy]
+python_version = "3.11"
+strict = true
+warn_return_any = true
+warn_unused_ignores = true
+disallow_untyped_defs = true
+plugins = ["pydantic.mypy"]
+[[tool.mypy.overrides]]
+module = [
+    "nibabel.*",
+    "gradio.*",
+    "datasets.*",
+    "niivue.*",
+]
+ignore_missing_imports = true
+[tool.pytest.ini_options]
+testpaths = ["tests"]
+python_files = ["test_*.py"]
+python_functions = ["test_*"]
+addopts = [
+    "-v",
+    "--tb=short",
+    "--strict-markers",
+]
+markers = [
+    "integration: marks tests requiring external resources (Docker, network)",
+    "slow: marks tests that take >10s to run",
+]
+filterwarnings = [
+    "ignore::DeprecationWarning",
+]
+[tool.coverage.run]
+source = ["src/stroke_deepisles_demo"]
+branch = true
+[tool.coverage.report]
+exclude_lines = [
+    "pragma: no cover",
+    "if TYPE_CHECKING:",
+    "raise NotImplementedError",
+]
+```
+## module stubs
+### `src/stroke_deepisles_demo/__init__.py`
+```python
+"""stroke-deepisles-demo: HF datasets + DeepISLES + Gradio visualization."""
+__version__ = "0.1.0"
+__all__ = ["__version__"]
+```
+### `src/stroke_deepisles_demo/core/config.py`
+```python
+"""Application configuration using pydantic-settings."""
+from __future__ import annotations
+from pydantic_settings import BaseSettings
+class Settings(BaseSettings):
+    """Application settings loaded from environment variables."""
+    # HuggingFace
+    hf_dataset_id: str = "YongchengYAO/ISLES24-MR-Lite"
+    hf_cache_dir: str | None = None
+    # DeepISLES
+    deepisles_docker_image: str = "isleschallenge/deepisles"
+    deepisles_fast_mode: bool = True
+    # Paths
+    temp_dir: str | None = None
+    class Config:
+        env_prefix = "STROKE_DEMO_"
+        env_file = ".env"
+settings = Settings()
+```
+### `src/stroke_deepisles_demo/core/types.py`
+```python
+"""Shared type definitions."""
+from __future__ import annotations
+from dataclasses import dataclass
+from pathlib import Path
+from typing import TypedDict
+class CaseFiles(TypedDict):
+    """Paths to NIfTI files for a single case."""
+    dwi: Path
+    adc: Path
+    flair: Path | None
+    ground_truth: Path | None
+@dataclass(frozen=True)
+class InferenceResult:
+    """Result of running DeepISLES on a case."""
+    case_id: str
+    input_files: CaseFiles
+    prediction_mask: Path
+    elapsed_seconds: float
+```
+### `src/stroke_deepisles_demo/core/exceptions.py`
+```python
+"""Custom exceptions for stroke-deepisles-demo."""
+from __future__ import annotations
+class StrokeDemoError(Exception):
+    """Base exception for stroke-deepisles-demo."""
+class DataLoadError(StrokeDemoError):
+    """Failed to load data from HuggingFace Hub."""
+class DockerNotAvailableError(StrokeDemoError):
+    """Docker is not installed or not running."""
+class DeepISLESError(StrokeDemoError):
+    """DeepISLES inference failed."""
+class MissingInputError(StrokeDemoError):
+    """Required input files are missing."""
+```
+## pre-commit configuration
+### `.pre-commit-config.yaml`
+```yaml
+repos:
+  - repo: https://github.com/astral-sh/ruff-pre-commit
+    rev: v0.8.0
+    hooks:
+      - id: ruff
+        args: [--fix]
+      - id: ruff-format
+  - repo: https://github.com/pre-commit/mirrors-mypy
+    rev: v1.8.0
+    hooks:
+      - id: mypy
+        additional_dependencies:
+          - pydantic>=2.5.0
+          - pydantic-settings>=2.1.0
+        args: [--config-file=pyproject.toml]
+  - repo: https://github.com/pre-commit/pre-commit-hooks
+    rev: v4.5.0
+    hooks:
+      - id: trailing-whitespace
+      - id: end-of-file-fixer
+      - id: check-yaml
+      - id: check-added-large-files
+        args: [--maxkb=1000]
+```
+## tdd plan
+### tests to write first
+1. **`tests/test_package.py`** - Smoke test that package imports work
+```python
+"""Smoke tests for package structure."""
+from __future__ import annotations
+def test_package_imports() -> None:
+    """Verify the package can be imported."""
+    import stroke_deepisles_demo
+    assert stroke_deepisles_demo.__version__ == "0.1.0"
+def test_core_modules_import() -> None:
+    """Verify core modules can be imported without side effects."""
+    from stroke_deepisles_demo.core import config, exceptions, types
+    assert config.settings is not None
+    assert types.CaseFiles is not None
+    assert exceptions.StrokeDemoError is not None
+def test_subpackages_exist() -> None:
+    """Verify subpackage structure exists."""
+    from stroke_deepisles_demo import data, inference, ui
+    # These are stubs, just verify they exist
+    assert data is not None
+    assert inference is not None
+    assert ui is not None
+```
+### what to mock
+- Nothing needed for Phase 0 - these are pure import tests
+### what to test for real
+- Package imports
+- Module structure
+- Type definitions load correctly
+- Pydantic settings initialize with defaults
+## "done" criteria
+Phase 0 is complete when:
+1. `uv sync` succeeds and creates virtual environment
+2. `uv run pytest` passes all smoke tests
+3. `uv run ruff check .` reports no errors
+4. `uv run ruff format --check .` reports no changes needed
+5. `uv run mypy src/` passes with no errors
+6. `uv run pre-commit run --all-files` passes
+7. Package can be imported: `uv run python -c "import stroke_deepisles_demo"`
+## commands cheatsheet
+```bash
+# Initialize (if starting fresh)
+uv init --package stroke-deepisles-demo
+# Install dependencies
+uv sync
+# Run tests
+uv run pytest
+# Run tests with coverage
+uv run pytest --cov
+# Lint
+uv run ruff check .
+# Format
+uv run ruff format .
+# Type check
+uv run mypy src/
+# Install pre-commit hooks
+uv run pre-commit install
+# Run all pre-commit hooks
+uv run pre-commit run --all-files
+```
+## notes
+- We use `hatchling` as the build backend (current uv default, stable)
+- `uv_build` is newer but `hatchling` is battle-tested
+- The `datasets` dependency is pinned to Tobias's fork via git URL
+- Gradio 5.x for latest features (SSR, improved components)
+- Python 3.11+ for modern typing features (`X | None` syntax)

docs/specs/02-phase-1-data-access.md ADDED Viewed

	@@ -0,0 +1,695 @@

+# phase 1: data access / hf integration
+## purpose
+Implement the data loading layer that consumes ISLES24-MR-Lite from HuggingFace Hub. At the end of this phase, we can load any case by ID and get local paths to DWI, ADC, and ground truth NIfTI files.
+## deliverables
+- [ ] `src/stroke_deepisles_demo/data/loader.py` - HF dataset loading
+- [ ] `src/stroke_deepisles_demo/data/adapter.py` - Case adapter for file access
+- [ ] `src/stroke_deepisles_demo/data/staging.py` - Stage files for DeepISLES
+- [ ] Unit tests with fixtures (no network required)
+- [ ] Integration test (marked, requires network)
+## vertical slice outcome
+After this phase, you can run:
+```python
+from stroke_deepisles_demo.data import get_case, list_case_ids
+# List available cases
+case_ids = list_case_ids()
+print(f"Found {len(case_ids)} cases")
+# Load a specific case
+case = get_case("sub-001")
+print(f"DWI: {case.dwi}")
+print(f"ADC: {case.adc}")
+print(f"Ground truth: {case.ground_truth}")
+```
+## module structure
+```
+src/stroke_deepisles_demo/data/
+├── __init__.py          # Public API exports
+├── loader.py            # HF Hub dataset loading
+├── adapter.py           # Case adapter (index → files)
+└── staging.py           # Stage files with DeepISLES naming
+```
+## interfaces and types
+### `data/loader.py`
+```python
+"""Load ISLES24-MR-Lite dataset from HuggingFace Hub."""
+from __future__ import annotations
+from pathlib import Path
+from typing import TYPE_CHECKING
+if TYPE_CHECKING:
+    from datasets import Dataset
+def load_isles_dataset(
+    dataset_id: str = "YongchengYAO/ISLES24-MR-Lite",
+    *,
+    cache_dir: Path | None = None,
+    streaming: bool = False,
+) -> Dataset:
+    """
+    Load the ISLES24-MR-Lite dataset from HuggingFace Hub.
+    Args:
+        dataset_id: HuggingFace dataset identifier
+        cache_dir: Local cache directory (uses HF default if None)
+        streaming: If True, use streaming mode (lazy loading)
+    Returns:
+        HuggingFace Dataset object with BIDS/NIfTI support
+    Raises:
+        DataLoadError: If dataset cannot be loaded
+    """
+    ...
+def get_dataset_info(dataset_id: str = "YongchengYAO/ISLES24-MR-Lite") -> DatasetInfo:
+    """
+    Get metadata about the dataset without downloading.
+    Returns:
+        DatasetInfo with case count, available modalities, etc.
+    """
+    ...
+@dataclass
+class DatasetInfo:
+    """Metadata about the loaded dataset."""
+    dataset_id: str
+    num_cases: int
+    modalities: list[str]  # e.g., ["dwi", "adc", "mask"]
+    has_ground_truth: bool
+```
+### `data/adapter.py`
+```python
+"""Adapt HF dataset rows to typed file references."""
+from __future__ import annotations
+from pathlib import Path
+from typing import Iterator
+from stroke_deepisles_demo.core.types import CaseFiles
+class CaseAdapter:
+    """
+    Adapts HuggingFace dataset to provide typed access to case files.
+    This handles the mapping between HF dataset structure and our
+    internal CaseFiles type.
+    """
+    def __init__(self, dataset: Dataset) -> None:
+        """
+        Initialize adapter with a loaded dataset.
+        Args:
+            dataset: HuggingFace Dataset with NIfTI files
+        """
+        ...
+    def __len__(self) -> int:
+        """Return number of cases in the dataset."""
+        ...
+    def __iter__(self) -> Iterator[str]:
+        """Iterate over case IDs."""
+        ...
+    def list_case_ids(self) -> list[str]:
+        """
+        List all available case identifiers.
+        Returns:
+            List of case IDs (e.g., ["sub-001", "sub-002", ...])
+        """
+        ...
+    def get_case(self, case_id: str | int) -> CaseFiles:
+        """
+        Get file paths for a specific case.
+        Args:
+            case_id: Either a string ID (e.g., "sub-001") or integer index
+        Returns:
+            CaseFiles with paths to DWI, ADC, and optionally ground truth
+        Raises:
+            KeyError: If case_id not found
+            DataLoadError: If files cannot be accessed
+        """
+        ...
+    def get_case_by_index(self, index: int) -> tuple[str, CaseFiles]:
+        """
+        Get case by numerical index.
+        Returns:
+            Tuple of (case_id, CaseFiles)
+        """
+        ...
+```
+### `data/staging.py`
+```python
+"""Stage NIfTI files with DeepISLES-expected naming."""
+from __future__ import annotations
+from pathlib import Path
+from typing import NamedTuple
+from stroke_deepisles_demo.core.types import CaseFiles
+class StagedCase(NamedTuple):
+    """Paths to staged files ready for DeepISLES."""
+    input_dir: Path      # Directory containing staged files
+    dwi_path: Path       # Path to dwi.nii.gz
+    adc_path: Path       # Path to adc.nii.gz
+    flair_path: Path | None  # Path to flair.nii.gz if available
+def stage_case_for_deepisles(
+    case_files: CaseFiles,
+    output_dir: Path,
+    *,
+    case_id: str | None = None,
+) -> StagedCase:
+    """
+    Stage case files with DeepISLES-expected naming convention.
+    DeepISLES expects files named exactly:
+    - dwi.nii.gz
+    - adc.nii.gz
+    - flair.nii.gz (optional)
+    This function copies/symlinks the source files to a staging directory
+    with the correct names.
+    Args:
+        case_files: Source file paths from CaseAdapter
+        output_dir: Directory to stage files into
+        case_id: Optional case ID for logging/subdirectory
+    Returns:
+        StagedCase with paths to staged files
+    Raises:
+        MissingInputError: If required files (DWI, ADC) are missing
+        OSError: If file operations fail
+    """
+    ...
+def create_staging_directory(base_dir: Path | None = None) -> Path:
+    """
+    Create a temporary staging directory.
+    Args:
+        base_dir: Parent directory (uses system temp if None)
+    Returns:
+        Path to created staging directory
+    """
+    ...
+```
+### `data/__init__.py` (public API)
+```python
+"""Data loading and case management for stroke-deepisles-demo."""
+from stroke_deepisles_demo.data.adapter import CaseAdapter
+from stroke_deepisles_demo.data.loader import DatasetInfo, get_dataset_info, load_isles_dataset
+from stroke_deepisles_demo.data.staging import StagedCase, stage_case_for_deepisles
+__all__ = [
+    # Loader
+    "load_isles_dataset",
+    "get_dataset_info",
+    "DatasetInfo",
+    # Adapter
+    "CaseAdapter",
+    # Staging
+    "stage_case_for_deepisles",
+    "StagedCase",
+]
+# Convenience functions (combine loader + adapter)
+def get_case(case_id: str | int) -> CaseFiles:
+    """Load a single case by ID or index."""
+    ...
+def list_case_ids() -> list[str]:
+    """List all available case IDs."""
+    ...
+```
+## tdd plan
+### test file structure
+```
+tests/
+├── conftest.py              # Shared fixtures
+├── data/
+│   ├── __init__.py
+│   ├── test_loader.py       # Tests for HF loading
+│   ├── test_adapter.py      # Tests for case adapter
+│   └── test_staging.py      # Tests for file staging
+└── fixtures/
+    └── nifti/               # Minimal synthetic NIfTI files
+        ├── dwi.nii.gz
+        ├── adc.nii.gz
+        └── mask.nii.gz
+```
+### tests to write first (TDD order)
+#### 1. `tests/conftest.py` - Fixtures
+```python
+"""Shared test fixtures."""
+from __future__ import annotations
+import tempfile
+from pathlib import Path
+import nibabel as nib
+import numpy as np
+import pytest
+@pytest.fixture
+def temp_dir() -> Path:
+    """Create a temporary directory for test outputs."""
+    with tempfile.TemporaryDirectory() as td:
+        yield Path(td)
+@pytest.fixture
+def synthetic_nifti_3d(temp_dir: Path) -> Path:
+    """Create a minimal synthetic 3D NIfTI file."""
+    data = np.random.rand(10, 10, 10).astype(np.float32)
+    img = nib.Nifti1Image(data, affine=np.eye(4))
+    path = temp_dir / "synthetic.nii.gz"
+    nib.save(img, path)
+    return path
+@pytest.fixture
+def synthetic_case_files(temp_dir: Path) -> CaseFiles:
+    """Create a complete set of synthetic case files."""
+    # Create DWI
+    dwi_data = np.random.rand(64, 64, 30).astype(np.float32)
+    dwi_img = nib.Nifti1Image(dwi_data, affine=np.eye(4))
+    dwi_path = temp_dir / "dwi.nii.gz"
+    nib.save(dwi_img, dwi_path)
+    # Create ADC
+    adc_data = np.random.rand(64, 64, 30).astype(np.float32) * 2000
+    adc_img = nib.Nifti1Image(adc_data, affine=np.eye(4))
+    adc_path = temp_dir / "adc.nii.gz"
+    nib.save(adc_img, adc_path)
+    # Create mask
+    mask_data = (np.random.rand(64, 64, 30) > 0.9).astype(np.uint8)
+    mask_img = nib.Nifti1Image(mask_data, affine=np.eye(4))
+    mask_path = temp_dir / "mask.nii.gz"
+    nib.save(mask_img, mask_path)
+    return CaseFiles(
+        dwi=dwi_path,
+        adc=adc_path,
+        flair=None,
+        ground_truth=mask_path,
+    )
+@pytest.fixture
+def mock_hf_dataset(synthetic_case_files: CaseFiles):
+    """Create a mock HF Dataset-like object."""
+    # Returns a simple dict-based mock that mimics Dataset behavior
+    ...
+```
+#### 2. `tests/data/test_staging.py` - Start with staging (no network)
+```python
+"""Tests for data staging module."""
+from __future__ import annotations
+from pathlib import Path
+import pytest
+from stroke_deepisles_demo.core.exceptions import MissingInputError
+from stroke_deepisles_demo.core.types import CaseFiles
+from stroke_deepisles_demo.data.staging import (
+    StagedCase,
+    create_staging_directory,
+    stage_case_for_deepisles,
+)
+class TestCreateStagingDirectory:
+    """Tests for create_staging_directory."""
+    def test_creates_directory(self, temp_dir: Path) -> None:
+        """Staging directory is created and exists."""
+        staging = create_staging_directory(base_dir=temp_dir)
+        assert staging.exists()
+        assert staging.is_dir()
+    def test_uses_system_temp_when_no_base(self) -> None:
+        """Uses system temp directory when base_dir is None."""
+        staging = create_staging_directory(base_dir=None)
+        assert staging.exists()
+        # Cleanup
+        staging.rmdir()
+class TestStageCaseForDeepIsles:
+    """Tests for stage_case_for_deepisles."""
+    def test_stages_required_files(
+        self, synthetic_case_files: CaseFiles, temp_dir: Path
+    ) -> None:
+        """DWI and ADC are staged with correct names."""
+        staged = stage_case_for_deepisles(synthetic_case_files, temp_dir)
+        assert staged.dwi_path.name == "dwi.nii.gz"
+        assert staged.adc_path.name == "adc.nii.gz"
+        assert staged.dwi_path.exists()
+        assert staged.adc_path.exists()
+    def test_staged_files_are_readable(
+        self, synthetic_case_files: CaseFiles, temp_dir: Path
+    ) -> None:
+        """Staged files can be read as valid NIfTI."""
+        import nibabel as nib
+        staged = stage_case_for_deepisles(synthetic_case_files, temp_dir)
+        dwi = nib.load(staged.dwi_path)
+        assert dwi.shape == (64, 64, 30)
+    def test_raises_when_dwi_missing(self, temp_dir: Path) -> None:
+        """Raises MissingInputError when DWI is missing."""
+        case_files = CaseFiles(
+            dwi=temp_dir / "nonexistent.nii.gz",
+            adc=temp_dir / "adc.nii.gz",
+            flair=None,
+            ground_truth=None,
+        )
+        with pytest.raises(MissingInputError, match="DWI"):
+            stage_case_for_deepisles(case_files, temp_dir)
+    def test_flair_is_optional(
+        self, synthetic_case_files: CaseFiles, temp_dir: Path
+    ) -> None:
+        """Staging succeeds when FLAIR is None."""
+        # synthetic_case_files has flair=None
+        staged = stage_case_for_deepisles(synthetic_case_files, temp_dir)
+        assert staged.flair_path is None
+```
+#### 3. `tests/data/test_adapter.py` - Case adapter with mocks
+```python
+"""Tests for case adapter module."""
+from __future__ import annotations
+import pytest
+from stroke_deepisles_demo.core.types import CaseFiles
+from stroke_deepisles_demo.data.adapter import CaseAdapter
+class TestCaseAdapter:
+    """Tests for CaseAdapter."""
+    def test_list_case_ids_returns_strings(self, mock_hf_dataset) -> None:
+        """list_case_ids returns list of string identifiers."""
+        adapter = CaseAdapter(mock_hf_dataset)
+        case_ids = adapter.list_case_ids()
+        assert isinstance(case_ids, list)
+        assert all(isinstance(cid, str) for cid in case_ids)
+    def test_len_matches_dataset_size(self, mock_hf_dataset) -> None:
+        """len(adapter) equals number of cases in dataset."""
+        adapter = CaseAdapter(mock_hf_dataset)
+        assert len(adapter) == len(mock_hf_dataset)
+    def test_get_case_by_string_id(self, mock_hf_dataset) -> None:
+        """Can retrieve case by string identifier."""
+        adapter = CaseAdapter(mock_hf_dataset)
+        case_ids = adapter.list_case_ids()
+        case = adapter.get_case(case_ids[0])
+        assert isinstance(case, dict)  # CaseFiles is a TypedDict
+        assert "dwi" in case
+        assert "adc" in case
+    def test_get_case_by_index(self, mock_hf_dataset) -> None:
+        """Can retrieve case by integer index."""
+        adapter = CaseAdapter(mock_hf_dataset)
+        case_id, case = adapter.get_case_by_index(0)
+        assert isinstance(case_id, str)
+        assert case["dwi"] is not None
+    def test_get_case_invalid_id_raises(self, mock_hf_dataset) -> None:
+        """Raises KeyError for invalid case ID."""
+        adapter = CaseAdapter(mock_hf_dataset)
+        with pytest.raises(KeyError):
+            adapter.get_case("nonexistent-case-id")
+    def test_iteration(self, mock_hf_dataset) -> None:
+        """Can iterate over case IDs."""
+        adapter = CaseAdapter(mock_hf_dataset)
+        case_ids = list(adapter)
+        assert len(case_ids) == len(adapter)
+```
+#### 4. `tests/data/test_loader.py` - Loader with network mocks
+```python
+"""Tests for data loader module."""
+from __future__ import annotations
+from unittest.mock import MagicMock, patch
+import pytest
+from stroke_deepisles_demo.core.exceptions import DataLoadError
+from stroke_deepisles_demo.data.loader import (
+    DatasetInfo,
+    get_dataset_info,
+    load_isles_dataset,
+)
+class TestLoadIslesDataset:
+    """Tests for load_isles_dataset."""
+    def test_calls_hf_load_dataset(self) -> None:
+        """Calls datasets.load_dataset with correct arguments."""
+        with patch("stroke_deepisles_demo.data.loader.load_dataset") as mock_load:
+            mock_load.return_value = MagicMock()
+            load_isles_dataset("test/dataset")
+            mock_load.assert_called_once()
+            call_args = mock_load.call_args
+            assert call_args.args[0] == "test/dataset"
+    def test_returns_dataset_object(self) -> None:
+        """Returns the loaded Dataset object."""
+        with patch("stroke_deepisles_demo.data.loader.load_dataset") as mock_load:
+            expected = MagicMock()
+            mock_load.return_value = expected
+            result = load_isles_dataset()
+            assert result is expected
+    def test_handles_load_error(self) -> None:
+        """Wraps HF errors in DataLoadError."""
+        with patch("stroke_deepisles_demo.data.loader.load_dataset") as mock_load:
+            mock_load.side_effect = Exception("Network error")
+            with pytest.raises(DataLoadError, match="Network error"):
+                load_isles_dataset()
+class TestGetDatasetInfo:
+    """Tests for get_dataset_info."""
+    def test_returns_datasetinfo(self) -> None:
+        """Returns DatasetInfo with expected fields."""
+        with patch("stroke_deepisles_demo.data.loader.load_dataset") as mock_load:
+            mock_ds = MagicMock()
+            mock_ds.__len__ = MagicMock(return_value=149)
+            mock_ds.features = {"dwi": ..., "adc": ..., "mask": ...}
+            mock_load.return_value = mock_ds
+            info = get_dataset_info()
+            assert isinstance(info, DatasetInfo)
+            assert info.num_cases == 149
+@pytest.mark.integration
+class TestLoadIslesDatasetIntegration:
+    """Integration tests that hit the real HuggingFace Hub."""
+    @pytest.mark.slow
+    def test_load_real_dataset(self) -> None:
+        """Actually loads ISLES24-MR-Lite from HF Hub."""
+        # This test requires network access
+        # Run with: pytest -m integration
+        dataset = load_isles_dataset(streaming=True)
+        # Just verify we got something
+        assert dataset is not None
+```
+### what to mock
+- `datasets.load_dataset` - Mock for unit tests, real for integration tests
+- `huggingface_hub` calls - Mock for unit tests
+- File system operations - Use `temp_dir` fixture with real files
+### what to test for real
+- NIfTI file creation/reading with nibabel
+- File staging (copy/symlink operations)
+- Integration test: actual HF Hub download (marked `@pytest.mark.integration`)
+## "done" criteria
+Phase 1 is complete when:
+1. All unit tests pass: `uv run pytest tests/data/ -v`
+2. Can load synthetic test cases without network
+3. Can list case IDs from mock dataset
+4. Can stage files with correct DeepISLES naming
+5. Integration test passes (with network): `uv run pytest -m integration`
+6. Type checking passes: `uv run mypy src/stroke_deepisles_demo/data/`
+7. Code coverage for data module > 80%
+## implementation notes
+- ISLES24-MR-Lite structure needs investigation - check HF page for exact column names
+- Consider using `huggingface_hub.snapshot_download` if `datasets.load_dataset` has issues with NIfTI
+- Staging can use symlinks on Unix, copies on Windows
+- Cache the HF dataset locally to avoid repeated downloads
+### critical: streaming mode + docker materialization
+**Reviewer feedback (valid)**: When using `streaming=True`, the dataset returns URLs or lazy file objects, NOT local POSIX paths. Docker requires physical files on the host disk for volume mounting.
+**Solution**: The `stage_case_for_deepisles` function MUST handle materialization:
+```python
+def stage_case_for_deepisles(
+    case_files: CaseFiles,
+    output_dir: Path,
+    *,
+    case_id: str | None = None,
+) -> StagedCase:
+    """
+    Stage case files with DeepISLES-expected naming.
+    IMPORTANT: This function handles both local paths and streaming data.
+    When files come from streaming mode, they must be downloaded/materialized
+    before Docker can mount them.
+    """
+    output_dir.mkdir(parents=True, exist_ok=True)
+    # Handle DWI - may be Path, URL, or NIfTI object
+    dwi_staged = output_dir / "dwi.nii.gz"
+    _materialize_nifti(case_files["dwi"], dwi_staged)
+    # Handle ADC
+    adc_staged = output_dir / "adc.nii.gz"
+    _materialize_nifti(case_files["adc"], adc_staged)
+    # ... etc
+def _materialize_nifti(source: Path | str | bytes | NiftiImage, dest: Path) -> None:
+    """
+    Materialize a NIfTI file to a local path.
+    Handles:
+    - Local Path: copy or symlink
+    - URL string: download
+    - bytes: write directly
+    - NIfTI object: serialize with nibabel
+    """
+    if isinstance(source, Path) and source.exists():
+        # Local file - symlink if possible, copy otherwise
+        shutil.copy2(source, dest)
+    elif isinstance(source, str) and source.startswith(("http://", "https://")):
+        # URL - download
+        _download_file(source, dest)
+    elif isinstance(source, bytes):
+        # Raw bytes
+        dest.write_bytes(source)
+    elif hasattr(source, "to_bytes"):
+        # NIfTI object (nibabel or wrapper)
+        dest.write_bytes(source.to_bytes())
+    else:
+        raise MissingInputError(f"Cannot materialize source: {type(source)}")
+```
+This ensures Docker always gets physical files regardless of how data was loaded.
+## dependencies to add
+No new dependencies needed - all specified in Phase 0:
+- `datasets` (Tobias fork)
+- `nibabel`
+- `numpy`

docs/specs/03-phase-2-deepisles-docker.md ADDED Viewed

	@@ -0,0 +1,884 @@

+# phase 2: deepisles docker integration
+## purpose
+Create a Python wrapper that calls the DeepISLES Docker image as a black box. At the end of this phase, we can run stroke lesion segmentation on a folder of NIfTI files and get back the predicted mask.
+## deliverables
+- [ ] `src/stroke_deepisles_demo/inference/docker.py` - Docker execution wrapper
+- [ ] `src/stroke_deepisles_demo/inference/deepisles.py` - DeepISLES-specific CLI interface
+- [ ] Unit tests with subprocess mocking
+- [ ] Integration test (marked, requires Docker)
+## vertical slice outcome
+After this phase, you can run:
+```python
+from stroke_deepisles_demo.inference import run_deepisles_on_folder
+# input_dir contains: dwi.nii.gz, adc.nii.gz
+result = run_deepisles_on_folder(
+    input_dir=Path("/path/to/staged/case"),
+    fast=True,
+)
+print(f"Prediction mask: {result.prediction_path}")
+print(f"Elapsed: {result.elapsed_seconds:.1f}s")
+```
+## module structure
+```
+src/stroke_deepisles_demo/inference/
+├── __init__.py          # Public API exports
+├── docker.py            # Generic Docker execution utilities
+└── deepisles.py         # DeepISLES-specific wrapper
+```
+## deepisles cli reference
+From the [DeepIsles repository](https://github.com/ezequieldlrosa/DeepIsles), the Docker interface expects:
+```bash
+docker run --rm \
+    -v /path/to/input:/input \
+    -v /path/to/output:/output \
+    --gpus all \
+    isleschallenge/deepisles \
+    --dwi_file_name dwi.nii.gz \
+    --adc_file_name adc.nii.gz \
+    [--flair_file_name flair.nii.gz] \
+    --fast True  # Single model mode, faster
+```
+**Expected input files:**
+- `dwi.nii.gz` (required) - Diffusion-weighted imaging
+- `adc.nii.gz` (required) - Apparent diffusion coefficient
+- `flair.nii.gz` (optional) - FLAIR sequence
+**Output:**
+- `results/` directory containing the lesion mask
+## interfaces and types
+### `inference/docker.py`
+```python
+"""Docker execution utilities."""
+from __future__ import annotations
+import subprocess
+from dataclasses import dataclass
+from pathlib import Path
+from typing import Sequence
+from stroke_deepisles_demo.core.exceptions import DockerNotAvailableError
+@dataclass(frozen=True)
+class DockerRunResult:
+    """Result of a Docker container run."""
+    exit_code: int
+    stdout: str
+    stderr: str
+    elapsed_seconds: float
+def check_docker_available() -> bool:
+    """
+    Check if Docker is installed and the daemon is running.
+    Returns:
+        True if Docker is available, False otherwise
+    """
+    ...
+def ensure_docker_available() -> None:
+    """
+    Ensure Docker is available, raising if not.
+    Raises:
+        DockerNotAvailableError: If Docker is not installed or not running
+    """
+    ...
+def pull_image_if_missing(image: str, *, timeout: float = 600) -> bool:
+    """
+    Pull a Docker image if not present locally.
+    Args:
+        image: Docker image name (e.g., "isleschallenge/deepisles")
+        timeout: Maximum seconds to wait for pull
+    Returns:
+        True if image was pulled, False if already present
+    """
+    ...
+def run_container(
+    image: str,
+    *,
+    command: Sequence[str] | None = None,
+    volumes: dict[Path, str] | None = None,  # host_path -> container_path
+    environment: dict[str, str] | None = None,
+    gpu: bool = False,
+    remove: bool = True,
+    timeout: float | None = None,
+) -> DockerRunResult:
+    """
+    Run a Docker container and wait for completion.
+    Args:
+        image: Docker image name
+        command: Command to run in container
+        volumes: Volume mounts (host path -> container path)
+        environment: Environment variables
+        gpu: If True, pass --gpus all
+        remove: If True, remove container after exit (--rm)
+        timeout: Maximum seconds to wait (None = no timeout)
+    Returns:
+        DockerRunResult with exit code, stdout, stderr, elapsed time
+    Raises:
+        DockerNotAvailableError: If Docker is not available
+        subprocess.TimeoutExpired: If timeout exceeded
+    """
+    ...
+def build_docker_command(
+    image: str,
+    *,
+    command: Sequence[str] | None = None,
+    volumes: dict[Path, str] | None = None,
+    environment: dict[str, str] | None = None,
+    gpu: bool = False,
+    remove: bool = True,
+) -> list[str]:
+    """
+    Build the docker run command without executing.
+    Useful for logging/debugging.
+    Returns:
+        List of command arguments for subprocess
+    """
+    ...
+```
+### `inference/deepisles.py`
+```python
+"""DeepISLES stroke segmentation wrapper."""
+from __future__ import annotations
+import time
+from dataclasses import dataclass
+from pathlib import Path
+from stroke_deepisles_demo.core.config import settings
+from stroke_deepisles_demo.core.exceptions import DeepISLESError, MissingInputError
+from stroke_deepisles_demo.inference.docker import (
+    DockerRunResult,
+    ensure_docker_available,
+    run_container,
+)
+@dataclass(frozen=True)
+class DeepISLESResult:
+    """Result of DeepISLES inference."""
+    prediction_path: Path
+    docker_result: DockerRunResult
+    elapsed_seconds: float
+def validate_input_folder(input_dir: Path) -> tuple[Path, Path, Path | None]:
+    """
+    Validate that input folder contains required files.
+    Args:
+        input_dir: Directory to validate
+    Returns:
+        Tuple of (dwi_path, adc_path, flair_path_or_none)
+    Raises:
+        MissingInputError: If required files are missing
+    """
+    ...
+def run_deepisles_on_folder(
+    input_dir: Path,
+    *,
+    output_dir: Path | None = None,
+    fast: bool = True,
+    gpu: bool = True,
+    timeout: float | None = 1800,  # 30 minutes default
+) -> DeepISLESResult:
+    """
+    Run DeepISLES stroke segmentation on a folder of NIfTI files.
+    Args:
+        input_dir: Directory containing dwi.nii.gz, adc.nii.gz, [flair.nii.gz]
+        output_dir: Where to write results (default: input_dir/results)
+        fast: If True, use single-model mode (faster, slightly less accurate)
+        gpu: If True, use GPU acceleration
+        timeout: Maximum seconds to wait for inference
+    Returns:
+        DeepISLESResult with path to prediction mask
+    Raises:
+        DockerNotAvailableError: If Docker is not available
+        MissingInputError: If required input files are missing
+        DeepISLESError: If inference fails (non-zero exit, missing output)
+    Example:
+        >>> result = run_deepisles_on_folder(Path("/data/case001"), fast=True)
+        >>> print(result.prediction_path)
+        /data/case001/results/prediction.nii.gz
+    """
+    ...
+def find_prediction_mask(output_dir: Path) -> Path:
+    """
+    Find the prediction mask in DeepISLES output directory.
+    DeepISLES outputs may have varying names depending on version.
+    This function finds the most likely prediction file.
+    Args:
+        output_dir: DeepISLES output directory
+    Returns:
+        Path to the prediction mask NIfTI file
+    Raises:
+        DeepISLESError: If no prediction mask found
+    """
+    ...
+# Constants
+DEEPISLES_IMAGE = "isleschallenge/deepisles"
+EXPECTED_INPUT_FILES = ["dwi.nii.gz", "adc.nii.gz"]
+OPTIONAL_INPUT_FILES = ["flair.nii.gz"]
+```
+### `inference/__init__.py` (public API)
+```python
+"""Inference module for stroke-deepisles-demo."""
+from stroke_deepisles_demo.inference.deepisles import (
+    DEEPISLES_IMAGE,
+    DeepISLESResult,
+    run_deepisles_on_folder,
+    validate_input_folder,
+)
+from stroke_deepisles_demo.inference.docker import (
+    DockerRunResult,
+    build_docker_command,
+    check_docker_available,
+    ensure_docker_available,
+    run_container,
+)
+__all__ = [
+    # DeepISLES
+    "run_deepisles_on_folder",
+    "validate_input_folder",
+    "DeepISLESResult",
+    "DEEPISLES_IMAGE",
+    # Docker utilities
+    "check_docker_available",
+    "ensure_docker_available",
+    "run_container",
+    "build_docker_command",
+    "DockerRunResult",
+]
+```
+## tdd plan
+### test file structure
+```
+tests/
+├── inference/
+│   ├── __init__.py
+│   ├── test_docker.py       # Tests for Docker utilities
+│   └── test_deepisles.py    # Tests for DeepISLES wrapper
+```
+### tests to write first (TDD order)
+#### 1. `tests/inference/test_docker.py`
+```python
+"""Tests for Docker utilities."""
+from __future__ import annotations
+import subprocess
+from pathlib import Path
+from unittest.mock import MagicMock, patch
+import pytest
+from stroke_deepisles_demo.core.exceptions import DockerNotAvailableError
+from stroke_deepisles_demo.inference.docker import (
+    build_docker_command,
+    check_docker_available,
+    ensure_docker_available,
+    run_container,
+)
+class TestCheckDockerAvailable:
+    """Tests for check_docker_available."""
+    def test_returns_true_when_docker_responds(self) -> None:
+        """Returns True when 'docker info' succeeds."""
+        with patch("subprocess.run") as mock_run:
+            mock_run.return_value = MagicMock(returncode=0)
+            result = check_docker_available()
+            assert result is True
+    def test_returns_false_when_docker_not_found(self) -> None:
+        """Returns False when docker command not found."""
+        with patch("subprocess.run") as mock_run:
+            mock_run.side_effect = FileNotFoundError()
+            result = check_docker_available()
+            assert result is False
+    def test_returns_false_when_daemon_not_running(self) -> None:
+        """Returns False when docker daemon not running."""
+        with patch("subprocess.run") as mock_run:
+            mock_run.return_value = MagicMock(returncode=1)
+            result = check_docker_available()
+            assert result is False
+class TestEnsureDockerAvailable:
+    """Tests for ensure_docker_available."""
+    def test_raises_when_docker_not_available(self) -> None:
+        """Raises DockerNotAvailableError when Docker not available."""
+        with patch(
+            "stroke_deepisles_demo.inference.docker.check_docker_available",
+            return_value=False,
+        ):
+            with pytest.raises(DockerNotAvailableError):
+                ensure_docker_available()
+    def test_no_error_when_docker_available(self) -> None:
+        """No exception when Docker is available."""
+        with patch(
+            "stroke_deepisles_demo.inference.docker.check_docker_available",
+            return_value=True,
+        ):
+            ensure_docker_available()  # Should not raise
+class TestBuildDockerCommand:
+    """Tests for build_docker_command."""
+    def test_basic_command(self) -> None:
+        """Builds basic docker run command."""
+        cmd = build_docker_command("myimage:latest")
+        assert cmd[0] == "docker"
+        assert "run" in cmd
+        assert "myimage:latest" in cmd
+    def test_includes_rm_flag(self) -> None:
+        """Includes --rm when remove=True."""
+        cmd = build_docker_command("myimage", remove=True)
+        assert "--rm" in cmd
+    def test_excludes_rm_flag(self) -> None:
+        """Excludes --rm when remove=False."""
+        cmd = build_docker_command("myimage", remove=False)
+        assert "--rm" not in cmd
+    def test_includes_gpu_flag(self) -> None:
+        """Includes --gpus all when gpu=True."""
+        cmd = build_docker_command("myimage", gpu=True)
+        assert "--gpus" in cmd
+        gpu_index = cmd.index("--gpus")
+        assert cmd[gpu_index + 1] == "all"
+    def test_volume_mounts(self, temp_dir: Path) -> None:
+        """Includes volume mounts."""
+        volumes = {temp_dir: "/data"}
+        cmd = build_docker_command("myimage", volumes=volumes)
+        assert "-v" in cmd
+        # Find the volume argument
+        v_index = cmd.index("-v")
+        assert f"{temp_dir}:/data" in cmd[v_index + 1]
+    def test_custom_command(self) -> None:
+        """Appends custom command arguments."""
+        cmd = build_docker_command(
+            "myimage", command=["--input", "/data", "--fast", "True"]
+        )
+        assert "--input" in cmd
+        assert "--fast" in cmd
+class TestRunContainer:
+    """Tests for run_container."""
+    def test_calls_subprocess_with_built_command(self) -> None:
+        """Calls subprocess.run with built command."""
+        with patch("subprocess.run") as mock_run:
+            mock_run.return_value = MagicMock(
+                returncode=0, stdout="output", stderr=""
+            )
+            with patch(
+                "stroke_deepisles_demo.inference.docker.ensure_docker_available"
+            ):
+                run_container("myimage")
+            mock_run.assert_called_once()
+    def test_returns_result_with_exit_code(self) -> None:
+        """Returns DockerRunResult with correct exit code."""
+        with patch("subprocess.run") as mock_run:
+            mock_run.return_value = MagicMock(
+                returncode=42, stdout="out", stderr="err"
+            )
+            with patch(
+                "stroke_deepisles_demo.inference.docker.ensure_docker_available"
+            ):
+                result = run_container("myimage")
+            assert result.exit_code == 42
+    def test_captures_stdout_stderr(self) -> None:
+        """Captures stdout and stderr from container."""
+        with patch("subprocess.run") as mock_run:
+            mock_run.return_value = MagicMock(
+                returncode=0, stdout="hello", stderr="warning"
+            )
+            with patch(
+                "stroke_deepisles_demo.inference.docker.ensure_docker_available"
+            ):
+                result = run_container("myimage")
+            assert result.stdout == "hello"
+            assert result.stderr == "warning"
+    def test_respects_timeout(self) -> None:
+        """Passes timeout to subprocess."""
+        with patch("subprocess.run") as mock_run:
+            mock_run.return_value = MagicMock(returncode=0, stdout="", stderr="")
+            with patch(
+                "stroke_deepisles_demo.inference.docker.ensure_docker_available"
+            ):
+                run_container("myimage", timeout=60.0)
+            call_kwargs = mock_run.call_args.kwargs
+            assert call_kwargs.get("timeout") == 60.0
+@pytest.mark.integration
+class TestDockerIntegration:
+    """Integration tests requiring real Docker."""
+    def test_docker_actually_available(self) -> None:
+        """Docker is actually available on this system."""
+        # This test only runs with -m integration
+        assert check_docker_available() is True
+    def test_can_run_hello_world(self) -> None:
+        """Can run docker hello-world container."""
+        result = run_container("hello-world", timeout=60.0)
+        assert result.exit_code == 0
+        assert "Hello from Docker!" in result.stdout
+```
+#### 2. `tests/inference/test_deepisles.py`
+```python
+"""Tests for DeepISLES wrapper."""
+from __future__ import annotations
+from pathlib import Path
+from unittest.mock import MagicMock, patch
+import pytest
+from stroke_deepisles_demo.core.exceptions import DeepISLESError, MissingInputError
+from stroke_deepisles_demo.inference.deepisles import (
+    DeepISLESResult,
+    find_prediction_mask,
+    run_deepisles_on_folder,
+    validate_input_folder,
+)
+class TestValidateInputFolder:
+    """Tests for validate_input_folder."""
+    def test_succeeds_with_required_files(self, temp_dir: Path) -> None:
+        """Returns paths when required files exist."""
+        (temp_dir / "dwi.nii.gz").touch()
+        (temp_dir / "adc.nii.gz").touch()
+        dwi, adc, flair = validate_input_folder(temp_dir)
+        assert dwi == temp_dir / "dwi.nii.gz"
+        assert adc == temp_dir / "adc.nii.gz"
+        assert flair is None
+    def test_includes_flair_when_present(self, temp_dir: Path) -> None:
+        """Returns FLAIR path when present."""
+        (temp_dir / "dwi.nii.gz").touch()
+        (temp_dir / "adc.nii.gz").touch()
+        (temp_dir / "flair.nii.gz").touch()
+        dwi, adc, flair = validate_input_folder(temp_dir)
+        assert flair == temp_dir / "flair.nii.gz"
+    def test_raises_when_dwi_missing(self, temp_dir: Path) -> None:
+        """Raises MissingInputError when DWI is missing."""
+        (temp_dir / "adc.nii.gz").touch()
+        with pytest.raises(MissingInputError, match="dwi"):
+            validate_input_folder(temp_dir)
+    def test_raises_when_adc_missing(self, temp_dir: Path) -> None:
+        """Raises MissingInputError when ADC is missing."""
+        (temp_dir / "dwi.nii.gz").touch()
+        with pytest.raises(MissingInputError, match="adc"):
+            validate_input_folder(temp_dir)
+class TestFindPredictionMask:
+    """Tests for find_prediction_mask."""
+    def test_finds_prediction_file(self, temp_dir: Path) -> None:
+        """Finds prediction.nii.gz in output directory."""
+        results_dir = temp_dir / "results"
+        results_dir.mkdir()
+        pred_file = results_dir / "prediction.nii.gz"
+        pred_file.touch()
+        result = find_prediction_mask(temp_dir)
+        assert result == pred_file
+    def test_raises_when_no_prediction(self, temp_dir: Path) -> None:
+        """Raises DeepISLESError when no prediction found."""
+        results_dir = temp_dir / "results"
+        results_dir.mkdir()
+        with pytest.raises(DeepISLESError, match="prediction"):
+            find_prediction_mask(temp_dir)
+class TestRunDeepIslesOnFolder:
+    """Tests for run_deepisles_on_folder."""
+    @pytest.fixture
+    def valid_input_dir(self, temp_dir: Path) -> Path:
+        """Create a valid input directory with required files."""
+        (temp_dir / "dwi.nii.gz").touch()
+        (temp_dir / "adc.nii.gz").touch()
+        return temp_dir
+    def test_validates_input_files(self, temp_dir: Path) -> None:
+        """Validates input files before running Docker."""
+        # Missing required files
+        with pytest.raises(MissingInputError):
+            run_deepisles_on_folder(temp_dir)
+    def test_calls_docker_with_correct_image(self, valid_input_dir: Path) -> None:
+        """Calls Docker with DeepISLES image."""
+        with patch(
+            "stroke_deepisles_demo.inference.deepisles.run_container"
+        ) as mock_run:
+            mock_run.return_value = MagicMock(exit_code=0, stdout="", stderr="")
+            # Also mock finding the prediction
+            with patch(
+                "stroke_deepisles_demo.inference.deepisles.find_prediction_mask"
+            ) as mock_find:
+                mock_find.return_value = valid_input_dir / "results" / "pred.nii.gz"
+                run_deepisles_on_folder(valid_input_dir)
+            # Check image name
+            call_args = mock_run.call_args
+            assert "isleschallenge/deepisles" in str(call_args)
+    def test_passes_fast_flag(self, valid_input_dir: Path) -> None:
+        """Passes --fast True when fast=True."""
+        with patch(
+            "stroke_deepisles_demo.inference.deepisles.run_container"
+        ) as mock_run:
+            mock_run.return_value = MagicMock(exit_code=0, stdout="", stderr="")
+            with patch(
+                "stroke_deepisles_demo.inference.deepisles.find_prediction_mask"
+            ) as mock_find:
+                mock_find.return_value = valid_input_dir / "results" / "pred.nii.gz"
+                run_deepisles_on_folder(valid_input_dir, fast=True)
+            # Check --fast in command
+            call_kwargs = mock_run.call_args.kwargs
+            command = call_kwargs.get("command", [])
+            assert "--fast" in command
+    def test_raises_on_docker_failure(self, valid_input_dir: Path) -> None:
+        """Raises DeepISLESError when Docker returns non-zero."""
+        with patch(
+            "stroke_deepisles_demo.inference.deepisles.run_container"
+        ) as mock_run:
+            mock_run.return_value = MagicMock(
+                exit_code=1, stdout="", stderr="Segmentation fault"
+            )
+            with pytest.raises(DeepISLESError, match="failed"):
+                run_deepisles_on_folder(valid_input_dir)
+    def test_returns_result_with_prediction_path(self, valid_input_dir: Path) -> None:
+        """Returns DeepISLESResult with prediction path."""
+        with patch(
+            "stroke_deepisles_demo.inference.deepisles.run_container"
+        ) as mock_run:
+            mock_run.return_value = MagicMock(exit_code=0, stdout="", stderr="")
+            with patch(
+                "stroke_deepisles_demo.inference.deepisles.find_prediction_mask"
+            ) as mock_find:
+                expected_path = valid_input_dir / "results" / "prediction.nii.gz"
+                mock_find.return_value = expected_path
+                result = run_deepisles_on_folder(valid_input_dir)
+            assert isinstance(result, DeepISLESResult)
+            assert result.prediction_path == expected_path
+@pytest.mark.integration
+@pytest.mark.slow
+class TestDeepIslesIntegration:
+    """Integration tests requiring real Docker and DeepISLES image."""
+    def test_real_inference(self, synthetic_case_files) -> None:
+        """Run actual DeepISLES inference on synthetic data."""
+        # This test requires:
+        # 1. Docker available
+        # 2. isleschallenge/deepisles image pulled
+        # 3. GPU (optional but recommended)
+        #
+        # Run with: pytest -m "integration and slow"
+        from stroke_deepisles_demo.data.staging import stage_case_for_deepisles
+        # Stage the synthetic files
+        staged = stage_case_for_deepisles(
+            synthetic_case_files,
+            Path("/tmp/deepisles_test"),
+        )
+        # Run inference
+        result = run_deepisles_on_folder(
+            staged.input_dir,
+            fast=True,
+            gpu=False,  # Might not have GPU in CI
+            timeout=600,
+        )
+        # Verify output exists
+        assert result.prediction_path.exists()
+```
+### what to mock
+- `subprocess.run` - Mock for all unit tests
+- `check_docker_available` - Mock to control Docker availability
+- `run_container` - Mock in DeepISLES tests to avoid Docker
+- File system for prediction finding - Use temp directories
+### what to test for real
+- Command building (no subprocess needed)
+- Input validation (real file system with temp dirs)
+- Integration test: actual Docker hello-world
+- Integration test: actual DeepISLES inference (marked `slow`)
+## "done" criteria
+Phase 2 is complete when:
+1. All unit tests pass: `uv run pytest tests/inference/ -v`
+2. Can build Docker commands correctly
+3. Can validate input folders
+4. Unit tests don't require Docker (all mocked)
+5. Integration test passes with Docker: `uv run pytest -m integration tests/inference/`
+6. Type checking passes: `uv run mypy src/stroke_deepisles_demo/inference/`
+7. Code coverage for inference module > 80%
+## implementation notes
+- Check DeepISLES repo for exact output file names/structure
+- Consider `--gpus all` vs `--gpus '"device=0"'` for GPU selection
+- Timeout should be generous (30+ minutes) for full ensemble mode
+- Log Docker stdout/stderr for debugging
+- Consider streaming Docker output for long-running inference
+### critical: docker file permissions (linux)
+**Reviewer feedback (valid)**: Docker containers run as root by default on Linux. Output files written to mounted volumes will be owned by root:root. The Python process running as a normal user will fail to read or delete these files.
+**Solution**: Pass `--user` flag to match host user:
+```python
+def build_docker_command(
+    image: str,
+    *,
+    volumes: dict[Path, str] | None = None,
+    gpu: bool = False,
+    remove: bool = True,
+    match_user: bool = True,  # NEW: default True on Linux
+) -> list[str]:
+    """Build docker run command."""
+    cmd = ["docker", "run"]
+    if remove:
+        cmd.append("--rm")
+    if gpu:
+        cmd.extend(["--gpus", "all"])
+    # Match host user to avoid permission issues
+    if match_user and sys.platform != "darwin":  # Not needed on macOS
+        import os
+        uid = os.getuid()
+        gid = os.getgid()
+        cmd.extend(["--user", f"{uid}:{gid}"])
+    if volumes:
+        for host_path, container_path in volumes.items():
+            cmd.extend(["-v", f"{host_path}:{container_path}"])
+    cmd.append(image)
+    return cmd
+```
+Alternative: Fix permissions after Docker completes (less clean but works):
+```python
+def fix_docker_output_permissions(output_dir: Path) -> None:
+    """Fix permissions on Docker-created files."""
+    import subprocess
+    # Only needed if running as non-root and files are root-owned
+    try:
+        subprocess.run(
+            ["sudo", "chown", "-R", f"{os.getuid()}:{os.getgid()}", str(output_dir)],
+            check=True,
+            capture_output=True,
+        )
+    except (subprocess.CalledProcessError, FileNotFoundError):
+        pass  # sudo not available or not needed
+```
+### critical: gpu availability check
+**Reviewer feedback (valid)**: We check for Docker daemon but not NVIDIA Container Runtime. A user might have Docker but lack GPU passthrough setup.
+**Solution**: Add GPU-specific availability check:
+```python
+def check_nvidia_docker_available() -> bool:
+    """
+    Check if NVIDIA Container Runtime is available for GPU support.
+    Returns:
+        True if nvidia-docker/nvidia-container-toolkit is configured
+    """
+    try:
+        result = subprocess.run(
+            ["docker", "run", "--rm", "--gpus", "all", "nvidia/cuda:11.0-base", "nvidia-smi"],
+            capture_output=True,
+            timeout=30,
+        )
+        return result.returncode == 0
+    except (subprocess.TimeoutExpired, FileNotFoundError):
+        return False
+def ensure_gpu_available_if_requested(gpu: bool) -> None:
+    """
+    Verify GPU is available if requested, or warn user.
+    Raises:
+        DockerGPUNotAvailableError: If GPU requested but not available
+    """
+    if gpu and not check_nvidia_docker_available():
+        raise DockerGPUNotAvailableError(
+            "GPU requested but NVIDIA Container Runtime not available. "
+            "Either install nvidia-container-toolkit or set gpu=False."
+        )
+```
+Add to exceptions:
+```python
+class DockerGPUNotAvailableError(StrokeDemoError):
+    """GPU requested but NVIDIA Container Runtime not available."""
+```
+### nifti orientation (medium risk)
+**Reviewer feedback (noted)**: DeepISLES may expect specific anatomical orientation (e.g., RAS). BIDS data might be in different orientations.
+**Mitigation**: DeepISLES is trained on ISLES challenge data which follows standard conventions. If issues arise, add orientation checking in staging:
+```python
+def check_nifti_orientation(nifti_path: Path) -> str:
+    """Check NIfTI orientation code (e.g., 'RAS', 'LPS')."""
+    import nibabel as nib
+    img = nib.load(nifti_path)
+    return nib.aff2axcodes(img.affine)
+def conform_to_ras(nifti_path: Path, output_path: Path) -> Path:
+    """Reorient NIfTI to RAS if needed."""
+    import nibabel as nib
+    img = nib.load(nifti_path)
+    # nibabel can reorient - implement if needed
+    ...
+```
+## dependencies to add
+None - all covered in Phase 0.

docs/specs/04-phase-3-pipeline.md ADDED Viewed

	@@ -0,0 +1,705 @@

+# phase 3: end-to-end pipeline (no ui)
+## purpose
+Tie together Phase 1 (data loading) and Phase 2 (DeepISLES inference) into a cohesive pipeline. At the end of this phase, we can run stroke segmentation on any case from ISLES24-MR-Lite with a single function call.
+## deliverables
+- [ ] `src/stroke_deepisles_demo/pipeline.py` - Main orchestration
+- [ ] `src/stroke_deepisles_demo/metrics.py` - Optional Dice computation
+- [ ] CLI entry point for testing
+- [ ] Unit tests with full mocking
+- [ ] Integration test for complete flow
+## vertical slice outcome
+After this phase, you can run:
+```python
+from stroke_deepisles_demo.pipeline import run_pipeline_on_case
+# Run segmentation on a specific case
+result = run_pipeline_on_case("sub-001")
+print(f"Input DWI: {result.input_files.dwi}")
+print(f"Input ADC: {result.input_files.adc}")
+print(f"Prediction: {result.prediction_mask}")
+print(f"Ground truth: {result.ground_truth}")
+print(f"Dice score: {result.dice_score:.3f}")  # if computed
+```
+Or via CLI:
+```bash
+uv run stroke-demo run --case sub-001 --fast
+uv run stroke-demo run --index 0 --output ./results
+uv run stroke-demo list  # List all available cases
+```
+## module structure
+```
+src/stroke_deepisles_demo/
+├── pipeline.py          # Main orchestration
+├── metrics.py           # Dice score computation
+└── cli.py               # CLI entry point (optional)
+```
+## interfaces and types
+### `pipeline.py`
+```python
+"""End-to-end pipeline orchestration."""
+from __future__ import annotations
+import tempfile
+from dataclasses import dataclass
+from pathlib import Path
+from typing import Mapping
+from stroke_deepisles_demo.core.config import settings
+from stroke_deepisles_demo.core.types import CaseFiles, InferenceResult
+from stroke_deepisles_demo.data import CaseAdapter, load_isles_dataset, stage_case_for_deepisles
+from stroke_deepisles_demo.inference import run_deepisles_on_folder
+@dataclass(frozen=True)
+class PipelineResult:
+    """Complete result of running the pipeline on a case."""
+    case_id: str
+    input_files: CaseFiles
+    staged_dir: Path
+    prediction_mask: Path
+    ground_truth: Path | None
+    dice_score: float | None  # None if ground truth unavailable or not computed
+    elapsed_seconds: float
+def run_pipeline_on_case(
+    case_id: str | int,
+    *,
+    dataset_id: str | None = None,
+    output_dir: Path | None = None,
+    fast: bool = True,
+    gpu: bool = True,
+    compute_dice: bool = True,
+    cleanup_staging: bool = False,
+) -> PipelineResult:
+    """
+    Run the complete segmentation pipeline on a single case.
+    This function:
+    1. Loads the case from HuggingFace Hub (or cache)
+    2. Stages NIfTI files with DeepISLES-expected naming
+    3. Runs DeepISLES Docker container
+    4. Optionally computes Dice score against ground truth
+    5. Returns all paths and metrics
+    Args:
+        case_id: Case identifier (string) or index (int)
+        dataset_id: HF dataset ID (default from settings)
+        output_dir: Directory for results (default: temp dir)
+        fast: Use single-model mode (faster)
+        gpu: Use GPU acceleration
+        compute_dice: Compute Dice score if ground truth available
+        cleanup_staging: Remove staging directory after inference
+    Returns:
+        PipelineResult with all paths and optional metrics
+    Raises:
+        DataLoadError: If case cannot be loaded
+        MissingInputError: If required files missing
+        DeepISLESError: If inference fails
+    Example:
+        >>> result = run_pipeline_on_case("sub-001", fast=True)
+        >>> print(f"Dice: {result.dice_score:.3f}")
+    """
+    ...
+def run_pipeline_on_batch(
+    case_ids: list[str | int],
+    *,
+    max_workers: int = 1,
+    **kwargs,
+) -> list[PipelineResult]:
+    """
+    Run pipeline on multiple cases.
+    Note: Parallel execution requires multiple GPUs or sequential mode.
+    Args:
+        case_ids: List of case identifiers or indices
+        max_workers: Number of parallel workers (default 1 for sequential)
+        **kwargs: Passed to run_pipeline_on_case
+    Returns:
+        List of PipelineResult, one per case
+    """
+    ...
+def get_pipeline_summary(results: list[PipelineResult]) -> PipelineSummary:
+    """
+    Compute summary statistics from multiple pipeline results.
+    Returns:
+        Summary with mean Dice, success rate, etc.
+    """
+    ...
+@dataclass(frozen=True)
+class PipelineSummary:
+    """Summary statistics from multiple pipeline runs."""
+    num_cases: int
+    num_successful: int
+    num_failed: int
+    mean_dice: float | None
+    std_dice: float | None
+    min_dice: float | None
+    max_dice: float | None
+    mean_elapsed_seconds: float
+# Internal helper
+def _load_or_get_adapter(
+    dataset_id: str | None = None,
+    cache: dict | None = None,
+) -> CaseAdapter:
+    """Load dataset and return adapter, using cache if available."""
+    ...
+```
+### `metrics.py`
+```python
+"""Metrics for evaluating segmentation quality."""
+from __future__ import annotations
+from pathlib import Path
+import nibabel as nib
+import numpy as np
+from numpy.typing import NDArray
+def compute_dice(
+    prediction: Path | NDArray[np.float64],
+    ground_truth: Path | NDArray[np.float64],
+    *,
+    threshold: float = 0.5,
+) -> float:
+    """
+    Compute Dice similarity coefficient between prediction and ground truth.
+    Dice = 2 * |P ∩ G| / (|P| + |G|)
+    Args:
+        prediction: Path to NIfTI file or numpy array
+        ground_truth: Path to NIfTI file or numpy array
+        threshold: Threshold for binarization (if needed)
+    Returns:
+        Dice coefficient in [0, 1]
+    Raises:
+        ValueError: If shapes don't match
+    """
+    ...
+def compute_volume_ml(
+    mask: Path | NDArray[np.float64],
+    voxel_size_mm: tuple[float, float, float] | None = None,
+) -> float:
+    """
+    Compute lesion volume in milliliters.
+    Args:
+        mask: Path to NIfTI file or numpy array
+        voxel_size_mm: Voxel dimensions in mm (read from NIfTI if None)
+    Returns:
+        Volume in milliliters (mL)
+    """
+    ...
+def load_nifti_as_array(path: Path) -> tuple[NDArray[np.float64], tuple[float, ...]]:
+    """
+    Load NIfTI file and return data array with voxel dimensions.
+    Returns:
+        Tuple of (data_array, voxel_sizes_mm)
+    """
+    ...
+```
+### `cli.py` (optional)
+```python
+"""Command-line interface for stroke-deepisles-demo."""
+from __future__ import annotations
+import argparse
+import sys
+from pathlib import Path
+def main(argv: list[str] | None = None) -> int:
+    """Main CLI entry point."""
+    parser = argparse.ArgumentParser(
+        prog="stroke-demo",
+        description="Run DeepISLES stroke segmentation on HF datasets",
+    )
+    subparsers = parser.add_subparsers(dest="command", required=True)
+    # List command
+    list_parser = subparsers.add_parser("list", help="List available cases")
+    list_parser.add_argument(
+        "--dataset", default=None, help="HF dataset ID"
+    )
+    # Run command
+    run_parser = subparsers.add_parser("run", help="Run segmentation")
+    run_parser.add_argument(
+        "--case", type=str, help="Case ID (e.g., sub-001)"
+    )
+    run_parser.add_argument(
+        "--index", type=int, help="Case index (alternative to --case)"
+    )
+    run_parser.add_argument(
+        "--output", type=Path, default=None, help="Output directory"
+    )
+    run_parser.add_argument(
+        "--fast", action="store_true", default=True, help="Use fast mode"
+    )
+    run_parser.add_argument(
+        "--no-gpu", action="store_true", help="Disable GPU"
+    )
+    args = parser.parse_args(argv)
+    if args.command == "list":
+        return cmd_list(args)
+    elif args.command == "run":
+        return cmd_run(args)
+    return 0
+def cmd_list(args: argparse.Namespace) -> int:
+    """Handle 'list' command."""
+    ...
+def cmd_run(args: argparse.Namespace) -> int:
+    """Handle 'run' command."""
+    ...
+if __name__ == "__main__":
+    sys.exit(main())
+```
+### pyproject.toml addition for CLI
+```toml
+[project.scripts]
+stroke-demo = "stroke_deepisles_demo.cli:main"
+```
+## tdd plan
+### test file structure
+```
+tests/
+├── test_pipeline.py         # Pipeline orchestration tests
+├── test_metrics.py          # Metrics computation tests
+└── test_cli.py              # CLI tests (optional)
+```
+### tests to write first (TDD order)
+#### 1. `tests/test_metrics.py` - Pure functions, no mocks needed
+```python
+"""Tests for metrics module."""
+from __future__ import annotations
+from pathlib import Path
+import nibabel as nib
+import numpy as np
+import pytest
+from stroke_deepisles_demo.metrics import (
+    compute_dice,
+    compute_volume_ml,
+    load_nifti_as_array,
+)
+class TestComputeDice:
+    """Tests for compute_dice."""
+    def test_identical_masks_return_one(self) -> None:
+        """Dice of identical masks is 1.0."""
+        mask = np.array([[[1, 1, 0], [0, 1, 0], [0, 0, 1]]])
+        dice = compute_dice(mask, mask)
+        assert dice == 1.0
+    def test_no_overlap_returns_zero(self) -> None:
+        """Dice of non-overlapping masks is 0.0."""
+        pred = np.array([[[1, 1, 0], [0, 0, 0], [0, 0, 0]]])
+        gt = np.array([[[0, 0, 0], [0, 0, 0], [0, 0, 1]]])
+        dice = compute_dice(pred, gt)
+        assert dice == 0.0
+    def test_partial_overlap(self) -> None:
+        """Dice with partial overlap is between 0 and 1."""
+        pred = np.array([[[1, 1, 0], [0, 0, 0], [0, 0, 0]]])
+        gt = np.array([[[1, 0, 0], [0, 0, 0], [0, 0, 0]]])
+        dice = compute_dice(pred, gt)
+        # Overlap: 1, Pred: 2, GT: 1 -> Dice = 2*1 / (2+1) = 0.667
+        assert 0.6 < dice < 0.7
+    def test_empty_masks_return_one(self) -> None:
+        """Dice of two empty masks is 1.0 (both agree on nothing)."""
+        empty = np.zeros((10, 10, 10))
+        dice = compute_dice(empty, empty)
+        assert dice == 1.0
+    def test_accepts_file_paths(self, temp_dir: Path) -> None:
+        """Can compute Dice from NIfTI file paths."""
+        mask = np.array([[[1, 1, 0], [0, 1, 0], [0, 0, 1]]]).astype(np.float32)
+        img = nib.Nifti1Image(mask, np.eye(4))
+        pred_path = temp_dir / "pred.nii.gz"
+        gt_path = temp_dir / "gt.nii.gz"
+        nib.save(img, pred_path)
+        nib.save(img, gt_path)
+        dice = compute_dice(pred_path, gt_path)
+        assert dice == 1.0
+    def test_shape_mismatch_raises(self) -> None:
+        """Raises ValueError if shapes don't match."""
+        pred = np.zeros((10, 10, 10))
+        gt = np.zeros((10, 10, 5))
+        with pytest.raises(ValueError, match="shape"):
+            compute_dice(pred, gt)
+class TestComputeVolumeMl:
+    """Tests for compute_volume_ml."""
+    def test_computes_volume_from_voxel_size(self) -> None:
+        """Volume computed correctly from voxel dimensions."""
+        # 10x10x10 = 1000 voxels of size 1mm^3 each = 1000mm^3 = 1mL
+        mask = np.ones((10, 10, 10))
+        volume = compute_volume_ml(mask, voxel_size_mm=(1.0, 1.0, 1.0))
+        assert volume == pytest.approx(1.0, rel=0.01)
+    def test_reads_voxel_size_from_nifti(self, temp_dir: Path) -> None:
+        """Reads voxel size from NIfTI header."""
+        mask = np.ones((10, 10, 10)).astype(np.float32)
+        # Affine with 2mm voxels
+        affine = np.diag([2.0, 2.0, 2.0, 1.0])
+        img = nib.Nifti1Image(mask, affine)
+        path = temp_dir / "mask.nii.gz"
+        nib.save(img, path)
+        # 1000 voxels * 8mm^3 = 8000mm^3 = 8mL
+        volume = compute_volume_ml(path)
+        assert volume == pytest.approx(8.0, rel=0.01)
+class TestLoadNiftiAsArray:
+    """Tests for load_nifti_as_array."""
+    def test_returns_array_and_voxel_sizes(self, temp_dir: Path) -> None:
+        """Returns data array and voxel dimensions."""
+        data = np.random.rand(10, 10, 10).astype(np.float32)
+        affine = np.diag([1.5, 1.5, 2.0, 1.0])
+        img = nib.Nifti1Image(data, affine)
+        path = temp_dir / "test.nii.gz"
+        nib.save(img, path)
+        arr, voxels = load_nifti_as_array(path)
+        assert arr.shape == (10, 10, 10)
+        assert voxels == pytest.approx((1.5, 1.5, 2.0), rel=0.01)
+```
+#### 2. `tests/test_pipeline.py` - Full orchestration with mocks
+```python
+"""Tests for pipeline orchestration."""
+from __future__ import annotations
+from pathlib import Path
+from unittest.mock import MagicMock, patch
+import pytest
+from stroke_deepisles_demo.core.types import CaseFiles
+from stroke_deepisles_demo.pipeline import (
+    PipelineResult,
+    PipelineSummary,
+    get_pipeline_summary,
+    run_pipeline_on_case,
+)
+class TestRunPipelineOnCase:
+    """Tests for run_pipeline_on_case."""
+    @pytest.fixture
+    def mock_dependencies(self, temp_dir: Path):
+        """Mock all external dependencies."""
+        with patch(
+            "stroke_deepisles_demo.pipeline.load_isles_dataset"
+        ) as mock_load, patch(
+            "stroke_deepisles_demo.pipeline.CaseAdapter"
+        ) as mock_adapter_cls, patch(
+            "stroke_deepisles_demo.pipeline.stage_case_for_deepisles"
+        ) as mock_stage, patch(
+            "stroke_deepisles_demo.pipeline.run_deepisles_on_folder"
+        ) as mock_inference, patch(
+            "stroke_deepisles_demo.pipeline.compute_dice"
+        ) as mock_dice:
+            # Configure mocks
+            mock_adapter = MagicMock()
+            mock_adapter.get_case.return_value = CaseFiles(
+                dwi=temp_dir / "dwi.nii.gz",
+                adc=temp_dir / "adc.nii.gz",
+                flair=None,
+                ground_truth=temp_dir / "gt.nii.gz",
+            )
+            mock_adapter_cls.return_value = mock_adapter
+            mock_stage.return_value = MagicMock(
+                input_dir=temp_dir / "staged",
+                dwi_path=temp_dir / "staged" / "dwi.nii.gz",
+                adc_path=temp_dir / "staged" / "adc.nii.gz",
+                flair_path=None,
+            )
+            mock_inference.return_value = MagicMock(
+                prediction_path=temp_dir / "results" / "pred.nii.gz",
+                elapsed_seconds=10.5,
+            )
+            mock_dice.return_value = 0.85
+            yield {
+                "load": mock_load,
+                "adapter_cls": mock_adapter_cls,
+                "adapter": mock_adapter,
+                "stage": mock_stage,
+                "inference": mock_inference,
+                "dice": mock_dice,
+            }
+    def test_returns_pipeline_result(self, mock_dependencies, temp_dir) -> None:
+        """Returns PipelineResult with expected fields."""
+        result = run_pipeline_on_case("sub-001")
+        assert isinstance(result, PipelineResult)
+        assert result.case_id == "sub-001"
+    def test_loads_case_from_adapter(self, mock_dependencies, temp_dir) -> None:
+        """Loads case using CaseAdapter."""
+        run_pipeline_on_case("sub-001")
+        mock_dependencies["adapter"].get_case.assert_called_once_with("sub-001")
+    def test_stages_files_for_deepisles(self, mock_dependencies, temp_dir) -> None:
+        """Stages files with correct naming."""
+        run_pipeline_on_case("sub-001")
+        mock_dependencies["stage"].assert_called_once()
+    def test_runs_deepisles_inference(self, mock_dependencies, temp_dir) -> None:
+        """Runs DeepISLES on staged directory."""
+        run_pipeline_on_case("sub-001", fast=True, gpu=False)
+        mock_dependencies["inference"].assert_called_once()
+        call_kwargs = mock_dependencies["inference"].call_args.kwargs
+        assert call_kwargs.get("fast") is True
+        assert call_kwargs.get("gpu") is False
+    def test_computes_dice_when_ground_truth_available(
+        self, mock_dependencies, temp_dir
+    ) -> None:
+        """Computes Dice score when ground truth is available."""
+        result = run_pipeline_on_case("sub-001", compute_dice=True)
+        mock_dependencies["dice"].assert_called_once()
+        assert result.dice_score == 0.85
+    def test_skips_dice_when_disabled(self, mock_dependencies, temp_dir) -> None:
+        """Skips Dice computation when compute_dice=False."""
+        result = run_pipeline_on_case("sub-001", compute_dice=False)
+        mock_dependencies["dice"].assert_not_called()
+        assert result.dice_score is None
+    def test_handles_missing_ground_truth(self, mock_dependencies, temp_dir) -> None:
+        """Handles cases without ground truth gracefully."""
+        # Modify mock to return no ground truth
+        mock_dependencies["adapter"].get_case.return_value = CaseFiles(
+            dwi=temp_dir / "dwi.nii.gz",
+            adc=temp_dir / "adc.nii.gz",
+            flair=None,
+            ground_truth=None,
+        )
+        result = run_pipeline_on_case("sub-001", compute_dice=True)
+        assert result.dice_score is None
+        assert result.ground_truth is None
+    def test_accepts_integer_index(self, mock_dependencies, temp_dir) -> None:
+        """Accepts integer index as case identifier."""
+        mock_dependencies["adapter"].get_case_by_index.return_value = (
+            "sub-001",
+            CaseFiles(
+                dwi=temp_dir / "dwi.nii.gz",
+                adc=temp_dir / "adc.nii.gz",
+                flair=None,
+                ground_truth=None,
+            ),
+        )
+        result = run_pipeline_on_case(0)
+        assert result.case_id == "sub-001"
+class TestGetPipelineSummary:
+    """Tests for get_pipeline_summary."""
+    def test_computes_mean_dice(self) -> None:
+        """Computes mean Dice from results."""
+        results = [
+            MagicMock(dice_score=0.8, elapsed_seconds=10),
+            MagicMock(dice_score=0.9, elapsed_seconds=12),
+            MagicMock(dice_score=0.7, elapsed_seconds=8),
+        ]
+        summary = get_pipeline_summary(results)
+        assert summary.mean_dice == pytest.approx(0.8, rel=0.01)
+    def test_handles_none_dice_scores(self) -> None:
+        """Handles results with None Dice scores."""
+        results = [
+            MagicMock(dice_score=0.8, elapsed_seconds=10),
+            MagicMock(dice_score=None, elapsed_seconds=12),
+            MagicMock(dice_score=0.7, elapsed_seconds=8),
+        ]
+        summary = get_pipeline_summary(results)
+        # Mean of 0.8 and 0.7 only
+        assert summary.mean_dice == pytest.approx(0.75, rel=0.01)
+    def test_counts_successful_and_failed(self) -> None:
+        """Counts successful and failed runs."""
+        results = [
+            MagicMock(dice_score=0.8, elapsed_seconds=10),
+            MagicMock(dice_score=None, elapsed_seconds=0),  # Failed
+        ]
+        summary = get_pipeline_summary(results)
+        assert summary.num_cases == 2
+        assert summary.num_successful == 1
+        assert summary.num_failed == 1
+@pytest.mark.integration
+class TestPipelineIntegration:
+    """Integration tests for full pipeline."""
+    @pytest.mark.slow
+    def test_run_on_real_case(self) -> None:
+        """Run pipeline on actual ISLES24-MR-Lite case."""
+        # Requires: network, Docker, DeepISLES image
+        # Run with: pytest -m "integration and slow"
+        result = run_pipeline_on_case(
+            0,  # First case
+            fast=True,
+            gpu=False,
+            compute_dice=True,
+        )
+        assert result.prediction_mask.exists()
+        assert 0 <= result.dice_score <= 1
+```
+### what to mock
+- `load_isles_dataset` - Avoid network calls
+- `CaseAdapter` - Return synthetic CaseFiles
+- `stage_case_for_deepisles` - Return mock staged paths
+- `run_deepisles_on_folder` - Avoid Docker
+- `compute_dice` - Return fixed value for deterministic tests
+### what to test for real
+- Dice computation (pure NumPy)
+- Volume computation (pure NumPy + nibabel)
+- NIfTI loading
+- Integration: full pipeline on real data
+## "done" criteria
+Phase 3 is complete when:
+1. All unit tests pass: `uv run pytest tests/test_pipeline.py tests/test_metrics.py -v`
+2. Dice computation is correct for known test cases
+3. Pipeline orchestrates all components correctly
+4. CLI works: `uv run stroke-demo list` and `uv run stroke-demo run --index 0`
+5. Integration test passes: `uv run pytest -m "integration and slow"`
+6. Type checking passes: `uv run mypy src/stroke_deepisles_demo/pipeline.py src/stroke_deepisles_demo/metrics.py`
+7. Code coverage for pipeline module > 80%
+## implementation notes
+- Use dataclasses for results (immutable, typed)
+- Consider caching the loaded dataset in module-level variable
+- Dice should handle edge cases (empty masks, shape mismatches)
+- CLI is optional but useful for manual testing
+- Batch processing is sequential by default (GPU constraint)

docs/specs/05-phase-4-gradio-ui.md ADDED Viewed

	@@ -0,0 +1,817 @@

+# phase 4: gradio / spaces app
+## purpose
+Build a minimal but clean Gradio 5 app that allows interactive case selection, segmentation, and visualization. At the end of this phase, we have a deployable Hugging Face Space.
+## deliverables
+- [ ] `src/stroke_deepisles_demo/ui/app.py` - Main Gradio application
+- [ ] `src/stroke_deepisles_demo/ui/viewer.py` - NiiVue integration
+- [ ] `src/stroke_deepisles_demo/ui/components.py` - Reusable UI components
+- [ ] `app.py` at repo root - HF Spaces entry point
+- [ ] Unit tests for UI logic (not Gradio itself)
+- [ ] Smoke test for app import
+## vertical slice outcome
+After this phase, you can run locally:
+```bash
+uv run gradio src/stroke_deepisles_demo/ui/app.py
+# or
+uv run python -m stroke_deepisles_demo.ui.app
+```
+And deploy to Hugging Face Spaces with the standard Gradio SDK.
+## module structure
+```
+src/stroke_deepisles_demo/ui/
+├── __init__.py          # Public API
+├── app.py               # Main Gradio application
+├── viewer.py            # NiiVue integration
+└── components.py        # Reusable UI components
+# Root level for HF Spaces
+app.py                   # Entry point: from stroke_deepisles_demo.ui.app import demo
+```
+## gradio 5 considerations
+Based on [Gradio 5 documentation](https://huggingface.co/blog/gradio-5):
+- Server-side rendering (SSR) for fast initial load
+- Improved components (Buttons, Tabs, Sliders)
+- WebRTC support for real-time streaming
+- New built-in themes
+Key patterns:
+```python
+import gradio as gr
+# Gradio 5 app pattern
+with gr.Blocks(theme=gr.themes.Soft()) as demo:
+    gr.Markdown("# Title")
+    with gr.Row():
+        with gr.Column():
+            # Inputs
+            ...
+        with gr.Column():
+            # Outputs
+            ...
+demo.launch()
+```
+## niivue integration strategy
+[NiiVue](https://github.com/niivue/niivue) is a WebGL2-based neuroimaging viewer.
+### proven implementation: tobias's bids-neuroimaging space
+**Reference**: [TobiasPitters/bids-neuroimaging](https://huggingface.co/spaces/TobiasPitters/bids-neuroimaging) - A working HF Space with NiiVue multiplanar + 3D rendering.
+Key patterns from Tobias's implementation:
+1. **FastAPI + raw HTML** (not Gradio) - Cleaner for single-page viewer
+2. **NiiVue via unpkg CDN**: `https://unpkg.com/@niivue/niivue@0.57.0/dist/index.js`
+3. **Base64 data URLs** for NIfTI data (no file serving needed):
+   ```python
+   import base64
+   nifti_bytes = nifti_image.to_bytes()
+   nifti_b64 = base64.b64encode(nifti_bytes).decode("utf-8")
+   data_url = f"data:application/octet-stream;base64,{nifti_b64}"
+   ```
+4. **NiiVue configuration for multiplanar + 3D**:
+   ```javascript
+   nv.setSliceType(nv.sliceTypeMultiplanar);
+   nv.setMultiplanarLayout(2);  // 2x2 grid with 3D render
+   nv.opts.show3Dcrosshair = true;
+   ```
+### recommended approach: hybrid fastapi + gradio
+For our demo, we use a **hybrid approach**:
+- **Gradio** for case selection dropdown and "Run Segmentation" button
+- **FastAPI endpoints** for serving NIfTI data as base64
+- **NiiVue via `gr.HTML`** for interactive 3D visualization
+This gives us:
+- Gradio's nice UI components for inputs
+- Proven NiiVue rendering from Tobias's implementation
+- No iframe complexity
+### concrete implementation
+```python
+import base64
+from pathlib import Path
+import nibabel as nib
+def nifti_to_data_url(nifti_path: Path) -> str:
+    """Convert NIfTI file to base64 data URL for NiiVue."""
+    img = nib.load(nifti_path)
+    nifti_bytes = img.to_bytes()
+    nifti_b64 = base64.b64encode(nifti_bytes).decode("utf-8")
+    return f"data:application/octet-stream;base64,{nifti_b64}"
+def create_niivue_viewer_html(
+    volume_data_url: str,
+    mask_data_url: str | None = None,
+    height: int = 600,
+) -> str:
+    """Create NiiVue HTML viewer with optional mask overlay."""
+    mask_loading = ""
+    if mask_data_url:
+        mask_loading = f"""
+            volumes.push({{
+                url: '{mask_data_url}',
+                colorMap: 'red',
+                opacity: 0.5
+            }});
+        """
+    return f"""
+    <div style="width:100%; height:{height}px; background:#000; border-radius:8px;">
+        <canvas id="niivue-canvas" style="width:100%; height:100%;"></canvas>
+    </div>
+    <script type="module">
+        const niivueModule = await import('https://unpkg.com/@niivue/niivue@0.57.0/dist/index.js');
+        const Niivue = niivueModule.Niivue;
+        const nv = new Niivue({{
+            logging: false,
+            show3Dcrosshair: true,
+            textHeight: 0.04
+        }});
+        await nv.attachTo('niivue-canvas');
+        const volumes = [{{
+            url: '{volume_data_url}',
+            name: 'dwi.nii.gz'
+        }}];
+        {mask_loading}
+        await nv.loadVolumes(volumes);
+        // Multiplanar + 3D view
+        nv.setSliceType(nv.sliceTypeMultiplanar);
+        if (nv.setMultiplanarLayout) {{
+            nv.setMultiplanarLayout(2);
+        }}
+        nv.opts.show3Dcrosshair = true;
+        nv.setRenderAzimuthElevation(120, 10);
+        nv.drawScene();
+    </script>
+    """
+```
+### fallback: matplotlib 2d slices
+For environments where WebGL fails, provide matplotlib fallback:
+```python
+import matplotlib.pyplot as plt
+import nibabel as nib
+def render_slices_fallback(nifti_path: Path, mask_path: Path | None = None) -> Figure:
+    """Render 3-panel slice view with optional mask overlay."""
+    img = nib.load(nifti_path)
+    data = img.get_fdata()
+    fig, axes = plt.subplots(1, 3, figsize=(15, 5))
+    # Middle slices
+    ax_slice = data.shape[2] // 2
+    cor_slice = data.shape[1] // 2
+    sag_slice = data.shape[0] // 2
+    axes[0].imshow(data[:, :, ax_slice].T, cmap='gray', origin='lower')
+    axes[0].set_title('Axial')
+    axes[1].imshow(data[:, cor_slice, :].T, cmap='gray', origin='lower')
+    axes[1].set_title('Coronal')
+    axes[2].imshow(data[sag_slice, :, :].T, cmap='gray', origin='lower')
+    axes[2].set_title('Sagittal')
+    if mask_path:
+        mask = nib.load(mask_path).get_fdata()
+        # Overlay in red with alpha
+        for ax, sl in zip(axes, [mask[:,:,ax_slice].T, mask[:,cor_slice,:].T, mask[sag_slice,:,:].T]):
+            ax.imshow(sl, cmap='Reds', alpha=0.5, origin='lower')
+    return fig
+```
+**Recommendation**: Use NiiVue as primary (proven working), matplotlib as fallback.
+## interfaces and types
+### `ui/app.py`
+```python
+"""Main Gradio application for stroke-deepisles-demo."""
+from __future__ import annotations
+import gradio as gr
+from stroke_deepisles_demo.pipeline import run_pipeline_on_case
+from stroke_deepisles_demo.ui.components import create_case_selector, create_results_display
+from stroke_deepisles_demo.ui.viewer import render_comparison_view
+def create_app() -> gr.Blocks:
+    """
+    Create the Gradio application.
+    Returns:
+        Configured gr.Blocks application
+    """
+    with gr.Blocks(
+        title="Stroke Lesion Segmentation Demo",
+        theme=gr.themes.Soft(),
+    ) as demo:
+        # Header
+        gr.Markdown("""
+        # Stroke Lesion Segmentation Demo
+        This demo runs [DeepISLES](https://github.com/ezequieldlrosa/DeepIsles)
+        stroke segmentation on cases from
+        [ISLES24-MR-Lite](https://huggingface.co/datasets/YongchengYAO/ISLES24-MR-Lite).
+        > **Disclaimer**: This is for research/demonstration only. Not for clinical use.
+        """)
+        with gr.Row():
+            # Left column: Controls
+            with gr.Column(scale=1):
+                case_selector = create_case_selector()
+                run_btn = gr.Button("Run Segmentation", variant="primary")
+                status = gr.Textbox(label="Status", interactive=False)
+            # Right column: Results
+            with gr.Column(scale=2):
+                results_display = create_results_display()
+        # Event handlers
+        run_btn.click(
+            fn=run_segmentation,
+            inputs=[case_selector],
+            outputs=[results_display, status],
+        )
+    return demo
+def run_segmentation(case_id: str) -> tuple[dict, str]:
+    """
+    Run segmentation and return results for display.
+    Args:
+        case_id: Selected case identifier
+    Returns:
+        Tuple of (results_dict, status_message)
+    """
+    ...
+# Module-level app instance for Gradio CLI
+demo = create_app()
+if __name__ == "__main__":
+    demo.launch()
+```
+### `ui/viewer.py`
+```python
+"""Neuroimaging visualization for Gradio."""
+from __future__ import annotations
+from pathlib import Path
+from typing import TYPE_CHECKING
+import numpy as np
+if TYPE_CHECKING:
+    from matplotlib.figure import Figure
+    from numpy.typing import NDArray
+def render_slice_comparison(
+    dwi_path: Path,
+    prediction_path: Path,
+    ground_truth_path: Path | None = None,
+    *,
+    slice_idx: int | None = None,
+    orientation: str = "axial",
+) -> Figure:
+    """
+    Render side-by-side comparison of DWI, prediction, and ground truth.
+    Args:
+        dwi_path: Path to DWI NIfTI
+        prediction_path: Path to predicted mask NIfTI
+        ground_truth_path: Optional path to ground truth mask
+        slice_idx: Slice index (default: middle slice)
+        orientation: One of "axial", "coronal", "sagittal"
+    Returns:
+        Matplotlib figure with comparison view
+    """
+    ...
+def render_3panel_view(
+    nifti_path: Path,
+    mask_path: Path | None = None,
+    *,
+    mask_alpha: float = 0.5,
+    mask_color: str = "red",
+) -> Figure:
+    """
+    Render axial/coronal/sagittal slices with optional mask overlay.
+    Args:
+        nifti_path: Path to base NIfTI volume
+        mask_path: Optional path to mask for overlay
+        mask_alpha: Transparency of mask overlay
+        mask_color: Color for mask overlay
+    Returns:
+        Matplotlib figure with 3-panel view
+    """
+    ...
+def create_niivue_html(
+    volume_url: str,
+    mask_url: str | None = None,
+    *,
+    height: int = 400,
+) -> str:
+    """
+    Create HTML/JS for NiiVue viewer.
+    Args:
+        volume_url: URL to volume NIfTI file
+        mask_url: Optional URL to mask NIfTI file
+        height: Viewer height in pixels
+    Returns:
+        HTML string with embedded NiiVue viewer
+    """
+    template = f"""
+    <div id="gl" style="width:100%; height:{height}px;"></div>
+    <script type="module">
+        import {{ Niivue }} from 'https://niivue.github.io/niivue/features/niivue.esm.js';
+        const nv = new Niivue({{ show3Dcrosshair: true }});
+        nv.attachToCanvas(document.getElementById('gl'));
+        const volumes = [{{ url: '{volume_url}' }}];
+        {'volumes.push({ url: "' + mask_url + '", colorMap: "red", opacity: 0.5 });' if mask_url else ''}
+        await nv.loadVolumes(volumes);
+    </script>
+    """
+    return template
+def get_slice_at_max_lesion(
+    mask_path: Path,
+    orientation: str = "axial",
+) -> int:
+    """
+    Find slice index with maximum lesion area.
+    Useful for displaying the most informative slice.
+    Args:
+        mask_path: Path to lesion mask NIfTI
+        orientation: Slice orientation
+    Returns:
+        Slice index with maximum lesion area
+    """
+    ...
+```
+### `ui/components.py`
+```python
+"""Reusable UI components."""
+from __future__ import annotations
+import gradio as gr
+from stroke_deepisles_demo.data import list_case_ids
+def create_case_selector() -> gr.Dropdown:
+    """
+    Create a dropdown for selecting cases.
+    Returns:
+        Configured gr.Dropdown component
+    """
+    try:
+        case_ids = list_case_ids()
+    except Exception:
+        case_ids = ["Error loading cases"]
+    return gr.Dropdown(
+        choices=case_ids,
+        value=case_ids[0] if case_ids else None,
+        label="Select Case",
+        info="Choose a case from ISLES24-MR-Lite",
+    )
+def create_results_display() -> dict[str, gr.components.Component]:
+    """
+    Create results display components.
+    Returns:
+        Dictionary of component name -> gr.Component
+    """
+    with gr.Group():
+        viewer = gr.Image(label="Segmentation Result", type="filepath")
+        metrics = gr.JSON(label="Metrics")
+        download = gr.File(label="Download Prediction")
+    return {
+        "viewer": viewer,
+        "metrics": metrics,
+        "download": download,
+    }
+def create_settings_accordion() -> dict[str, gr.components.Component]:
+    """
+    Create expandable settings section.
+    Returns:
+        Dictionary of setting name -> gr.Component
+    """
+    with gr.Accordion("Advanced Settings", open=False):
+        fast_mode = gr.Checkbox(
+            value=True,
+            label="Fast Mode",
+            info="Use single model (faster, slightly less accurate)",
+        )
+        show_ground_truth = gr.Checkbox(
+            value=True,
+            label="Show Ground Truth",
+            info="Display ground truth mask if available",
+        )
+    return {
+        "fast_mode": fast_mode,
+        "show_ground_truth": show_ground_truth,
+    }
+```
+### Root `app.py` for HF Spaces
+```python
+"""Entry point for Hugging Face Spaces deployment."""
+from stroke_deepisles_demo.ui.app import demo
+if __name__ == "__main__":
+    demo.launch()
+```
+## hugging face spaces configuration
+### `README.md` header for Spaces
+```yaml
+---
+title: Stroke DeepISLES Demo
+emoji: 🧠
+colorFrom: blue
+colorTo: purple
+sdk: gradio
+sdk_version: 5.0.0
+app_file: app.py
+pinned: false
+license: mit
+---
+```
+### `requirements.txt` for Spaces
+```
+# Note: HF Spaces uses requirements.txt, not pyproject.toml
+git+https://github.com/CloseChoice/datasets.git@feat/bids-loader-streaming-upload-fix
+huggingface-hub>=0.25.0
+nibabel>=5.2.0
+numpy>=1.26.0
+pydantic>=2.5.0
+pydantic-settings>=2.1.0
+gradio>=5.0.0
+matplotlib>=3.8.0
+```
+## tdd plan
+### test file structure
+```
+tests/
+├── ui/
+│   ├── __init__.py
+│   ├── test_viewer.py       # Tests for visualization
+│   ├── test_components.py   # Tests for UI components
+│   └── test_app.py          # Smoke tests for app
+```
+### tests to write first (TDD order)
+#### 1. `tests/ui/test_viewer.py` - Pure visualization functions
+```python
+"""Tests for viewer module."""
+from __future__ import annotations
+from pathlib import Path
+import matplotlib
+import matplotlib.pyplot as plt
+import numpy as np
+import pytest
+matplotlib.use("Agg")  # Non-interactive backend for tests
+from stroke_deepisles_demo.ui.viewer import (
+    create_niivue_html,
+    get_slice_at_max_lesion,
+    render_3panel_view,
+    render_slice_comparison,
+)
+class TestRender3PanelView:
+    """Tests for render_3panel_view."""
+    def test_returns_matplotlib_figure(self, synthetic_nifti_3d: Path) -> None:
+        """Returns a matplotlib Figure object."""
+        fig = render_3panel_view(synthetic_nifti_3d)
+        assert isinstance(fig, plt.Figure)
+        plt.close(fig)
+    def test_has_three_axes(self, synthetic_nifti_3d: Path) -> None:
+        """Figure has 3 subplots (axial, coronal, sagittal)."""
+        fig = render_3panel_view(synthetic_nifti_3d)
+        assert len(fig.axes) == 3
+        plt.close(fig)
+    def test_overlay_mask_when_provided(
+        self, synthetic_nifti_3d: Path, temp_dir: Path
+    ) -> None:
+        """Overlays mask when mask_path provided."""
+        # Create a simple mask
+        import nibabel as nib
+        mask_data = np.zeros((10, 10, 10), dtype=np.uint8)
+        mask_data[4:6, 4:6, 4:6] = 1
+        mask_img = nib.Nifti1Image(mask_data, np.eye(4))
+        mask_path = temp_dir / "mask.nii.gz"
+        nib.save(mask_img, mask_path)
+        fig = render_3panel_view(synthetic_nifti_3d, mask_path=mask_path)
+        # Should not raise
+        assert fig is not None
+        plt.close(fig)
+class TestRenderSliceComparison:
+    """Tests for render_slice_comparison."""
+    def test_comparison_without_ground_truth(
+        self, synthetic_nifti_3d: Path
+    ) -> None:
+        """Works when ground truth is None."""
+        fig = render_slice_comparison(
+            synthetic_nifti_3d,
+            synthetic_nifti_3d,  # Use same as prediction for test
+            ground_truth_path=None,
+        )
+        assert isinstance(fig, plt.Figure)
+        plt.close(fig)
+    def test_comparison_with_ground_truth(
+        self, synthetic_nifti_3d: Path
+    ) -> None:
+        """Works when ground truth is provided."""
+        fig = render_slice_comparison(
+            synthetic_nifti_3d,
+            synthetic_nifti_3d,
+            ground_truth_path=synthetic_nifti_3d,
+        )
+        assert isinstance(fig, plt.Figure)
+        plt.close(fig)
+class TestGetSliceAtMaxLesion:
+    """Tests for get_slice_at_max_lesion."""
+    def test_finds_slice_with_lesion(self, temp_dir: Path) -> None:
+        """Returns slice index where lesion is largest."""
+        import nibabel as nib
+        # Create mask with lesion at slice 7
+        mask_data = np.zeros((10, 10, 10), dtype=np.uint8)
+        mask_data[:, :, 7] = 1  # Full slice 7 is lesion
+        mask_img = nib.Nifti1Image(mask_data, np.eye(4))
+        mask_path = temp_dir / "mask.nii.gz"
+        nib.save(mask_img, mask_path)
+        slice_idx = get_slice_at_max_lesion(mask_path, orientation="axial")
+        assert slice_idx == 7
+    def test_returns_middle_for_empty_mask(self, temp_dir: Path) -> None:
+        """Returns middle slice when mask is empty."""
+        import nibabel as nib
+        mask_data = np.zeros((10, 10, 20), dtype=np.uint8)
+        mask_img = nib.Nifti1Image(mask_data, np.eye(4))
+        mask_path = temp_dir / "mask.nii.gz"
+        nib.save(mask_img, mask_path)
+        slice_idx = get_slice_at_max_lesion(mask_path, orientation="axial")
+        assert slice_idx == 10  # Middle of 20
+class TestCreateNiivueHtml:
+    """Tests for create_niivue_html."""
+    def test_includes_volume_url(self) -> None:
+        """Generated HTML includes the volume URL."""
+        html = create_niivue_html("http://example.com/brain.nii.gz")
+        assert "http://example.com/brain.nii.gz" in html
+    def test_includes_mask_when_provided(self) -> None:
+        """Generated HTML includes mask URL when provided."""
+        html = create_niivue_html(
+            "http://example.com/brain.nii.gz",
+            mask_url="http://example.com/mask.nii.gz",
+        )
+        assert "http://example.com/mask.nii.gz" in html
+    def test_sets_height(self) -> None:
+        """Generated HTML respects height parameter."""
+        html = create_niivue_html(
+            "http://example.com/brain.nii.gz",
+            height=600,
+        )
+        assert "height:600px" in html
+```
+#### 2. `tests/ui/test_app.py` - Smoke tests
+```python
+"""Smoke tests for Gradio app."""
+from __future__ import annotations
+def test_app_module_imports() -> None:
+    """App module imports without side effects."""
+    # This should not launch the app or make network calls
+    from stroke_deepisles_demo.ui import app
+    assert hasattr(app, "create_app")
+    assert hasattr(app, "demo")
+def test_create_app_returns_blocks() -> None:
+    """create_app returns a gr.Blocks instance."""
+    import gradio as gr
+    from stroke_deepisles_demo.ui.app import create_app
+    app = create_app()
+    assert isinstance(app, gr.Blocks)
+def test_viewer_module_imports() -> None:
+    """Viewer module imports without errors."""
+    from stroke_deepisles_demo.ui import viewer
+    assert hasattr(viewer, "render_3panel_view")
+    assert hasattr(viewer, "create_niivue_html")
+def test_components_module_imports() -> None:
+    """Components module imports without errors."""
+    from stroke_deepisles_demo.ui import components
+    assert hasattr(components, "create_case_selector")
+    assert hasattr(components, "create_results_display")
+```
+### what to mock
+- `list_case_ids()` in components - Avoid network during import
+- Any data loading in app initialization
+### what to test for real
+- Matplotlib figure generation
+- NiiVue HTML string generation
+- Slice finding algorithms
+- Module imports (no network side effects)
+## "done" criteria
+Phase 4 is complete when:
+1. All unit tests pass: `uv run pytest tests/ui/ -v`
+2. App launches locally: `uv run python -m stroke_deepisles_demo.ui.app`
+3. Can select a case, click "Run", see visualization
+4. Visualization shows DWI with predicted mask overlay
+5. Metrics (Dice score) displayed
+6. Type checking passes: `uv run mypy src/stroke_deepisles_demo/ui/`
+7. Ready for HF Spaces deployment (README header, requirements.txt)
+## implementation notes
+- **NiiVue is primary** - Proven working in Tobias's Space, not "fragile"
+- **Base64 data URLs** - Avoids file serving complexity, works in all environments
+- **Lazy initialization** - Do NOT call `list_case_ids()` at module import time (causes network calls)
+- **Test on HF Spaces early** - Verify WebGL works in their environment
+- **Keep UI simple** - This is a demo, not a full application
+- **Cache case list** - Avoid repeated HF Hub calls
+### avoiding import-time side effects
+The reviewer correctly noted that `demo = create_app()` at module level triggers network calls. Fix:
+```python
+# BAD - triggers network call on import
+demo = create_app()
+# GOOD - lazy initialization
+_demo: gr.Blocks | None = None
+def get_demo() -> gr.Blocks:
+    global _demo
+    if _demo is None:
+        _demo = create_app()
+    return _demo
+# For Gradio CLI compatibility
+demo = None  # Set lazily
+if __name__ == "__main__":
+    get_demo().launch()
+```
+Or use a factory pattern in the root `app.py`:
+```python
+# app.py (HF Spaces entry point)
+from stroke_deepisles_demo.ui.app import create_app
+demo = create_app()  # Only called when this file is executed
+if __name__ == "__main__":
+    demo.launch()
+```
+## dependencies to add
+```toml
+# Add to pyproject.toml dependencies
+"matplotlib>=3.8.0",
+"fastapi>=0.115.0",  # For API endpoints if using hybrid approach
+"uvicorn[standard]>=0.32.0",  # For local development
+```
+## reference implementation
+Clone Tobias's working Space for reference:
+```
+_reference_repos/bids-neuroimaging-space/
+```
+Key file: `main.py` - Complete NiiVue + FastAPI implementation.

docs/specs/06-phase-5-polish.md ADDED Viewed

	@@ -0,0 +1,667 @@

+# phase 5: polish, observability, and docs
+## purpose
+Add production-quality polish: structured logging, environment-driven configuration, comprehensive documentation, and CI readiness. At the end of this phase, the codebase is maintainable, debuggable, and ready for others to contribute.
+## deliverables
+- [ ] Structured logging throughout all modules
+- [ ] Environment-driven configuration via pydantic-settings
+- [ ] Developer documentation (CONTRIBUTING.md, architecture)
+- [ ] API documentation (docstrings, optional Sphinx/mkdocs)
+- [ ] CI configuration (GitHub Actions)
+- [ ] Final cleanup and code review checklist
+## logging strategy
+### centralized logging setup
+```python
+# src/stroke_deepisles_demo/core/logging.py
+"""Centralized logging configuration."""
+from __future__ import annotations
+import logging
+import sys
+from typing import Literal
+LogLevel = Literal["DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"]
+def setup_logging(
+    level: LogLevel = "INFO",
+    *,
+    format_style: Literal["simple", "detailed", "json"] = "simple",
+) -> None:
+    """
+    Configure logging for the application.
+    Args:
+        level: Minimum log level
+        format_style: Output format style
+    Example:
+        >>> setup_logging("DEBUG", format_style="detailed")
+    """
+    formats = {
+        "simple": "%(levelname)s: %(message)s",
+        "detailed": "%(asctime)s | %(name)s | %(levelname)s | %(message)s",
+        "json": '{"time": "%(asctime)s", "name": "%(name)s", "level": "%(levelname)s", "message": "%(message)s"}',
+    }
+    logging.basicConfig(
+        level=getattr(logging, level),
+        format=formats[format_style],
+        stream=sys.stderr,
+        force=True,
+    )
+    # Reduce noise from libraries
+    logging.getLogger("urllib3").setLevel(logging.WARNING)
+    logging.getLogger("httpx").setLevel(logging.WARNING)
+    logging.getLogger("datasets").setLevel(logging.WARNING)
+def get_logger(name: str) -> logging.Logger:
+    """
+    Get a logger for a module.
+    Args:
+        name: Logger name (typically __name__)
+    Returns:
+        Configured logger instance
+    """
+    return logging.getLogger(f"stroke_demo.{name}")
+```
+### logging usage pattern
+```python
+# In each module
+from stroke_deepisles_demo.core.logging import get_logger
+logger = get_logger(__name__)
+def run_deepisles_on_folder(input_dir: Path, *, fast: bool = True) -> DeepISLESResult:
+    logger.info("Starting DeepISLES inference", extra={"input_dir": str(input_dir), "fast": fast})
+    try:
+        result = _run_docker(...)
+        logger.info("Inference complete", extra={"elapsed": result.elapsed_seconds})
+        return result
+    except Exception as e:
+        logger.error("Inference failed", extra={"error": str(e)}, exc_info=True)
+        raise
+```
+## enhanced configuration
+### `src/stroke_deepisles_demo/core/config.py`
+```python
+"""Application configuration using pydantic-settings."""
+from __future__ import annotations
+from pathlib import Path
+from typing import Literal
+from pydantic import Field, field_validator
+from pydantic_settings import BaseSettings, SettingsConfigDict
+class Settings(BaseSettings):
+    """
+    Application settings loaded from environment variables.
+    All settings can be overridden via environment variables with
+    the STROKE_DEMO_ prefix.
+    Example:
+        export STROKE_DEMO_LOG_LEVEL=DEBUG
+        export STROKE_DEMO_HF_DATASET_ID=my/dataset
+    """
+    model_config = SettingsConfigDict(
+        env_prefix="STROKE_DEMO_",
+        env_file=".env",
+        env_file_encoding="utf-8",
+        extra="ignore",
+    )
+    # Logging
+    log_level: Literal["DEBUG", "INFO", "WARNING", "ERROR"] = "INFO"
+    log_format: Literal["simple", "detailed", "json"] = "simple"
+    # HuggingFace
+    hf_dataset_id: str = "YongchengYAO/ISLES24-MR-Lite"
+    hf_cache_dir: Path | None = None
+    hf_token: str | None = Field(default=None, repr=False)  # Hidden from logs
+    # DeepISLES
+    deepisles_docker_image: str = "isleschallenge/deepisles"
+    deepisles_fast_mode: bool = True
+    deepisles_timeout_seconds: int = 1800  # 30 minutes
+    deepisles_use_gpu: bool = True
+    # Paths
+    temp_dir: Path | None = None
+    results_dir: Path = Path("./results")
+    # UI
+    gradio_server_name: str = "0.0.0.0"
+    gradio_server_port: int = 7860
+    gradio_share: bool = False
+    @field_validator("results_dir", mode="before")
+    @classmethod
+    def ensure_results_dir_exists(cls, v: Path | str) -> Path:
+        """Create results directory if it doesn't exist."""
+        path = Path(v)
+        path.mkdir(parents=True, exist_ok=True)
+        return path
+# Global settings instance
+settings = Settings()
+def get_settings() -> Settings:
+    """Get the current settings instance."""
+    return settings
+def reload_settings() -> Settings:
+    """Reload settings from environment (useful for testing)."""
+    global settings
+    settings = Settings()
+    return settings
+```
+## documentation structure
+```
+docs/
+├── specs/                  # Design specs (these documents)
+│   ├── 00-context.md
+│   ├── 01-phase-0-repo-bootstrap.md
+│   ├── ...
+│   └── 06-phase-5-polish.md
+│
+├── guides/                 # User guides
+│   ├── quickstart.md       # Getting started
+│   ├── configuration.md    # Environment variables
+│   └── deployment.md       # HF Spaces deployment
+│
+└── reference/              # API reference (auto-generated)
+    └── api.md
+# Root level
+README.md                   # Project overview
+CONTRIBUTING.md             # Contribution guidelines
+CHANGELOG.md                # Version history
+```
+### `CONTRIBUTING.md`
+```markdown
+# Contributing to stroke-deepisles-demo
+Thank you for your interest in contributing!
+## Development Setup
+1. **Clone the repository**
+   ```bash
+   git clone https://github.com/The-Obstacle-Is-The-Way/stroke-deepisles-demo.git
+   cd stroke-deepisles-demo
+   ```
+2. **Install uv** (if not already installed)
+   ```bash
+   curl -LsSf https://astral.sh/uv/install.sh | sh
+   ```
+3. **Install dependencies**
+   ```bash
+   uv sync
+   ```
+4. **Install pre-commit hooks**
+   ```bash
+   uv run pre-commit install
+   ```
+## Running Tests
+```bash
+# All tests (excluding integration)
+uv run pytest
+# With coverage
+uv run pytest --cov
+# Integration tests (requires Docker)
+uv run pytest -m integration
+# Slow tests (requires Docker + DeepISLES image)
+uv run pytest -m "integration and slow"
+```
+## Code Quality
+```bash
+# Lint
+uv run ruff check .
+# Format
+uv run ruff format .
+# Type check
+uv run mypy src/
+```
+## Project Structure
+```
+src/stroke_deepisles_demo/
+├── core/           # Shared utilities (config, types, exceptions)
+├── data/           # HF dataset loading and case management
+├── inference/      # DeepISLES Docker integration
+├── ui/             # Gradio application
+├── pipeline.py     # End-to-end orchestration
+└── metrics.py      # Evaluation metrics
+```
+## Pull Request Process
+1. Create a feature branch from `main`
+2. Write tests for new functionality
+3. Ensure all tests pass and code quality checks pass
+4. Update documentation if needed
+5. Submit PR with clear description
+## Code Style
+- Type hints on all functions
+- Docstrings in Google style
+- Keep functions focused and small
+- Prefer explicit over implicit
+```
+### `docs/guides/quickstart.md`
+```markdown
+# Quickstart
+Get started with stroke-deepisles-demo in 5 minutes.
+## Prerequisites
+- Python 3.11+
+- Docker (for DeepISLES inference)
+- ~10GB disk space (for Docker image and datasets)
+## Installation
+```bash
+# Clone
+git clone https://github.com/The-Obstacle-Is-The-Way/stroke-deepisles-demo.git
+cd stroke-deepisles-demo
+# Install
+uv sync
+```
+## Pull DeepISLES Docker Image
+```bash
+docker pull isleschallenge/deepisles
+```
+## Run Locally
+### Option 1: Gradio UI
+```bash
+uv run python -m stroke_deepisles_demo.ui.app
+# Open http://localhost:7860
+```
+### Option 2: CLI
+```bash
+# List available cases
+uv run stroke-demo list
+# Run on a specific case
+uv run stroke-demo run --case sub-001 --fast
+```
+### Option 3: Python API
+```python
+from stroke_deepisles_demo.pipeline import run_pipeline_on_case
+result = run_pipeline_on_case("sub-001", fast=True)
+print(f"Dice score: {result.dice_score:.3f}")
+print(f"Prediction: {result.prediction_mask}")
+```
+## Configuration
+Set environment variables or create a `.env` file:
+```bash
+# .env
+STROKE_DEMO_LOG_LEVEL=DEBUG
+STROKE_DEMO_DEEPISLES_USE_GPU=false  # If no GPU available
+```
+See [Configuration Guide](configuration.md) for all options.
+```
+### `docs/guides/configuration.md`
+```markdown
+# Configuration
+All settings can be configured via environment variables.
+## Environment Variables
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `STROKE_DEMO_LOG_LEVEL` | `INFO` | Logging level (DEBUG, INFO, WARNING, ERROR) |
+| `STROKE_DEMO_LOG_FORMAT` | `simple` | Log format (simple, detailed, json) |
+| `STROKE_DEMO_HF_DATASET_ID` | `YongchengYAO/ISLES24-MR-Lite` | HuggingFace dataset ID |
+| `STROKE_DEMO_HF_CACHE_DIR` | `None` | Custom HF cache directory |
+| `STROKE_DEMO_HF_TOKEN` | `None` | HuggingFace API token (for private datasets) |
+| `STROKE_DEMO_DEEPISLES_DOCKER_IMAGE` | `isleschallenge/deepisles` | DeepISLES Docker image |
+| `STROKE_DEMO_DEEPISLES_FAST_MODE` | `true` | Use single-model mode |
+| `STROKE_DEMO_DEEPISLES_TIMEOUT_SECONDS` | `1800` | Inference timeout |
+| `STROKE_DEMO_DEEPISLES_USE_GPU` | `true` | Use GPU acceleration |
+| `STROKE_DEMO_RESULTS_DIR` | `./results` | Directory for output files |
+## Using .env File
+Create a `.env` file in the project root:
+```bash
+STROKE_DEMO_LOG_LEVEL=DEBUG
+STROKE_DEMO_DEEPISLES_USE_GPU=false
+STROKE_DEMO_RESULTS_DIR=/data/results
+```
+## Programmatic Configuration
+```python
+from stroke_deepisles_demo.core.config import settings, reload_settings
+import os
+# Check current settings
+print(settings.log_level)
+# Override via environment
+os.environ["STROKE_DEMO_LOG_LEVEL"] = "DEBUG"
+reload_settings()
+print(settings.log_level)  # DEBUG
+```
+```
+## ci configuration
+### `.github/workflows/ci.yml`
+```yaml
+name: CI
+on:
+  push:
+    branches: [main]
+  pull_request:
+    branches: [main]
+jobs:
+  lint:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - name: Install uv
+        uses: astral-sh/setup-uv@v4
+      - name: Set up Python
+        run: uv python install 3.12
+      - name: Install dependencies
+        run: uv sync
+      - name: Lint with ruff
+        run: uv run ruff check .
+      - name: Check formatting
+        run: uv run ruff format --check .
+  typecheck:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - name: Install uv
+        uses: astral-sh/setup-uv@v4
+      - name: Set up Python
+        run: uv python install 3.12
+      - name: Install dependencies
+        run: uv sync
+      - name: Type check with mypy
+        run: uv run mypy src/
+  test:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - name: Install uv
+        uses: astral-sh/setup-uv@v4
+      - name: Set up Python
+        run: uv python install 3.12
+      - name: Install dependencies
+        run: uv sync
+      - name: Run tests
+        run: uv run pytest --cov --cov-report=xml
+      - name: Upload coverage
+        uses: codecov/codecov-action@v4
+        with:
+          files: ./coverage.xml
+  integration:
+    runs-on: ubuntu-latest
+    if: github.event_name == 'push' && github.ref == 'refs/heads/main'
+    steps:
+      - uses: actions/checkout@v4
+      - name: Install uv
+        uses: astral-sh/setup-uv@v4
+      - name: Set up Python
+        run: uv python install 3.12
+      - name: Install dependencies
+        run: uv sync
+      - name: Run integration tests
+        run: uv run pytest -m integration --timeout=600
+```
+## final code review checklist
+### code quality
+- [ ] All functions have type hints
+- [ ] All public functions have docstrings
+- [ ] No unused imports or variables
+- [ ] No hardcoded paths or secrets
+- [ ] Error messages are helpful
+### testing
+- [ ] Unit test coverage > 80%
+- [ ] Edge cases covered
+- [ ] Integration tests for critical paths
+- [ ] Tests are deterministic (no flakiness)
+### documentation
+- [ ] README is clear and accurate
+- [ ] CONTRIBUTING.md is complete
+- [ ] All configuration options documented
+- [ ] Example usage in docstrings
+### security
+- [ ] No secrets in code
+- [ ] HF_TOKEN is optional and hidden from logs
+- [ ] Docker commands are properly escaped
+- [ ] No arbitrary code execution vulnerabilities
+### production readiness
+- [ ] Logging is consistent and useful
+- [ ] Errors are handled gracefully
+- [ ] Configuration is environment-driven
+- [ ] CI passes on all checks
+## tdd plan
+### tests for logging
+```python
+"""Tests for logging configuration."""
+from __future__ import annotations
+import logging
+from stroke_deepisles_demo.core.logging import get_logger, setup_logging
+class TestSetupLogging:
+    """Tests for setup_logging."""
+    def test_sets_log_level(self) -> None:
+        """Sets the root logger level."""
+        setup_logging("DEBUG")
+        assert logging.getLogger().level == logging.DEBUG
+    def test_format_styles(self) -> None:
+        """Different format styles work."""
+        for style in ["simple", "detailed", "json"]:
+            setup_logging("INFO", format_style=style)
+            # Should not raise
+class TestGetLogger:
+    """Tests for get_logger."""
+    def test_returns_namespaced_logger(self) -> None:
+        """Returns logger with stroke_demo prefix."""
+        logger = get_logger("my_module")
+        assert logger.name == "stroke_demo.my_module"
+```
+### tests for configuration
+```python
+"""Tests for configuration."""
+from __future__ import annotations
+import os
+from pathlib import Path
+import pytest
+from stroke_deepisles_demo.core.config import Settings, reload_settings
+class TestSettings:
+    """Tests for Settings."""
+    def test_default_values(self) -> None:
+        """Has sensible defaults."""
+        settings = Settings()
+        assert settings.log_level == "INFO"
+        assert settings.hf_dataset_id == "YongchengYAO/ISLES24-MR-Lite"
+    def test_env_override(self, monkeypatch) -> None:
+        """Environment variables override defaults."""
+        monkeypatch.setenv("STROKE_DEMO_LOG_LEVEL", "DEBUG")
+        settings = Settings()
+        assert settings.log_level == "DEBUG"
+    def test_hf_token_hidden_from_repr(self) -> None:
+        """HF token is not visible in repr."""
+        settings = Settings(hf_token="secret123")
+        assert "secret123" not in repr(settings)
+    def test_results_dir_created(self, tmp_path: Path) -> None:
+        """Results directory is created if it doesn't exist."""
+        new_dir = tmp_path / "new_results"
+        settings = Settings(results_dir=new_dir)
+        assert new_dir.exists()
+```
+## "done" criteria
+Phase 5 is complete when:
+1. Structured logging is in place throughout
+2. All settings are configurable via environment
+3. README.md and CONTRIBUTING.md are complete
+4. Developer guides are written
+5. CI workflow passes on GitHub Actions
+6. Code coverage > 80% overall
+7. All code review checklist items pass
+8. Repository is ready for others to contribute
+## final deliverables
+At the end of all phases, the repository contains:
+```
+stroke-deepisles-demo/
+├── .github/
+│   └── workflows/
+│       └── ci.yml
+├── docs/
+│   ├── specs/
+│   ├── guides/
+│   └── reference/
+├── src/
+│   └── stroke_deepisles_demo/
+│       ├── core/
+│       ├── data/
+│       ├── inference/
+│       ├── ui/
+│       ├── pipeline.py
+│       ├── metrics.py
+│       └── cli.py
+├── tests/
+├── pyproject.toml
+├── uv.lock
+├── README.md
+├── CONTRIBUTING.md
+├── CHANGELOG.md
+├── .pre-commit-config.yaml
+├── .gitignore
+├── .env.example
+└── app.py                  # HF Spaces entry point
+```