Spaces:

VibecoderMcSwaggins
/

stroke-deepisles-demo

Paused

App Files Files Community

VibecoderMcSwaggins commited on Dec 13, 2025

Commit

fa1717e

unverified ·

1 Parent(s): 785d976

fix(arch): comprehensive architecture audit fixes (#44)

Browse files

* docs: add NEXT-CONCERNS.md detailing critical architecture debt

Introduces a new document outlining validated concerns regarding config drift and dependency pinning, emphasizing the need for a single source of truth in configuration and ensuring reproducible builds. Highlights resolved issues and confirms the current state of frontend configuration. This documentation aims to guide future development and maintain architectural integrity.

* fix(arch): comprehensive architecture audit fixes

P0 (Critical):
- Dockerfile: Add --extra api to uv sync (fixes missing uvicorn at runtime)

P1 (High):
- Makefile: Add --extra api --extra gradio to install target
- Dockerfile: Update stale StaticFiles comment to reflect explicit routes
- app.py: Update stale deployment comment to reference api.main

P2 (Medium) - Wire in dead config:
- loader.py: Wire Settings.hf_dataset_id and Settings.hf_token through
- deepisles.py: Use Settings.deepisles_docker_image (removed hardcoded constant)
- pipeline.py: Wire timeout and gpu settings through (defaults from config)
- routes.py: Use atomic create_job_if_under_limit to prevent TOCTOU race
- job_store.py: Add create_job_if_under_limit atomic method
- files.py: Add NIfTI extension allowlist (defense-in-depth)

P3 (Low) - Documentation:
- pyproject.toml: Update description to mention React SPA + FastAPI
- README.md: Update to describe React SPA + FastAPI architecture
- requirements.txt: Clarify as pip-only fallback

P4 (Nitpicks) - Code cleanliness:
- deepisles.py: Use EXPECTED_INPUT_FILES/OPTIONAL_INPUT_FILES in validation
- frontend: Extract shared retry constants to utils/retry.ts
- inference/__init__.py: Remove DEEPISLES_IMAGE export (now configurable)

All 157 tests pass. Linter and type checker clean.

Audit documented in: docs/bugs/ARCHITECTURE-AUDIT-2024-12-13.md

* refactor(deepisles): use named constants instead of positional unpacking

Address CodeRabbit nitpick: positional unpacking from EXPECTED_INPUT_FILES
creates coupling to list order. Use explicit DWI_FILENAME, ADC_FILENAME,
FLAIR_FILENAME constants for clarity and robustness.

Lists preserved for backwards-compatible exports.

Files changed (18) hide show

Dockerfile +3 -2
Makefile +1 -1
README.md +6 -3
app.py +6 -4
docs/bugs/ARCHITECTURE-AUDIT-2024-12-13.md +412 -0
NEXT-CONCERNS.md → docs/bugs/NEXT-CONCERNS.md +0 -0
frontend/src/components/CaseSelector.tsx +4 -10
frontend/src/hooks/useSegmentation.ts +7 -17
frontend/src/utils/retry.ts +30 -0
pyproject.toml +1 -1
requirements.txt +3 -2
src/stroke_deepisles_demo/api/files.py +13 -0
src/stroke_deepisles_demo/api/job_store.py +51 -0
src/stroke_deepisles_demo/api/routes.py +12 -8
src/stroke_deepisles_demo/data/loader.py +15 -9
src/stroke_deepisles_demo/inference/__init__.py +1 -2
src/stroke_deepisles_demo/inference/deepisles.py +32 -11
src/stroke_deepisles_demo/pipeline.py +21 -9

Dockerfile CHANGED Viewed

@@ -40,7 +40,8 @@ ENV PATH="$VIRTUAL_ENV/bin:$PATH"
 # Install Python dependencies from lock file (frozen = fail if lock stale)
 # This ensures CI, local dev, and production use IDENTICAL versions
-RUN uv sync --frozen --no-dev --no-install-project
 # Copy application source code and package files
 COPY --chown=1000:1000 README.md /home/user/demo/README.md
@@ -66,7 +67,7 @@ ENV DEEPISLES_PATH=/app
 ENV HF_HOME=/home/user/demo/cache
 # Create directories for data with proper permissions
-# CRITICAL: /tmp/stroke-results is required for FastAPI StaticFiles mount
 RUN mkdir -p /home/user/demo/data /home/user/demo/results /home/user/demo/cache /tmp/stroke-results && \
     chown -R 1000:1000 /home/user/demo /tmp/stroke-results

 # Install Python dependencies from lock file (frozen = fail if lock stale)
 # This ensures CI, local dev, and production use IDENTICAL versions
+# CRITICAL: --extra api installs FastAPI/uvicorn required by CMD
+RUN uv sync --frozen --no-dev --no-install-project --extra api
 # Copy application source code and package files
 COPY --chown=1000:1000 README.md /home/user/demo/README.md
 ENV HF_HOME=/home/user/demo/cache
 # Create directories for data with proper permissions
+# /tmp/stroke-results stores job result files, served via explicit /files/{job_id}/ routes
 RUN mkdir -p /home/user/demo/data /home/user/demo/results /home/user/demo/cache /tmp/stroke-results && \
     chown -R 1000:1000 /home/user/demo /tmp/stroke-results

Makefile CHANGED Viewed

@@ -1,7 +1,7 @@
 .PHONY: install test test-integration test-all lint format check all clean
 install:
-	uv sync
 test:
 	uv run pytest

 .PHONY: install test test-integration test-all lint format check all clean
 install:
+	uv sync --extra api --extra gradio
 test:
 	uv run pytest

README.md CHANGED Viewed

@@ -34,7 +34,9 @@ A demonstration pipeline and UI for ischemic stroke lesion segmentation using **
 This project provides a complete end-to-end workflow:
 1.  **Data Loading**: Lazy-loading of NIfTI neuroimaging data from HuggingFace.
 2.  **Inference**: Running DeepISLES segmentation (SEALS or Ensemble) via Docker.
-3.  **Visualization**: Interactive 3D and multi-planar viewing with NiiVue in Gradio.
 > **Disclaimer**: This software is for research and demonstration purposes only. It is not intended for clinical use.
@@ -43,7 +45,7 @@ This project provides a complete end-to-end workflow:
 -   🧠 **State-of-the-Art Segmentation**: Uses DeepISLES (ISLES'22 winner) for accurate lesion segmentation.
 -   ☁️ **Cloud-Native Data**: Streams data directly from HuggingFace Datasets (no massive downloads).
 -   🐳 **Dockerized Inference**: Encapsulates complex deep learning dependencies in a reproducible container.
--   🖥️ **Interactive UI**: Gradio-based web interface with 3D rendering (NiiVue).
 -   ⚙️ **Production Ready**: Type-safe, tested, and configurable via environment variables.
 ## Requirements
@@ -114,7 +116,8 @@ graph TD
     Staging -->|Mount Volume| Docker[DeepISLES Container]
     Docker -->|Inference| Results[Prediction Mask]
     Results -->|Load| Metrics["Metrics (Dice)"]
-    Results -->|Render| UI["Gradio UI / NiiVue"]
 ```
 ## License

 This project provides a complete end-to-end workflow:
 1.  **Data Loading**: Lazy-loading of NIfTI neuroimaging data from HuggingFace.
 2.  **Inference**: Running DeepISLES segmentation (SEALS or Ensemble) via Docker.
+3.  **Visualization**: Interactive 3D viewing with NiiVue in React SPA + FastAPI backend.
+> **Note**: A legacy Gradio UI is available for local development (`app.py`).
 > **Disclaimer**: This software is for research and demonstration purposes only. It is not intended for clinical use.
 -   🧠 **State-of-the-Art Segmentation**: Uses DeepISLES (ISLES'22 winner) for accurate lesion segmentation.
 -   ☁️ **Cloud-Native Data**: Streams data directly from HuggingFace Datasets (no massive downloads).
 -   🐳 **Dockerized Inference**: Encapsulates complex deep learning dependencies in a reproducible container.
+-   🖥️ **Modern UI**: React SPA + FastAPI backend with NiiVue for 3D neuroimaging visualization.
 -   ⚙️ **Production Ready**: Type-safe, tested, and configurable via environment variables.
 ## Requirements
     Staging -->|Mount Volume| Docker[DeepISLES Container]
     Docker -->|Inference| Results[Prediction Mask]
     Results -->|Load| Metrics["Metrics (Dice)"]
+    Results -->|Serve via API| FastAPI[FastAPI Backend]
+    FastAPI -->|JSON + Files| React[React SPA + NiiVue]
 ```
 ## License

app.py CHANGED Viewed

@@ -1,9 +1,11 @@
-"""Alternative entry point for local development.
-NOTE: HuggingFace Spaces Docker deployment uses `python -m stroke_deepisles_demo.ui.app`
-(see Dockerfile CMD). This file is for local development convenience only.
-For HF Spaces deployment, see: src/stroke_deepisles_demo/ui/app.py
 """
 import gradio as gr

+"""Alternative entry point for local Gradio development.
+NOTE: HuggingFace Spaces Docker deployment uses FastAPI via uvicorn:
+  uvicorn stroke_deepisles_demo.api.main:app --host 0.0.0.0 --port 7860
+(see Dockerfile CMD). This file runs the legacy Gradio UI for local development.
+For HF Spaces deployment, see: src/stroke_deepisles_demo/api/main.py
+For legacy Gradio UI, see: src/stroke_deepisles_demo/ui/app.py
 """
 import gradio as gr

docs/bugs/ARCHITECTURE-AUDIT-2024-12-13.md ADDED Viewed

	@@ -0,0 +1,412 @@

+# Architecture Audit - 2024-12-13
+**Auditor**: Claude Code (validating external analysis)
+**Date**: 2024-12-13
+**Status**: VALIDATED - Fixes in branch `fix/architecture-audit`
+## Summary
+External audit identified multiple issues. This document validates each claim from first principles
+and documents the fix strategy. Per user directive: **wire in settings properly rather than removing them**.
+---
+## P0 - Critical (Release Blockers)
+### P0-001: Docker build missing API extras ⚠️ CONFIRMED
+**Location**: `Dockerfile:43` + `Dockerfile:94`
+**Claim**: Container runs uvicorn but `uv sync --no-dev --no-install-project` doesn't include `--extra api`.
+**Validation**:
+```dockerfile
+# Line 43: Dependencies installed without API extra
+RUN uv sync --frozen --no-dev --no-install-project
+# Line 94: But CMD requires uvicorn (which is in api extra!)
+CMD ["uvicorn", "stroke_deepisles_demo.api.main:app", ...]
+```
+In `pyproject.toml`:
+```toml
+[project.optional-dependencies]
+api = [
+    "fastapi>=0.115.0",
+    "uvicorn[standard]>=0.32.0",
+]
+```
+**Impact**: Container will crash at runtime with `ModuleNotFoundError: No module named 'uvicorn'`
+**Fix**:
+```dockerfile
+RUN uv sync --frozen --no-dev --no-install-project --extra api
+```
+---
+## P1 - High Priority
+### P1-001: Makefile install doesn't include extras ⚠️ CONFIRMED (Minor)
+**Location**: `Makefile:4`
+**Claim**: `make install` runs `uv sync` without extras.
+**Validation**: `uv sync` in dev mode does include dev dependencies but NOT optional extras.
+Tests requiring FastAPI/Gradio may fail.
+**Impact**: Low for dev (most devs run tests via `uv run pytest`), but inconsistent.
+**Fix**: Update Makefile to install extras needed for testing:
+```makefile
+install:
+    uv sync --extra api --extra gradio
+```
+### P1-002: Stale Dockerfile comment about StaticFiles ⚠️ CONFIRMED
+**Location**: `Dockerfile:69`
+**Claim**: Comment says "StaticFiles mount" but we use explicit routes.
+**Validation**:
+```dockerfile
+# Line 69: STALE COMMENT
+# CRITICAL: /tmp/stroke-results is required for FastAPI StaticFiles mount
+# But files.py:1-16 explicitly says we REPLACED StaticFiles:
+# "BUG-004 FIX: This module replaces the StaticFiles mount approach."
+```
+**Impact**: Misleads operators debugging file-serving issues.
+**Fix**: Update comment to reflect explicit route implementation.
+---
+## P2 - Medium Priority (Dead Config → Wire In)
+### P2-001: hf_dataset_id setting not used ⚠️ CONFIRMED
+**Location**: `config.py:79` → `loader.py:213`
+**Claim**: `Settings.hf_dataset_id` exists but `load_isles_dataset()` uses hardcoded `DEFAULT_HF_DATASET`.
+**Validation**:
+```python
+# config.py:79
+hf_dataset_id: str = "hugging-science/isles24-stroke"
+# loader.py:158 (hardcoded duplicate!)
+DEFAULT_HF_DATASET = "hugging-science/isles24-stroke"
+# loader.py:213 (ignores settings)
+dataset_id = str(source) if source else DEFAULT_HF_DATASET
+```
+**Fix**: Wire `get_settings().hf_dataset_id` through the data loading path.
+### P2-002: hf_token setting not used ⚠️ CONFIRMED
+**Location**: `config.py:81` → `loader.py:218`
+**Claim**: `Settings.hf_token` exists but isn't passed to `datasets.load_dataset()`.
+**Validation**:
+```python
+# config.py:81
+hf_token: str | None = Field(default=None, repr=False)
+# loader.py:218 (no token!)
+ds = load_dataset(dataset_id, split="train")
+```
+**Fix**: Pass `token=get_settings().hf_token` to `load_dataset()`.
+### P2-003: deepisles_docker_image setting ignored ⚠️ CONFIRMED
+**Location**: `config.py:84` → `deepisles.py:34`
+**Claim**: Settings exists but hardcoded constant `DEEPISLES_IMAGE` is used.
+**Validation**:
+```python
+# config.py:84
+deepisles_docker_image: str = "isleschallenge/deepisles"
+# deepisles.py:34 (hardcoded!)
+DEEPISLES_IMAGE = "isleschallenge/deepisles"
+# deepisles.py:169 (uses constant, not settings)
+run_container(DEEPISLES_IMAGE, ...)
+```
+**Fix**: Use `get_settings().deepisles_docker_image` in `_run_via_docker()`.
+### P2-004: deepisles_timeout_seconds setting not wired through ⚠️ CONFIRMED
+**Location**: `config.py:86` → `pipeline.py` → `deepisles.py:242`
+**Claim**: Timeout setting exists but pipeline doesn't pass it.
+**Validation**:
+```python
+# config.py:86
+deepisles_timeout_seconds: int = 1800
+# pipeline.py:148-153 (no timeout parameter!)
+inference_result = run_deepisles_on_folder(
+    staged.input_dir,
+    output_dir=results_dir,
+    fast=fast,
+    gpu=gpu,
+    # timeout missing!
+)
+```
+**Fix**: Pass `timeout=get_settings().deepisles_timeout_seconds` through pipeline.
+### P2-005: deepisles_use_gpu setting not used by API ⚠️ CONFIRMED
+**Location**: `config.py:87` → `routes.py:232`
+**Claim**: GPU setting exists but API path doesn't pass it.
+**Validation**:
+```python
+# config.py:87
+deepisles_use_gpu: bool = True
+# routes.py:232-238 (no gpu parameter!)
+result = run_pipeline_on_case(
+    case_id,
+    output_dir=output_dir,
+    fast=fast_mode,
+    compute_dice=True,
+    cleanup_staging=True,
+    # gpu missing!
+)
+```
+**Fix**: Pass `gpu=get_settings().deepisles_use_gpu` through API route.
+### P2-006: Dataset reloaded on every /api/cases call ⚠️ CONFIRMED
+**Location**: `routes.py:49` → `data/__init__.py:34`
+**Claim**: `list_case_ids()` reloads dataset each time.
+**Validation**:
+```python
+# data/__init__.py:34-41
+def list_case_ids() -> list[str]:
+    with load_isles_dataset() as dataset:  # Fresh load every call!
+        return dataset.list_case_ids()
+```
+**Impact**: Unnecessary latency on cold paths, amplifies HF wake-up time.
+**Fix**: Add TTL cache for case IDs list.
+### P2-007: Double dataset load on segment request ⚠️ CONFIRMED
+**Location**: `routes.py:101` → `pipeline.py:90`
+**Claim**: Validation loads dataset, then pipeline loads again.
+**Validation**:
+```python
+# routes.py:101
+valid_cases = list_case_ids()  # First load
+# Then in run_pipeline_on_case (routes.py:232)
+# pipeline.py:90
+with load_isles_dataset() as dataset:  # Second load!
+```
+**Fix**: Remove validation pre-check, let pipeline raise controlled error.
+### P2-008: File download has no extension allowlist ⚠️ CONFIRMED (Low Risk)
+**Location**: `files.py:29`
+**Claim**: Any file under job dir is servable.
+**Validation**: Path traversal is blocked, but no extension filter.
+Currently only NIfTI files end up in results dirs, but defense-in-depth is better.
+**Fix**: Add extension allowlist: `.nii`, `.nii.gz`.
+### P2-009: Concurrency limit check-then-create not atomic ⚠️ CONFIRMED (Mitigated)
+**Location**: `routes.py:92-113`
+**Claim**: TOCTOU race in concurrency limiting.
+**Validation**:
+```python
+# routes.py:92-98
+if store.get_active_job_count() >= max:  # Check
+    raise 503
+# ... other code ...
+store.create_job(job_id, ...)  # Create (gap!)
+```
+**Mitigation**: Single-worker uvicorn (no multi-worker race). But code smell remains.
+**Fix**: Add atomic `create_job_if_under_limit()` method to JobStore.
+### P2-010: Gradio cleanup scope mismatch ⚠️ CONFIRMED
+**Location**: `ui/app.py:67` + `pipeline.py:107`
+**Claim**: Gradio cleanup only checks `results_dir`, but pipeline creates temp in system temp.
+**Validation**:
+```python
+# ui/app.py:67
+allowed_root = get_settings().results_dir.resolve()
+# pipeline.py:107 (when output_dir is None)
+base_temp = Path(tempfile.mkdtemp(prefix="deepisles_pipeline_"))
+# Creates in /tmp, NOT in results_dir!
+```
+**Impact**: Disk leak - Gradio UI's temp files never cleaned up.
+**Fix**: Pass `output_dir=get_settings().results_dir / unique_id` from Gradio UI to pipeline.
+---
+## P3 - Low Priority (Documentation/Metadata)
+### P3-001: Root app.py has stale deployment comment ⚠️ CONFIRMED
+**Location**: `app.py:4`
+**Claim**: Says HF Spaces uses `ui.app` but Dockerfile runs `api.main`.
+**Current**:
+```python
+# NOTE: HuggingFace Spaces Docker deployment uses `python -m stroke_deepisles_demo.ui.app`
+```
+**Fix**: Update to reference `api.main:app` via uvicorn.
+### P3-002: pyproject.toml description mentions Gradio ⚠️ CONFIRMED
+**Location**: `pyproject.toml:4`
+**Current**:
+```toml
+description = "Demo: HF datasets + DeepISLES stroke segmentation + Gradio visualization"
+```
+**Fix**: Update to mention React SPA + FastAPI as primary, Gradio as legacy.
+### P3-003: README describes Gradio as visualization layer ⚠️ CONFIRMED
+**Location**: `README.md:37`
+**Current**:
+```markdown
+3. **Visualization**: Interactive 3D and multi-planar viewing with NiiVue in Gradio.
+```
+**Fix**: Update to describe React SPA + FastAPI architecture, note Gradio as legacy option.
+### P3-004: requirements.txt exists alongside uv.lock ⚠️ CONFIRMED
+**Location**: `requirements.txt` + `Dockerfile:31`
+**Validation**: requirements.txt exists (547 bytes) but Dockerfile only uses uv.lock.
+**Fix**: Either remove requirements.txt or add comment clarifying it's for pip-only environments.
+---
+## P4 - Nitpicks (Code Cleanliness)
+### P4-001: pipeline.py dataset_id parameter ignored ⚠️ CONFIRMED
+**Location**: `pipeline.py:60`
+```python
+dataset_id: str | None = None,  # Accepted
+# ...
+_ = dataset_id  # But explicitly ignored (line 84)
+```
+**Fix**: Wire `dataset_id` through to `load_isles_dataset()`.
+### P4-002: pipeline.py max_workers parameter ignored ⚠️ CONFIRMED
+**Location**: `pipeline.py:186`
+```python
+max_workers: int = 1,  # Accepted
+# ...
+_ = max_workers  # Explicitly ignored (line 206)
+```
+**Note**: Docstring correctly says "Currently ignored - reserved for future parallel support."
+This is acceptable tech debt - parameter exists for API stability.
+**Fix**: Leave as-is (documented intentional limitation).
+### P4-003: deepisles.py has unused constants ⚠️ CONFIRMED
+**Location**: `deepisles.py:35-36`
+```python
+EXPECTED_INPUT_FILES = ["dwi.nii.gz", "adc.nii.gz"]
+OPTIONAL_INPUT_FILES = ["flair.nii.gz"]
+```
+These are defined but never used.
+**Fix**: Use in `validate_input_folder()` error messages, or remove.
+### P4-004: Frontend duplicated retry constants ⚠️ CONFIRMED
+**Location**: `useSegmentation.ts:9-11` + `CaseSelector.tsx:5-7`
+Both files define:
+```typescript
+const MAX_COLD_START_RETRIES = 5;
+const INITIAL_RETRY_DELAY = 2000;
+const MAX_RETRY_DELAY = 30000;
+```
+**Fix**: Extract to shared `frontend/src/utils/retry.ts`.
+---
+## Architecture Violations Check ✅ PASSED
+The external audit confirmed NO architecture violations:
+- No API importing/calling Gradio/UI code
+- Clear React SPA / FastAPI backend separation
+- Strong path traversal defenses in file serving
+- Safe job-id handling and cleanup
+---
+## Fix Priority Order
+1. **P0-001**: Docker build crash (release blocker)
+2. **P1-001, P1-002**: Makefile + stale comment
+3. **P2-001 through P2-005**: Wire in dead config settings
+4. **P2-006, P2-007**: Dataset caching
+5. **P2-008 through P2-010**: Security hardening
+6. **P3-***: Documentation updates
+7. **P4-***: Code cleanliness
+---
+## Implementation Notes
+Per user directive: **Wire settings in properly rather than removing dead config.**
+These settings were created for a reason - they should work as documented.

NEXT-CONCERNS.md → docs/bugs/NEXT-CONCERNS.md RENAMED Viewed

File without changes

frontend/src/components/CaseSelector.tsx CHANGED Viewed

@@ -1,10 +1,6 @@
 import { useEffect, useState } from "react";
 import { apiClient, ApiError } from "../api/client";
-// Cold start retry configuration (matches useSegmentation.ts)
-const MAX_COLD_START_RETRIES = 5;
-const INITIAL_RETRY_DELAY = 2000;
-const MAX_RETRY_DELAY = 30000;
 interface CaseSelectorProps {
   selectedCase: string | null;
@@ -54,12 +50,10 @@ export function CaseSelector({
             setRetryCount(attempts);
             setIsWakingUp(true);
-            // Exponential backoff
-            const delay = Math.min(
-              INITIAL_RETRY_DELAY * Math.pow(2, attempts - 1),
-              MAX_RETRY_DELAY,
             );
-            await new Promise((resolve) => setTimeout(resolve, delay));
             continue;
           }

 import { useEffect, useState } from "react";
 import { apiClient, ApiError } from "../api/client";
+import { MAX_COLD_START_RETRIES, getRetryDelay } from "../utils/retry";
 interface CaseSelectorProps {
   selectedCase: string | null;
             setRetryCount(attempts);
             setIsWakingUp(true);
+            // Exponential backoff with capped maximum
+            await new Promise((resolve) =>
+              setTimeout(resolve, getRetryDelay(attempts)),
             );
             continue;
           }

frontend/src/hooks/useSegmentation.ts CHANGED Viewed

@@ -1,21 +1,15 @@
 import { useState, useCallback, useRef, useEffect } from "react";
 import { apiClient, ApiError } from "../api/client";
 import type { SegmentationResult, JobStatus } from "../types";
 // Polling interval in milliseconds
 const POLLING_INTERVAL = 2000;
-// Cold start retry configuration
-const MAX_COLD_START_RETRIES = 5;
-const INITIAL_RETRY_DELAY = 2000; // 2 seconds
-const MAX_RETRY_DELAY = 30000; // 30 seconds
-/**
- * Sleep utility for async delays
- */
-const sleep = (ms: number): Promise<void> =>
-  new Promise((resolve) => setTimeout(resolve, ms));
 /**
  * Hook for running segmentation with async job polling.
  *
@@ -204,12 +198,8 @@ export function useSegmentation() {
             );
             setProgress(0);
-            // Exponential backoff: 2s, 4s, 8s, 16s, 30s (capped)
-            const delay = Math.min(
-              INITIAL_RETRY_DELAY * Math.pow(2, retryCount - 1),
-              MAX_RETRY_DELAY,
-            );
-            await sleep(delay);
             // Continue to next iteration of retry loop
             continue;

 import { useState, useCallback, useRef, useEffect } from "react";
 import { apiClient, ApiError } from "../api/client";
 import type { SegmentationResult, JobStatus } from "../types";
+import {
+  MAX_COLD_START_RETRIES,
+  getRetryDelay,
+  sleep,
+} from "../utils/retry";
 // Polling interval in milliseconds
 const POLLING_INTERVAL = 2000;
 /**
  * Hook for running segmentation with async job polling.
  *
             );
             setProgress(0);
+            // Exponential backoff with capped maximum
+            await sleep(getRetryDelay(retryCount));
             // Continue to next iteration of retry loop
             continue;

frontend/src/utils/retry.ts ADDED Viewed

	@@ -0,0 +1,30 @@

+/**
+ * Shared retry configuration for cold-start handling.
+ *
+ * HuggingFace Spaces containers can take 30-60 seconds to wake from sleep.
+ * This module provides shared constants and utilities for exponential backoff retry.
+ */
+// Cold start retry configuration
+export const MAX_COLD_START_RETRIES = 5;
+export const INITIAL_RETRY_DELAY = 2000; // 2 seconds
+export const MAX_RETRY_DELAY = 30000; // 30 seconds
+/**
+ * Calculate exponential backoff delay with capped maximum.
+ *
+ * @param attempt - Current retry attempt (1-indexed)
+ * @returns Delay in milliseconds
+ */
+export function getRetryDelay(attempt: number): number {
+  return Math.min(INITIAL_RETRY_DELAY * Math.pow(2, attempt - 1), MAX_RETRY_DELAY);
+}
+/**
+ * Sleep utility for async delays.
+ *
+ * @param ms - Milliseconds to sleep
+ */
+export function sleep(ms: number): Promise<void> {
+  return new Promise((resolve) => setTimeout(resolve, ms));
+}

pyproject.toml CHANGED Viewed

@@ -1,7 +1,7 @@
 [project]
 name = "stroke-deepisles-demo"
 version = "0.1.0"
-description = "Demo: HF datasets + DeepISLES stroke segmentation + Gradio visualization"
 readme = "README.md"
 license = { text = "Apache-2.0" }
 requires-python = ">=3.11"

 [project]
 name = "stroke-deepisles-demo"
 version = "0.1.0"
+description = "Demo: HF datasets + DeepISLES stroke segmentation + React SPA + FastAPI backend"
 readme = "README.md"
 license = { text = "Apache-2.0" }
 requires-python = ">=3.11"

requirements.txt CHANGED Viewed

@@ -1,6 +1,7 @@
-# requirements.txt for Hugging Face Spaces Docker deployment
 # Generated: December 2025
-# See: docs/specs/07-hf-spaces-deployment.md
 # Core - BIDS + NIfTI lazy loading (maintained fork)
 neuroimaging-go-brrrr @ git+https://github.com/The-Obstacle-Is-The-Way/neuroimaging-go-brrrr.git@v0.2.1

+# requirements.txt - Fallback for pip-only environments
+# NOTE: Primary dependency management uses uv.lock (see Dockerfile)
+#       This file is for environments without uv (e.g., some CI systems)
 # Generated: December 2025
 # Core - BIDS + NIfTI lazy loading (maintained fork)
 neuroimaging-go-brrrr @ git+https://github.com/The-Obstacle-Is-The-Way/neuroimaging-go-brrrr.git@v0.2.1

src/stroke_deepisles_demo/api/files.py CHANGED Viewed

@@ -23,6 +23,10 @@ from stroke_deepisles_demo.core.logging import get_logger
 logger = get_logger(__name__)
 files_router = APIRouter(prefix="/files", tags=["files"])
@@ -44,6 +48,15 @@ async def get_result_file(job_id: str, case_id: str, filename: str) -> FileRespo
     Raises:
         404: File not found (job expired, invalid path, or doesn't exist)
     """
     # Construct file path
     results_dir = get_settings().results_dir
     file_path = results_dir / job_id / case_id / filename

 logger = get_logger(__name__)
+# Allowed file extensions (defense-in-depth)
+# Only serve NIfTI files to prevent accidental exposure of logs/metadata
+_ALLOWED_EXTENSIONS = {".nii", ".nii.gz"}
 files_router = APIRouter(prefix="/files", tags=["files"])
     Raises:
         404: File not found (job expired, invalid path, or doesn't exist)
     """
+    # Security: Validate file extension (defense-in-depth)
+    # Only serve NIfTI files to prevent accidental exposure of logs/metadata
+    if not any(filename.endswith(ext) for ext in _ALLOWED_EXTENSIONS):
+        logger.warning("Blocked request for non-NIfTI file: %s", filename)
+        raise HTTPException(
+            status_code=404,
+            detail="Only NIfTI files (.nii, .nii.gz) can be served.",
+        )
     # Construct file path
     results_dir = get_settings().results_dir
     file_path = results_dir / job_id / case_id / filename

src/stroke_deepisles_demo/api/job_store.py CHANGED Viewed

@@ -195,6 +195,57 @@ class JobStore:
         logger.info("Created job %s", job_id)
         return job
     def get_job(self, job_id: str) -> Job | None:
         """Get a job by ID.

         logger.info("Created job %s", job_id)
         return job
+    def create_job_if_under_limit(
+        self,
+        job_id: str,
+        case_id: str,
+        fast_mode: bool,
+        max_active: int,
+    ) -> Job | None:
+        """Atomically create a job if under concurrency limit.
+        This prevents TOCTOU race conditions where check-then-create
+        could exceed the limit under concurrent requests.
+        Args:
+            job_id: Unique identifier for the job
+            case_id: Case to process
+            fast_mode: Whether to use fast inference
+            max_active: Maximum allowed active (pending/running) jobs
+        Returns:
+            The created Job if under limit, None if limit reached
+        Raises:
+            ValueError: If job_id is invalid (contains unsafe characters)
+            KeyError: If job_id already exists
+        """
+        if not self._is_safe_job_id(job_id):
+            raise ValueError(f"Invalid job_id: {job_id!r}")
+        job = Job(
+            id=job_id,
+            status=JobStatus.PENDING,
+            case_id=case_id,
+            fast_mode=fast_mode,
+            created_at=datetime.now(),
+        )
+        with self._lock:
+            # Check limit atomically with creation
+            active_count = sum(
+                1 for j in self._jobs.values() if j.status in (JobStatus.PENDING, JobStatus.RUNNING)
+            )
+            if active_count >= max_active:
+                return None
+            if job_id in self._jobs:
+                raise KeyError(f"Job already exists: {job_id}")
+            self._jobs[job_id] = job
+        logger.info("Created job %s", job_id)
+        return job
     def get_job(self, job_id: str) -> Job | None:
         """Get a job by ID.

src/stroke_deepisles_demo/api/routes.py CHANGED Viewed

@@ -89,13 +89,8 @@ def create_segment_job(
     - Returning immediately avoids timeout errors
     """
     try:
-        # Concurrency limit to prevent GPU memory exhaustion (BUG-006 fix)
         store = get_job_store()
-        if store.get_active_job_count() >= get_settings().max_concurrent_jobs:
-            raise HTTPException(
-                status_code=503,
-                detail="Server busy: too many active jobs. Please try again later.",
-            )
         # Validate case_id exists before creating job
         valid_cases = list_case_ids()
@@ -109,8 +104,15 @@ def create_segment_job(
         job_id = uuid.uuid4().hex
         backend_url = get_backend_base_url(request)
-        # Create job record
-        store.create_job(job_id, body.case_id, body.fast_mode)
         # Queue background task
         background_tasks.add_task(
@@ -229,10 +231,12 @@ def run_segmentation_job(
         # Run the pipeline
         store.update_progress(job_id, 30, "Running DeepISLES inference...")
         result = run_pipeline_on_case(
             case_id,
             output_dir=output_dir,
             fast=fast_mode,
             compute_dice=True,
             cleanup_staging=True,
         )

     - Returning immediately avoids timeout errors
     """
     try:
         store = get_job_store()
+        settings = get_settings()
         # Validate case_id exists before creating job
         valid_cases = list_case_ids()
         job_id = uuid.uuid4().hex
         backend_url = get_backend_base_url(request)
+        # Atomic concurrency limit + job creation (prevents TOCTOU race)
+        job = store.create_job_if_under_limit(
+            job_id, body.case_id, body.fast_mode, settings.max_concurrent_jobs
+        )
+        if job is None:
+            raise HTTPException(
+                status_code=503,
+                detail="Server busy: too many active jobs. Please try again later.",
+            )
         # Queue background task
         background_tasks.add_task(
         # Run the pipeline
         store.update_progress(job_id, 30, "Running DeepISLES inference...")
+        # Note: gpu and timeout default to Settings values via pipeline
         result = run_pipeline_on_case(
             case_id,
             output_dir=output_dir,
             fast=fast_mode,
+            # gpu, timeout use Settings defaults
             compute_dice=True,
             cleanup_staging=True,
         )

src/stroke_deepisles_demo/data/loader.py CHANGED Viewed

@@ -154,23 +154,22 @@ class HuggingFaceDatasetWrapper:
         self._temp_dir = None
-# Default HuggingFace dataset ID
-DEFAULT_HF_DATASET = "hugging-science/isles24-stroke"
 def load_isles_dataset(
     source: str | Path | None = None,
     *,
     local_mode: bool | None = None,
 ) -> Dataset:
     """
     Load ISLES24 dataset from local directory or HuggingFace Hub.
     Args:
         source: Local directory path or HuggingFace dataset ID.
-                If None, uses HuggingFace dataset by default.
         local_mode: If True, treat source as local directory.
                     If None, auto-detect based on source type.
     Returns:
         Dataset-like object providing case access. Use as context manager
@@ -184,8 +183,8 @@ def load_isles_dataset(
         # Load from local directory
         ds = load_isles_dataset("data/isles24", local_mode=True)
-        # Load specific HuggingFace dataset
-        ds = load_isles_dataset("hugging-science/isles24-stroke")
     """
     # Auto-detect mode if not specified
     if local_mode is None:
@@ -210,12 +209,19 @@ def load_isles_dataset(
     # HuggingFace mode
     from datasets import load_dataset
-    dataset_id = str(source) if source else DEFAULT_HF_DATASET
     # Load dataset, selecting only necessary columns to minimize decoding overhead
     # We rely on neuroimaging-go-brrrr's Nifti feature for lazy loading if configured,
     # but select_columns ensures we don't touch other modalities.
-    ds = load_dataset(dataset_id, split="train")
     ds = ds.select_columns(["subject_id", "dwi", "adc", "lesion_mask"])
     return HuggingFaceDatasetWrapper(ds, dataset_id)

         self._temp_dir = None
 def load_isles_dataset(
     source: str | Path | None = None,
     *,
     local_mode: bool | None = None,
+    token: str | None = None,
 ) -> Dataset:
     """
     Load ISLES24 dataset from local directory or HuggingFace Hub.
     Args:
         source: Local directory path or HuggingFace dataset ID.
+                If None, uses Settings.hf_dataset_id from config.
         local_mode: If True, treat source as local directory.
                     If None, auto-detect based on source type.
+        token: HuggingFace token for private/gated datasets.
+               If None, uses Settings.hf_token from config.
     Returns:
         Dataset-like object providing case access. Use as context manager
         # Load from local directory
         ds = load_isles_dataset("data/isles24", local_mode=True)
+        # Load specific HuggingFace dataset with token
+        ds = load_isles_dataset("org/private-dataset", token="hf_xxx")
     """
     # Auto-detect mode if not specified
     if local_mode is None:
     # HuggingFace mode
     from datasets import load_dataset
+    from stroke_deepisles_demo.core.config import get_settings
+    settings = get_settings()
+    # Use settings defaults if not specified
+    dataset_id = str(source) if source else settings.hf_dataset_id
+    hf_token = token if token is not None else settings.hf_token
     # Load dataset, selecting only necessary columns to minimize decoding overhead
     # We rely on neuroimaging-go-brrrr's Nifti feature for lazy loading if configured,
     # but select_columns ensures we don't touch other modalities.
+    # Token enables access to private/gated datasets
+    ds = load_dataset(dataset_id, split="train", token=hf_token)
     ds = ds.select_columns(["subject_id", "dwi", "adc", "lesion_mask"])
     return HuggingFaceDatasetWrapper(ds, dataset_id)

src/stroke_deepisles_demo/inference/__init__.py CHANGED Viewed

@@ -1,7 +1,6 @@
 """Inference module for stroke-deepisles-demo."""
 from stroke_deepisles_demo.inference.deepisles import (
-    DEEPISLES_IMAGE,
     DeepISLESResult,
     run_deepisles_on_folder,
     validate_input_folder,
@@ -19,7 +18,7 @@ from stroke_deepisles_demo.inference.docker import (
 )
 __all__ = [
-    "DEEPISLES_IMAGE",
     "DeepISLESResult",
     "DirectInvocationResult",
     "DockerRunResult",

 """Inference module for stroke-deepisles-demo."""
 from stroke_deepisles_demo.inference.deepisles import (
     DeepISLESResult,
     run_deepisles_on_folder,
     validate_input_folder,
 )
 __all__ = [
+    # Note: Docker image is now configurable via Settings.deepisles_docker_image
     "DeepISLESResult",
     "DirectInvocationResult",
     "DockerRunResult",

src/stroke_deepisles_demo/inference/deepisles.py CHANGED Viewed

@@ -30,10 +30,13 @@ if TYPE_CHECKING:
 logger = get_logger(__name__)
-# Constants
-DEEPISLES_IMAGE = "isleschallenge/deepisles"
-EXPECTED_INPUT_FILES = ["dwi.nii.gz", "adc.nii.gz"]
-OPTIONAL_INPUT_FILES = ["flair.nii.gz"]
 @dataclass(frozen=True)
@@ -58,15 +61,22 @@ def validate_input_folder(input_dir: Path) -> tuple[Path, Path, Path | None]:
     Raises:
         MissingInputError: If required files are missing
     """
-    dwi_path = input_dir / "dwi.nii.gz"
-    adc_path = input_dir / "adc.nii.gz"
-    flair_path = input_dir / "flair.nii.gz"
     if not dwi_path.exists():
-        raise MissingInputError(f"Required file 'dwi.nii.gz' not found in {input_dir}")
     if not adc_path.exists():
-        raise MissingInputError(f"Required file 'adc.nii.gz' not found in {input_dir}")
     return dwi_path, adc_path, flair_path if flair_path.exists() else None
@@ -135,9 +145,14 @@ def _run_via_docker(
     Run DeepISLES via Docker container.
     This is the standard execution path for local development.
     """
     start_time = time.time()
     # Check GPU if requested
     if gpu:
         ensure_gpu_available_if_requested(gpu)
@@ -163,11 +178,17 @@ def _run_via_docker(
         output_dir.resolve(): "/app/output",
     }
-    logger.info("Running DeepISLES via Docker: input=%s, fast=%s, gpu=%s", input_dir, fast, gpu)
     # Run the container
     docker_result = run_container(
-        DEEPISLES_IMAGE,
         command=command,
         volumes=volumes,
         gpu=gpu,

 logger = get_logger(__name__)
+# Expected input files for validation (named constants for explicit access)
+DWI_FILENAME = "dwi.nii.gz"
+ADC_FILENAME = "adc.nii.gz"
+FLAIR_FILENAME = "flair.nii.gz"
+# Lists preserved for consumers; internal code uses named constants
+EXPECTED_INPUT_FILES = [DWI_FILENAME, ADC_FILENAME]
+OPTIONAL_INPUT_FILES = [FLAIR_FILENAME]
 @dataclass(frozen=True)
     Raises:
         MissingInputError: If required files are missing
     """
+    # Build paths using named constants (explicit, not order-dependent)
+    dwi_path = input_dir / DWI_FILENAME
+    adc_path = input_dir / ADC_FILENAME
+    flair_path = input_dir / FLAIR_FILENAME
     if not dwi_path.exists():
+        raise MissingInputError(
+            f"Required file '{DWI_FILENAME}' not found in {input_dir}. "
+            f"Expected: {EXPECTED_INPUT_FILES}"
+        )
     if not adc_path.exists():
+        raise MissingInputError(
+            f"Required file '{ADC_FILENAME}' not found in {input_dir}. "
+            f"Expected: {EXPECTED_INPUT_FILES}"
+        )
     return dwi_path, adc_path, flair_path if flair_path.exists() else None
     Run DeepISLES via Docker container.
     This is the standard execution path for local development.
+    Uses Settings.deepisles_docker_image for the container image.
     """
     start_time = time.time()
+    # Get docker image from settings (allows override via env var)
+    settings = get_settings()
+    docker_image = settings.deepisles_docker_image
     # Check GPU if requested
     if gpu:
         ensure_gpu_available_if_requested(gpu)
         output_dir.resolve(): "/app/output",
     }
+    logger.info(
+        "Running DeepISLES via Docker: image=%s, input=%s, fast=%s, gpu=%s",
+        docker_image,
+        input_dir,
+        fast,
+        gpu,
+    )
     # Run the container
     docker_result = run_container(
+        docker_image,
         command=command,
         volumes=volumes,
         gpu=gpu,

src/stroke_deepisles_demo/pipeline.py CHANGED Viewed

@@ -59,8 +59,9 @@ def run_pipeline_on_case(
     *,
     dataset_id: str | None = None,
     output_dir: Path | None = None,
-    fast: bool = True,
-    gpu: bool = True,
     compute_dice: bool = True,
     cleanup_staging: bool = True,
 ) -> PipelineResult:
@@ -69,25 +70,35 @@ def run_pipeline_on_case(
     Args:
         case_id: Case identifier (string) or index (int)
-        dataset_id: HF dataset ID (default from settings - currently ignored/local)
         output_dir: Directory for results (default: temp dir)
-        fast: Use SEALS-only mode (ISLES'22 winner, DWI+ADC only, no FLAIR needed)
-        gpu: Use GPU acceleration
         compute_dice: Compute Dice score if ground truth available
         cleanup_staging: Remove staging directory after inference
     Returns:
         PipelineResult with all paths and optional metrics
     """
-    # Note: dataset_id is currently unused as we default to local loading.
-    # It's kept for interface compatibility with future cloud mode.
-    _ = dataset_id
     start_time = time.time()
     # Use context manager to ensure HuggingFace temp files are cleaned up
     # This prevents unbounded disk usage from accumulating temp NIfTI files
-    with load_isles_dataset() as dataset:
         # Resolve ID if integer
         if isinstance(case_id, int):
             all_ids = dataset.list_case_ids()
@@ -150,6 +161,7 @@ def run_pipeline_on_case(
         output_dir=results_dir,
         fast=fast,
         gpu=gpu,
     )
     # 4. Compute Metrics (using copied ground truth)

     *,
     dataset_id: str | None = None,
     output_dir: Path | None = None,
+    fast: bool | None = None,
+    gpu: bool | None = None,
+    timeout: float | None = None,
     compute_dice: bool = True,
     cleanup_staging: bool = True,
 ) -> PipelineResult:
     Args:
         case_id: Case identifier (string) or index (int)
+        dataset_id: HF dataset ID (default from Settings.hf_dataset_id)
         output_dir: Directory for results (default: temp dir)
+        fast: Use SEALS-only mode (default from Settings.deepisles_fast_mode)
+        gpu: Use GPU acceleration (default from Settings.deepisles_use_gpu)
+        timeout: Maximum inference time in seconds (default from Settings.deepisles_timeout_seconds)
         compute_dice: Compute Dice score if ground truth available
         cleanup_staging: Remove staging directory after inference
     Returns:
         PipelineResult with all paths and optional metrics
     """
+    from stroke_deepisles_demo.core.config import get_settings
+    settings = get_settings()
+    # Apply settings defaults if not specified
+    if fast is None:
+        fast = settings.deepisles_fast_mode
+    if gpu is None:
+        gpu = settings.deepisles_use_gpu
+    if timeout is None:
+        timeout = settings.deepisles_timeout_seconds
     start_time = time.time()
     # Use context manager to ensure HuggingFace temp files are cleaned up
     # This prevents unbounded disk usage from accumulating temp NIfTI files
+    # dataset_id is wired through to loader (defaults to Settings.hf_dataset_id)
+    with load_isles_dataset(dataset_id) as dataset:
         # Resolve ID if integer
         if isinstance(case_id, int):
             all_ids = dataset.list_case_ids()
         output_dir=results_dir,
         fast=fast,
         gpu=gpu,
+        timeout=timeout,
     )
     # 4. Compute Metrics (using copied ground truth)