Spaces:

rikhoffbauer2
/

drum-sample-extractor

Sleeping

App Files Files Community

ChatGPT commited on 8 days ago

Commit

3703c4e

1 Parent(s): b8fa9bf

feat: add hit review and streaming progress

Browse files

Files changed (17) hide show

README.md +9 -3
app.py +50 -3
docs/API.md +23 -4
docs/FEATURES.md +10 -6
docs/HIT_REVIEW_AND_STREAMING.md +85 -0
docs/PIPELINE_TIMING_AND_REALTIME.md +10 -11
docs/PROGRESS.md +26 -6
docs/REMAINING_WORK.md +11 -11
docs/TASKS.md +9 -3
docs/UI_REPLACEMENT.md +13 -1
docs/benchmark-online-preview.json +78 -78
docs/benchmark-subprocesses.json +80 -80
pipeline_runner.py +39 -1
scripts/test_sse_and_review_hits.py +70 -0
web/app.js +142 -23
web/index.html +40 -9
web/styles.css +13 -0

README.md CHANGED Viewed

@@ -29,13 +29,16 @@ Implemented in the current development pass:
   - `online_preview`: prototype-based incremental assignment intended for near-realtime preview.
 - Disk cache for decoded full-mix/stem outputs keyed by source digest and extraction settings.
 - Run history panel indexing `.runs/*/output/manifest.json`.
-- Documentation for features, progress, tasks, API, timing, realtime suitability, UI, and remaining work.
 - Legacy Gradio apps preserved in `legacy/` for reference only.
 Not fully complete yet:
 - No interactive waveform editing of onsets/clusters.
-- No server-sent event stream or websocket progress channel.
 - No frontend TypeScript build/test harness.
 - Demucs remains offline/batch by design.
@@ -44,6 +47,7 @@ See:
 - `docs/FEATURES.md`
 - `docs/TASKS.md`
 - `docs/PROGRESS.md`
 - `docs/REMAINING_WORK.md`
 ## Run locally
@@ -68,6 +72,7 @@ That bypasses Demucs and uses the near-realtime clustering path.
 ```bash
 python3 scripts/benchmark_subprocesses.py --runs 2 --bars 4 --output docs/benchmark-subprocesses.json
 ```
 The benchmark uses synthetic drum fixtures and `stem=all` so the DSP stages are measured without Demucs model download/runtime noise.
@@ -101,7 +106,7 @@ curl http://127.0.0.1:7860/api/jobs
 | `app.py` | FastAPI app, static UI serving, job API, run history, artifact downloads |
 | `pipeline_runner.py` | Timed extraction pipeline, disk stem/source cache, batch/online clustering routing |
 | `sample_extractor.py` | Core DSP/sample extraction implementation |
-| `web/` | Custom no-build browser frontend |
 | `scripts/benchmark_subprocesses.py` | Synthetic benchmark runner for stage timings |
 | `docs/` | Review, timing, API, UI, feature, task, progress, and remaining-work documentation |
 | `legacy/` | Previous Gradio apps retained for reference |
@@ -115,6 +120,7 @@ Each run is stored under `.runs/<job-id>/output/`:
 - `reconstruction.mid`
 - `sample-pack.zip`
 - `samples/*.wav`
 - `manifest.json`
 Generated runtime directories are ignored by git:

   - `online_preview`: prototype-based incremental assignment intended for near-realtime preview.
 - Disk cache for decoded full-mix/stem outputs keyed by source digest and extraction settings.
 - Run history panel indexing `.runs/*/output/manifest.json`.
+- Individual review WAVs for every detected hit under `review/hits/`.
+- Click-to-audition workflow for waveform onsets, detected hit rows, and representative sample rows.
+- Server-sent-events progress endpoint with frontend `EventSource` support and polling fallback.
+- Documentation for features, progress, tasks, API, timing, hit review, realtime suitability, UI, and remaining work.
 - Legacy Gradio apps preserved in `legacy/` for reference only.
 Not fully complete yet:
 - No interactive waveform editing of onsets/clusters.
+- No interactive onset/cluster editing yet.
 - No frontend TypeScript build/test harness.
 - Demucs remains offline/batch by design.
 - `docs/FEATURES.md`
 - `docs/TASKS.md`
 - `docs/PROGRESS.md`
+- `docs/HIT_REVIEW_AND_STREAMING.md`
 - `docs/REMAINING_WORK.md`
 ## Run locally
 ```bash
 python3 scripts/benchmark_subprocesses.py --runs 2 --bars 4 --output docs/benchmark-subprocesses.json
+python3 scripts/test_sse_and_review_hits.py
 ```
 The benchmark uses synthetic drum fixtures and `stem=all` so the DSP stages are measured without Demucs model download/runtime noise.
 | `app.py` | FastAPI app, static UI serving, job API, run history, artifact downloads |
 | `pipeline_runner.py` | Timed extraction pipeline, disk stem/source cache, batch/online clustering routing |
 | `sample_extractor.py` | Core DSP/sample extraction implementation |
+| `web/` | Custom no-build browser frontend with waveform, hit review, and sample audition |
 | `scripts/benchmark_subprocesses.py` | Synthetic benchmark runner for stage timings |
 | `docs/` | Review, timing, API, UI, feature, task, progress, and remaining-work documentation |
 | `legacy/` | Previous Gradio apps retained for reference |
 - `reconstruction.mid`
 - `sample-pack.zip`
 - `samples/*.wav`
+- `review/hits/*.wav`
 - `manifest.json`
 Generated runtime directories are ignored by git:

app.py CHANGED Viewed

@@ -7,6 +7,7 @@ Run with:
 from __future__ import annotations
 import json
 import shutil
 import time
@@ -20,7 +21,7 @@ from typing import Any
 from fastapi import FastAPI, File, Form, HTTPException, UploadFile
 from fastapi.middleware.cors import CORSMiddleware
-from fastapi.responses import FileResponse, JSONResponse
 from fastapi.staticfiles import StaticFiles
 from pipeline_runner import PipelineParams, clear_disk_cache, initial_stages, run_extraction_pipeline
@@ -31,7 +32,7 @@ WEB_DIR = ROOT / "web"
 RUNS_DIR = ROOT / ".runs"
 RUNS_DIR.mkdir(exist_ok=True)
-app = FastAPI(title="Drum Sample Extractor", version="11.0.0")
 app.add_middleware(
     CORSMiddleware,
     allow_origins=["*"],
@@ -58,6 +59,10 @@ def _serialise_job(job: dict[str, Any]) -> dict[str, Any]:
             {**sample, "url": _job_url(job["id"], sample["file"])}
             for sample in result.get("samples", [])
         ]
         payload["result"] = result
     return payload
@@ -243,11 +248,53 @@ def get_job(job_id: str) -> dict[str, Any]:
     raise HTTPException(status_code=404, detail="Job not found")
 @app.get("/api/jobs/{job_id}/files/{relative_path:path}")
 def get_job_file(job_id: str, relative_path: str) -> FileResponse:
     root = (RUNS_DIR / job_id / "output").resolve()
     path = (root / relative_path).resolve()
-    if not str(path).startswith(str(root)) or not path.exists() or not path.is_file():
         raise HTTPException(status_code=404, detail="File not found")
     return FileResponse(path)

 from __future__ import annotations
+import asyncio
 import json
 import shutil
 import time
 from fastapi import FastAPI, File, Form, HTTPException, UploadFile
 from fastapi.middleware.cors import CORSMiddleware
+from fastapi.responses import FileResponse, JSONResponse, StreamingResponse
 from fastapi.staticfiles import StaticFiles
 from pipeline_runner import PipelineParams, clear_disk_cache, initial_stages, run_extraction_pipeline
 RUNS_DIR = ROOT / ".runs"
 RUNS_DIR.mkdir(exist_ok=True)
+app = FastAPI(title="Drum Sample Extractor", version="11.1.0")
 app.add_middleware(
     CORSMiddleware,
     allow_origins=["*"],
             {**sample, "url": _job_url(job["id"], sample["file"])}
             for sample in result.get("samples", [])
         ]
+        result["hits"] = [
+            {**hit, "url": _job_url(job["id"], hit["file"])}
+            for hit in result.get("hits", [])
+        ]
         payload["result"] = result
     return payload
     raise HTTPException(status_code=404, detail="Job not found")
+@app.get("/api/jobs/{job_id}/events")
+def get_job_events(job_id: str) -> StreamingResponse:
+    with jobs_lock:
+        exists_in_memory = job_id in jobs
+    exists_on_disk = _read_manifest_job(job_id) is not None
+    if not exists_in_memory and not exists_on_disk:
+        raise HTTPException(status_code=404, detail="Job not found")
+    async def event_stream():
+        last_payload: str | None = None
+        while True:
+            with jobs_lock:
+                memory_job = jobs.get(job_id)
+                job = dict(memory_job) if memory_job else None
+            if job is None:
+                job = _read_manifest_job(job_id)
+            if job is None:
+                payload = {"id": job_id, "status": "error", "error": "Job disappeared"}
+            else:
+                payload = _serialise_job(job)
+            encoded = json.dumps(payload, sort_keys=True)
+            if encoded != last_payload:
+                yield f"event: job\ndata: {encoded}\n\n"
+                last_payload = encoded
+            if payload.get("status") in {"complete", "error"}:
+                break
+            await asyncio.sleep(0.5)
+    return StreamingResponse(
+        event_stream(),
+        media_type="text/event-stream",
+        headers={
+            "Cache-Control": "no-cache",
+            "X-Accel-Buffering": "no",
+        },
+    )
 @app.get("/api/jobs/{job_id}/files/{relative_path:path}")
 def get_job_file(job_id: str, relative_path: str) -> FileResponse:
     root = (RUNS_DIR / job_id / "output").resolve()
     path = (root / relative_path).resolve()
+    try:
+        path.relative_to(root)
+    except ValueError as exc:
+        raise HTTPException(status_code=404, detail="File not found") from exc
+    if not path.exists() or not path.is_file():
         raise HTTPException(status_code=404, detail="File not found")
     return FileResponse(path)

docs/API.md CHANGED Viewed

@@ -131,10 +131,28 @@ Completed jobs contain:
 | `hit_count` | Number of accepted onsets/hits. |
 | `cluster_count` | Number of sample clusters. |
 | `stages` | Per-stage timing/status/detail list. |
-| `samples` | Sample rows with score, duration, first onset, and download URL. |
-| `overview` | Decimated envelope and onset markers for waveform display. |
 | `files` | Relative artifact paths. |
-| `file_urls` | Direct API URLs for artifacts. |
 ## `GET /api/jobs/{job_id}/files/{relative_path}`
@@ -146,9 +164,10 @@ Examples:
 curl -O http://127.0.0.1:7860/api/jobs/58ca0db4ac74/files/sample-pack.zip
 curl -O http://127.0.0.1:7860/api/jobs/58ca0db4ac74/files/reconstruction.mid
 curl -O http://127.0.0.1:7860/api/jobs/58ca0db4ac74/files/samples/hihat_open_0.wav
 ```
-The endpoint prevents path traversal by resolving downloads under `.runs/<job-id>/output/`.
 ## `POST /api/cache/clear`

 | `hit_count` | Number of accepted onsets/hits. |
 | `cluster_count` | Number of sample clusters. |
 | `stages` | Per-stage timing/status/detail list. |
+| `samples` | Representative sample rows with score, duration, first onset, and playback/download URL. |
+| `hits` | Per-detected-hit review rows with onset, duration, label, cluster, representative flag, and playback/download URL. |
+| `overview` | Decimated envelope and clickable onset markers for waveform display. |
 | `files` | Relative artifact paths. |
+| `file_urls` | Direct API URLs for top-level artifacts. |
+## `GET /api/jobs/{job_id}/events`
+Streams job snapshots as server-sent events. This is the preferred progress channel for the frontend; polling remains supported via `GET /api/jobs/{job_id}`.
+```bash
+curl -N http://127.0.0.1:7860/api/jobs/58ca0db4ac74/events
+```
+Event shape:
+```text
+event: job
+data: {"id":"58ca0db4ac74","status":"running","stages":[...]}
+```
+The stream closes after `complete` or `error`. Completed historical jobs emit one final `job` event and close.
 ## `GET /api/jobs/{job_id}/files/{relative_path}`
 curl -O http://127.0.0.1:7860/api/jobs/58ca0db4ac74/files/sample-pack.zip
 curl -O http://127.0.0.1:7860/api/jobs/58ca0db4ac74/files/reconstruction.mid
 curl -O http://127.0.0.1:7860/api/jobs/58ca0db4ac74/files/samples/hihat_open_0.wav
+curl -O http://127.0.0.1:7860/api/jobs/58ca0db4ac74/files/review/hits/hit_00000_kick.wav
 ```
+The endpoint prevents path traversal by resolving downloads under `.runs/<job-id>/output/` and requiring the final path to remain relative to that output root.
 ## `POST /api/cache/clear`

docs/FEATURES.md CHANGED Viewed

@@ -14,12 +14,14 @@ Turn an input audio file into a practical drum sample pack: detected hits, group
 | UI | Drag/drop audio upload | Implemented | Uses multipart upload to `POST /api/jobs`. |
 | UI | Source preview | Implemented | Browser `<audio>` preview before extraction. |
 | UI | Pipeline controls | Implemented | Stem/model/onset/clustering/MIDI/synthesis/cache controls. |
-| UI | Live-ish progress | Implemented | Polls stage state and logs every 800 ms. |
-| UI | Waveform/onset overview | Implemented | Canvas envelope plus onset markers from `manifest.json`. |
-| UI | Result downloads | Implemented | ZIP, MIDI, stem WAV, reconstruction WAV, individual sample WAVs. |
 | UI | Run history browser | Implemented | Lists completed `.runs/*/output/manifest.json` entries and reloads results. |
 | API | Health/config | Implemented | `GET /api/health`, `GET /api/config`. |
-| API | Job creation/polling | Implemented | `POST /api/jobs`, `GET /api/jobs/{id}`. |
 | API | Run listing | Implemented | `GET /api/jobs` returns active and completed runs. |
 | API | Safe artifact serving | Implemented | Path traversal is blocked by resolved output-root checks. |
 | API | Cache clear | Implemented | Clears in-memory DSP cache and disk stem/source cache. |
@@ -34,21 +36,23 @@ Turn an input audio file into a practical drum sample pack: detected hits, group
 | Pipeline | Optional synthesis | Implemented | Weighted aligned average for multi-hit clusters. |
 | Pipeline | MIDI export | Implemented | Quantized or unquantized reconstruction MIDI. |
 | Pipeline | Reconstruction render | Implemented | Renders MIDI-like reconstruction using selected samples. |
 | Pipeline | Sample pack ZIP | Implemented | Includes WAVs, index JSON, MIDI, rendered reconstruction. |
 | Docs | Project review | Implemented | `docs/PROJECT_REVIEW.md`. |
 | Docs | Timing/realtime analysis | Implemented | `docs/PIPELINE_TIMING_AND_REALTIME.md`. |
 | Docs | API docs | Implemented | `docs/API.md`. |
 | Docs | UI replacement docs | Implemented | `docs/UI_REPLACEMENT.md`. |
 | Docs | Feature/task/progress tracking | Implemented | This file, `TASKS.md`, `PROGRESS.md`. |
 ## Partially implemented features
 | Area | Feature | Current state | Needed to call it complete |
 |---|---|---|---|
-| Progress | Stage progress | Shows stage boundaries and logs | Add lower-level progress inside Demucs and clustering. |
 | Realtime | Online clustering | Implemented as batch-invoked prototype assignment | Add streaming/incremental audio analysis API for true realtime preview. |
 | Run history | Manifest browser | Lists and reloads completed runs | Add side-by-side comparison and filtering/search. |
-| Editing | Review workflow | Displays waveform and samples | Add click-to-audition hits, onset editing, cluster merge/split, label reassignment. |
 | Frontend quality | No-build JavaScript UI | Good enough for local app | Convert to TypeScript once interaction model stabilizes. |
 ## Explicit non-goals for this pass

 | UI | Drag/drop audio upload | Implemented | Uses multipart upload to `POST /api/jobs`. |
 | UI | Source preview | Implemented | Browser `<audio>` preview before extraction. |
 | UI | Pipeline controls | Implemented | Stem/model/onset/clustering/MIDI/synthesis/cache controls. |
+| UI | Streaming progress | Implemented | Uses `EventSource` over `GET /api/jobs/{id}/events`, with polling fallback. |
+| UI | Waveform/onset overview | Implemented | Canvas envelope plus clickable onset markers from `manifest.json`. |
+| UI | Result downloads | Implemented | ZIP, MIDI, stem WAV, reconstruction WAV, individual sample WAVs, and per-hit review WAVs. |
 | UI | Run history browser | Implemented | Lists completed `.runs/*/output/manifest.json` entries and reloads results. |
+| UI | Hit and sample audition | Implemented | Dedicated players for selected hit slices and representative sample WAVs. |
 | API | Health/config | Implemented | `GET /api/health`, `GET /api/config`. |
+| API | Job creation/status | Implemented | `POST /api/jobs`, `GET /api/jobs/{id}`. |
+| API | SSE job events | Implemented | `GET /api/jobs/{id}/events` streams job snapshots until complete/error. |
 | API | Run listing | Implemented | `GET /api/jobs` returns active and completed runs. |
 | API | Safe artifact serving | Implemented | Path traversal is blocked by resolved output-root checks. |
 | API | Cache clear | Implemented | Clears in-memory DSP cache and disk stem/source cache. |
 | Pipeline | Optional synthesis | Implemented | Weighted aligned average for multi-hit clusters. |
 | Pipeline | MIDI export | Implemented | Quantized or unquantized reconstruction MIDI. |
 | Pipeline | Reconstruction render | Implemented | Renders MIDI-like reconstruction using selected samples. |
+| Pipeline | Per-hit review export | Implemented | Writes every accepted detected hit to `review/hits/*.wav` and records rows in the manifest. |
 | Pipeline | Sample pack ZIP | Implemented | Includes WAVs, index JSON, MIDI, rendered reconstruction. |
 | Docs | Project review | Implemented | `docs/PROJECT_REVIEW.md`. |
 | Docs | Timing/realtime analysis | Implemented | `docs/PIPELINE_TIMING_AND_REALTIME.md`. |
 | Docs | API docs | Implemented | `docs/API.md`. |
 | Docs | UI replacement docs | Implemented | `docs/UI_REPLACEMENT.md`. |
 | Docs | Feature/task/progress tracking | Implemented | This file, `TASKS.md`, `PROGRESS.md`. |
+| Docs | Hit review and streaming docs | Implemented | `docs/HIT_REVIEW_AND_STREAMING.md`. |
 ## Partially implemented features
 | Area | Feature | Current state | Needed to call it complete |
 |---|---|---|---|
+| Progress | Stage progress | SSE streams stage boundaries and logs | Add lower-level progress inside Demucs and clustering. |
 | Realtime | Online clustering | Implemented as batch-invoked prototype assignment | Add streaming/incremental audio analysis API for true realtime preview. |
 | Run history | Manifest browser | Lists and reloads completed runs | Add side-by-side comparison and filtering/search. |
+| Editing | Review workflow | Click-to-audition for hits and samples is implemented | Add onset editing, cluster merge/split, label reassignment. |
 | Frontend quality | No-build JavaScript UI | Good enough for local app | Convert to TypeScript once interaction model stabilizes. |
 ## Explicit non-goals for this pass

docs/HIT_REVIEW_AND_STREAMING.md ADDED Viewed

	@@ -0,0 +1,85 @@

+# Hit review and progress streaming
+Last updated: 2026-05-12
+## Purpose
+This pass moves the app closer to a review workstation by making detected hits individually inspectable and by replacing frontend-only polling with a server-sent-events progress channel.
+## Implemented behavior
+| Area | Implementation | Files |
+|---|---|---|
+| Review hit artifacts | Every accepted detected hit is written as an individual WAV under `review/hits/`. | `pipeline_runner.py` |
+| Manifest hit rows | `manifest.json` now includes a top-level `hits` array with onset, duration, label, cluster, representative flag, and relative file path. | `pipeline_runner.py` |
+| Hit URLs | API serialization adds direct download/playback URLs to every hit row. | `app.py` |
+| Waveform selection | Clicking the waveform selects the nearest detected onset marker. | `web/app.js` |
+| Hit audition | Clicking a hit row or waveform marker loads that hit into the selected-hit audio player. | `web/index.html`, `web/app.js` |
+| Sample audition | Representative sample rows now have explicit Audition buttons and a dedicated selected-sample player. | `web/index.html`, `web/app.js` |
+| SSE progress | `GET /api/jobs/{job_id}/events` streams job snapshots whenever state changes. | `app.py`, `web/app.js` |
+| Poll fallback | The frontend falls back to polling if `EventSource` is unavailable or errors. | `web/app.js` |
+| Artifact serving hardening | File downloads now use `Path.relative_to()` against the resolved run output directory. | `app.py` |
+## Manifest shape additions
+Completed results now include:
+```json
+{
+  "hits": [
+    {
+      "index": 0,
+      "label": "kick",
+      "cluster_id": 3,
+      "cluster_label": "kick_0",
+      "is_representative": true,
+      "onset_sec": 0.002993,
+      "duration_ms": 255.0,
+      "rms_energy": 0.141768,
+      "spectral_centroid_hz": 773.4,
+      "file": "review/hits/hit_00000_kick.wav"
+    }
+  ]
+}
+```
+API responses add `url` to each hit row, for example:
+```json
+{
+  "file": "review/hits/hit_00000_kick.wav",
+  "url": "/api/jobs/<job-id>/files/review/hits/hit_00000_kick.wav"
+}
+```
+The `overview.onsets` entries now also carry `index` and `duration_sec`, allowing the waveform to map markers back to review hit rows.
+## Streaming endpoint
+`GET /api/jobs/{job_id}/events` returns `text/event-stream`.
+Each emitted event has type `job` and contains the same serialized shape as `GET /api/jobs/{job_id}`:
+```text
+event: job
+data: {"id":"...","status":"running",...}
+```
+The stream ends after `complete` or `error`. Completed historical jobs stream one final event and then close.
+## Current limitations
+- Hit review is read-only. It does not yet support delete/shift/relabel actions.
+- Every accepted hit is exported as a WAV. This is correct for review UX, but large files with thousands of hits may produce many small artifacts.
+- SSE streams job snapshots, not fine-grained internal Demucs progress.
+- The waveform is an overview canvas, not an editable detailed waveform yet.
+## Next editor step
+Add an edit state layer on top of the hit manifest:
+1. Mark hit deleted/restored.
+2. Shift onset and duration bounds.
+3. Reassign cluster label.
+4. Merge/split clusters.
+5. Re-render/repack from edited manifest without rerunning Demucs or onset detection.

docs/PIPELINE_TIMING_AND_REALTIME.md CHANGED Viewed

@@ -36,16 +36,16 @@ The checked-in benchmark files were refreshed on 2026-05-12 with synthetic 2-bar
 | Stage | Batch quality mean | Online preview mean |
 |---|---:|---:|
-| source load | 0.011 s | 0.012 s |
-| BPM detection | 0.185 s | 0.163 s |
-| onset detection + slicing | 1.943 s | 1.834 s |
-| classification | 0.019 s | 0.017 s |
-| clustering | 0.148 s | 0.045 s |
-| representative selection | 0.204 s | 0.115 s |
 | synthesis | 0.001 s | 0.001 s |
-| export/package | 0.156 s | 0.221 s |
-On these small fixtures, `online_preview` reduced clustering time by about 3× compared with `batch_quality`. The total run is still dominated by onset detection, so the next realtime optimization target is streaming/incremental onset analysis rather than only clustering.
 First cold runs can be much slower because imports and library initialization are paid up front.
@@ -126,6 +126,5 @@ The current `online_preview` mode is invoked by the batch job API after onset de
 1. A streaming/ranged audio analysis API.
 2. Incremental onset detector state.
 3. Incremental hit artifact writing.
-4. SSE progress/results stream.
-5. UI that appends hits/clusters as they arrive.
-6. Optional final `batch_quality` consolidation pass.

 | Stage | Batch quality mean | Online preview mean |
 |---|---:|---:|
+| source load | 0.010 s | 0.010 s |
+| BPM detection | 0.155 s | 0.126 s |
+| onset detection + slicing | 1.964 s | 1.763 s |
+| classification | 0.042 s | 0.041 s |
+| clustering | 0.046 s | 0.037 s |
+| representative selection | 0.177 s | 0.158 s |
 | synthesis | 0.001 s | 0.001 s |
+| export/package | 0.158 s | 0.291 s |
+On these small fixtures, `online_preview` reduced clustering time compared with `batch_quality`, while export time increased because this pass now writes every accepted hit as a review WAV under `review/hits/`. The total run is still dominated by onset detection, so the next realtime optimization target is streaming/incremental onset analysis rather than only clustering.
 First cold runs can be much slower because imports and library initialization are paid up front.
 1. A streaming/ranged audio analysis API.
 2. Incremental onset detector state.
 3. Incremental hit artifact writing.
+4. UI that appends hits/clusters as they arrive instead of waiting for the completed manifest.
+5. Optional final `batch_quality` consolidation pass.

docs/PROGRESS.md CHANGED Viewed

@@ -40,17 +40,17 @@ The project now has a clearer product surface: final-quality batch extraction, f
 ## Current assessment
-The application is not “fully complete” as an editing workstation, but it is substantially implemented as an extraction workstation. The remaining gaps are concentrated around interactive correction/editing, richer progress streaming, run comparison, and frontend engineering hardening.
 ## Next recommended pass
 Implement the editing loop:
-1. Click waveform onset marker or sample table row to audition.
-2. Show selected hit metadata and audio snippet.
-3. Allow onset shift, label change, cluster reassignment, merge, and split.
-4. Re-export without rerunning Demucs/onset detection when only grouping changes.
-5. Save edit decisions into the manifest.
 ## Validation performed in this pass
@@ -58,6 +58,26 @@ Implement the editing loop:
 - Ran FastAPI smoke job through `scripts/test_api_job.py`.
 - Ran an online-preview API smoke job with synthetic audio.
 - Verified `GET /api/jobs` history output and `POST /api/cache/clear` behavior.
 - Refreshed batch and online benchmark JSON files:
   - `docs/benchmark-subprocesses.json`
   - `docs/benchmark-online-preview.json`

 ## Current assessment
+The application is not “fully complete” as an editing workstation, but it is substantially implemented as an extraction and review workstation. The remaining gaps are concentrated around mutating corrections/editing, run comparison, and frontend engineering hardening.
 ## Next recommended pass
 Implement the editing loop:
+1. Add edit state for deleted/restored hits and shifted onsets.
+2. Add label change, cluster reassignment, merge, and split.
+3. Re-export without rerunning Demucs/onset detection when only grouping changes.
+4. Save edit decisions into the manifest.
+5. Add side-by-side run comparison for parameter tuning.
 ## Validation performed in this pass
 - Ran FastAPI smoke job through `scripts/test_api_job.py`.
 - Ran an online-preview API smoke job with synthetic audio.
 - Verified `GET /api/jobs` history output and `POST /api/cache/clear` behavior.
+- Verified SSE completion and review-hit artifact serving.
 - Refreshed batch and online benchmark JSON files:
   - `docs/benchmark-subprocesses.json`
   - `docs/benchmark-online-preview.json`
+## Pass 3: hit review and streaming progress
+Completed in this pass:
+1. Added `GET /api/jobs/{job_id}/events` as a server-sent-events progress stream.
+2. Updated the frontend to consume SSE via `EventSource`, with the existing polling loop retained as fallback.
+3. Added per-hit review artifact export under `review/hits/`.
+4. Added a top-level `hits` array to each run manifest with onset, duration, classification, cluster label, representative flag, and file path.
+5. Added API serialization for hit playback/download URLs.
+6. Added selected-hit and selected-sample audio players.
+7. Made waveform onset markers clickable by selecting the nearest detected hit.
+8. Added hit table and sample-table audition controls.
+9. Hardened artifact file serving by using resolved path containment via `Path.relative_to()`.
+10. Refreshed batch and online benchmark JSON files after the review-hit export change.
+Outcome:
+The app now supports a real review loop for inspecting what the onset detector and clustering produced. Users can audition individual detected slices, representative samples, stem audio, and reconstruction audio from one screen. Progress updates are lower-latency and less wasteful via SSE while still remaining robust in browsers that need polling fallback.

docs/REMAINING_WORK.md CHANGED Viewed

@@ -8,11 +8,11 @@ The project is now a usable extraction workstation, not a complete interactive s
 ## Highest-priority remaining gaps
-1. **Hit audition and selection**: clicking an onset marker or sample row should audition that exact hit/sample.
-2. **Waveform editing**: add onset adjustment, delete/add hit, and rerun-from-edited-onsets without redoing Demucs.
-3. **Cluster editing**: allow merge, split, relabel, and manual reassignment of hits.
 4. **Run comparison**: compare two manifests side-by-side for parameter tuning.
-5. **Progress streaming**: replace polling or supplement it with SSE for lower-latency logs/progress.
 6. **Frontend engineering hardening**: migrate the frontend to TypeScript after the UX stabilizes and add browser-level tests.
 7. **Benchmark panel**: add an in-app benchmark view that can run synthetic fixtures and compare parameter profiles.
@@ -26,10 +26,10 @@ The project is now a usable extraction workstation, not a complete interactive s
 ## Suggested implementation order
-1. Add click-to-audition for sample table rows and waveform onsets.
-2. Store detected hit snippets as individual review artifacts or expose ranged audio endpoints.
-3. Add edit state to manifests: deleted hits, shifted onsets, labels, cluster overrides.
-4. Add rerender/repack endpoint that starts from edited hit/cluster state.
-5. Add run comparison view.
-6. Add SSE progress streaming.
-7. Convert frontend to TypeScript and add UI tests.

 ## Highest-priority remaining gaps
+1. **Waveform editing**: add onset adjustment, delete/add hit, and rerun-from-edited-onsets without redoing Demucs.
+2. **Cluster editing**: allow merge, split, relabel, and manual reassignment of hits.
+3. **Edited re-export**: regenerate samples/MIDI/ZIP from edited hit/cluster state without rerunning Demucs or onset detection.
 4. **Run comparison**: compare two manifests side-by-side for parameter tuning.
+5. **Lower-level progress**: expose internal Demucs/clustering progress where libraries make that possible.
 6. **Frontend engineering hardening**: migrate the frontend to TypeScript after the UX stabilizes and add browser-level tests.
 7. **Benchmark panel**: add an in-app benchmark view that can run synthetic fixtures and compare parameter profiles.
 ## Suggested implementation order
+1. Add edit state to manifests: deleted hits, shifted onsets, labels, cluster overrides.
+2. Add rerender/repack endpoint that starts from edited hit/cluster state.
+3. Add cluster merge/split/relabel actions in the UI.
+4. Add run comparison view.
+5. Add lower-level progress hooks inside expensive stages where practical.
+6. Convert frontend to TypeScript and add UI tests.
+7. Add an in-app benchmark/parameter profile panel.

docs/TASKS.md CHANGED Viewed

@@ -12,7 +12,7 @@ Last updated: 2026-05-12
 | Add documentation to project | Done | `docs/*.md`, updated `README.md`. |
 | Replace Gradio UI | Done | Active app is FastAPI + custom web UI; Gradio moved to `legacy/`. |
 | Document features, tasks, and progress | Done | `docs/FEATURES.md`, this file, `docs/PROGRESS.md`. |
-| Continue development while keeping docs up-to-date | In progress | This pass adds run history, disk cache, online clustering mode, and docs updates. |
 ## Completed implementation tasks
@@ -33,6 +33,13 @@ Last updated: 2026-05-12
 - [x] Add UI controls for clustering mode and disk cache.
 - [x] Fix duplicate sample writes in `build_archive`.
 - [x] Add feature, task, and progress docs.
 ## Validation tasks
@@ -40,15 +47,14 @@ Last updated: 2026-05-12
 - [x] FastAPI smoke test for health/config/job flow.
 - [x] Pipeline smoke test on synthetic audio.
 - [x] API history/cache smoke test.
 - [x] Git status reviewed before packaging.
 - [x] Project archive excludes `.runs/`, `.cache/`, and dependency folders.
 ## Remaining high-value tasks
-- [ ] Add click-to-audition onset markers and table rows.
 - [ ] Add onset adjustment and rerun-from-onsets flow.
 - [ ] Add cluster merge/split/relabel workflow.
 - [ ] Add side-by-side run comparison.
-- [ ] Add SSE progress stream for lower-latency updates.
 - [ ] Convert frontend to TypeScript with a small Vite build once UX stabilizes.
 - [ ] Add automated browser-level UI tests.

 | Add documentation to project | Done | `docs/*.md`, updated `README.md`. |
 | Replace Gradio UI | Done | Active app is FastAPI + custom web UI; Gradio moved to `legacy/`. |
 | Document features, tasks, and progress | Done | `docs/FEATURES.md`, this file, `docs/PROGRESS.md`. |
+| Continue development while keeping docs up-to-date | In progress | Latest pass adds SSE progress, per-hit review artifacts, hit/sample audition, hardened artifact serving, and docs updates. |
 ## Completed implementation tasks
 - [x] Add UI controls for clustering mode and disk cache.
 - [x] Fix duplicate sample writes in `build_archive`.
 - [x] Add feature, task, and progress docs.
+- [x] Add `GET /api/jobs/{id}/events` SSE progress stream.
+- [x] Add per-hit review WAV export under `review/hits/`.
+- [x] Add manifest `hits` rows with onset, duration, cluster, representative flag, and artifact path.
+- [x] Add click-to-audition for waveform onset markers and detected hit rows.
+- [x] Add sample-row audition controls.
+- [x] Harden artifact path containment with `Path.relative_to()`.
+- [x] Add hit review/streaming documentation.
 ## Validation tasks
 - [x] FastAPI smoke test for health/config/job flow.
 - [x] Pipeline smoke test on synthetic audio.
 - [x] API history/cache smoke test.
+- [x] SSE and review-hit artifact smoke test via `scripts/test_sse_and_review_hits.py`.
 - [x] Git status reviewed before packaging.
 - [x] Project archive excludes `.runs/`, `.cache/`, and dependency folders.
 ## Remaining high-value tasks
 - [ ] Add onset adjustment and rerun-from-onsets flow.
 - [ ] Add cluster merge/split/relabel workflow.
 - [ ] Add side-by-side run comparison.
 - [ ] Convert frontend to TypeScript with a small Vite build once UX stabilizes.
 - [ ] Add automated browser-level UI tests.

docs/UI_REPLACEMENT.md CHANGED Viewed

@@ -65,7 +65,7 @@ Two modes are exposed:
 | `batch_quality` | Slower, final-quality clustering using all-pairs similarity plus agglomerative clustering. |
 | `online_preview` | Faster near-realtime-style clustering using prototype assignment. Best for quick iteration after bypassing Demucs. |
-## Why polling instead of websockets/SSE
 Polling is the simplest robust option here because the current pipeline is CPU-heavy and mostly stage-based. The UI polls every 800 ms, which is enough to show stage transitions and logs without introducing websocket lifecycle complexity.
@@ -79,3 +79,15 @@ Future improvement: use Server-Sent Events for lower-latency log streaming once
 - Add downloadable timing report per job.
 - Add filters/search to the run history browser.
 - Convert the frontend to TypeScript when the UX stops moving quickly.

 | `batch_quality` | Slower, final-quality clustering using all-pairs similarity plus agglomerative clustering. |
 | `online_preview` | Faster near-realtime-style clustering using prototype assignment. Best for quick iteration after bypassing Demucs. |
+## Why SSE progress with polling fallback instead of websockets/SSE
 Polling is the simplest robust option here because the current pipeline is CPU-heavy and mostly stage-based. The UI polls every 800 ms, which is enough to show stage transitions and logs without introducing websocket lifecycle complexity.
 - Add downloadable timing report per job.
 - Add filters/search to the run history browser.
 - Convert the frontend to TypeScript when the UX stops moving quickly.
+## Latest review UI additions
+The current UI now includes:
+- Dedicated selected-hit and selected-sample audio players.
+- Clickable waveform onset markers that select the nearest detected hit.
+- A detected-hit review table backed by `review/hits/*.wav` artifacts.
+- Audition buttons for representative sample rows.
+- Server-sent-events job progress via `GET /api/jobs/{job_id}/events`, with polling fallback.
+This still stops short of destructive editing. The next UI layer should store edits as manifest overlays, then call a re-export endpoint that reuses cached hit audio instead of rerunning Demucs/onset detection.

docs/benchmark-online-preview.json CHANGED Viewed

@@ -8,66 +8,66 @@
       "run_index": 0,
       "clustering_mode": "online_preview",
       "audio_duration_sec": 4.75,
-      "total_duration_sec": 2.394493,
-      "realtime_factor": 0.504104,
-      "hit_count": 14,
       "cluster_count": 10,
       "stages": [
         {
           "key": "stem",
           "label": "Stem extraction / source load",
-          "duration_sec": 0.01333964500008733,
           "status": "done",
           "detail": "loaded full mix \u00b7 cached"
         },
         {
           "key": "bpm",
           "label": "Tempo detection",
-          "duration_sec": 0.18073730900005103,
           "status": "done",
           "detail": "120.2 BPM"
         },
         {
           "key": "onsets",
           "label": "Onset detection + slicing",
-          "duration_sec": 1.8083914959997855,
           "status": "done",
-          "detail": "14 hits"
         },
         {
           "key": "classification",
           "label": "Spectral rule classification",
-          "duration_sec": 0.015553790000012668,
           "status": "done",
-          "detail": "bright:5, hihat_open:8, kick:1"
         },
         {
           "key": "clustering",
           "label": "Mel fingerprint + transient NCC clustering",
-          "duration_sec": 0.01717499700021108,
           "status": "done",
           "detail": "10 clusters \u00b7 online preview"
         },
         {
           "key": "selection",
           "label": "Best representative scoring",
-          "duration_sec": 0.06853683399981492,
           "status": "done",
           "detail": "quality-scored representatives"
         },
         {
           "key": "synthesis",
           "label": "Optional sample synthesis",
-          "duration_sec": 0.0004338460000781197,
           "status": "done",
           "detail": "2 synthesized alternates"
         },
         {
           "key": "export",
           "label": "MIDI, reconstruction, WAV, ZIP export",
-          "duration_sec": 0.2898033520000354,
           "status": "done",
-          "detail": "10 WAVs + MIDI + ZIP"
         }
       ]
     },
@@ -78,66 +78,66 @@
       "run_index": 0,
       "clustering_mode": "online_preview",
       "audio_duration_sec": 4.874989,
-      "total_duration_sec": 2.422223,
-      "realtime_factor": 0.496867,
-      "hit_count": 30,
       "cluster_count": 12,
       "stages": [
         {
           "key": "stem",
           "label": "Stem extraction / source load",
-          "duration_sec": 0.012654803000032189,
           "status": "done",
           "detail": "loaded full mix \u00b7 cached"
         },
         {
           "key": "bpm",
           "label": "Tempo detection",
-          "duration_sec": 0.10868702200014013,
           "status": "done",
-          "detail": "120.2 BPM"
         },
         {
           "key": "onsets",
           "label": "Onset detection + slicing",
-          "duration_sec": 1.7981390029999602,
           "status": "done",
-          "detail": "30 hits"
         },
         {
           "key": "classification",
           "label": "Spectral rule classification",
-          "duration_sec": 0.020911717999979373,
           "status": "done",
-          "detail": "bright:12, cymbal:2, hihat_closed:9, hihat_open:3, kick:1, mid:3"
         },
         {
           "key": "clustering",
           "label": "Mel fingerprint + transient NCC clustering",
-          "duration_sec": 0.08173960800013447,
           "status": "done",
           "detail": "12 clusters \u00b7 online preview"
         },
         {
           "key": "selection",
           "label": "Best representative scoring",
-          "duration_sec": 0.18588780100003532,
           "status": "done",
           "detail": "quality-scored representatives"
         },
         {
           "key": "synthesis",
           "label": "Optional sample synthesis",
-          "duration_sec": 0.001146163000157685,
           "status": "done",
-          "detail": "6 synthesized alternates"
         },
         {
           "key": "export",
           "label": "MIDI, reconstruction, WAV, ZIP export",
-          "duration_sec": 0.21253995300003226,
           "status": "done",
-          "detail": "12 WAVs + MIDI + ZIP"
         }
       ]
     },
@@ -148,66 +148,66 @@
       "run_index": 0,
       "clustering_mode": "online_preview",
       "audio_duration_sec": 4.874989,
-      "total_duration_sec": 2.406563,
-      "realtime_factor": 0.493655,
-      "hit_count": 28,
       "cluster_count": 12,
       "stages": [
         {
           "key": "stem",
           "label": "Stem extraction / source load",
-          "duration_sec": 0.009107656999958635,
           "status": "done",
           "detail": "loaded full mix \u00b7 cached"
         },
         {
           "key": "bpm",
           "label": "Tempo detection",
-          "duration_sec": 0.19882379599994238,
           "status": "done",
-          "detail": "118.8 BPM"
         },
         {
           "key": "onsets",
           "label": "Onset detection + slicing",
-          "duration_sec": 1.8942657120001059,
           "status": "done",
-          "detail": "28 hits"
         },
         {
           "key": "classification",
           "label": "Spectral rule classification",
-          "duration_sec": 0.015083428000025378,
           "status": "done",
-          "detail": "bright:5, cymbal:2, hihat_closed:19, hihat_open:2"
         },
         {
           "key": "clustering",
           "label": "Mel fingerprint + transient NCC clustering",
-          "duration_sec": 0.036892447000127504,
           "status": "done",
           "detail": "12 clusters \u00b7 online preview"
         },
         {
           "key": "selection",
           "label": "Best representative scoring",
-          "duration_sec": 0.0908485570000721,
           "status": "done",
           "detail": "quality-scored representatives"
         },
         {
           "key": "synthesis",
           "label": "Optional sample synthesis",
-          "duration_sec": 0.0007993310000529164,
           "status": "done",
-          "detail": "4 synthesized alternates"
         },
         {
           "key": "export",
           "label": "MIDI, reconstruction, WAV, ZIP export",
-          "duration_sec": 0.1602465889998257,
           "status": "done",
-          "detail": "12 WAVs + MIDI + ZIP"
         }
       ]
     }
@@ -215,59 +215,59 @@
   "summary": [
     {
       "stage": "stem",
-      "mean_sec": 0.011701,
-      "median_sec": 0.012655,
-      "min_sec": 0.009108,
-      "max_sec": 0.01334
     },
     {
       "stage": "bpm",
-      "mean_sec": 0.162749,
-      "median_sec": 0.180737,
-      "min_sec": 0.108687,
-      "max_sec": 0.198824
     },
     {
       "stage": "onsets",
-      "mean_sec": 1.833599,
-      "median_sec": 1.808391,
-      "min_sec": 1.798139,
-      "max_sec": 1.894266
     },
     {
       "stage": "classification",
-      "mean_sec": 0.017183,
-      "median_sec": 0.015554,
-      "min_sec": 0.015083,
-      "max_sec": 0.020912
     },
     {
       "stage": "clustering",
-      "mean_sec": 0.045269,
-      "median_sec": 0.036892,
-      "min_sec": 0.017175,
-      "max_sec": 0.08174
     },
     {
       "stage": "selection",
-      "mean_sec": 0.115091,
-      "median_sec": 0.090849,
-      "min_sec": 0.068537,
-      "max_sec": 0.185888
     },
     {
       "stage": "synthesis",
-      "mean_sec": 0.000793,
-      "median_sec": 0.000799,
-      "min_sec": 0.000434,
-      "max_sec": 0.001146
     },
     {
       "stage": "export",
-      "mean_sec": 0.220863,
-      "median_sec": 0.21254,
-      "min_sec": 0.160247,
-      "max_sec": 0.289803
     }
   ]
 }

       "run_index": 0,
       "clustering_mode": "online_preview",
       "audio_duration_sec": 4.75,
+      "total_duration_sec": 1.88646,
+      "realtime_factor": 0.397149,
+      "hit_count": 13,
       "cluster_count": 10,
       "stages": [
         {
           "key": "stem",
           "label": "Stem extraction / source load",
+          "duration_sec": 0.011189419999936945,
           "status": "done",
           "detail": "loaded full mix \u00b7 cached"
         },
         {
           "key": "bpm",
           "label": "Tempo detection",
+          "duration_sec": 0.09853705299974536,
           "status": "done",
           "detail": "120.2 BPM"
         },
         {
           "key": "onsets",
           "label": "Onset detection + slicing",
+          "duration_sec": 1.3858792310002173,
           "status": "done",
+          "detail": "13 hits"
         },
         {
           "key": "classification",
           "label": "Spectral rule classification",
+          "duration_sec": 0.014456886000061786,
           "status": "done",
+          "detail": "bright:5, hihat_open:7, kick:1"
         },
         {
           "key": "clustering",
           "label": "Mel fingerprint + transient NCC clustering",
+          "duration_sec": 0.016802669999833597,
           "status": "done",
           "detail": "10 clusters \u00b7 online preview"
         },
         {
           "key": "selection",
           "label": "Best representative scoring",
+          "duration_sec": 0.07535981499995614,
           "status": "done",
           "detail": "quality-scored representatives"
         },
         {
           "key": "synthesis",
           "label": "Optional sample synthesis",
+          "duration_sec": 0.00036268399981054245,
           "status": "done",
           "detail": "2 synthesized alternates"
         },
         {
           "key": "export",
           "label": "MIDI, reconstruction, WAV, ZIP export",
+          "duration_sec": 0.28339249200007544,
           "status": "done",
+          "detail": "10 samples + 13 review hits + MIDI + ZIP"
         }
       ]
     },
       "run_index": 0,
       "clustering_mode": "online_preview",
       "audio_duration_sec": 4.874989,
+      "total_duration_sec": 2.914241,
+      "realtime_factor": 0.597794,
+      "hit_count": 28,
       "cluster_count": 12,
       "stages": [
         {
           "key": "stem",
           "label": "Stem extraction / source load",
+          "duration_sec": 0.00999813099997482,
           "status": "done",
           "detail": "loaded full mix \u00b7 cached"
         },
         {
           "key": "bpm",
           "label": "Tempo detection",
+          "duration_sec": 0.10688103099982982,
           "status": "done",
+          "detail": "161.5 BPM"
         },
         {
           "key": "onsets",
           "label": "Onset detection + slicing",
+          "duration_sec": 2.1018096600000717,
           "status": "done",
+          "detail": "28 hits"
         },
         {
           "key": "classification",
           "label": "Spectral rule classification",
+          "duration_sec": 0.09064649800029656,
           "status": "done",
+          "detail": "bright:12, cymbal:1, hihat_closed:9, hihat_open:3, mid:3"
         },
         {
           "key": "clustering",
           "label": "Mel fingerprint + transient NCC clustering",
+          "duration_sec": 0.049414074000196706,
           "status": "done",
           "detail": "12 clusters \u00b7 online preview"
         },
         {
           "key": "selection",
           "label": "Best representative scoring",
+          "duration_sec": 0.23301379500026087,
           "status": "done",
           "detail": "quality-scored representatives"
         },
         {
           "key": "synthesis",
           "label": "Optional sample synthesis",
+          "duration_sec": 0.0012726520003525366,
           "status": "done",
+          "detail": "5 synthesized alternates"
         },
         {
           "key": "export",
           "label": "MIDI, reconstruction, WAV, ZIP export",
+          "duration_sec": 0.32063418000007005,
           "status": "done",
+          "detail": "12 samples + 28 review hits + MIDI + ZIP"
         }
       ]
     },
       "run_index": 0,
       "clustering_mode": "online_preview",
       "audio_duration_sec": 4.874989,
+      "total_duration_sec": 2.480844,
+      "realtime_factor": 0.508892,
+      "hit_count": 29,
       "cluster_count": 12,
       "stages": [
         {
           "key": "stem",
           "label": "Stem extraction / source load",
+          "duration_sec": 0.010305768999842257,
           "status": "done",
           "detail": "loaded full mix \u00b7 cached"
         },
         {
           "key": "bpm",
           "label": "Tempo detection",
+          "duration_sec": 0.1724793140001566,
           "status": "done",
+          "detail": "120.2 BPM"
         },
         {
           "key": "onsets",
           "label": "Onset detection + slicing",
+          "duration_sec": 1.8014776340000935,
           "status": "done",
+          "detail": "29 hits"
         },
         {
           "key": "classification",
           "label": "Spectral rule classification",
+          "duration_sec": 0.017559420999987196,
           "status": "done",
+          "detail": "bright:5, cymbal:1, hihat_closed:20, hihat_open:3"
         },
         {
           "key": "clustering",
           "label": "Mel fingerprint + transient NCC clustering",
+          "duration_sec": 0.043723993000185146,
           "status": "done",
           "detail": "12 clusters \u00b7 online preview"
         },
         {
           "key": "selection",
           "label": "Best representative scoring",
+          "duration_sec": 0.16425892699999167,
           "status": "done",
           "detail": "quality-scored representatives"
         },
         {
           "key": "synthesis",
           "label": "Optional sample synthesis",
+          "duration_sec": 0.0012976000002709043,
           "status": "done",
+          "detail": "8 synthesized alternates"
         },
         {
           "key": "export",
           "label": "MIDI, reconstruction, WAV, ZIP export",
+          "duration_sec": 0.2692134119997718,
           "status": "done",
+          "detail": "12 samples + 29 review hits + MIDI + ZIP"
         }
       ]
     }
   "summary": [
     {
       "stage": "stem",
+      "mean_sec": 0.010498,
+      "median_sec": 0.010306,
+      "min_sec": 0.009998,
+      "max_sec": 0.011189
     },
     {
       "stage": "bpm",
+      "mean_sec": 0.125966,
+      "median_sec": 0.106881,
+      "min_sec": 0.098537,
+      "max_sec": 0.172479
     },
     {
       "stage": "onsets",
+      "mean_sec": 1.763056,
+      "median_sec": 1.801478,
+      "min_sec": 1.385879,
+      "max_sec": 2.10181
     },
     {
       "stage": "classification",
+      "mean_sec": 0.040888,
+      "median_sec": 0.017559,
+      "min_sec": 0.014457,
+      "max_sec": 0.090646
     },
     {
       "stage": "clustering",
+      "mean_sec": 0.036647,
+      "median_sec": 0.043724,
+      "min_sec": 0.016803,
+      "max_sec": 0.049414
     },
     {
       "stage": "selection",
+      "mean_sec": 0.157544,
+      "median_sec": 0.164259,
+      "min_sec": 0.07536,
+      "max_sec": 0.233014
     },
     {
       "stage": "synthesis",
+      "mean_sec": 0.000978,
+      "median_sec": 0.001273,
+      "min_sec": 0.000363,
+      "max_sec": 0.001298
     },
     {
       "stage": "export",
+      "mean_sec": 0.29108,
+      "median_sec": 0.283392,
+      "min_sec": 0.269213,
+      "max_sec": 0.320634
     }
   ]
 }

docs/benchmark-subprocesses.json CHANGED Viewed

@@ -8,66 +8,66 @@
       "run_index": 0,
       "clustering_mode": "batch_quality",
       "audio_duration_sec": 4.75,
-      "total_duration_sec": 2.416794,
-      "realtime_factor": 0.508799,
-      "hit_count": 14,
       "cluster_count": 7,
       "stages": [
         {
           "key": "stem",
           "label": "Stem extraction / source load",
-          "duration_sec": 0.011517213000161064,
           "status": "done",
           "detail": "loaded full mix \u00b7 cached"
         },
         {
           "key": "bpm",
           "label": "Tempo detection",
-          "duration_sec": 0.19438482000009571,
           "status": "done",
           "detail": "120.2 BPM"
         },
         {
           "key": "onsets",
           "label": "Onset detection + slicing",
-          "duration_sec": 1.8062190609998652,
           "status": "done",
-          "detail": "14 hits"
         },
         {
           "key": "classification",
           "label": "Spectral rule classification",
-          "duration_sec": 0.016392102000054365,
           "status": "done",
-          "detail": "bright:5, hihat_closed:1, hihat_open:7, kick:1"
         },
         {
           "key": "clustering",
           "label": "Mel fingerprint + transient NCC clustering",
-          "duration_sec": 0.07352871200009758,
           "status": "done",
           "detail": "7 clusters \u00b7 batch quality"
         },
         {
           "key": "selection",
           "label": "Best representative scoring",
-          "duration_sec": 0.096273950000068,
           "status": "done",
           "detail": "quality-scored representatives"
         },
         {
           "key": "synthesis",
           "label": "Optional sample synthesis",
-          "duration_sec": 0.0006992359999458131,
           "status": "done",
           "detail": "2 synthesized alternates"
         },
         {
           "key": "export",
           "label": "MIDI, reconstruction, WAV, ZIP export",
-          "duration_sec": 0.2172303219999776,
           "status": "done",
-          "detail": "7 WAVs + MIDI + ZIP"
         }
       ]
     },
@@ -78,66 +78,66 @@
       "run_index": 0,
       "clustering_mode": "batch_quality",
       "audio_duration_sec": 4.874989,
-      "total_duration_sec": 2.99188,
-      "realtime_factor": 0.61372,
-      "hit_count": 35,
-      "cluster_count": 2,
       "stages": [
         {
           "key": "stem",
           "label": "Stem extraction / source load",
-          "duration_sec": 0.010077079999973648,
           "status": "done",
           "detail": "loaded full mix \u00b7 cached"
         },
         {
           "key": "bpm",
           "label": "Tempo detection",
-          "duration_sec": 0.17334403699987888,
           "status": "done",
           "detail": "161.5 BPM"
         },
         {
           "key": "onsets",
           "label": "Onset detection + slicing",
-          "duration_sec": 2.1082552409998243,
           "status": "done",
-          "detail": "35 hits"
         },
         {
           "key": "classification",
           "label": "Spectral rule classification",
-          "duration_sec": 0.021269321000090713,
           "status": "done",
-          "detail": "bright:14, cymbal:1, hihat_closed:14, hihat_open:3, kick:1, mid:2"
         },
         {
           "key": "clustering",
           "label": "Mel fingerprint + transient NCC clustering",
-          "duration_sec": 0.26927052900009585,
           "status": "done",
-          "detail": "2 clusters \u00b7 batch quality"
         },
         {
           "key": "selection",
           "label": "Best representative scoring",
-          "duration_sec": 0.31629775500005053,
           "status": "done",
           "detail": "quality-scored representatives"
         },
         {
           "key": "synthesis",
           "label": "Optional sample synthesis",
-          "duration_sec": 0.0011716779999915161,
           "status": "done",
-          "detail": "2 synthesized alternates"
         },
         {
           "key": "export",
           "label": "MIDI, reconstruction, WAV, ZIP export",
-          "duration_sec": 0.09167172899992693,
           "status": "done",
-          "detail": "2 WAVs + MIDI + ZIP"
         }
       ]
     },
@@ -148,66 +148,66 @@
       "run_index": 0,
       "clustering_mode": "batch_quality",
       "audio_duration_sec": 4.874989,
-      "total_duration_sec": 2.597859,
-      "realtime_factor": 0.532895,
-      "hit_count": 23,
-      "cluster_count": 3,
       "stages": [
         {
           "key": "stem",
           "label": "Stem extraction / source load",
-          "duration_sec": 0.012474630000042453,
           "status": "done",
           "detail": "loaded full mix \u00b7 cached"
         },
         {
           "key": "bpm",
           "label": "Tempo detection",
-          "duration_sec": 0.18858063699985905,
           "status": "done",
           "detail": "120.2 BPM"
         },
         {
           "key": "onsets",
           "label": "Onset detection + slicing",
-          "duration_sec": 1.9154837959999895,
           "status": "done",
-          "detail": "23 hits"
         },
         {
           "key": "classification",
           "label": "Spectral rule classification",
-          "duration_sec": 0.0188920179998604,
           "status": "done",
-          "detail": "bright:3, hihat_closed:17, hihat_open:3"
         },
         {
           "key": "clustering",
           "label": "Mel fingerprint + transient NCC clustering",
-          "duration_sec": 0.10195718500017392,
           "status": "done",
-          "detail": "3 clusters \u00b7 batch quality"
         },
         {
           "key": "selection",
           "label": "Best representative scoring",
-          "duration_sec": 0.19837312200002089,
           "status": "done",
           "detail": "quality-scored representatives"
         },
         {
           "key": "synthesis",
           "label": "Optional sample synthesis",
-          "duration_sec": 0.0011928339999940363,
           "status": "done",
-          "detail": "3 synthesized alternates"
         },
         {
           "key": "export",
           "label": "MIDI, reconstruction, WAV, ZIP export",
-          "duration_sec": 0.1603816869999264,
           "status": "done",
-          "detail": "3 WAVs + MIDI + ZIP"
         }
       ]
     }
@@ -215,59 +215,59 @@
   "summary": [
     {
       "stage": "stem",
-      "mean_sec": 0.011356,
-      "median_sec": 0.011517,
-      "min_sec": 0.010077,
-      "max_sec": 0.012475
     },
     {
       "stage": "bpm",
-      "mean_sec": 0.185436,
-      "median_sec": 0.188581,
-      "min_sec": 0.173344,
-      "max_sec": 0.194385
     },
     {
       "stage": "onsets",
-      "mean_sec": 1.943319,
-      "median_sec": 1.915484,
-      "min_sec": 1.806219,
-      "max_sec": 2.108255
     },
     {
       "stage": "classification",
-      "mean_sec": 0.018851,
-      "median_sec": 0.018892,
-      "min_sec": 0.016392,
-      "max_sec": 0.021269
     },
     {
       "stage": "clustering",
-      "mean_sec": 0.148252,
-      "median_sec": 0.101957,
-      "min_sec": 0.073529,
-      "max_sec": 0.269271
     },
     {
       "stage": "selection",
-      "mean_sec": 0.203648,
-      "median_sec": 0.198373,
-      "min_sec": 0.096274,
-      "max_sec": 0.316298
     },
     {
       "stage": "synthesis",
-      "mean_sec": 0.001021,
-      "median_sec": 0.001172,
-      "min_sec": 0.000699,
-      "max_sec": 0.001193
     },
     {
       "stage": "export",
-      "mean_sec": 0.156428,
-      "median_sec": 0.160382,
-      "min_sec": 0.091672,
-      "max_sec": 0.21723
     }
   ]
 }

       "run_index": 0,
       "clustering_mode": "batch_quality",
       "audio_duration_sec": 4.75,
+      "total_duration_sec": 2.508936,
+      "realtime_factor": 0.528197,
+      "hit_count": 13,
       "cluster_count": 7,
       "stages": [
         {
           "key": "stem",
           "label": "Stem extraction / source load",
+          "duration_sec": 0.010515291000047,
           "status": "done",
           "detail": "loaded full mix \u00b7 cached"
         },
         {
           "key": "bpm",
           "label": "Tempo detection",
+          "duration_sec": 0.11277726900016205,
           "status": "done",
           "detail": "120.2 BPM"
         },
         {
           "key": "onsets",
           "label": "Onset detection + slicing",
+          "duration_sec": 1.9893157869996685,
           "status": "done",
+          "detail": "13 hits"
         },
         {
           "key": "classification",
           "label": "Spectral rule classification",
+          "duration_sec": 0.013427571999727661,
           "status": "done",
+          "detail": "bright:5, hihat_closed:1, hihat_open:6, kick:1"
         },
         {
           "key": "clustering",
           "label": "Mel fingerprint + transient NCC clustering",
+          "duration_sec": 0.013959215999875596,
           "status": "done",
           "detail": "7 clusters \u00b7 batch quality"
         },
         {
           "key": "selection",
           "label": "Best representative scoring",
+          "duration_sec": 0.09699052199994185,
           "status": "done",
           "detail": "quality-scored representatives"
         },
         {
           "key": "synthesis",
           "label": "Optional sample synthesis",
+          "duration_sec": 0.000661541999761539,
           "status": "done",
           "detail": "2 synthesized alternates"
         },
         {
           "key": "export",
           "label": "MIDI, reconstruction, WAV, ZIP export",
+          "duration_sec": 0.2707521170000291,
           "status": "done",
+          "detail": "7 samples + 13 review hits + MIDI + ZIP"
         }
       ]
     },
       "run_index": 0,
       "clustering_mode": "batch_quality",
       "audio_duration_sec": 4.874989,
+      "total_duration_sec": 2.562433,
+      "realtime_factor": 0.525628,
+      "hit_count": 30,
+      "cluster_count": 1,
       "stages": [
         {
           "key": "stem",
           "label": "Stem extraction / source load",
+          "duration_sec": 0.009733310000228812,
           "status": "done",
           "detail": "loaded full mix \u00b7 cached"
         },
         {
           "key": "bpm",
           "label": "Tempo detection",
+          "duration_sec": 0.18278188500016768,
           "status": "done",
           "detail": "161.5 BPM"
         },
         {
           "key": "onsets",
           "label": "Onset detection + slicing",
+          "duration_sec": 1.8905766069997298,
           "status": "done",
+          "detail": "30 hits"
         },
         {
           "key": "classification",
           "label": "Spectral rule classification",
+          "duration_sec": 0.016936135000378272,
           "status": "done",
+          "detail": "bright:15, cymbal:1, hihat_closed:10, hihat_open:3, mid:1"
         },
         {
           "key": "clustering",
           "label": "Mel fingerprint + transient NCC clustering",
+          "duration_sec": 0.09508980800001154,
           "status": "done",
+          "detail": "1 clusters \u00b7 batch quality"
         },
         {
           "key": "selection",
           "label": "Best representative scoring",
+          "duration_sec": 0.271814092999648,
           "status": "done",
           "detail": "quality-scored representatives"
         },
         {
           "key": "synthesis",
           "label": "Optional sample synthesis",
+          "duration_sec": 0.0009019099998113234,
           "status": "done",
+          "detail": "1 synthesized alternates"
         },
         {
           "key": "export",
           "label": "MIDI, reconstruction, WAV, ZIP export",
+          "duration_sec": 0.09411494899995887,
           "status": "done",
+          "detail": "1 samples + 30 review hits + MIDI + ZIP"
         }
       ]
     },
       "run_index": 0,
       "clustering_mode": "batch_quality",
       "audio_duration_sec": 4.874989,
+      "total_duration_sec": 2.587342,
+      "realtime_factor": 0.530738,
+      "hit_count": 20,
+      "cluster_count": 4,
       "stages": [
         {
           "key": "stem",
           "label": "Stem extraction / source load",
+          "duration_sec": 0.008843839000292064,
           "status": "done",
           "detail": "loaded full mix \u00b7 cached"
         },
         {
           "key": "bpm",
           "label": "Tempo detection",
+          "duration_sec": 0.16997624899977382,
           "status": "done",
           "detail": "120.2 BPM"
         },
         {
           "key": "onsets",
           "label": "Onset detection + slicing",
+          "duration_sec": 2.0115367889998197,
           "status": "done",
+          "detail": "20 hits"
         },
         {
           "key": "classification",
           "label": "Spectral rule classification",
+          "duration_sec": 0.0954397410000638,
           "status": "done",
+          "detail": "bright:3, hihat_closed:14, hihat_open:3"
         },
         {
           "key": "clustering",
           "label": "Mel fingerprint + transient NCC clustering",
+          "duration_sec": 0.02929340799983038,
           "status": "done",
+          "detail": "4 clusters \u00b7 batch quality"
         },
         {
           "key": "selection",
           "label": "Best representative scoring",
+          "duration_sec": 0.1620299520000117,
           "status": "done",
           "detail": "quality-scored representatives"
         },
         {
           "key": "synthesis",
           "label": "Optional sample synthesis",
+          "duration_sec": 0.0010316440002497984,
           "status": "done",
+          "detail": "2 synthesized alternates"
         },
         {
           "key": "export",
           "label": "MIDI, reconstruction, WAV, ZIP export",
+          "duration_sec": 0.108677784000065,
           "status": "done",
+          "detail": "4 samples + 20 review hits + MIDI + ZIP"
         }
       ]
     }
   "summary": [
     {
       "stage": "stem",
+      "mean_sec": 0.009697,
+      "median_sec": 0.009733,
+      "min_sec": 0.008844,
+      "max_sec": 0.010515
     },
     {
       "stage": "bpm",
+      "mean_sec": 0.155178,
+      "median_sec": 0.169976,
+      "min_sec": 0.112777,
+      "max_sec": 0.182782
     },
     {
       "stage": "onsets",
+      "mean_sec": 1.96381,
+      "median_sec": 1.989316,
+      "min_sec": 1.890577,
+      "max_sec": 2.011537
     },
     {
       "stage": "classification",
+      "mean_sec": 0.041934,
+      "median_sec": 0.016936,
+      "min_sec": 0.013428,
+      "max_sec": 0.09544
     },
     {
       "stage": "clustering",
+      "mean_sec": 0.046114,
+      "median_sec": 0.029293,
+      "min_sec": 0.013959,
+      "max_sec": 0.09509
     },
     {
       "stage": "selection",
+      "mean_sec": 0.176945,
+      "median_sec": 0.16203,
+      "min_sec": 0.096991,
+      "max_sec": 0.271814
     },
     {
       "stage": "synthesis",
+      "mean_sec": 0.000865,
+      "median_sec": 0.000902,
+      "min_sec": 0.000662,
+      "max_sec": 0.001032
     },
     {
       "stage": "export",
+      "mean_sec": 0.157848,
+      "median_sec": 0.108678,
+      "min_sec": 0.094115,
+      "max_sec": 0.270752
     }
   ]
 }

pipeline_runner.py CHANGED Viewed

@@ -136,6 +136,7 @@ class PipelineResult:
     cluster_count: int
     stages: list[dict[str, Any]]
     samples: list[dict[str, Any]]
     overview: dict[str, Any]
     files: dict[str, str]
@@ -267,7 +268,9 @@ def _make_overview(audio: np.ndarray, sr: int, hits: list[Any], max_points: int
         "envelope": [round(float(x), 6) for x in envelope],
         "onsets": [
             {
                 "time_sec": round(float(h.onset_time), 6),
                 "label": h.label,
                 "energy": round(float(h.rms_energy), 6),
                 "cluster_id": int(getattr(h, "cluster_id", -1)),
@@ -283,6 +286,13 @@ def _copy_temp_file(src: str | os.PathLike[str], dst: Path) -> str:
     return str(dst)
 def run_extraction_pipeline(
     audio_path: str | os.PathLike[str],
     output_dir: str | os.PathLike[str],
@@ -400,6 +410,7 @@ def run_extraction_pipeline(
             stage.detail = detail
     sample_rows: list[dict[str, Any]] = []
     files: dict[str, str] = {"stem": "stem.wav"}
     with _timed_stage(stages, "export", progress_cb) as stage:
@@ -421,6 +432,32 @@ def run_extraction_pipeline(
         files["reconstruction"] = "reconstruction.wav"
         files["midi"] = "reconstruction.mid"
         for cluster in sorted(clusters, key=lambda item: item.count, reverse=True):
             best = cluster.best_hit
             sample_path = samples_dir / f"{cluster.label}.wav"
@@ -451,7 +488,7 @@ def run_extraction_pipeline(
             os.unlink(archive_tmp)
         except OSError:
             pass
-        stage.detail = f"{len(sample_rows)} WAVs + MIDI + ZIP"
     duration_sec = time.perf_counter() - started_total
     result = PipelineResult(
@@ -465,6 +502,7 @@ def run_extraction_pipeline(
         cluster_count=len(clusters),
         stages=[asdict(stage) for stage in stages],
         samples=sample_rows,
         overview=_make_overview(stem_audio, stem_sr, hits),
         files=files,
     )

     cluster_count: int
     stages: list[dict[str, Any]]
     samples: list[dict[str, Any]]
+    hits: list[dict[str, Any]]
     overview: dict[str, Any]
     files: dict[str, str]
         "envelope": [round(float(x), 6) for x in envelope],
         "onsets": [
             {
+                "index": int(getattr(h, "index", -1)),
                 "time_sec": round(float(h.onset_time), 6),
+                "duration_sec": round(float(h.duration), 6),
                 "label": h.label,
                 "energy": round(float(h.rms_energy), 6),
                 "cluster_id": int(getattr(h, "cluster_id", -1)),
     return str(dst)
+def _safe_file_component(value: str) -> str:
+    cleaned = "".join(ch if ch.isalnum() or ch in {"-", "_"} else "_" for ch in value.lower())
+    while "__" in cleaned:
+        cleaned = cleaned.replace("__", "_")
+    return cleaned.strip("_") or "item"
 def run_extraction_pipeline(
     audio_path: str | os.PathLike[str],
     output_dir: str | os.PathLike[str],
             stage.detail = detail
     sample_rows: list[dict[str, Any]] = []
+    hit_rows: list[dict[str, Any]] = []
     files: dict[str, str] = {"stem": "stem.wav"}
     with _timed_stage(stages, "export", progress_cb) as stage:
         files["reconstruction"] = "reconstruction.wav"
         files["midi"] = "reconstruction.mid"
+        cluster_labels = {int(cluster.cluster_id): cluster.label for cluster in clusters}
+        representative_ids = {id(cluster.best_hit) for cluster in clusters}
+        review_hits_dir = out / "review" / "hits"
+        if hits:
+            review_hits_dir.mkdir(parents=True, exist_ok=True)
+        for hit in sorted(hits, key=lambda item: item.index):
+            safe_label = _safe_file_component(hit.label or "hit")
+            file_name = f"hit_{int(hit.index):05d}_{safe_label}.wav"
+            rel_file = f"review/hits/{file_name}"
+            hit.save(str(out / rel_file))
+            cluster_id = int(getattr(hit, "cluster_id", -1))
+            hit_rows.append(
+                {
+                    "index": int(hit.index),
+                    "label": hit.label,
+                    "cluster_id": cluster_id,
+                    "cluster_label": cluster_labels.get(cluster_id, "unclustered"),
+                    "is_representative": id(hit) in representative_ids,
+                    "onset_sec": round(float(hit.onset_time), 6),
+                    "duration_ms": round(float(hit.duration * 1000), 1),
+                    "rms_energy": round(float(hit.rms_energy), 6),
+                    "spectral_centroid_hz": round(float(hit.spectral_centroid), 1),
+                    "file": rel_file,
+                }
+            )
         for cluster in sorted(clusters, key=lambda item: item.count, reverse=True):
             best = cluster.best_hit
             sample_path = samples_dir / f"{cluster.label}.wav"
             os.unlink(archive_tmp)
         except OSError:
             pass
+        stage.detail = f"{len(sample_rows)} samples + {len(hit_rows)} review hits + MIDI + ZIP"
     duration_sec = time.perf_counter() - started_total
     result = PipelineResult(
         cluster_count=len(clusters),
         stages=[asdict(stage) for stage in stages],
         samples=sample_rows,
+        hits=hit_rows,
         overview=_make_overview(stem_audio, stem_sr, hits),
         files=files,
     )

scripts/test_sse_and_review_hits.py ADDED Viewed

	@@ -0,0 +1,70 @@

+#!/usr/bin/env python3
+"""Smoke-test SSE progress plus per-hit review artifacts."""
+from __future__ import annotations
+import io
+import json
+import sys
+from pathlib import Path
+import soundfile as sf
+from fastapi.testclient import TestClient
+sys.path.insert(0, str(Path(__file__).resolve().parents[1]))
+from app import app  # noqa: E402
+from synth_generator import generate_test_song  # noqa: E402
+def main() -> int:
+    song = generate_test_song(pattern_name="funk", bars=1, bpm=120, add_bass=False)
+    buf = io.BytesIO()
+    sf.write(buf, song.drums_only, song.sr, format="WAV")
+    buf.seek(0)
+    client = TestClient(app)
+    response = client.post(
+        "/api/jobs",
+        files={"file": ("funk.wav", buf, "audio/wav")},
+        data={"params": json.dumps({"stem": "all", "clustering_mode": "online_preview", "target_min": 2, "target_max": 8})},
+    )
+    response.raise_for_status()
+    job_id = response.json()["id"]
+    final = None
+    with client.stream("GET", f"/api/jobs/{job_id}/events") as stream:
+        stream.raise_for_status()
+        for line in stream.iter_lines():
+            if not line or not line.startswith("data: "):
+                continue
+            payload = json.loads(line[6:])
+            if payload["status"] == "error":
+                raise RuntimeError(payload.get("error"))
+            if payload["status"] == "complete":
+                final = payload
+                break
+    assert final is not None, "SSE stream ended without complete event"
+    hits = final["result"]["hits"]
+    samples = final["result"]["samples"]
+    assert hits, "expected review hit rows"
+    assert samples, "expected representative sample rows"
+    first_hit_url = hits[0]["url"]
+    file_response = client.get(first_hit_url)
+    assert file_response.status_code == 200, first_hit_url
+    assert file_response.content[:4] == b"RIFF", "review hit should be a WAV file"
+    print(json.dumps({
+        "status": final["status"],
+        "job_id": job_id,
+        "hit_count": len(hits),
+        "sample_count": len(samples),
+        "first_hit_url": first_hit_url,
+    }, indent=2))
+    return 0
+if __name__ == "__main__":
+    raise SystemExit(main())

web/app.js CHANGED Viewed

@@ -10,6 +10,9 @@ const fields = [
 let config = null;
 let selectedFile = null;
 let activePoll = null;
 function esc(value) {
   return String(value ?? "").replace(/[&<>'"]/g, (c) => ({ "&": "&amp;", "<": "&lt;", ">": "&gt;", "'": "&#39;", '"': "&quot;" }[c]));
@@ -134,24 +137,104 @@ function drawWaveform(overview) {
   ctx.fill();
   ctx.stroke();
-  ctx.strokeStyle = "rgba(200,165,255,.55)";
-  ctx.lineWidth = 1;
   for (const onset of overview.onsets ?? []) {
     const x = (onset.time_sec / Math.max(overview.duration_sec, 0.001)) * w;
     ctx.beginPath();
-    ctx.moveTo(x, 10);
-    ctx.lineTo(x, h - 10);
     ctx.stroke();
   }
 }
 function renderResult(job) {
   const result = job.result;
   if (!result) return;
   const rtf = Number(result.realtime_factor).toFixed(2);
   const mode = result.params?.clustering_mode ?? "—";
   $("resultSummary").textContent = `${result.hit_count} hits → ${result.cluster_count} samples · BPM ${result.bpm ?? "—"} · ${fmtSec(result.duration_sec)} total · ${rtf}× realtime · ${mode}`;
-  drawWaveform(result.overview);
   const fileUrls = result.file_urls ?? {};
   const labels = { archive: "Sample pack ZIP", midi: "MIDI", stem: "Stem WAV", reconstruction: "Reconstruction WAV" };
@@ -159,18 +242,9 @@ function renderResult(job) {
   $("stemAudio").src = fileUrls.stem ?? "";
   $("reconAudio").src = fileUrls.reconstruction ?? "";
-  const tbody = $("samplesTable").querySelector("tbody");
-  tbody.innerHTML = (result.samples ?? []).map((sample) => `
-    <tr>
-      <td>${esc(sample.label)}</td>
-      <td>${esc(sample.classification)}</td>
-      <td>${esc(sample.hits)}</td>
-      <td>${esc(sample.score)}</td>
-      <td>${esc(sample.duration_ms)} ms</td>
-      <td>${esc(sample.first_onset_sec)} s</td>
-      <td><a href="${esc(sample.url)}" download>WAV</a></td>
-    </tr>
-  `).join("");
 }
 function renderJob(job) {
@@ -201,6 +275,7 @@ function renderHistory(payload) {
   for (const button of $("historyList").querySelectorAll(".history-row")) {
     button.addEventListener("click", async () => {
       const job = await api(`/api/jobs/${button.dataset.jobId}`);
       renderJob(job);
       window.scrollTo({ top: document.body.scrollHeight, behavior: "smooth" });
     });
@@ -216,21 +291,26 @@ async function refreshHistory() {
   }
 }
-async function pollJob(id) {
   if (activePoll) clearInterval(activePoll);
   const tick = async () => {
     try {
       const job = await api(`/api/jobs/${id}`);
       renderJob(job);
       if (["complete", "error"].includes(job.status)) {
-        clearInterval(activePoll);
-        activePoll = null;
         $("runButton").disabled = !selectedFile;
         await refreshHistory();
       }
     } catch (error) {
-      clearInterval(activePoll);
-      activePoll = null;
       $("runButton").disabled = !selectedFile;
       $("resultSummary").textContent = error.message;
     }
@@ -239,8 +319,32 @@ async function pollJob(id) {
   activePoll = setInterval(tick, 800);
 }
 async function runExtraction() {
   if (!selectedFile) return;
   $("runButton").disabled = true;
   $("jobPill").textContent = "uploading";
   $("logs").textContent = "Uploading source and starting extraction…";
@@ -250,7 +354,7 @@ async function runExtraction() {
   try {
     const job = await api("/api/jobs", { method: "POST", body: form });
     renderJob(job);
-    await pollJob(job.id);
     await refreshHistory();
   } catch (error) {
     $("runButton").disabled = false;
@@ -269,6 +373,20 @@ function setFile(file) {
   }
 }
 async function boot() {
   try {
     await api("/api/health");
@@ -308,6 +426,7 @@ $("clearCacheButton").addEventListener("click", async () => {
     $("logs").textContent = error.message;
   }
 });
 const dropzone = $("dropzone");
 for (const eventName of ["dragenter", "dragover"]) {

 let config = null;
 let selectedFile = null;
 let activePoll = null;
+let activeEvents = null;
+let lastResult = null;
+let selectedHitIndex = null;
 function esc(value) {
   return String(value ?? "").replace(/[&<>'"]/g, (c) => ({ "&": "&amp;", "<": "&lt;", ">": "&gt;", "'": "&#39;", '"': "&quot;" }[c]));
   ctx.fill();
   ctx.stroke();
   for (const onset of overview.onsets ?? []) {
     const x = (onset.time_sec / Math.max(overview.duration_sec, 0.001)) * w;
+    const selected = Number(onset.index) === Number(selectedHitIndex);
+    ctx.strokeStyle = selected ? "rgba(255,255,255,.95)" : "rgba(200,165,255,.55)";
+    ctx.lineWidth = selected ? 2.4 : 1;
     ctx.beginPath();
+    ctx.moveTo(x, selected ? 3 : 10);
+    ctx.lineTo(x, selected ? h - 3 : h - 10);
     ctx.stroke();
   }
 }
+function playAudio(el, url) {
+  if (!url) return;
+  el.src = url;
+  el.currentTime = 0;
+  const promise = el.play();
+  if (promise && typeof promise.catch === "function") promise.catch(() => {});
+}
+function selectHit(index, shouldPlay = true) {
+  if (!lastResult) return;
+  const hit = (lastResult.hits ?? []).find((item) => Number(item.index) === Number(index));
+  if (!hit) return;
+  selectedHitIndex = hit.index;
+  $("selectedHitMeta").textContent = `#${hit.index} · ${hit.label} · ${hit.cluster_label} · ${hit.onset_sec}s · ${hit.duration_ms} ms${hit.is_representative ? " · representative" : ""}`;
+  if (shouldPlay) playAudio($("hitAudio"), hit.url);
+  for (const row of document.querySelectorAll("[data-hit-index]")) {
+    row.classList.toggle("selected", Number(row.dataset.hitIndex) === Number(hit.index));
+  }
+  drawWaveform(lastResult.overview);
+}
+function auditionSample(sample) {
+  $("selectedSampleMeta").textContent = `${sample.label} · ${sample.classification} · ${sample.hits} hits · score ${sample.score}`;
+  playAudio($("sampleAudio"), sample.url);
+}
+function renderSamples(result) {
+  const tbody = $("samplesTable").querySelector("tbody");
+  tbody.innerHTML = (result.samples ?? []).map((sample, i) => `
+    <tr data-sample-index="${i}">
+      <td><button class="mini-button" type="button" data-sample-audition="${i}">Audition</button></td>
+      <td>${esc(sample.label)}</td>
+      <td>${esc(sample.classification)}</td>
+      <td>${esc(sample.hits)}</td>
+      <td>${esc(sample.score)}</td>
+      <td>${esc(sample.duration_ms)} ms</td>
+      <td>${esc(sample.first_onset_sec)} s</td>
+      <td><a href="${esc(sample.url)}" download>WAV</a></td>
+    </tr>
+  `).join("");
+  for (const button of tbody.querySelectorAll("[data-sample-audition]")) {
+    button.addEventListener("click", (event) => {
+      event.stopPropagation();
+      const sample = result.samples[Number(button.dataset.sampleAudition)];
+      auditionSample(sample);
+    });
+  }
+}
+function renderHits(result) {
+  const tbody = $("hitsTable").querySelector("tbody");
+  const hits = result.hits ?? [];
+  tbody.innerHTML = hits.map((hit) => `
+    <tr data-hit-index="${esc(hit.index)}" class="${Number(hit.index) === Number(selectedHitIndex) ? "selected" : ""}">
+      <td><button class="mini-button" type="button" data-hit-audition="${esc(hit.index)}">Audition</button></td>
+      <td>${esc(hit.index)}</td>
+      <td>${esc(hit.label)}${hit.is_representative ? " ★" : ""}</td>
+      <td>${esc(hit.cluster_label)}</td>
+      <td>${esc(hit.onset_sec)} s</td>
+      <td>${esc(hit.duration_ms)} ms</td>
+      <td>${esc(hit.rms_energy)}</td>
+      <td><a href="${esc(hit.url)}" download>WAV</a></td>
+    </tr>
+  `).join("");
+  for (const row of tbody.querySelectorAll("[data-hit-index]")) {
+    row.addEventListener("click", () => selectHit(row.dataset.hitIndex));
+  }
+  for (const button of tbody.querySelectorAll("[data-hit-audition]")) {
+    button.addEventListener("click", (event) => {
+      event.stopPropagation();
+      selectHit(button.dataset.hitAudition);
+    });
+  }
+  if (hits.length && selectedHitIndex === null) selectHit(hits[0].index, false);
+}
 function renderResult(job) {
   const result = job.result;
   if (!result) return;
+  lastResult = result;
+  if (!(result.hits ?? []).some((hit) => Number(hit.index) === Number(selectedHitIndex))) {
+    selectedHitIndex = (result.hits ?? [])[0]?.index ?? null;
+  }
   const rtf = Number(result.realtime_factor).toFixed(2);
   const mode = result.params?.clustering_mode ?? "—";
   $("resultSummary").textContent = `${result.hit_count} hits → ${result.cluster_count} samples · BPM ${result.bpm ?? "—"} · ${fmtSec(result.duration_sec)} total · ${rtf}× realtime · ${mode}`;
   const fileUrls = result.file_urls ?? {};
   const labels = { archive: "Sample pack ZIP", midi: "MIDI", stem: "Stem WAV", reconstruction: "Reconstruction WAV" };
   $("stemAudio").src = fileUrls.stem ?? "";
   $("reconAudio").src = fileUrls.reconstruction ?? "";
+  renderSamples(result);
+  renderHits(result);
+  drawWaveform(result.overview);
 }
 function renderJob(job) {
   for (const button of $("historyList").querySelectorAll(".history-row")) {
     button.addEventListener("click", async () => {
       const job = await api(`/api/jobs/${button.dataset.jobId}`);
+      selectedHitIndex = null;
       renderJob(job);
       window.scrollTo({ top: document.body.scrollHeight, behavior: "smooth" });
     });
   }
 }
+function stopWatchers() {
   if (activePoll) clearInterval(activePoll);
+  activePoll = null;
+  if (activeEvents) activeEvents.close();
+  activeEvents = null;
+}
+async function pollJob(id) {
+  stopWatchers();
   const tick = async () => {
     try {
       const job = await api(`/api/jobs/${id}`);
       renderJob(job);
       if (["complete", "error"].includes(job.status)) {
+        stopWatchers();
         $("runButton").disabled = !selectedFile;
         await refreshHistory();
       }
     } catch (error) {
+      stopWatchers();
       $("runButton").disabled = !selectedFile;
       $("resultSummary").textContent = error.message;
     }
   activePoll = setInterval(tick, 800);
 }
+async function watchJob(id) {
+  if (!("EventSource" in window)) return pollJob(id);
+  stopWatchers();
+  return new Promise((resolve) => {
+    activeEvents = new EventSource(`/api/jobs/${id}/events`);
+    activeEvents.addEventListener("job", async (event) => {
+      const job = JSON.parse(event.data);
+      renderJob(job);
+      if (["complete", "error"].includes(job.status)) {
+        stopWatchers();
+        $("runButton").disabled = !selectedFile;
+        await refreshHistory();
+        resolve();
+      }
+    });
+    activeEvents.onerror = () => {
+      stopWatchers();
+      pollJob(id).then(resolve);
+    };
+  });
+}
 async function runExtraction() {
   if (!selectedFile) return;
+  selectedHitIndex = null;
+  lastResult = null;
   $("runButton").disabled = true;
   $("jobPill").textContent = "uploading";
   $("logs").textContent = "Uploading source and starting extraction…";
   try {
     const job = await api("/api/jobs", { method: "POST", body: form });
     renderJob(job);
+    await watchJob(job.id);
     await refreshHistory();
   } catch (error) {
     $("runButton").disabled = false;
   }
 }
+function selectNearestWaveformHit(event) {
+  if (!lastResult?.overview?.onsets?.length) return;
+  const rect = $("waveform").getBoundingClientRect();
+  const ratio = Math.min(1, Math.max(0, (event.clientX - rect.left) / Math.max(1, rect.width)));
+  const time = ratio * Math.max(lastResult.overview.duration_sec, 0.001);
+  let best = null;
+  let bestDelta = Infinity;
+  for (const onset of lastResult.overview.onsets) {
+    const delta = Math.abs(Number(onset.time_sec) - time);
+    if (delta < bestDelta) { best = onset; bestDelta = delta; }
+  }
+  if (best) selectHit(best.index);
+}
 async function boot() {
   try {
     await api("/api/health");
     $("logs").textContent = error.message;
   }
 });
+$("waveform").addEventListener("click", selectNearestWaveformHit);
 const dropzone = $("dropzone");
 for (const eventName of ["dragenter", "dragover"]) {

web/index.html CHANGED Viewed

@@ -175,15 +175,46 @@
             <label>Stem audio<audio id="stemAudio" controls></audio></label>
             <label>Reconstruction<audio id="reconAudio" controls></audio></label>
           </div>
-          <div class="table-wrap">
-            <table id="samplesTable">
-              <thead>
-                <tr>
-                  <th>Sample</th><th>Class</th><th>Hits</th><th>Score</th><th>Duration</th><th>First hit</th><th>File</th>
-                </tr>
-              </thead>
-              <tbody></tbody>
-            </table>
           </div>
         </section>
       </main>

             <label>Stem audio<audio id="stemAudio" controls></audio></label>
             <label>Reconstruction<audio id="reconAudio" controls></audio></label>
           </div>
+          <div class="review-grid">
+            <article class="review-card">
+              <strong>Selected hit</strong>
+              <span id="selectedHitMeta">Click an onset marker or hit row to audition the detected slice.</span>
+              <audio id="hitAudio" controls></audio>
+            </article>
+            <article class="review-card">
+              <strong>Selected sample</strong>
+              <span id="selectedSampleMeta">Click Audition in the sample table to hear the representative sample.</span>
+              <audio id="sampleAudio" controls></audio>
+            </article>
+          </div>
+          <div class="result-columns">
+            <section>
+              <h3>Representative samples</h3>
+              <div class="table-wrap">
+                <table id="samplesTable">
+                  <thead>
+                    <tr>
+                      <th>Audition</th><th>Sample</th><th>Class</th><th>Hits</th><th>Score</th><th>Duration</th><th>First hit</th><th>File</th>
+                    </tr>
+                  </thead>
+                  <tbody></tbody>
+                </table>
+              </div>
+            </section>
+            <section>
+              <h3>Detected hit review</h3>
+              <p class="subtle">Every detected slice is exported under <code>review/hits/</code>. Click rows or waveform markers to audition.</p>
+              <div class="table-wrap hit-table-wrap">
+                <table id="hitsTable">
+                  <thead>
+                    <tr>
+                      <th>Audition</th><th>#</th><th>Label</th><th>Cluster</th><th>Onset</th><th>Duration</th><th>Energy</th><th>File</th>
+                    </tr>
+                  </thead>
+                  <tbody></tbody>
+                </table>
+              </div>
+            </section>
           </div>
         </section>
       </main>

web/styles.css CHANGED Viewed

@@ -86,3 +86,16 @@ tr:last-child td { border-bottom: 0; }
 .history-row span:not(:first-child) { color: #dbe5f7; font-size: 12px; font-variant-numeric: tabular-nums; }
 .empty { color: var(--muted); margin: 0; }
 @media (max-width: 680px) { .history-row { grid-template-columns: 1fr 1fr; } }

 .history-row span:not(:first-child) { color: #dbe5f7; font-size: 12px; font-variant-numeric: tabular-nums; }
 .empty { color: var(--muted); margin: 0; }
 @media (max-width: 680px) { .history-row { grid-template-columns: 1fr 1fr; } }
+h3 { margin: 0 0 10px; font-size: 16px; letter-spacing: -.015em; }
+.subtle { margin: -4px 0 12px; color: var(--muted); font-size: 13px; }
+.review-grid { display: grid; grid-template-columns: repeat(2, minmax(0, 1fr)); gap: 16px; margin: 0 0 18px; }
+.review-card { border: 1px solid var(--line); border-radius: 20px; background: rgba(0,0,0,.16); padding: 14px; }
+.review-card strong, .review-card span { display: block; }
+.review-card span { color: var(--muted); font-size: 13px; margin-top: 5px; line-height: 1.4; }
+.result-columns { display: grid; grid-template-columns: minmax(0, 1fr); gap: 20px; }
+.hit-table-wrap { max-height: 420px; }
+.mini-button { padding: 7px 10px; border-radius: 999px; background: rgba(255,255,255,.08); border: 1px solid var(--line); color: var(--text); font-size: 12px; font-weight: 800; }
+tr.selected td { background: rgba(139,211,255,.12); }
+tr[data-hit-index] { cursor: pointer; }
+tr[data-hit-index]:hover td { background: rgba(255,255,255,.045); }
+@media (max-width: 760px) { .review-grid { grid-template-columns: 1fr; } }