Spaces:

rikhoffbauer2
/

drum-sample-extractor

Sleeping

App Files Files Community

ChatGPT commited on 6 days ago

Commit

b8fa9bf

1 Parent(s): eb1a122

feat: add run history and online clustering

Browse files

Files changed (19) hide show

.gitignore +2 -0
README.md +52 -14
app.py +84 -24
docs/API.md +98 -43
docs/FEATURES.md +58 -0
docs/PIPELINE_TIMING_AND_REALTIME.md +94 -177
docs/PROGRESS.md +63 -0
docs/PROJECT_REVIEW.md +41 -78
docs/REMAINING_WORK.md +24 -16
docs/TASKS.md +54 -0
docs/UI_REPLACEMENT.md +30 -15
docs/benchmark-online-preview.json +273 -0
docs/benchmark-subprocesses.json +90 -293
pipeline_runner.py +88 -21
sample_extractor.py +114 -0
scripts/benchmark_subprocesses.py +7 -5
web/app.js +71 -15
web/index.html +20 -1
web/styles.css +8 -0

.gitignore CHANGED Viewed

@@ -20,3 +20,5 @@ build/
 *.mid
 *.zip
 !drum-sample-extractor-updated.zip

 *.mid
 *.zip
 !drum-sample-extractor-updated.zip
+.cache/

README.md CHANGED Viewed

@@ -10,17 +10,41 @@ pinned: false
 # Drum Sample Extractor
-A custom FastAPI + browser UI for extracting reusable drum samples from an audio file.
-The pipeline can isolate a stem with Demucs, detect onsets, classify hits, cluster similar transients, choose representative samples, optionally synthesize alternate samples, and export WAVs, MIDI, reconstruction audio, and a complete ZIP sample pack.
 ## Current status
-- Gradio has been replaced by a custom web frontend in `web/` served by `app.py`.
-- The extraction pipeline is exposed through a JSON/multipart API and factored into `pipeline_runner.py`.
-- Per-stage timing is captured for every extraction run and written into `manifest.json`.
-- Benchmarking support is available in `scripts/benchmark_subprocesses.py`.
-- Legacy Gradio apps are preserved in `legacy/` for reference only.
 ## Run locally
@@ -33,7 +57,12 @@ uvicorn app:app --host 0.0.0.0 --port 7860
 Open `http://127.0.0.1:7860`.
-For fast iteration, set `Stem` to `all`. That bypasses Demucs and runs onset detection, classification, clustering, representative selection, synthesis, MIDI rendering, and packaging directly on the uploaded audio.
 ## Run benchmarks
@@ -43,13 +72,13 @@ python3 scripts/benchmark_subprocesses.py --runs 2 --bars 4 --output docs/benchm
 The benchmark uses synthetic drum fixtures and `stem=all` so the DSP stages are measured without Demucs model download/runtime noise.
-## API
 ```bash
 curl http://127.0.0.1:7860/api/config
 curl -F 'file=@song.wav' \
-  -F 'params={"stem":"all","target_min":4,"target_max":12}' \
   http://127.0.0.1:7860/api/jobs
 ```
@@ -59,16 +88,22 @@ Then poll the returned job id:
 curl http://127.0.0.1:7860/api/jobs/<job-id>
 ```
 ## Important files
 | Path | Purpose |
 |---|---|
-| `app.py` | FastAPI app, static UI serving, job API, artifact downloads |
-| `pipeline_runner.py` | Timed extraction pipeline used by API and benchmarks |
 | `sample_extractor.py` | Core DSP/sample extraction implementation |
 | `web/` | Custom no-build browser frontend |
 | `scripts/benchmark_subprocesses.py` | Synthetic benchmark runner for stage timings |
-| `docs/` | Review, timing, API, and UI documentation |
 | `legacy/` | Previous Gradio apps retained for reference |
 ## Output per run
@@ -82,4 +117,7 @@ Each run is stored under `.runs/<job-id>/output/`:
 - `samples/*.wav`
 - `manifest.json`
-`.runs/` is ignored by git.

 # Drum Sample Extractor
+A custom FastAPI + browser workstation for extracting reusable drum samples from an audio file.
+The pipeline can isolate a stem with Demucs, detect onsets, classify hits, cluster similar transients, choose representative samples, optionally synthesize alternate samples, and export WAVs, MIDI, reconstruction audio, manifests, and a complete ZIP sample pack.
 ## Current status
+The project is usable as a local/Hugging Face Space application. Gradio is no longer the active UI; the active app is a custom FastAPI backend plus a no-build browser frontend.
+Implemented in the current development pass:
+- Custom web frontend in `web/`, served by `app.py`.
+- FastAPI job API with upload, polling, safe artifact downloads, config, health, cache clearing, and run-history listing.
+- Timed pipeline runner in `pipeline_runner.py`.
+- Per-stage timing in every `manifest.json`.
+- Two clustering modes:
+  - `batch_quality`: all-pairs mel/NCC similarity plus agglomerative clustering.
+  - `online_preview`: prototype-based incremental assignment intended for near-realtime preview.
+- Disk cache for decoded full-mix/stem outputs keyed by source digest and extraction settings.
+- Run history panel indexing `.runs/*/output/manifest.json`.
+- Documentation for features, progress, tasks, API, timing, realtime suitability, UI, and remaining work.
+- Legacy Gradio apps preserved in `legacy/` for reference only.
+Not fully complete yet:
+- No interactive waveform editing of onsets/clusters.
+- No server-sent event stream or websocket progress channel.
+- No frontend TypeScript build/test harness.
+- Demucs remains offline/batch by design.
+See:
+- `docs/FEATURES.md`
+- `docs/TASKS.md`
+- `docs/PROGRESS.md`
+- `docs/REMAINING_WORK.md`
 ## Run locally
 Open `http://127.0.0.1:7860`.
+For fast iteration, set:
+- `Stem = all`
+- `Clustering mode = online_preview`
+That bypasses Demucs and uses the near-realtime clustering path.
 ## Run benchmarks
 The benchmark uses synthetic drum fixtures and `stem=all` so the DSP stages are measured without Demucs model download/runtime noise.
+## API example
 ```bash
 curl http://127.0.0.1:7860/api/config
 curl -F 'file=@song.wav' \
+  -F 'params={"stem":"all","clustering_mode":"online_preview","target_min":4,"target_max":12}' \
   http://127.0.0.1:7860/api/jobs
 ```
 curl http://127.0.0.1:7860/api/jobs/<job-id>
 ```
+List active/completed runs:
+```bash
+curl http://127.0.0.1:7860/api/jobs
+```
 ## Important files
 | Path | Purpose |
 |---|---|
+| `app.py` | FastAPI app, static UI serving, job API, run history, artifact downloads |
+| `pipeline_runner.py` | Timed extraction pipeline, disk stem/source cache, batch/online clustering routing |
 | `sample_extractor.py` | Core DSP/sample extraction implementation |
 | `web/` | Custom no-build browser frontend |
 | `scripts/benchmark_subprocesses.py` | Synthetic benchmark runner for stage timings |
+| `docs/` | Review, timing, API, UI, feature, task, progress, and remaining-work documentation |
 | `legacy/` | Previous Gradio apps retained for reference |
 ## Output per run
 - `samples/*.wav`
 - `manifest.json`
+Generated runtime directories are ignored by git:
+- `.runs/`
+- `.cache/`

app.py CHANGED Viewed

@@ -9,6 +9,7 @@ from __future__ import annotations
 import json
 import shutil
 import traceback
 import uuid
 from concurrent.futures import ThreadPoolExecutor
@@ -22,7 +23,7 @@ from fastapi.middleware.cors import CORSMiddleware
 from fastapi.responses import FileResponse, JSONResponse
 from fastapi.staticfiles import StaticFiles
-from pipeline_runner import PipelineParams, initial_stages, run_extraction_pipeline
 from sample_extractor import DEMUCS_MODELS, DEMUCS_STEMS, cache_clear
 ROOT = Path(__file__).resolve().parent
@@ -30,7 +31,7 @@ WEB_DIR = ROOT / "web"
 RUNS_DIR = ROOT / ".runs"
 RUNS_DIR.mkdir(exist_ok=True)
-app = FastAPI(title="Drum Sample Extractor", version="10.0.0")
 app.add_middleware(
     CORSMiddleware,
     allow_origins=["*"],
@@ -61,6 +62,63 @@ def _serialise_job(job: dict[str, Any]) -> dict[str, Any]:
     return payload
 def _update_job(job_id: str, **patch: Any) -> None:
     with jobs_lock:
         jobs[job_id].update(patch)
@@ -87,7 +145,8 @@ def _run_job(job_id: str) -> None:
             if stage.get("status") == "running":
                 _append_log(job_id, f"Started: {stage['label']}")
             elif stage.get("status") == "done":
-                _append_log(job_id, f"Finished: {stage['label']} in {stage['duration_sec']:.3f}s")
     try:
         result = run_extraction_pipeline(input_path, output_dir, PipelineParams.from_mapping(params), progress_cb=progress)
@@ -109,13 +168,27 @@ def config() -> dict[str, Any]:
         "demucs_stems": {key: value + ["all"] for key, value in DEMUCS_STEMS.items()},
         "defaults": asdict(PipelineParams()),
         "stages": initial_stages(),
     }
 @app.post("/api/cache/clear")
 def clear_cache() -> dict[str, str]:
     cache_clear()
-    return {"status": "cleared"}
 @app.post("/api/jobs")
@@ -142,6 +215,7 @@ async def create_job(file: UploadFile = File(...), params: str = Form("{}")) ->
         "id": job_id,
         "status": "pending",
         "filename": file.filename,
         "params": asdict(validated),
         "stages": initial_stages(),
         "logs": [],
@@ -161,26 +235,12 @@ async def create_job(file: UploadFile = File(...), params: str = Form("{}")) ->
 def get_job(job_id: str) -> dict[str, Any]:
     with jobs_lock:
         job = jobs.get(job_id)
-        if not job:
-            manifest = RUNS_DIR / job_id / "output" / "manifest.json"
-            if manifest.exists():
-                result = json.loads(manifest.read_text(encoding="utf-8"))
-                return _serialise_job(
-                    {
-                        "id": job_id,
-                        "status": "complete",
-                        "filename": None,
-                        "params": result.get("params", {}),
-                        "stages": result.get("stages", []),
-                        "logs": [],
-                        "result": result,
-                        "error": None,
-                        "traceback": None,
-                        "output_dir": str(manifest.parent),
-                    }
-                )
-            raise HTTPException(status_code=404, detail="Job not found")
-        return _serialise_job(dict(job))
 @app.get("/api/jobs/{job_id}/files/{relative_path:path}")

 import json
 import shutil
+import time
 import traceback
 import uuid
 from concurrent.futures import ThreadPoolExecutor
 from fastapi.responses import FileResponse, JSONResponse
 from fastapi.staticfiles import StaticFiles
+from pipeline_runner import PipelineParams, clear_disk_cache, initial_stages, run_extraction_pipeline
 from sample_extractor import DEMUCS_MODELS, DEMUCS_STEMS, cache_clear
 ROOT = Path(__file__).resolve().parent
 RUNS_DIR = ROOT / ".runs"
 RUNS_DIR.mkdir(exist_ok=True)
+app = FastAPI(title="Drum Sample Extractor", version="11.0.0")
 app.add_middleware(
     CORSMiddleware,
     allow_origins=["*"],
     return payload
+def _manifest_path(job_id: str) -> Path:
+    return RUNS_DIR / job_id / "output" / "manifest.json"
+def _read_manifest_job(job_id: str) -> dict[str, Any] | None:
+    manifest = _manifest_path(job_id)
+    if not manifest.exists():
+        return None
+    result = json.loads(manifest.read_text(encoding="utf-8"))
+    return {
+        "id": job_id,
+        "status": "complete",
+        "filename": result.get("source", {}).get("filename"),
+        "params": result.get("params", {}),
+        "stages": result.get("stages", []),
+        "logs": [],
+        "result": result,
+        "error": None,
+        "traceback": None,
+        "output_dir": str(manifest.parent),
+    }
+def _summarise_job(job: dict[str, Any]) -> dict[str, Any]:
+    result = job.get("result") or {}
+    return {
+        "id": job["id"],
+        "status": job.get("status"),
+        "filename": job.get("filename"),
+        "created_at": job.get("created_at"),
+        "duration_sec": result.get("duration_sec"),
+        "audio_duration_sec": result.get("audio_duration_sec"),
+        "realtime_factor": result.get("realtime_factor"),
+        "bpm": result.get("bpm"),
+        "hit_count": result.get("hit_count"),
+        "cluster_count": result.get("cluster_count"),
+        "clustering_mode": (result.get("params") or job.get("params") or {}).get("clustering_mode"),
+        "stem": (result.get("params") or job.get("params") or {}).get("stem"),
+        "error": job.get("error"),
+    }
+def _list_manifest_jobs(limit: int = 50) -> list[dict[str, Any]]:
+    rows: list[dict[str, Any]] = []
+    for manifest in sorted(RUNS_DIR.glob("*/output/manifest.json"), key=lambda p: p.stat().st_mtime, reverse=True):
+        job_id = manifest.parents[1].name
+        manifest_job = _read_manifest_job(job_id)
+        if not manifest_job:
+            continue
+        summary = _summarise_job(manifest_job)
+        summary["created_at"] = manifest.stat().st_mtime
+        rows.append(summary)
+        if len(rows) >= limit:
+            break
+    return rows
 def _update_job(job_id: str, **patch: Any) -> None:
     with jobs_lock:
         jobs[job_id].update(patch)
             if stage.get("status") == "running":
                 _append_log(job_id, f"Started: {stage['label']}")
             elif stage.get("status") == "done":
+                detail = f" · {stage['detail']}" if stage.get("detail") else ""
+                _append_log(job_id, f"Finished: {stage['label']} in {stage['duration_sec']:.3f}s{detail}")
     try:
         result = run_extraction_pipeline(input_path, output_dir, PipelineParams.from_mapping(params), progress_cb=progress)
         "demucs_stems": {key: value + ["all"] for key, value in DEMUCS_STEMS.items()},
         "defaults": asdict(PipelineParams()),
         "stages": initial_stages(),
+        "clustering_modes": {
+            "batch_quality": "Batch quality: all-pairs mel/NCC + agglomerative clustering",
+            "online_preview": "Online preview: prototype assignment for near-realtime feedback",
+        },
     }
 @app.post("/api/cache/clear")
 def clear_cache() -> dict[str, str]:
     cache_clear()
+    clear_disk_cache()
+    return {"status": "cleared", "scope": "memory+disk"}
+@app.get("/api/jobs")
+def list_jobs(limit: int = 50) -> dict[str, Any]:
+    limit = max(1, min(int(limit), 200))
+    with jobs_lock:
+        active = [_summarise_job(dict(job)) for job in jobs.values() if job.get("status") != "complete"]
+    history = _list_manifest_jobs(limit=limit)
+    return {"active": active, "history": history}
 @app.post("/api/jobs")
         "id": job_id,
         "status": "pending",
         "filename": file.filename,
+        "created_at": time.time(),
         "params": asdict(validated),
         "stages": initial_stages(),
         "logs": [],
 def get_job(job_id: str) -> dict[str, Any]:
     with jobs_lock:
         job = jobs.get(job_id)
+        if job:
+            return _serialise_job(dict(job))
+    manifest_job = _read_manifest_job(job_id)
+    if manifest_job:
+        return _serialise_job(manifest_job)
+    raise HTTPException(status_code=404, detail="Job not found")
 @app.get("/api/jobs/{job_id}/files/{relative_path:path}")

docs/API.md CHANGED Viewed

@@ -1,5 +1,7 @@
 # API documentation
 The active app is `app.py`, a FastAPI application.
 ## Start server
@@ -18,12 +20,57 @@ Returns backend health.
 ## `GET /api/config`
-Returns supported models, stems, default pipeline params, and stage definitions.
 ```bash
 curl http://127.0.0.1:7860/api/config
 ```
 ## `POST /api/jobs`
 Creates an extraction job.
@@ -34,14 +81,14 @@ Fields:
 | Field | Type | Required | Description |
 |---|---|---:|---|
-| `file` | file | yes | Audio source |
-| `params` | JSON string | no | Partial or full pipeline params |
 Example:
 ```bash
 curl -F 'file=@song.wav' \
-  -F 'params={"stem":"all","target_min":4,"target_max":12,"synthesize":true}' \
   http://127.0.0.1:7860/api/jobs
 ```
@@ -52,7 +99,7 @@ Response status: `202 Accepted`
   "id": "58ca0db4ac74",
   "status": "pending",
   "filename": "song.wav",
-  "params": {"stem": "all"},
   "stages": [],
   "logs": [],
   "result": null,
@@ -62,32 +109,32 @@ Response status: `202 Accepted`
 ## `GET /api/jobs/{job_id}`
-Poll job status and retrieve results.
 Statuses:
 | Status | Meaning |
 |---|---|
-| `pending` | Job is queued |
-| `running` | Job is executing |
-| `complete` | Result and artifacts are ready |
-| `error` | Pipeline failed; `error` and `traceback` are populated |
 Completed jobs contain:
 | Key | Meaning |
 |---|---|
-| `duration_sec` | Total wall time |
-| `audio_duration_sec` | Duration of processed stem/source |
-| `realtime_factor` | `duration_sec / audio_duration_sec` |
-| `bpm` | Detected tempo |
-| `hit_count` | Number of accepted onsets/hits |
-| `cluster_count` | Number of sample clusters |
-| `stages` | Per-stage timing/status/detail list |
-| `samples` | Sample rows with score, duration, first onset, and download URL |
-| `overview` | Decimated envelope and onset markers for waveform display |
-| `files` | Relative artifact paths |
-| `file_urls` | Direct API URLs for artifacts |
 ## `GET /api/jobs/{job_id}/files/{relative_path}`
@@ -105,36 +152,44 @@ The endpoint prevents path traversal by resolving downloads under `.runs/<job-id
 ## `POST /api/cache/clear`
-Clears the in-memory extraction cache.
 ```bash
 curl -X POST http://127.0.0.1:7860/api/cache/clear
 ```
 ## Pipeline parameters
 Defined in `pipeline_runner.PipelineParams`.
 | Parameter | Default | Meaning |
 |---|---:|---|
-| `stem` | `drums` | Demucs source to extract, or `all` to bypass Demucs |
-| `demucs_model` | `htdemucs_ft` | Demucs model |
-| `demucs_shifts` | `1` | Test-time shifts for Demucs quality/speed tradeoff |
-| `demucs_overlap` | `0.25` | Demucs chunk overlap |
-| `onset_mode` | `auto` | `auto`, `percussive`, `harmonic`, or `broadband` |
-| `onset_delta` | `0.12` | Peak-pick threshold |
-| `energy_threshold_db` | `-35` | RMS gate for accepting hits |
-| `pre_pad` | `0.003` | Seconds of audio before onset |
-| `min_dur` | `0.02` | Minimum hit duration |
-| `max_dur` | `1.5` | Maximum hit duration |
-| `min_gap` | `0.03` | Minimum time between onsets |
-| `ncc_threshold` | `0.80` | Similarity threshold when not targeting cluster count |
-| `attack_ms` | `25` | Transient window used for NCC |
-| `mel_threshold` | `0.75` | Candidate prefilter threshold |
-| `linkage` | `average` | Agglomerative linkage |
-| `target_min` | `5` | Lower cluster target; `0` disables target mode |
-| `target_max` | `20` | Upper cluster target; `0` disables target mode |
-| `synthesize` | `true` | Write synthesized alternates for clusters with multiple hits |
-| `quantize_midi` | `true` | Snap MIDI notes to grid |
-| `subdivision` | `16` | MIDI grid subdivision |
-| `device` | `cpu` | Torch device for Demucs |

 # API documentation
+Last updated: 2026-05-12
 The active app is `app.py`, a FastAPI application.
 ## Start server
 ## `GET /api/config`
+Returns supported models, stems, default pipeline params, stage definitions, and clustering mode labels.
 ```bash
 curl http://127.0.0.1:7860/api/config
 ```
+Important response keys:
+| Key | Meaning |
+|---|---|
+| `demucs_models` | Supported Demucs model names. |
+| `demucs_stems` | Valid stems per model, plus `all` for bypassing Demucs. |
+| `defaults` | Default `PipelineParams`. |
+| `stages` | Pipeline stage definitions. |
+| `clustering_modes` | Human-readable labels for batch and online clustering modes. |
+## `GET /api/jobs`
+Lists active in-memory jobs and completed run manifests found under `.runs/`.
+```bash
+curl http://127.0.0.1:7860/api/jobs?limit=50
+```
+Response:
+```json
+{
+  "active": [],
+  "history": [
+    {
+      "id": "58ca0db4ac74",
+      "status": "complete",
+      "filename": "song.wav",
+      "created_at": 1778540000.0,
+      "duration_sec": 2.4,
+      "audio_duration_sec": 8.0,
+      "realtime_factor": 0.3,
+      "bpm": 120.0,
+      "hit_count": 32,
+      "cluster_count": 8,
+      "clustering_mode": "online_preview",
+      "stem": "all",
+      "error": null
+    }
+  ]
+}
+```
+`created_at` is the manifest file modification time as a Unix timestamp.
 ## `POST /api/jobs`
 Creates an extraction job.
 | Field | Type | Required | Description |
 |---|---|---:|---|
+| `file` | file | yes | Audio source. |
+| `params` | JSON string | no | Partial or full pipeline params. |
 Example:
 ```bash
 curl -F 'file=@song.wav' \
+  -F 'params={"stem":"all","clustering_mode":"online_preview","target_min":4,"target_max":12,"synthesize":true}' \
   http://127.0.0.1:7860/api/jobs
 ```
   "id": "58ca0db4ac74",
   "status": "pending",
   "filename": "song.wav",
+  "params": {"stem": "all", "clustering_mode": "online_preview"},
   "stages": [],
   "logs": [],
   "result": null,
 ## `GET /api/jobs/{job_id}`
+Poll job status and retrieve results. This works for active in-memory jobs and completed historical jobs whose manifest is still present in `.runs/`.
 Statuses:
 | Status | Meaning |
 |---|---|
+| `pending` | Job is queued. |
+| `running` | Job is executing. |
+| `complete` | Result and artifacts are ready. |
+| `error` | Pipeline failed; `error` and `traceback` are populated. |
 Completed jobs contain:
 | Key | Meaning |
 |---|---|
+| `duration_sec` | Total wall time. |
+| `audio_duration_sec` | Duration of processed stem/source. |
+| `realtime_factor` | `duration_sec / audio_duration_sec`. |
+| `bpm` | Detected tempo. |
+| `hit_count` | Number of accepted onsets/hits. |
+| `cluster_count` | Number of sample clusters. |
+| `stages` | Per-stage timing/status/detail list. |
+| `samples` | Sample rows with score, duration, first onset, and download URL. |
+| `overview` | Decimated envelope and onset markers for waveform display. |
+| `files` | Relative artifact paths. |
+| `file_urls` | Direct API URLs for artifacts. |
 ## `GET /api/jobs/{job_id}/files/{relative_path}`
 ## `POST /api/cache/clear`
+Clears the in-memory DSP cache and disk stem/source cache.
 ```bash
 curl -X POST http://127.0.0.1:7860/api/cache/clear
 ```
+Response:
+```json
+{"status":"cleared","scope":"memory+disk"}
+```
 ## Pipeline parameters
 Defined in `pipeline_runner.PipelineParams`.
 | Parameter | Default | Meaning |
 |---|---:|---|
+| `stem` | `drums` | Demucs source to extract, or `all` to bypass Demucs. |
+| `demucs_model` | `htdemucs_ft` | Demucs model. |
+| `demucs_shifts` | `1` | Test-time shifts for Demucs quality/speed tradeoff. |
+| `demucs_overlap` | `0.25` | Demucs chunk overlap. |
+| `onset_mode` | `auto` | `auto`, `percussive`, `harmonic`, or `broadband`. |
+| `onset_delta` | `0.12` | Peak-pick threshold. |
+| `energy_threshold_db` | `-35` | RMS gate for accepting hits. |
+| `pre_pad` | `0.003` | Seconds of audio before onset. |
+| `min_dur` | `0.02` | Minimum hit duration. |
+| `max_dur` | `1.5` | Maximum hit duration. |
+| `min_gap` | `0.03` | Minimum time between onsets. |
+| `ncc_threshold` | `0.80` | Similarity threshold. Also used by online clustering assignment. |
+| `attack_ms` | `25` | Transient window used for NCC/prototypes. |
+| `mel_threshold` | `0.75` | Candidate prefilter threshold. For online mode, lower values such as `0.62` are useful. |
+| `linkage` | `average` | Agglomerative linkage for `batch_quality`. |
+| `clustering_mode` | `batch_quality` | `batch_quality` or `online_preview`. |
+| `target_min` | `5` | Lower cluster target; `0` disables target mode in batch mode. |
+| `target_max` | `20` | Upper cluster target; `0` disables target/cap mode. |
+| `synthesize` | `true` | Write synthesized alternates for clusters with multiple hits. |
+| `quantize_midi` | `true` | Snap MIDI notes to grid. |
+| `subdivision` | `16` | MIDI grid subdivision. |
+| `device` | `cpu` | Torch device for Demucs. |
+| `use_disk_cache` | `true` | Cache decoded full mix/stems by source digest and extraction settings. |

docs/FEATURES.md ADDED Viewed

	@@ -0,0 +1,58 @@

+# Feature inventory
+Last updated: 2026-05-12
+## Product goal
+Turn an input audio file into a practical drum sample pack: detected hits, grouped sample classes, representative WAVs, optional synthesized alternates, MIDI reconstruction, rendered reconstruction audio, and an inspectable manifest.
+## Implemented features
+| Area | Feature | Status | Notes |
+|---|---|---:|---|
+| UI | Custom browser frontend | Implemented | `web/index.html`, `web/styles.css`, `web/app.js`; no Gradio dependency in active app. |
+| UI | Drag/drop audio upload | Implemented | Uses multipart upload to `POST /api/jobs`. |
+| UI | Source preview | Implemented | Browser `<audio>` preview before extraction. |
+| UI | Pipeline controls | Implemented | Stem/model/onset/clustering/MIDI/synthesis/cache controls. |
+| UI | Live-ish progress | Implemented | Polls stage state and logs every 800 ms. |
+| UI | Waveform/onset overview | Implemented | Canvas envelope plus onset markers from `manifest.json`. |
+| UI | Result downloads | Implemented | ZIP, MIDI, stem WAV, reconstruction WAV, individual sample WAVs. |
+| UI | Run history browser | Implemented | Lists completed `.runs/*/output/manifest.json` entries and reloads results. |
+| API | Health/config | Implemented | `GET /api/health`, `GET /api/config`. |
+| API | Job creation/polling | Implemented | `POST /api/jobs`, `GET /api/jobs/{id}`. |
+| API | Run listing | Implemented | `GET /api/jobs` returns active and completed runs. |
+| API | Safe artifact serving | Implemented | Path traversal is blocked by resolved output-root checks. |
+| API | Cache clear | Implemented | Clears in-memory DSP cache and disk stem/source cache. |
+| Pipeline | Demucs stem extraction | Implemented | Offline/batch stage; not advertised as realtime. |
+| Pipeline | Stem/full-mix disk cache | Implemented | Keyed by source SHA-256 plus stem/model/shifts/overlap/device. |
+| Pipeline | BPM detection | Implemented | `librosa` onset/beat based estimate. |
+| Pipeline | SuperFlux-style onset detection | Implemented | Multi-band auto mode plus percussive/harmonic/broadband modes. |
+| Pipeline | Hit classification | Implemented | Rule-based spectral class labels. |
+| Pipeline | Batch quality clustering | Implemented | Mel prefilter + transient NCC + agglomerative clustering. |
+| Pipeline | Online preview clustering | Implemented | Prototype-based incremental assignment for near-realtime feedback. |
+| Pipeline | Representative selection | Implemented | Quality score picks best hit per cluster. |
+| Pipeline | Optional synthesis | Implemented | Weighted aligned average for multi-hit clusters. |
+| Pipeline | MIDI export | Implemented | Quantized or unquantized reconstruction MIDI. |
+| Pipeline | Reconstruction render | Implemented | Renders MIDI-like reconstruction using selected samples. |
+| Pipeline | Sample pack ZIP | Implemented | Includes WAVs, index JSON, MIDI, rendered reconstruction. |
+| Docs | Project review | Implemented | `docs/PROJECT_REVIEW.md`. |
+| Docs | Timing/realtime analysis | Implemented | `docs/PIPELINE_TIMING_AND_REALTIME.md`. |
+| Docs | API docs | Implemented | `docs/API.md`. |
+| Docs | UI replacement docs | Implemented | `docs/UI_REPLACEMENT.md`. |
+| Docs | Feature/task/progress tracking | Implemented | This file, `TASKS.md`, `PROGRESS.md`. |
+## Partially implemented features
+| Area | Feature | Current state | Needed to call it complete |
+|---|---|---|---|
+| Progress | Stage progress | Shows stage boundaries and logs | Add lower-level progress inside Demucs and clustering. |
+| Realtime | Online clustering | Implemented as batch-invoked prototype assignment | Add streaming/incremental audio analysis API for true realtime preview. |
+| Run history | Manifest browser | Lists and reloads completed runs | Add side-by-side comparison and filtering/search. |
+| Editing | Review workflow | Displays waveform and samples | Add click-to-audition hits, onset editing, cluster merge/split, label reassignment. |
+| Frontend quality | No-build JavaScript UI | Good enough for local app | Convert to TypeScript once interaction model stabilizes. |
+## Explicit non-goals for this pass
+- Realtime Demucs. It is not realistic for this use-case and should remain offline/cached.
+- Perfect source separation. Stem quality depends on model choice and input material.
+- Full DAW/sample-editor UX. This pass creates the workstation foundation; detailed editing is next.

docs/PIPELINE_TIMING_AND_REALTIME.md CHANGED Viewed

@@ -1,214 +1,131 @@
-# Pipeline timing and near-real-time analysis
-## Measurement setup
-Benchmarks were run with `scripts/benchmark_subprocesses.py` using synthetic drum fixtures from `synth_generator.py`.
-Important constraints:
-- `stem=all` was used to bypass Demucs and measure the DSP/sample-extraction subprocesses directly.
-- The script performs one warm-up run first, so import/JIT overhead is not included in the summary.
-- Runs used 4 bars at 120 BPM across `rock`, `funk`, and `halftime` synthetic patterns.
-- The benchmark output is stored in `docs/benchmark-subprocesses.json`.
-## Measured subprocess lengths
-| Stage | Mean seconds | Median seconds | Min seconds | Max seconds |
-|---|---:|---:|---:|---:|
-| `stem` | 0.017 | 0.013 | 0.009 | 0.039 |
-| `bpm` | 0.224 | 0.223 | 0.206 | 0.241 |
-| `onsets` | 2.140 | 2.034 | 1.762 | 2.871 |
-| `classification` | 0.034 | 0.035 | 0.024 | 0.045 |
-| `clustering` | 0.496 | 0.597 | 0.059 | 0.913 |
-| `selection` | 0.499 | 0.551 | 0.311 | 0.651 |
-| `synthesis` | 0.002 | 0.002 | 0.002 | 0.003 |
-| `export` | 0.105 | 0.103 | 0.046 | 0.178 |
-Observed total runtime for warm synthetic 4-bar fixtures was roughly `0.30×–0.43×` realtime when Demucs was bypassed. In plain terms: the pure extraction stages ran faster than the audio duration on these fixtures. The first cold run can be much slower because librosa/scipy/numba-style initialization costs are paid up front.
 ## Significant subprocesses
-### 1. Stem extraction / source load
-Current implementation:
-- `stem=all`: load and normalize the source audio with librosa.
-- any other stem: run Demucs via `demucs.pretrained.get_model` and `demucs.apply.apply_model`.
-Timing profile:
-- `stem=all` is near-instant after warm-up on short fixtures.
-- Demucs is the offline bottleneck and should be treated as non-realtime in this project.
-Real-time suitability: **No for Demucs, yes for direct source load.**
-Recommended strategy:
-- Keep Demucs as an explicit offline preprocessing stage.
-- Cache stem output by content hash and model parameters.
-- Let users bypass Demucs for drum loops, already-separated stems, and iterative parameter tuning.
-### 2. BPM / tempo detection
-Current implementation:
-- `librosa.onset.onset_strength`
-- `librosa.feature.tempo`
-- beat-track sanity adjustment
-Timing profile:
-- Measured around 0.22 s for ~9 s synthetic clips after warm-up.
-Real-time suitability: **Near-realtime with buffering.**
-A live version should estimate tempo over rolling windows and refine continuously. It does not need the entire file, but short windows can be unstable.
-### 3. Onset detection + slicing
-Current implementation:
-- Multiband SuperFlux-style onset envelope in `auto` mode.
-- Optional percussive/harmonic/broadband modes.
-- Peak picking and hit slicing by onset-to-next-onset boundaries.
-- Energy threshold and duration filtering.
-Timing profile:
-- This is the largest non-Demucs DSP stage in the measured benchmark: about 2.14 s mean for ~9 s fixtures.
-- It is still faster than realtime in warm synthetic tests.
-Real-time suitability: **Yes, with a rolling window and bounded lookahead.**
-Why:
-- Onset strength and peak picking are local-window operations.
-- Backtracking and next-onset slicing require a small amount of future context.
-- A live system can emit provisional hits and finalize durations once the next onset or max-duration cutoff arrives.
-### 4. Spectral rule classification
-Current implementation:
-- STFT per hit.
-- Low/mid/high energy ratios.
-- Spectral centroid, zero-crossing rate, duration rules.
-Timing profile:
-- Measured around 34 ms mean for the benchmark fixtures.
-Real-time suitability: **Yes.**
-This is cheap per hit and can run immediately after a hit segment is finalized.
-### 5. Mel fingerprinting + transient NCC clustering
-Current implementation:
-- Build mel fingerprints for hits.
-- Use cosine similarity as a prefilter.
-- Compute transient normalized cross-correlation only for candidate pairs.
-- Run agglomerative clustering on the resulting precomputed distance matrix.
-- Optionally merge singleton clusters into nearby multi-hit clusters.
-Timing profile:
-- Measured around 0.50 s mean, but depends strongly on number of hits and pair count.
-- Complexity is roughly quadratic in hit count for pairwise similarity, with mel prefiltering reducing NCC work.
-Real-time suitability: **Partially.**
-What can be realtime:
-- Mel fingerprint extraction per hit.
-- Transient NCC against a bounded set of existing cluster representatives.
-- Online assignment to existing clusters.
-What is not truly realtime in the current implementation:
-- Full agglomerative clustering over the complete distance matrix.
-- Target cluster count search through repeated clustering.
-Recommended live design:
-1. Maintain cluster prototypes: representative transient, mel centroid, count, label histogram.
-2. For each finalized hit, compute fingerprint and compare to prototypes first.
-3. Only run transient NCC against likely candidates.
-4. Assign immediately when above threshold; create a new cluster otherwise.
-5. Periodically run batch reclustering in the background to clean up early mistakes.
-### 6. Best representative selection
-Current implementation:
-- Compute sample quality score per candidate hit.
-- Choose highest-scoring hit per cluster.
-Timing profile:
-- Measured around 0.50 s mean in the benchmark.
-- Cost scales with number of hits and quality scoring work.
-Real-time suitability: **Yes as an incremental update.**
-A live version can maintain the current best hit per cluster and only rescore new arrivals or candidates whose cluster changed.
-### 7. Optional synthesis
-Current implementation:
-- Align cluster members by peak position.
-- Normalize and weighted-average hits to create an alternate synthesized sample.
-Timing profile:
-- Measured around 2 ms mean on benchmark fixtures.
-Real-time suitability: **Yes for small clusters, but better as deferred polish.**
-It is fast, but users usually do not need synthesized alternates before cluster membership stabilizes.
-### 8. Export: MIDI, reconstruction, WAVs, ZIP
-Current implementation:
-- Build MIDI notes from hits and cluster sample notes.
-- Render reconstruction with representative samples.
-- Write samples, reconstruction audio, MIDI, archive, and manifest.
-Timing profile:
-- Measured around 0.10 s mean on benchmark fixtures.
-Real-time suitability: **No for ZIP packaging; yes for preview rendering chunks.**
-The final ZIP is a completion artifact. Reconstruction can be rendered progressively for UI preview.
-## Real-time feasibility summary
-| Subprocess | Current batch status | Near-real-time feasibility | Notes |
-|---|---|---|---|
-| Source load | Fast | Yes | Direct file/stream decode is not the bottleneck |
-| Demucs stem separation | Slow/offline | No | Keep offline and cached |
-| BPM detection | Buffered batch | Partial | Rolling estimate works, exact tempo should refine over time |
-| Onset detection | Batch but local-window | Yes | Needs bounded lookahead/backtracking |
-| Hit slicing | Depends on next onset | Yes | Emit provisional segment, finalize on next onset/max duration |
-| Rule classification | Per-hit | Yes | Cheap and stateless |
-| Mel fingerprinting | Per-hit | Yes | Compute once per finalized hit |
-| Transient NCC | Pairwise batch | Partial | Realtime against prototypes; batch all-pairs is not realtime |
-| Agglomerative clustering | Batch | No | Replace or complement with online prototype assignment |
-| Representative selection | Batch per cluster | Yes | Keep best-so-far per cluster |
-| Synthesis | Batch per cluster | Partial | Can update lazily after cluster changes |
-| MIDI/reconstruction preview | Batch export | Partial | Preview can stream; final MIDI is a completion artifact |
-| ZIP packaging | Final artifact | No | Keep as final step |
-## Recommended next technical move
-Implement a second clustering mode named `online`:
-```text
-onset event → segment finalized → classify → mel fingerprint → candidate prototypes → transient NCC → assign/create cluster → update best representative → UI update
-```
-Keep the existing agglomerative mode as `batch-quality`. Use online mode for immediate feedback and batch mode for final high-quality export.

+# Pipeline timing and realtime suitability
+Last updated: 2026-05-12
+## Measurement scope
+The timing benchmark in `docs/benchmark-subprocesses.json` measures synthetic drum fixtures with:
+- `stem=all`, so Demucs is bypassed.
+- Warm process after import/model-library initialization.
+- Synthetic rock/funk/halftime fixtures generated by `synth_generator.py`.
+- `scripts/benchmark_subprocesses.py` as the benchmark driver.
+This isolates the sample-extraction subprocesses from source-separation noise. Demucs timing depends heavily on model, hardware, track length, first-run downloads, and CPU/GPU availability, so it is analyzed separately.
 ## Significant subprocesses
+| Subprocess | Current implementation | Timing behavior | Realtime suitability |
+|---|---|---|---|
+| Source load / stem extraction | `extract_stem`; full mix via `librosa`, stems via Demucs | Full mix is usually small; Demucs dominates full jobs | Full mix: near-realtime. Demucs: no. |
+| BPM detection | `detect_bpm` using onset envelope and beat tracking | Usually sub-second for short fixtures | Near-realtime with buffering; not critical path. |
+| Onset detection + slicing | `detect_onsets` multi-band SuperFlux-style envelope | Often the largest pure-DSP stage | Near-realtime with bounded lookahead. |
+| Classification | Rule-based spectral analysis per hit | Fast relative to onset/clustering | Near-realtime. |
+| Batch clustering | Mel fingerprints + transient NCC + agglomerative clustering | Pairwise/batch; scales poorly with many hits | Not realtime. Final-quality batch mode. |
+| Online clustering | Prototype assignment per hit | Scales with hit count × cluster count | Near-realtime preview path. |
+| Representative selection | Scores each candidate hit | Moderate for many clusters/hits | Near-realtime for moderate hit counts. |
+| Synthesis | Weighted aligned average per multi-hit cluster | Usually small | Near-realtime for moderate clusters. |
+| Export/package | WAV/MIDI/render/ZIP writes | Disk-bound; ZIP is batch finalization | Not meaningful as realtime; finalization step. |
+## Current benchmark summary
+The checked-in benchmark files were refreshed on 2026-05-12 with synthetic 2-bar fixtures and Demucs bypassed:
+- `docs/benchmark-subprocesses.json`: `batch_quality` clustering.
+- `docs/benchmark-online-preview.json`: `online_preview` clustering.
+| Stage | Batch quality mean | Online preview mean |
+|---|---:|---:|
+| source load | 0.011 s | 0.012 s |
+| BPM detection | 0.185 s | 0.163 s |
+| onset detection + slicing | 1.943 s | 1.834 s |
+| classification | 0.019 s | 0.017 s |
+| clustering | 0.148 s | 0.045 s |
+| representative selection | 0.204 s | 0.115 s |
+| synthesis | 0.001 s | 0.001 s |
+| export/package | 0.156 s | 0.221 s |
+On these small fixtures, `online_preview` reduced clustering time by about 3× compared with `batch_quality`. The total run is still dominated by onset detection, so the next realtime optimization target is streaming/incremental onset analysis rather than only clustering.
+First cold runs can be much slower because imports and library initialization are paid up front.
+## Batch quality versus online preview clustering
+### `batch_quality`
+Current final-quality clustering path:
+1. Compute mel fingerprint for each hit.
+2. Compute pairwise mel cosine prefilter.
+3. Compute transient NCC only for candidate pairs.
+4. Build distance matrix.
+5. Run agglomerative clustering.
+6. Optionally merge singleton clusters.
+This gives better global grouping, but it is fundamentally batch-oriented because it wants the full similarity matrix.
+### `online_preview`
+Current near-realtime-oriented clustering path:
+1. Process hits in onset order.
+2. Compute one mel fingerprint and one transient per hit.
+3. Compare the new hit against existing cluster prototypes.
+4. Assign it to the best prototype or create a new cluster until the target cap is reached.
+5. Update prototype fingerprints/transients using energy-weighted rolling averages.
+Complexity is roughly `O(number_of_hits × number_of_clusters)`, not `O(number_of_hits²)`, and does not require future hits before producing a current assignment. It is suitable for progressive preview and fast iteration, but it is not guaranteed to match the global batch clustering result.
+## What can run in or near realtime
+These can be performed progressively with small buffers:
+- Source decode for already-separated/full-mix audio.
+- Onset envelope computation.
+- Peak picking with bounded lookahead.
+- Hit slicing once enough tail audio is buffered.
+- Rule-based classification.
+- Mel fingerprint extraction.
+- Online prototype clustering.
+- Representative preview selection.
+- Basic reconstruction preview.
+## What should stay offline/batch
+- Demucs source separation.
+- All-pairs transient NCC for large hit sets.
+- Agglomerative clustering.
+- Final ZIP packaging.
+- Full high-quality rerender/export.
+## Recommended runtime strategy
+| Phase | Mode | Purpose |
+|---|---|---|
+| Upload / first pass | `stem=all`, `clustering_mode=online_preview` | Fast inspection and parameter tuning. |
+| Final extraction from full mix/stem | `stem=all`, `clustering_mode=batch_quality` | Better grouping without source separation. |
+| Final extraction from full song | `stem=drums`, `clustering_mode=batch_quality`, disk cache on | Best quality with offline Demucs cost paid once. |
+## Disk cache impact
+Disk cache now stores decoded full mix or Demucs stem output under `.cache/stems/`, keyed by:
+- Source SHA-256.
+- Stem name.
+- Demucs model.
+- Demucs shifts.
+- Demucs overlap.
+- Device/decode mode.
+This does not make Demucs realtime, but it prevents repeated source separation work when retuning onset/clustering parameters for the same source and stem settings.
+## Remaining realtime work
+The current `online_preview` mode is invoked by the batch job API after onset detection. To make the application genuinely realtime/progressive, add:
+1. A streaming/ranged audio analysis API.
+2. Incremental onset detector state.
+3. Incremental hit artifact writing.
+4. SSE progress/results stream.
+5. UI that appends hits/clusters as they arrive.
+6. Optional final `batch_quality` consolidation pass.

docs/PROGRESS.md ADDED Viewed

	@@ -0,0 +1,63 @@

+# Progress log
+Last updated: 2026-05-12
+## Pass 1: project review, timing, and Gradio replacement
+Completed:
+1. Inspected the original project structure and active Gradio entrypoints.
+2. Moved previous Gradio interfaces into `legacy/`.
+3. Created `pipeline_runner.py` as the timed orchestration layer.
+4. Created `app.py` as a FastAPI backend.
+5. Created a custom no-build browser frontend under `web/`.
+6. Added stage timing to each extraction run.
+7. Added synthetic benchmarking via `scripts/benchmark_subprocesses.py`.
+8. Added initial docs for project review, timing/realtime, API, UI, and remaining work.
+Outcome:
+The application became usable without Gradio and produced per-run manifests/artifacts.
+## Pass 2: feature ledger and continued development
+Completed in this pass:
+1. Added first-class docs for features, tasks, and progress.
+2. Added `GET /api/jobs` for active/completed run listing.
+3. Added run-history UI panel that indexes `.runs/*/output/manifest.json`.
+4. Added disk caching for decoded full mix and Demucs stem outputs.
+5. Extended cache clearing to remove both memory and disk cache.
+6. Added `clustering_mode` pipeline parameter.
+7. Added `online_preview` clustering using prototype assignment.
+8. Added frontend controls for clustering mode and disk cache.
+9. Fixed duplicate sample writes in `sample_extractor.build_archive`.
+10. Updated README and docs to reflect the new state.
+Outcome:
+The project now has a clearer product surface: final-quality batch extraction, faster online-style preview clustering, persistent run history, and explicit docs tracking what is done versus still missing.
+## Current assessment
+The application is not “fully complete” as an editing workstation, but it is substantially implemented as an extraction workstation. The remaining gaps are concentrated around interactive correction/editing, richer progress streaming, run comparison, and frontend engineering hardening.
+## Next recommended pass
+Implement the editing loop:
+1. Click waveform onset marker or sample table row to audition.
+2. Show selected hit metadata and audio snippet.
+3. Allow onset shift, label change, cluster reassignment, merge, and split.
+4. Re-export without rerunning Demucs/onset detection when only grouping changes.
+5. Save edit decisions into the manifest.
+## Validation performed in this pass
+- Compiled active Python files with `python3 -m py_compile app.py pipeline_runner.py sample_extractor.py scripts/*.py`.
+- Ran FastAPI smoke job through `scripts/test_api_job.py`.
+- Ran an online-preview API smoke job with synthetic audio.
+- Verified `GET /api/jobs` history output and `POST /api/cache/clear` behavior.
+- Refreshed batch and online benchmark JSON files:
+  - `docs/benchmark-subprocesses.json`
+  - `docs/benchmark-online-preview.json`

docs/PROJECT_REVIEW.md CHANGED Viewed

@@ -1,89 +1,52 @@
 # Project review
-## Goal
-Review the uploaded drum sample extractor, identify architectural and UX gaps, replace the Gradio UI with a custom frontend, and document the extraction pipeline with timing and real-time feasibility notes.
-## Success checklist
-- The active app is no longer Gradio-based.
-- The core extraction process is callable independently of the UI.
-- Every significant extraction subprocess is timed.
-- Runtime artifacts are stable and downloadable.
-- Documentation explains current behavior, tradeoffs, and remaining work.
-- Legacy files are preserved but not part of the active path.
-## Existing project structure before changes
-The archive contained a compact Python project:
-| File | Role |
-|---|---|
-| `app.py` | Active Gradio UI, parameter controls, extraction, eval, optimization tabs |
-| `app_v2.py` | Older Gradio UI variant |
-| `sample_extractor.py` | Current extraction pipeline: Demucs/load, SuperFlux onsets, rule labels, mel+NCC clustering, MIDI/export |
-| `drum_extractor.py` | Older CLI-oriented pipeline with CLAP-era comments and broader experimental code |
-| `synth_generator.py` | Synthetic drum fixture generator |
-| `evaluation.py` | Ground-truth matching and scoring |
-| `optimizer.py`, `optimizer_v2.py` | Parameter search experiments |
-| `quality_metrics.py` | Completeness, cleanness, onset, reference metrics |
-| `config_store.py` | Config persistence and leaderboard helpers |
-## Key findings
-1. `sample_extractor.py` is the right core to keep. It is compact, stage-oriented, and already exposes most of the operations needed by a proper app/API.
-2. `app.py` mixed UI code, runtime hotfixing, file conversion, extraction orchestration, and artifact packaging. That made it hard to test or replace the UI.
-3. The previous Gradio UI was fast to build but not ideal for this use-case: extraction is a staged process with logs, timing, waveform review, downloadable artifacts, and a dense parameter surface that benefits from a purpose-built layout.
-4. The previous `app.py` patched `sample_extractor.py` at runtime to fix `_sf(..., lag=2)` vs `_sf(..., l=2)`. The underlying bug is now fixed directly in `sample_extractor.py`.
-5. There was no meaningful project documentation, no API documentation, and no benchmark/timing documentation.
-6. `requirements.txt` still treated Gradio as first-class. The active app now uses FastAPI; Gradio dependencies have been moved to `requirements-legacy-gradio.txt`.
-7. `.runs/`, generated audio, MIDI, ZIP files, and local caches needed explicit ignore rules.
-## Changes made
-| Area | Change |
 |---|---|
-| Active UI | Replaced Gradio with `app.py` FastAPI + custom static frontend in `web/` |
-| Pipeline | Added `pipeline_runner.py` with validated params, stage timing, progress callbacks, manifests, and artifact writing |
-| Legacy | Moved old Gradio apps into `legacy/` |
-| Bugfix | Fixed the `_sf(yh, lag=2, ms=5)` keyword mismatch in `sample_extractor.py` |
-| API | Added job creation, polling, config, health, cache clear, and safe artifact download endpoints |
-| UX | Added drag/drop upload, dense controls, stage timeline, logs, waveform/onset overview, audio previews, sample table, downloads |
-| Benchmarking | Added `scripts/benchmark_subprocesses.py` and committed benchmark output JSON |
-| Packaging | Added Dockerfile, updated requirements, added `.gitignore` |
-| Docs | Added project review, timing/real-time analysis, API docs, UI notes, and remaining work |
-## Current architecture
-```text
-browser UI in web/
-        │
-        ▼
-FastAPI app.py
-        │
-        ▼
-pipeline_runner.py
-        │
-        ▼
-sample_extractor.py + quality_metrics.py
-        │
-        ▼
-.runs/<job-id>/output/{samples, MIDI, WAV, ZIP, manifest.json}
-```
-The UI only talks to the API. The API only calls the timed runner. The runner is now independently testable and usable from scripts.
-## Risks and limitations
-- Demucs can dominate runtime and may require a model download on first use.
-- The current job store is in-memory. Completed jobs can be reloaded from `manifest.json`, but queued/running job state is lost on process restart.
-- The clustering implementation is still batch-oriented. It can be optimized or adapted incrementally, but current agglomerative clustering is not a streaming algorithm.
-- There is no authentication or quota control; this is intended as a local/Hugging Face style app, not a public multi-tenant service.
-- The browser UI is currently no-build static JavaScript/CSS. That is intentional for deployability, but a larger UI should eventually move to TypeScript with a real component/test setup.
-## Verification performed
-- Python syntax compilation for `app.py`, `pipeline_runner.py`, `sample_extractor.py`, and benchmark scripts.
-- FastAPI `TestClient` checks for `/`, `/api/health`, and `/api/config`.
-- End-to-end API job test using a synthetic drum fixture with `stem=all`.
-- Synthetic subprocess benchmark across rock, funk, and halftime patterns.

 # Project review
+Last updated: 2026-05-12
+## Summary
+The project has evolved from a Gradio-driven prototype into a usable FastAPI + custom frontend extraction workstation. The core DSP pipeline is still compact and script-oriented, but it now has a clearer boundary between API/UI orchestration (`app.py`), timed pipeline execution (`pipeline_runner.py`), and lower-level sample extraction (`sample_extractor.py`).
+## What is strong
+1. **Useful core pipeline**: stem extraction, onset detection, classification, clustering, selection, synthesis, MIDI rendering, and packaging are all present.
+2. **Small deployable surface**: active runtime is FastAPI plus static files; no frontend build is required.
+3. **Good local iteration path**: `stem=all` bypasses Demucs for fast tuning.
+4. **Per-stage timing**: every job manifest records stage durations and details.
+5. **Artifacts are explicit**: stem WAV, reconstruction WAV, MIDI, sample WAVs, ZIP, and manifest are written per run.
+6. **Legacy preservation**: old Gradio apps remain available under `legacy/` without being active.
+7. **New near-realtime path**: `online_preview` clustering gives a practical alternative to all-pairs batch clustering.
+## Main risks
+1. **Interactive editing is missing**: users can inspect outputs but cannot correct onsets or cluster decisions in the UI yet.
+2. **Job state is process-local**: active jobs disappear from memory on restart; completed history is recovered from manifests only.
+3. **Progress is stage-level**: Demucs and clustering do not expose fine-grained progress.
+4. **Frontend is plain JavaScript**: good for speed, weaker for long-term maintainability than TypeScript modules/tests.
+5. **Demucs cost remains dominant**: source separation is necessarily offline; disk cache mitigates repeated runs but not first-run latency.
+6. **DSP code is dense**: `sample_extractor.py` is effective but would benefit from smaller modules and stronger tests.
+## Development decisions made
+| Decision | Rationale |
 |---|---|
+| Replace Gradio with FastAPI/static UI | More control over workflow, layout, artifacts, and progress display. |
+| Keep no-build frontend for now | Fastest robust replacement; avoids adding Node/Vite just to ship the first custom UI. |
+| Preserve Gradio in `legacy/` | Avoids data loss and gives reference behavior. |
+| Add `pipeline_runner.py` | Keeps API orchestration separate from DSP primitives. |
+| Add disk cache in pipeline layer | Avoids invasive Demucs changes and caches both full mix and stems. |
+| Add `online_preview` rather than replacing batch clustering | Preserves final-quality path while adding a near-realtime option. |
+## Current implementation quality
+| Area | Rating | Notes |
+|---|---:|---|
+| Extraction functionality | Good | Core path works on synthetic tests. |
+| UI/UX foundation | Good | Custom flow is much better than generic Gradio controls. |
+| Realtime architecture | Partial | Online clustering exists; streaming onset/audio pipeline does not. |
+| Documentation | Good | Feature/task/progress/API/timing docs are now embedded. |
+| Test coverage | Basic | Smoke tests exist; no formal unit/browser tests yet. |
+| Maintainability | Medium | Better boundaries now, but DSP module remains dense. |
+## Recommendation
+Next development should not add more global parameters. It should add an editing loop: audition detected hits, manually fix bad onsets, merge/split clusters, relabel samples, then repack from edited state without rerunning expensive stages.

docs/REMAINING_WORK.md CHANGED Viewed

@@ -1,27 +1,35 @@
 # Remaining work
-## Highest value next steps
-1. **Online clustering mode**: add prototype-based incremental clustering for immediate feedback, while keeping agglomerative clustering as the final-quality batch mode.
-2. **Run history**: index `.runs/*/output/manifest.json` so prior runs are browsable and comparable in the UI.
-3. **Waveform editing**: add hit audition, onset adjustment, cluster merge/split, and label reassignment.
-4. **Demucs caching**: persist stem cache on disk by input digest + model + stem + shifts + overlap.
-5. **True progress reporting**: expose lower-level progress inside Demucs and pairwise clustering, not only stage transitions.
-6. **Benchmark panel**: add an in-app benchmark view that can run synthetic fixtures and compare parameter profiles.
-7. **Frontend test harness**: move the no-build UI to TypeScript once the interaction model stabilizes.
 ## Known constraints
-- Demucs is not a realtime stage and should stay explicitly offline.
-- Agglomerative clustering is a batch algorithm; it should not be sold as realtime.
 - First run on a fresh environment can be slower due to imports, model download, and library initialization.
 - The current job queue is process-local and single-worker. That is fine for local use, but not enough for a shared public deployment.
 ## Suggested implementation order
-1. Add disk cache for source decode/stem separation.
-2. Add run history index and UI browser.
-3. Add hit audition from `overview.onsets` and sample rows.
-4. Implement online prototype clustering.
-5. Add comparison mode between two job manifests.
-6. Add SSE log/progress streaming.

 # Remaining work
+Last updated: 2026-05-12
+## Current gap assessment
+The project is now a usable extraction workstation, not a complete interactive sample editor. The largest remaining gaps are UX/editor capabilities rather than core batch extraction.
+## Highest-priority remaining gaps
+1. **Hit audition and selection**: clicking an onset marker or sample row should audition that exact hit/sample.
+2. **Waveform editing**: add onset adjustment, delete/add hit, and rerun-from-edited-onsets without redoing Demucs.
+3. **Cluster editing**: allow merge, split, relabel, and manual reassignment of hits.
+4. **Run comparison**: compare two manifests side-by-side for parameter tuning.
+5. **Progress streaming**: replace polling or supplement it with SSE for lower-latency logs/progress.
+6. **Frontend engineering hardening**: migrate the frontend to TypeScript after the UX stabilizes and add browser-level tests.
+7. **Benchmark panel**: add an in-app benchmark view that can run synthetic fixtures and compare parameter profiles.
 ## Known constraints
+- Demucs is not a realtime stage and should stay explicitly offline/cached.
+- Batch agglomerative clustering is not realtime; `online_preview` is the progressive clustering path.
 - First run on a fresh environment can be slower due to imports, model download, and library initialization.
 - The current job queue is process-local and single-worker. That is fine for local use, but not enough for a shared public deployment.
+- Run history is filesystem-backed via `.runs/`; deleting `.runs/` deletes history.
 ## Suggested implementation order
+1. Add click-to-audition for sample table rows and waveform onsets.
+2. Store detected hit snippets as individual review artifacts or expose ranged audio endpoints.
+3. Add edit state to manifests: deleted hits, shifted onsets, labels, cluster overrides.
+4. Add rerender/repack endpoint that starts from edited hit/cluster state.
+5. Add run comparison view.
+6. Add SSE progress streaming.
+7. Convert frontend to TypeScript and add UI tests.

docs/TASKS.md ADDED Viewed

	@@ -0,0 +1,54 @@

+# Task ledger
+Last updated: 2026-05-12
+## User-requested tasks
+| Task | Status | Evidence |
+|---|---:|---|
+| Review the project | Done | `docs/PROJECT_REVIEW.md`. |
+| Determine length of significant subprocesses | Done | `pipeline_runner.py`, `scripts/benchmark_subprocesses.py`, `docs/benchmark-subprocesses.json`, `docs/PIPELINE_TIMING_AND_REALTIME.md`. |
+| Identify near-realtime subprocesses | Done | `docs/PIPELINE_TIMING_AND_REALTIME.md`. |
+| Add documentation to project | Done | `docs/*.md`, updated `README.md`. |
+| Replace Gradio UI | Done | Active app is FastAPI + custom web UI; Gradio moved to `legacy/`. |
+| Document features, tasks, and progress | Done | `docs/FEATURES.md`, this file, `docs/PROGRESS.md`. |
+| Continue development while keeping docs up-to-date | In progress | This pass adds run history, disk cache, online clustering mode, and docs updates. |
+## Completed implementation tasks
+- [x] Preserve old Gradio apps in `legacy/`.
+- [x] Expose extraction as a FastAPI job API.
+- [x] Serve a custom browser UI from `web/`.
+- [x] Add per-stage timing to the pipeline.
+- [x] Write per-run `manifest.json`.
+- [x] Add synthetic benchmark script.
+- [x] Add API documentation.
+- [x] Add UI replacement documentation.
+- [x] Add project review and realtime analysis documentation.
+- [x] Add run-history listing endpoint: `GET /api/jobs`.
+- [x] Add run-history UI panel.
+- [x] Add disk cache for stem/full-mix loads.
+- [x] Extend cache clearing to disk cache.
+- [x] Add prototype-based `online_preview` clustering mode.
+- [x] Add UI controls for clustering mode and disk cache.
+- [x] Fix duplicate sample writes in `build_archive`.
+- [x] Add feature, task, and progress docs.
+## Validation tasks
+- [x] Python compile check for active Python files.
+- [x] FastAPI smoke test for health/config/job flow.
+- [x] Pipeline smoke test on synthetic audio.
+- [x] API history/cache smoke test.
+- [x] Git status reviewed before packaging.
+- [x] Project archive excludes `.runs/`, `.cache/`, and dependency folders.
+## Remaining high-value tasks
+- [ ] Add click-to-audition onset markers and table rows.
+- [ ] Add onset adjustment and rerun-from-onsets flow.
+- [ ] Add cluster merge/split/relabel workflow.
+- [ ] Add side-by-side run comparison.
+- [ ] Add SSE progress stream for lower-latency updates.
+- [ ] Convert frontend to TypeScript with a small Vite build once UX stabilizes.
+- [ ] Add automated browser-level UI tests.

docs/UI_REPLACEMENT.md CHANGED Viewed

@@ -1,26 +1,30 @@
 # Custom UI replacement
 ## What changed
-The active interface is now a custom browser UI served from `web/` by the FastAPI app in `app.py`. The old Gradio files were moved to `legacy/`.
 ## UX goals
 1. Make the process feel like a sample-extraction workstation, not a generic notebook form.
-2. Keep upload, controls, pipeline status, logs, waveform review, audio previews, downloads, and sample rows visible without tab hunting.
 3. Show stage timing as a first-class result, because extraction quality and speed tradeoffs matter.
 4. Make `stem=all` obvious for fast iteration when Demucs is unnecessary.
-5. Keep the frontend deployable without a JavaScript build step.
 ## UI structure
 | Area | Purpose |
 |---|---|
-| Hero/status | Backend readiness and product framing |
-| Source panel | Drag/drop upload and source audio preview |
-| Controls panel | Stem, onset, clustering, MIDI, and synthesis parameters |
-| Pipeline panel | Stage statuses, durations, and live logs |
-| Result panel | Summary, waveform/onsets, downloads, stem/reconstruction audio, sample table |
 ## Frontend implementation
@@ -32,11 +36,11 @@ Files:
 The frontend uses modern browser APIs directly:
-- `fetch` for API calls
-- `FormData` for upload
-- `<audio>` for previews
-- `<canvas>` for waveform/onset visualization
-- CSS grid, responsive layout, custom properties, and backdrop filters for layout/polish
 No Gradio runtime, iframe, or generated UI framework is involved.
@@ -50,6 +54,17 @@ The frontend creates a job with `POST /api/jobs`, then polls `GET /api/jobs/{id}
 - reconstruction WAV
 - individual sample WAVs
 ## Why polling instead of websockets/SSE
 Polling is the simplest robust option here because the current pipeline is CPU-heavy and mostly stage-based. The UI polls every 800 ms, which is enough to show stage transitions and logs without introducing websocket lifecycle complexity.
@@ -62,5 +77,5 @@ Future improvement: use Server-Sent Events for lower-latency log streaming once
 - Add inline controls for reassigning sample labels and merging/splitting clusters.
 - Add A/B comparison between parameter runs.
 - Add downloadable timing report per job.
-- Add persistent run history browser for `.runs/`.
-- Add online clustering mode for near-realtime progressive preview.

 # Custom UI replacement
+Last updated: 2026-05-12
 ## What changed
+The active interface is a custom browser UI served from `web/` by the FastAPI app in `app.py`. The old Gradio files live in `legacy/` and are no longer used by the active application.
 ## UX goals
 1. Make the process feel like a sample-extraction workstation, not a generic notebook form.
+2. Keep upload, controls, pipeline status, logs, waveform review, audio previews, downloads, run history, and sample rows visible without tab hunting.
 3. Show stage timing as a first-class result, because extraction quality and speed tradeoffs matter.
 4. Make `stem=all` obvious for fast iteration when Demucs is unnecessary.
+5. Make `online_preview` obvious as the near-realtime clustering path.
+6. Keep the frontend deployable without a JavaScript build step until the interaction model stabilizes.
 ## UI structure
 | Area | Purpose |
 |---|---|
+| Hero/status | Backend readiness and product framing. |
+| Source panel | Drag/drop upload and source audio preview. |
+| Controls panel | Stem, onset, clustering, MIDI, synthesis, and disk-cache parameters. |
+| Pipeline panel | Stage statuses, durations, and logs. |
+| Run history panel | Loads completed manifests from `.runs/`. |
+| Result panel | Summary, waveform/onsets, downloads, stem/reconstruction audio, sample table. |
 ## Frontend implementation
 The frontend uses modern browser APIs directly:
+- `fetch` for API calls.
+- `FormData` for upload.
+- `<audio>` for previews.
+- `<canvas>` for waveform/onset visualization.
+- CSS grid, responsive layout, custom properties, and backdrop filters for layout/polish.
 No Gradio runtime, iframe, or generated UI framework is involved.
 - reconstruction WAV
 - individual sample WAVs
+The run history panel calls `GET /api/jobs` and can reload any completed manifest still present under `.runs/`.
+## Clustering UX
+Two modes are exposed:
+| Mode | UX intent |
+|---|---|
+| `batch_quality` | Slower, final-quality clustering using all-pairs similarity plus agglomerative clustering. |
+| `online_preview` | Faster near-realtime-style clustering using prototype assignment. Best for quick iteration after bypassing Demucs. |
 ## Why polling instead of websockets/SSE
 Polling is the simplest robust option here because the current pipeline is CPU-heavy and mostly stage-based. The UI polls every 800 ms, which is enough to show stage transitions and logs without introducing websocket lifecycle complexity.
 - Add inline controls for reassigning sample labels and merging/splitting clusters.
 - Add A/B comparison between parameter runs.
 - Add downloadable timing report per job.
+- Add filters/search to the run history browser.
+- Convert the frontend to TypeScript when the UX stops moving quickly.

docs/benchmark-online-preview.json ADDED Viewed

	@@ -0,0 +1,273 @@

+{
+  "clustering_mode": "online_preview",
+  "runs": [
+    {
+      "pattern": "rock",
+      "bars": 2,
+      "bpm": 120.0,
+      "run_index": 0,
+      "clustering_mode": "online_preview",
+      "audio_duration_sec": 4.75,
+      "total_duration_sec": 2.394493,
+      "realtime_factor": 0.504104,
+      "hit_count": 14,
+      "cluster_count": 10,
+      "stages": [
+        {
+          "key": "stem",
+          "label": "Stem extraction / source load",
+          "duration_sec": 0.01333964500008733,
+          "status": "done",
+          "detail": "loaded full mix \u00b7 cached"
+        },
+        {
+          "key": "bpm",
+          "label": "Tempo detection",
+          "duration_sec": 0.18073730900005103,
+          "status": "done",
+          "detail": "120.2 BPM"
+        },
+        {
+          "key": "onsets",
+          "label": "Onset detection + slicing",
+          "duration_sec": 1.8083914959997855,
+          "status": "done",
+          "detail": "14 hits"
+        },
+        {
+          "key": "classification",
+          "label": "Spectral rule classification",
+          "duration_sec": 0.015553790000012668,
+          "status": "done",
+          "detail": "bright:5, hihat_open:8, kick:1"
+        },
+        {
+          "key": "clustering",
+          "label": "Mel fingerprint + transient NCC clustering",
+          "duration_sec": 0.01717499700021108,
+          "status": "done",
+          "detail": "10 clusters \u00b7 online preview"
+        },
+        {
+          "key": "selection",
+          "label": "Best representative scoring",
+          "duration_sec": 0.06853683399981492,
+          "status": "done",
+          "detail": "quality-scored representatives"
+        },
+        {
+          "key": "synthesis",
+          "label": "Optional sample synthesis",
+          "duration_sec": 0.0004338460000781197,
+          "status": "done",
+          "detail": "2 synthesized alternates"
+        },
+        {
+          "key": "export",
+          "label": "MIDI, reconstruction, WAV, ZIP export",
+          "duration_sec": 0.2898033520000354,
+          "status": "done",
+          "detail": "10 WAVs + MIDI + ZIP"
+        }
+      ]
+    },
+    {
+      "pattern": "funk",
+      "bars": 2,
+      "bpm": 120.0,
+      "run_index": 0,
+      "clustering_mode": "online_preview",
+      "audio_duration_sec": 4.874989,
+      "total_duration_sec": 2.422223,
+      "realtime_factor": 0.496867,
+      "hit_count": 30,
+      "cluster_count": 12,
+      "stages": [
+        {
+          "key": "stem",
+          "label": "Stem extraction / source load",
+          "duration_sec": 0.012654803000032189,
+          "status": "done",
+          "detail": "loaded full mix \u00b7 cached"
+        },
+        {
+          "key": "bpm",
+          "label": "Tempo detection",
+          "duration_sec": 0.10868702200014013,
+          "status": "done",
+          "detail": "120.2 BPM"
+        },
+        {
+          "key": "onsets",
+          "label": "Onset detection + slicing",
+          "duration_sec": 1.7981390029999602,
+          "status": "done",
+          "detail": "30 hits"
+        },
+        {
+          "key": "classification",
+          "label": "Spectral rule classification",
+          "duration_sec": 0.020911717999979373,
+          "status": "done",
+          "detail": "bright:12, cymbal:2, hihat_closed:9, hihat_open:3, kick:1, mid:3"
+        },
+        {
+          "key": "clustering",
+          "label": "Mel fingerprint + transient NCC clustering",
+          "duration_sec": 0.08173960800013447,
+          "status": "done",
+          "detail": "12 clusters \u00b7 online preview"
+        },
+        {
+          "key": "selection",
+          "label": "Best representative scoring",
+          "duration_sec": 0.18588780100003532,
+          "status": "done",
+          "detail": "quality-scored representatives"
+        },
+        {
+          "key": "synthesis",
+          "label": "Optional sample synthesis",
+          "duration_sec": 0.001146163000157685,
+          "status": "done",
+          "detail": "6 synthesized alternates"
+        },
+        {
+          "key": "export",
+          "label": "MIDI, reconstruction, WAV, ZIP export",
+          "duration_sec": 0.21253995300003226,
+          "status": "done",
+          "detail": "12 WAVs + MIDI + ZIP"
+        }
+      ]
+    },
+    {
+      "pattern": "halftime",
+      "bars": 2,
+      "bpm": 120.0,
+      "run_index": 0,
+      "clustering_mode": "online_preview",
+      "audio_duration_sec": 4.874989,
+      "total_duration_sec": 2.406563,
+      "realtime_factor": 0.493655,
+      "hit_count": 28,
+      "cluster_count": 12,
+      "stages": [
+        {
+          "key": "stem",
+          "label": "Stem extraction / source load",
+          "duration_sec": 0.009107656999958635,
+          "status": "done",
+          "detail": "loaded full mix \u00b7 cached"
+        },
+        {
+          "key": "bpm",
+          "label": "Tempo detection",
+          "duration_sec": 0.19882379599994238,
+          "status": "done",
+          "detail": "118.8 BPM"
+        },
+        {
+          "key": "onsets",
+          "label": "Onset detection + slicing",
+          "duration_sec": 1.8942657120001059,
+          "status": "done",
+          "detail": "28 hits"
+        },
+        {
+          "key": "classification",
+          "label": "Spectral rule classification",
+          "duration_sec": 0.015083428000025378,
+          "status": "done",
+          "detail": "bright:5, cymbal:2, hihat_closed:19, hihat_open:2"
+        },
+        {
+          "key": "clustering",
+          "label": "Mel fingerprint + transient NCC clustering",
+          "duration_sec": 0.036892447000127504,
+          "status": "done",
+          "detail": "12 clusters \u00b7 online preview"
+        },
+        {
+          "key": "selection",
+          "label": "Best representative scoring",
+          "duration_sec": 0.0908485570000721,
+          "status": "done",
+          "detail": "quality-scored representatives"
+        },
+        {
+          "key": "synthesis",
+          "label": "Optional sample synthesis",
+          "duration_sec": 0.0007993310000529164,
+          "status": "done",
+          "detail": "4 synthesized alternates"
+        },
+        {
+          "key": "export",
+          "label": "MIDI, reconstruction, WAV, ZIP export",
+          "duration_sec": 0.1602465889998257,
+          "status": "done",
+          "detail": "12 WAVs + MIDI + ZIP"
+        }
+      ]
+    }
+  ],
+  "summary": [
+    {
+      "stage": "stem",
+      "mean_sec": 0.011701,
+      "median_sec": 0.012655,
+      "min_sec": 0.009108,
+      "max_sec": 0.01334
+    },
+    {
+      "stage": "bpm",
+      "mean_sec": 0.162749,
+      "median_sec": 0.180737,
+      "min_sec": 0.108687,
+      "max_sec": 0.198824
+    },
+    {
+      "stage": "onsets",
+      "mean_sec": 1.833599,
+      "median_sec": 1.808391,
+      "min_sec": 1.798139,
+      "max_sec": 1.894266
+    },
+    {
+      "stage": "classification",
+      "mean_sec": 0.017183,
+      "median_sec": 0.015554,
+      "min_sec": 0.015083,
+      "max_sec": 0.020912
+    },
+    {
+      "stage": "clustering",
+      "mean_sec": 0.045269,
+      "median_sec": 0.036892,
+      "min_sec": 0.017175,
+      "max_sec": 0.08174
+    },
+    {
+      "stage": "selection",
+      "mean_sec": 0.115091,
+      "median_sec": 0.090849,
+      "min_sec": 0.068537,
+      "max_sec": 0.185888
+    },
+    {
+      "stage": "synthesis",
+      "mean_sec": 0.000793,
+      "median_sec": 0.000799,
+      "min_sec": 0.000434,
+      "max_sec": 0.001146
+    },
+    {
+      "stage": "export",
+      "mean_sec": 0.220863,
+      "median_sec": 0.21254,
+      "min_sec": 0.160247,
+      "max_sec": 0.289803
+    }
+  ]
+}

docs/benchmark-subprocesses.json CHANGED Viewed

@@ -1,138 +1,141 @@
 {
   "runs": [
     {
       "pattern": "rock",
-      "bars": 4,
       "bpm": 120.0,
       "run_index": 0,
-      "audio_duration_sec": 8.75,
-      "total_duration_sec": 2.594698,
-      "realtime_factor": 0.296537,
-      "hit_count": 28,
-      "cluster_count": 1,
       "stages": [
         {
           "key": "stem",
           "label": "Stem extraction / source load",
-          "duration_sec": 0.014633260999971753,
           "status": "done",
-          "detail": "loaded full mix"
         },
         {
           "key": "bpm",
           "label": "Tempo detection",
-          "duration_sec": 0.23692302500001006,
           "status": "done",
           "detail": "120.2 BPM"
         },
         {
           "key": "onsets",
           "label": "Onset detection + slicing",
-          "duration_sec": 1.762329765000004,
           "status": "done",
-          "detail": "28 hits"
         },
         {
           "key": "classification",
           "label": "Spectral rule classification",
-          "duration_sec": 0.02908633100003044,
           "status": "done",
-          "detail": "bright:9, cymbal:1, hihat_closed:1, hihat_open:15, mid:2"
         },
         {
           "key": "clustering",
           "label": "Mel fingerprint + transient NCC clustering",
-          "duration_sec": 0.05944011799999771,
           "status": "done",
-          "detail": "1 clusters"
         },
         {
           "key": "selection",
           "label": "Best representative scoring",
-          "duration_sec": 0.31093429700001707,
           "status": "done",
           "detail": "quality-scored representatives"
         },
         {
           "key": "synthesis",
           "label": "Optional sample synthesis",
-          "duration_sec": 0.0028187070000171843,
           "status": "done",
-          "detail": "1 synthesized alternates"
         },
         {
           "key": "export",
           "label": "MIDI, reconstruction, WAV, ZIP export",
-          "duration_sec": 0.1779485609999938,
           "status": "done",
-          "detail": "1 WAVs + MIDI + ZIP"
         }
       ]
     },
     {
       "pattern": "funk",
-      "bars": 4,
       "bpm": 120.0,
       "run_index": 0,
-      "audio_duration_sec": 8.874989,
-      "total_duration_sec": 3.790648,
-      "realtime_factor": 0.427116,
-      "hit_count": 53,
       "cluster_count": 2,
       "stages": [
         {
           "key": "stem",
           "label": "Stem extraction / source load",
-          "duration_sec": 0.009321340000042255,
           "status": "done",
-          "detail": "loaded full mix"
         },
         {
           "key": "bpm",
           "label": "Tempo detection",
-          "duration_sec": 0.23110938799999303,
           "status": "done",
           "detail": "161.5 BPM"
         },
         {
           "key": "onsets",
           "label": "Onset detection + slicing",
-          "duration_sec": 2.1605432889999747,
           "status": "done",
-          "detail": "53 hits"
         },
         {
           "key": "classification",
           "label": "Spectral rule classification",
-          "duration_sec": 0.04475730899997643,
           "status": "done",
-          "detail": "bright:25, hihat_closed:18, hihat_open:7, mid:3"
         },
         {
           "key": "clustering",
           "label": "Mel fingerprint + transient NCC clustering",
-          "duration_sec": 0.6768225310000275,
           "status": "done",
-          "detail": "2 clusters"
         },
         {
           "key": "selection",
           "label": "Best representative scoring",
-          "duration_sec": 0.559724416999984,
           "status": "done",
           "detail": "quality-scored representatives"
         },
         {
           "key": "synthesis",
           "label": "Optional sample synthesis",
-          "duration_sec": 0.0024601989999837315,
           "status": "done",
           "detail": "2 synthesized alternates"
         },
         {
           "key": "export",
           "label": "MIDI, reconstruction, WAV, ZIP export",
-          "duration_sec": 0.10532420399999864,
           "status": "done",
           "detail": "2 WAVs + MIDI + ZIP"
         }
@@ -140,337 +143,131 @@
     },
     {
       "pattern": "halftime",
-      "bars": 4,
       "bpm": 120.0,
       "run_index": 0,
-      "audio_duration_sec": 8.874989,
-      "total_duration_sec": 3.701891,
-      "realtime_factor": 0.417115,
-      "hit_count": 66,
-      "cluster_count": 2,
-      "stages": [
-        {
-          "key": "stem",
-          "label": "Stem extraction / source load",
-          "duration_sec": 0.009298575000002529,
-          "status": "done",
-          "detail": "loaded full mix"
-        },
-        {
-          "key": "bpm",
-          "label": "Tempo detection",
-          "duration_sec": 0.21581650399997443,
-          "status": "done",
-          "detail": "120.2 BPM"
-        },
-        {
-          "key": "onsets",
-          "label": "Onset detection + slicing",
-          "duration_sec": 1.9768937550000487,
-          "status": "done",
-          "detail": "66 hits"
-        },
-        {
-          "key": "classification",
-          "label": "Spectral rule classification",
-          "duration_sec": 0.03783250899999757,
-          "status": "done",
-          "detail": "bright:11, cymbal:2, hihat_closed:48, hihat_open:5"
-        },
-        {
-          "key": "clustering",
-          "label": "Mel fingerprint + transient NCC clustering",
-          "duration_sec": 0.7498706449999872,
-          "status": "done",
-          "detail": "2 clusters"
-        },
-        {
-          "key": "selection",
-          "label": "Best representative scoring",
-          "duration_sec": 0.6169061510000233,
-          "status": "done",
-          "detail": "quality-scored representatives"
-        },
-        {
-          "key": "synthesis",
-          "label": "Optional sample synthesis",
-          "duration_sec": 0.0028750459999855593,
-          "status": "done",
-          "detail": "2 synthesized alternates"
-        },
-        {
-          "key": "export",
-          "label": "MIDI, reconstruction, WAV, ZIP export",
-          "duration_sec": 0.09185817900004167,
-          "status": "done",
-          "detail": "2 WAVs + MIDI + ZIP"
-        }
-      ]
-    },
-    {
-      "pattern": "rock",
-      "bars": 4,
-      "bpm": 120.0,
-      "run_index": 1,
-      "audio_duration_sec": 8.75,
-      "total_duration_sec": 2.848686,
-      "realtime_factor": 0.325564,
-      "hit_count": 24,
-      "cluster_count": 1,
-      "stages": [
-        {
-          "key": "stem",
-          "label": "Stem extraction / source load",
-          "duration_sec": 0.03869248300003392,
-          "status": "done",
-          "detail": "loaded full mix"
-        },
-        {
-          "key": "bpm",
-          "label": "Tempo detection",
-          "duration_sec": 0.24107510999999704,
-          "status": "done",
-          "detail": "120.2 BPM"
-        },
-        {
-          "key": "onsets",
-          "label": "Onset detection + slicing",
-          "duration_sec": 2.0721967459999746,
-          "status": "done",
-          "detail": "24 hits"
-        },
-        {
-          "key": "classification",
-          "label": "Spectral rule classification",
-          "duration_sec": 0.024016725000024053,
-          "status": "done",
-          "detail": "bright:7, hihat_closed:2, hihat_open:15"
-        },
-        {
-          "key": "clustering",
-          "label": "Mel fingerprint + transient NCC clustering",
-          "duration_sec": 0.05910233800000242,
-          "status": "done",
-          "detail": "1 clusters"
-        },
-        {
-          "key": "selection",
-          "label": "Best representative scoring",
-          "duration_sec": 0.3106304350000073,
-          "status": "done",
-          "detail": "quality-scored representatives"
-        },
-        {
-          "key": "synthesis",
-          "label": "Optional sample synthesis",
-          "duration_sec": 0.0015013799999792354,
-          "status": "done",
-          "detail": "1 synthesized alternates"
-        },
-        {
-          "key": "export",
-          "label": "MIDI, reconstruction, WAV, ZIP export",
-          "duration_sec": 0.10095534999999245,
-          "status": "done",
-          "detail": "1 WAVs + MIDI + ZIP"
-        }
-      ]
-    },
-    {
-      "pattern": "funk",
-      "bars": 4,
-      "bpm": 120.0,
-      "run_index": 1,
-      "audio_duration_sec": 8.874989,
-      "total_duration_sec": 3.416797,
-      "realtime_factor": 0.384992,
-      "hit_count": 52,
       "cluster_count": 3,
       "stages": [
         {
           "key": "stem",
           "label": "Stem extraction / source load",
-          "duration_sec": 0.011181277999980921,
           "status": "done",
-          "detail": "loaded full mix"
         },
         {
           "key": "bpm",
           "label": "Tempo detection",
-          "duration_sec": 0.20633040499996014,
           "status": "done",
           "detail": "120.2 BPM"
         },
         {
           "key": "onsets",
           "label": "Onset detection + slicing",
-          "duration_sec": 1.9962494719999881,
           "status": "done",
-          "detail": "52 hits"
         },
         {
           "key": "classification",
           "label": "Spectral rule classification",
-          "duration_sec": 0.03461634600000707,
           "status": "done",
-          "detail": "bright:23, cymbal:3, hihat_closed:15, hihat_open:8, mid:3"
         },
         {
           "key": "clustering",
           "label": "Mel fingerprint + transient NCC clustering",
-          "duration_sec": 0.51767344000001,
           "status": "done",
-          "detail": "3 clusters"
         },
         {
           "key": "selection",
           "label": "Best representative scoring",
-          "duration_sec": 0.5431782379999959,
           "status": "done",
           "detail": "quality-scored representatives"
         },
         {
           "key": "synthesis",
           "label": "Optional sample synthesis",
-          "duration_sec": 0.001988787999948727,
           "status": "done",
           "detail": "3 synthesized alternates"
         },
         {
           "key": "export",
           "label": "MIDI, reconstruction, WAV, ZIP export",
-          "duration_sec": 0.10504587100001572,
           "status": "done",
           "detail": "3 WAVs + MIDI + ZIP"
         }
       ]
-    },
-    {
-      "pattern": "halftime",
-      "bars": 4,
-      "bpm": 120.0,
-      "run_index": 1,
-      "audio_duration_sec": 8.874989,
-      "total_duration_sec": 4.750472,
-      "realtime_factor": 0.535265,
-      "hit_count": 64,
-      "cluster_count": 1,
-      "stages": [
-        {
-          "key": "stem",
-          "label": "Stem extraction / source load",
-          "duration_sec": 0.016472632999978032,
-          "status": "done",
-          "detail": "loaded full mix"
-        },
-        {
-          "key": "bpm",
-          "label": "Tempo detection",
-          "duration_sec": 0.2141354419999857,
-          "status": "done",
-          "detail": "120.2 BPM"
-        },
-        {
-          "key": "onsets",
-          "label": "Onset detection + slicing",
-          "duration_sec": 2.8706004370000073,
-          "status": "done",
-          "detail": "64 hits"
-        },
-        {
-          "key": "classification",
-          "label": "Spectral rule classification",
-          "duration_sec": 0.036172296999950504,
-          "status": "done",
-          "detail": "bright:11, cymbal:2, hihat_closed:45, hihat_open:4, mid:2"
-        },
-        {
-          "key": "clustering",
-          "label": "Mel fingerprint + transient NCC clustering",
-          "duration_sec": 0.9130003360000387,
-          "status": "done",
-          "detail": "1 clusters"
-        },
-        {
-          "key": "selection",
-          "label": "Best representative scoring",
-          "duration_sec": 0.6508792970000172,
-          "status": "done",
-          "detail": "quality-scored representatives"
-        },
-        {
-          "key": "synthesis",
-          "label": "Optional sample synthesis",
-          "duration_sec": 0.0025003810000043813,
-          "status": "done",
-          "detail": "1 synthesized alternates"
-        },
-        {
-          "key": "export",
-          "label": "MIDI, reconstruction, WAV, ZIP export",
-          "duration_sec": 0.04621197200003735,
-          "status": "done",
-          "detail": "1 WAVs + MIDI + ZIP"
-        }
-      ]
     }
   ],
   "summary": [
     {
       "stage": "stem",
-      "mean_sec": 0.0166,
-      "median_sec": 0.012907,
-      "min_sec": 0.009299,
-      "max_sec": 0.038692
     },
     {
       "stage": "bpm",
-      "mean_sec": 0.224232,
-      "median_sec": 0.223463,
-      "min_sec": 0.20633,
-      "max_sec": 0.241075
     },
     {
       "stage": "onsets",
-      "mean_sec": 2.139802,
-      "median_sec": 2.034223,
-      "min_sec": 1.76233,
-      "max_sec": 2.8706
     },
     {
       "stage": "classification",
-      "mean_sec": 0.034414,
-      "median_sec": 0.035394,
-      "min_sec": 0.024017,
-      "max_sec": 0.044757
     },
     {
       "stage": "clustering",
-      "mean_sec": 0.495985,
-      "median_sec": 0.597248,
-      "min_sec": 0.059102,
-      "max_sec": 0.913
     },
     {
       "stage": "selection",
-      "mean_sec": 0.498709,
-      "median_sec": 0.551451,
-      "min_sec": 0.31063,
-      "max_sec": 0.650879
     },
     {
       "stage": "synthesis",
-      "mean_sec": 0.002357,
-      "median_sec": 0.00248,
-      "min_sec": 0.001501,
-      "max_sec": 0.002875
     },
     {
       "stage": "export",
-      "mean_sec": 0.104557,
-      "median_sec": 0.103001,
-      "min_sec": 0.046212,
-      "max_sec": 0.177949
     }
   ]
 }

 {
+  "clustering_mode": "batch_quality",
   "runs": [
     {
       "pattern": "rock",
+      "bars": 2,
       "bpm": 120.0,
       "run_index": 0,
+      "clustering_mode": "batch_quality",
+      "audio_duration_sec": 4.75,
+      "total_duration_sec": 2.416794,
+      "realtime_factor": 0.508799,
+      "hit_count": 14,
+      "cluster_count": 7,
       "stages": [
         {
           "key": "stem",
           "label": "Stem extraction / source load",
+          "duration_sec": 0.011517213000161064,
           "status": "done",
+          "detail": "loaded full mix \u00b7 cached"
         },
         {
           "key": "bpm",
           "label": "Tempo detection",
+          "duration_sec": 0.19438482000009571,
           "status": "done",
           "detail": "120.2 BPM"
         },
         {
           "key": "onsets",
           "label": "Onset detection + slicing",
+          "duration_sec": 1.8062190609998652,
           "status": "done",
+          "detail": "14 hits"
         },
         {
           "key": "classification",
           "label": "Spectral rule classification",
+          "duration_sec": 0.016392102000054365,
           "status": "done",
+          "detail": "bright:5, hihat_closed:1, hihat_open:7, kick:1"
         },
         {
           "key": "clustering",
           "label": "Mel fingerprint + transient NCC clustering",
+          "duration_sec": 0.07352871200009758,
           "status": "done",
+          "detail": "7 clusters \u00b7 batch quality"
         },
         {
           "key": "selection",
           "label": "Best representative scoring",
+          "duration_sec": 0.096273950000068,
           "status": "done",
           "detail": "quality-scored representatives"
         },
         {
           "key": "synthesis",
           "label": "Optional sample synthesis",
+          "duration_sec": 0.0006992359999458131,
           "status": "done",
+          "detail": "2 synthesized alternates"
         },
         {
           "key": "export",
           "label": "MIDI, reconstruction, WAV, ZIP export",
+          "duration_sec": 0.2172303219999776,
           "status": "done",
+          "detail": "7 WAVs + MIDI + ZIP"
         }
       ]
     },
     {
       "pattern": "funk",
+      "bars": 2,
       "bpm": 120.0,
       "run_index": 0,
+      "clustering_mode": "batch_quality",
+      "audio_duration_sec": 4.874989,
+      "total_duration_sec": 2.99188,
+      "realtime_factor": 0.61372,
+      "hit_count": 35,
       "cluster_count": 2,
       "stages": [
         {
           "key": "stem",
           "label": "Stem extraction / source load",
+          "duration_sec": 0.010077079999973648,
           "status": "done",
+          "detail": "loaded full mix \u00b7 cached"
         },
         {
           "key": "bpm",
           "label": "Tempo detection",
+          "duration_sec": 0.17334403699987888,
           "status": "done",
           "detail": "161.5 BPM"
         },
         {
           "key": "onsets",
           "label": "Onset detection + slicing",
+          "duration_sec": 2.1082552409998243,
           "status": "done",
+          "detail": "35 hits"
         },
         {
           "key": "classification",
           "label": "Spectral rule classification",
+          "duration_sec": 0.021269321000090713,
           "status": "done",
+          "detail": "bright:14, cymbal:1, hihat_closed:14, hihat_open:3, kick:1, mid:2"
         },
         {
           "key": "clustering",
           "label": "Mel fingerprint + transient NCC clustering",
+          "duration_sec": 0.26927052900009585,
           "status": "done",
+          "detail": "2 clusters \u00b7 batch quality"
         },
         {
           "key": "selection",
           "label": "Best representative scoring",
+          "duration_sec": 0.31629775500005053,
           "status": "done",
           "detail": "quality-scored representatives"
         },
         {
           "key": "synthesis",
           "label": "Optional sample synthesis",
+          "duration_sec": 0.0011716779999915161,
           "status": "done",
           "detail": "2 synthesized alternates"
         },
         {
           "key": "export",
           "label": "MIDI, reconstruction, WAV, ZIP export",
+          "duration_sec": 0.09167172899992693,
           "status": "done",
           "detail": "2 WAVs + MIDI + ZIP"
         }
     },
     {
       "pattern": "halftime",
+      "bars": 2,
       "bpm": 120.0,
       "run_index": 0,
+      "clustering_mode": "batch_quality",
+      "audio_duration_sec": 4.874989,
+      "total_duration_sec": 2.597859,
+      "realtime_factor": 0.532895,
+      "hit_count": 23,
       "cluster_count": 3,
       "stages": [
         {
           "key": "stem",
           "label": "Stem extraction / source load",
+          "duration_sec": 0.012474630000042453,
           "status": "done",
+          "detail": "loaded full mix \u00b7 cached"
         },
         {
           "key": "bpm",
           "label": "Tempo detection",
+          "duration_sec": 0.18858063699985905,
           "status": "done",
           "detail": "120.2 BPM"
         },
         {
           "key": "onsets",
           "label": "Onset detection + slicing",
+          "duration_sec": 1.9154837959999895,
           "status": "done",
+          "detail": "23 hits"
         },
         {
           "key": "classification",
           "label": "Spectral rule classification",
+          "duration_sec": 0.0188920179998604,
           "status": "done",
+          "detail": "bright:3, hihat_closed:17, hihat_open:3"
         },
         {
           "key": "clustering",
           "label": "Mel fingerprint + transient NCC clustering",
+          "duration_sec": 0.10195718500017392,
           "status": "done",
+          "detail": "3 clusters \u00b7 batch quality"
         },
         {
           "key": "selection",
           "label": "Best representative scoring",
+          "duration_sec": 0.19837312200002089,
           "status": "done",
           "detail": "quality-scored representatives"
         },
         {
           "key": "synthesis",
           "label": "Optional sample synthesis",
+          "duration_sec": 0.0011928339999940363,
           "status": "done",
           "detail": "3 synthesized alternates"
         },
         {
           "key": "export",
           "label": "MIDI, reconstruction, WAV, ZIP export",
+          "duration_sec": 0.1603816869999264,
           "status": "done",
           "detail": "3 WAVs + MIDI + ZIP"
         }
       ]
     }
   ],
   "summary": [
     {
       "stage": "stem",
+      "mean_sec": 0.011356,
+      "median_sec": 0.011517,
+      "min_sec": 0.010077,
+      "max_sec": 0.012475
     },
     {
       "stage": "bpm",
+      "mean_sec": 0.185436,
+      "median_sec": 0.188581,
+      "min_sec": 0.173344,
+      "max_sec": 0.194385
     },
     {
       "stage": "onsets",
+      "mean_sec": 1.943319,
+      "median_sec": 1.915484,
+      "min_sec": 1.806219,
+      "max_sec": 2.108255
     },
     {
       "stage": "classification",
+      "mean_sec": 0.018851,
+      "median_sec": 0.018892,
+      "min_sec": 0.016392,
+      "max_sec": 0.021269
     },
     {
       "stage": "clustering",
+      "mean_sec": 0.148252,
+      "median_sec": 0.101957,
+      "min_sec": 0.073529,
+      "max_sec": 0.269271
     },
     {
       "stage": "selection",
+      "mean_sec": 0.203648,
+      "median_sec": 0.198373,
+      "min_sec": 0.096274,
+      "max_sec": 0.316298
     },
     {
       "stage": "synthesis",
+      "mean_sec": 0.001021,
+      "median_sec": 0.001172,
+      "min_sec": 0.000699,
+      "max_sec": 0.001193
     },
     {
       "stage": "export",
+      "mean_sec": 0.156428,
+      "median_sec": 0.160382,
+      "min_sec": 0.091672,
+      "max_sec": 0.21723
     }
   ]
 }

pipeline_runner.py CHANGED Viewed

@@ -3,6 +3,7 @@
 from __future__ import annotations
 import json
 import os
 import shutil
@@ -23,6 +24,7 @@ from sample_extractor import (
     build_archive,
     classify_hits,
     cluster_hits,
     detect_bpm,
     detect_onsets,
     export_midi,
@@ -53,12 +55,14 @@ class PipelineParams:
     attack_ms: float = 25.0
     mel_threshold: float = 0.75
     linkage: str = "average"
     target_min: int = 5
     target_max: int = 20
     synthesize: bool = True
     quantize_midi: bool = True
     subdivision: int = 16
     device: str = "cpu"
     @classmethod
     def from_mapping(cls, data: dict[str, Any] | None) -> "PipelineParams":
@@ -81,6 +85,8 @@ class PipelineParams:
             raise ValueError(f"Unsupported onset mode: {self.onset_mode}")
         if self.linkage not in {"average", "complete", "single"}:
             raise ValueError(f"Unsupported clustering linkage: {self.linkage}")
         if not 0 <= self.demucs_shifts <= 8:
             raise ValueError("demucs_shifts must be between 0 and 8")
         if not 0.0 <= self.demucs_overlap <= 0.9:
@@ -185,11 +191,66 @@ def _normalise_audio(audio: np.ndarray) -> np.ndarray:
     return audio.astype(np.float32)
 def _write_audio(path: Path, audio: np.ndarray, sr: int, subtype: str = "PCM_24") -> None:
     path.parent.mkdir(parents=True, exist_ok=True)
     sf.write(path, audio.astype(np.float32), sr, subtype=subtype)
 def _make_overview(audio: np.ndarray, sr: int, hits: list[Any], max_points: int = 1600) -> dict[str, Any]:
     if len(audio) == 0:
         return {"sample_rate": sr, "duration_sec": 0, "envelope": [], "onsets": []}
@@ -250,16 +311,9 @@ def run_extraction_pipeline(
     _notify(progress_cb, {"type": "start", "stages": [asdict(s) for s in stages]})
     with _timed_stage(stages, "stem", progress_cb) as stage:
-        stem_audio, stem_sr = extract_stem(
-            str(audio_path),
-            stem=params.stem,
-            device=params.device,
-            model_name=params.demucs_model,
-            shifts=int(params.demucs_shifts),
-            overlap=float(params.demucs_overlap),
-        )
         stem_audio = _normalise_audio(stem_audio)
-        stage.detail = f"{params.stem} via {params.demucs_model}" if params.stem != "all" else "loaded full mix"
         _write_audio(out / "stem.wav", stem_audio, stem_sr, subtype="PCM_16")
     audio_duration_sec = len(stem_audio) / stem_sr if stem_sr else 0.0
@@ -291,21 +345,34 @@ def run_extraction_pipeline(
             stage.detail = ", ".join(f"{key}:{value}" for key, value in sorted(counts.items()))
         with _timed_stage(stages, "clustering", progress_cb) as stage:
-            clusters = cluster_hits(
-                hits,
-                audio=stem_audio,
-                sr=stem_sr,
-                ncc_threshold=float(params.ncc_threshold),
-                attack_ms=float(params.attack_ms),
-                mel_threshold=float(params.mel_threshold),
-                target_min=int(params.target_min),
-                target_max=int(params.target_max),
-                linkage=params.linkage,
-            )
             for cluster in clusters:
                 for hit in cluster.hits:
                     hit.cluster_id = cluster.cluster_id
-            stage.detail = f"{len(clusters)} clusters"
         with _timed_stage(stages, "selection", progress_cb) as stage:
             select_best(clusters)

 from __future__ import annotations
+import hashlib
 import json
 import os
 import shutil
     build_archive,
     classify_hits,
     cluster_hits,
+    cluster_hits_online,
     detect_bpm,
     detect_onsets,
     export_midi,
     attack_ms: float = 25.0
     mel_threshold: float = 0.75
     linkage: str = "average"
+    clustering_mode: str = "batch_quality"
     target_min: int = 5
     target_max: int = 20
     synthesize: bool = True
     quantize_midi: bool = True
     subdivision: int = 16
     device: str = "cpu"
+    use_disk_cache: bool = True
     @classmethod
     def from_mapping(cls, data: dict[str, Any] | None) -> "PipelineParams":
             raise ValueError(f"Unsupported onset mode: {self.onset_mode}")
         if self.linkage not in {"average", "complete", "single"}:
             raise ValueError(f"Unsupported clustering linkage: {self.linkage}")
+        if self.clustering_mode not in {"batch_quality", "online_preview"}:
+            raise ValueError(f"Unsupported clustering mode: {self.clustering_mode}")
         if not 0 <= self.demucs_shifts <= 8:
             raise ValueError("demucs_shifts must be between 0 and 8")
         if not 0.0 <= self.demucs_overlap <= 0.9:
     return audio.astype(np.float32)
+MODULE_ROOT = Path(__file__).resolve().parent
+CACHE_DIR = Path(os.environ["DSE_CACHE_DIR"]) if os.environ.get("DSE_CACHE_DIR") else MODULE_ROOT / ".cache"
+STEM_CACHE_DIR = CACHE_DIR / "stems"
+CACHE_VERSION = "dse-cache-v2"
 def _write_audio(path: Path, audio: np.ndarray, sr: int, subtype: str = "PCM_24") -> None:
     path.parent.mkdir(parents=True, exist_ok=True)
     sf.write(path, audio.astype(np.float32), sr, subtype=subtype)
+def _sha256_file(path: str | os.PathLike[str]) -> str:
+    h = hashlib.sha256()
+    with Path(path).open("rb") as handle:
+        for chunk in iter(lambda: handle.read(1024 * 1024), b""):
+            h.update(chunk)
+    return h.hexdigest()
+def _stem_cache_path(audio_path: str | os.PathLike[str], params: PipelineParams) -> Path:
+    key_payload = {
+        "version": CACHE_VERSION,
+        "source_sha256": _sha256_file(audio_path),
+        "stem": params.stem,
+        "demucs_model": params.demucs_model,
+        "demucs_shifts": params.demucs_shifts,
+        "demucs_overlap": params.demucs_overlap,
+        "device": params.device if params.stem != "all" else "decode",
+    }
+    key = hashlib.sha256(json.dumps(key_payload, sort_keys=True).encode("utf-8")).hexdigest()
+    return STEM_CACHE_DIR / f"{key}.wav"
+def clear_disk_cache() -> None:
+    if CACHE_DIR.exists():
+        shutil.rmtree(CACHE_DIR)
+def _load_or_extract_stem(audio_path: str | os.PathLike[str], params: PipelineParams) -> tuple[np.ndarray, int, str]:
+    if params.use_disk_cache:
+        cache_path = _stem_cache_path(audio_path, params)
+        if cache_path.exists():
+            audio, sr = sf.read(cache_path, dtype="float32", always_2d=False)
+            return np.asarray(audio, dtype=np.float32), int(sr), f"{params.stem} disk-cache hit"
+    audio, sr = extract_stem(
+        str(audio_path),
+        stem=params.stem,
+        device=params.device,
+        model_name=params.demucs_model,
+        shifts=int(params.demucs_shifts),
+        overlap=float(params.demucs_overlap),
+    )
+    detail = f"{params.stem} via {params.demucs_model}" if params.stem != "all" else "loaded full mix"
+    if params.use_disk_cache:
+        cache_path = _stem_cache_path(audio_path, params)
+        _write_audio(cache_path, audio, sr, subtype="PCM_16")
+        detail += " · cached"
+    return audio, sr, detail
 def _make_overview(audio: np.ndarray, sr: int, hits: list[Any], max_points: int = 1600) -> dict[str, Any]:
     if len(audio) == 0:
         return {"sample_rate": sr, "duration_sec": 0, "envelope": [], "onsets": []}
     _notify(progress_cb, {"type": "start", "stages": [asdict(s) for s in stages]})
     with _timed_stage(stages, "stem", progress_cb) as stage:
+        stem_audio, stem_sr, stem_detail = _load_or_extract_stem(audio_path, params)
         stem_audio = _normalise_audio(stem_audio)
+        stage.detail = stem_detail
         _write_audio(out / "stem.wav", stem_audio, stem_sr, subtype="PCM_16")
     audio_duration_sec = len(stem_audio) / stem_sr if stem_sr else 0.0
             stage.detail = ", ".join(f"{key}:{value}" for key, value in sorted(counts.items()))
         with _timed_stage(stages, "clustering", progress_cb) as stage:
+            if params.clustering_mode == "online_preview":
+                clusters = cluster_hits_online(
+                    hits,
+                    audio=stem_audio,
+                    sr=stem_sr,
+                    ncc_threshold=float(params.ncc_threshold),
+                    attack_ms=float(params.attack_ms),
+                    mel_threshold=float(params.mel_threshold),
+                    target_min=int(params.target_min),
+                    target_max=int(params.target_max),
+                )
+                stage.detail = f"{len(clusters)} clusters · online preview"
+            else:
+                clusters = cluster_hits(
+                    hits,
+                    audio=stem_audio,
+                    sr=stem_sr,
+                    ncc_threshold=float(params.ncc_threshold),
+                    attack_ms=float(params.attack_ms),
+                    mel_threshold=float(params.mel_threshold),
+                    target_min=int(params.target_min),
+                    target_max=int(params.target_max),
+                    linkage=params.linkage,
+                )
+                stage.detail = f"{len(clusters)} clusters · batch quality"
             for cluster in clusters:
                 for hit in cluster.hits:
                     hit.cluster_id = cluster.cluster_id
         with _timed_stage(stages, "selection", progress_cb) as stage:
             select_best(clusters)

sample_extractor.py CHANGED Viewed

@@ -267,6 +267,120 @@ def _merge_singletons(clusters, sim_matrix, hits, merge_ratio=2.0):
     for i,c in enumerate(multi): c.cluster_id = i
     return multi
 def cluster_hits(hits, audio=None, sr=44100, ncc_threshold=0.80, attack_ms=25.0,
                  mel_threshold=0.75, target_min=0, target_max=0,
                  linkage='average', merge_singletons=True):

     for i,c in enumerate(multi): c.cluster_id = i
     return multi
+def _cosine(a, b):
+    """Fast cosine similarity for normalized or unnormalized one-dimensional vectors."""
+    n = min(len(a), len(b))
+    if n <= 0:
+        return 0.0
+    av = a[:n]
+    bv = b[:n]
+    denom = float(np.linalg.norm(av) * np.linalg.norm(bv))
+    if denom < 1e-8:
+        return 0.0
+    return float(np.dot(av, bv) / denom)
+def _retitle_clusters(clusters):
+    """Sort, re-index, and make labels stable after incremental assignment."""
+    clusters.sort(key=lambda c: c.count, reverse=True)
+    seen = defaultdict(int)
+    for i, c in enumerate(clusters):
+        c.cluster_id = i
+        majority = defaultdict(int)
+        for hit in c.hits:
+            majority[hit.label] += 1
+        base = max(majority, key=majority.get) if majority else c.label.rsplit('_', 1)[0]
+        suffix = seen[base]
+        seen[base] += 1
+        c.label = f"{base}_{suffix}"
+    return clusters
+def cluster_hits_online(hits, audio=None, sr=44100, ncc_threshold=0.72, attack_ms=25.0,
+                        mel_threshold=0.62, target_min=0, target_max=0,
+                        max_clusters=0):
+    """Prototype-based online clustering for near-realtime previews.
+    The batch algorithm builds an all-pairs matrix and then runs agglomerative
+    clustering. This mode instead processes hits in onset order and compares
+    each new hit only against current cluster prototypes. Complexity is roughly
+    O(number_of_hits × number_of_clusters), so it can update progressively while
+    audio is being analyzed. It is intentionally a preview/final-fast algorithm,
+    not a replacement for the highest-quality batch pass.
+    """
+    if not hits:
+        return []
+    if len(hits) == 1:
+        return [Cluster(cluster_id=0, label=f"{hits[0].label}_0", hits=[hits[0]])]
+    if audio is None:
+        audio = np.concatenate([h.audio for h in hits])
+    cap = int(max_clusters or target_max or 0)
+    if cap <= 0:
+        cap = max(1, min(len(hits), int(target_min or 16)))
+    cap = max(1, min(cap, len(hits)))
+    print(f"[Cluster:online] {len(hits)} hits, cap={cap}, attack={attack_ms}ms")
+    ordered = sorted(hits, key=lambda h: h.onset_time)
+    clusters = []
+    proto_fp = []
+    proto_tr = []
+    proto_energy = []
+    for hit in ordered:
+        fp = _mel_fingerprint(audio, sr, hit.onset_time)
+        tr = _extract_transient(audio, sr, hit.onset_time, attack_ms)
+        best_idx = -1
+        best_score = -1.0
+        best_mel = 0.0
+        best_ncc = 0.0
+        for i, cluster in enumerate(clusters):
+            # Prefer same broad class when possible, but do not make it mandatory.
+            label_bonus = 0.05 if cluster.label.startswith(hit.label + "_") else 0.0
+            mel = _cosine(fp, proto_fp[i])
+            if mel < mel_threshold:
+                continue
+            ncc = _transient_ncc(tr, proto_tr[i])
+            score = (0.45 * mel) + (0.55 * ncc) + label_bonus
+            if score > best_score:
+                best_idx, best_score, best_mel, best_ncc = i, score, mel, ncc
+        should_create = best_idx < 0 or (best_ncc < ncc_threshold and best_score < ncc_threshold)
+        if should_create and len(clusters) < cap:
+            cluster = Cluster(cluster_id=len(clusters), label=f"{hit.label}_{len(clusters)}", hits=[hit])
+            clusters.append(cluster)
+            proto_fp.append(fp)
+            proto_tr.append(tr)
+            proto_energy.append(max(hit.rms_energy, 1e-8))
+            continue
+        if best_idx < 0:
+            # Cap reached and no good match: assign to the nearest prototype by mel.
+            similarities = [_cosine(fp, existing) for existing in proto_fp]
+            best_idx = int(np.argmax(similarities))
+        cluster = clusters[best_idx]
+        cluster.hits.append(hit)
+        # Energy-weighted rolling prototype update; keeps loud clean hits dominant.
+        w_old = proto_energy[best_idx]
+        w_new = max(hit.rms_energy, 1e-8)
+        total = w_old + w_new
+        max_len = max(len(proto_fp[best_idx]), len(fp))
+        old_fp = np.pad(proto_fp[best_idx], (0, max_len - len(proto_fp[best_idx])))
+        new_fp = np.pad(fp, (0, max_len - len(fp)))
+        proto_fp[best_idx] = ((old_fp * w_old) + (new_fp * w_new)) / total
+        max_tr = max(len(proto_tr[best_idx]), len(tr))
+        old_tr = np.pad(proto_tr[best_idx], (0, max_tr - len(proto_tr[best_idx])))
+        new_tr = np.pad(tr, (0, max_tr - len(tr)))
+        proto_tr[best_idx] = ((old_tr * w_old) + (new_tr * w_new)) / total
+        proto_energy[best_idx] = total
+    clusters = _retitle_clusters(clusters)
+    for c in clusters:
+        print(f"    {c.label}: {c.count}")
+    return clusters
 def cluster_hits(hits, audio=None, sr=44100, ncc_threshold=0.80, attack_ms=25.0,
                  mel_threshold=0.75, target_min=0, target_max=0,
                  linkage='average', merge_singletons=True):

scripts/benchmark_subprocesses.py CHANGED Viewed

@@ -24,19 +24,20 @@ from sample_extractor import cache_clear
 from synth_generator import generate_test_song
-def run_case(pattern: str, bars: int, bpm: float, run_index: int) -> dict:
     tmp = Path(tempfile.mkdtemp(prefix="dse-bench-"))
     song = generate_test_song(pattern_name=pattern, bars=bars, bpm=bpm, add_bass=False, seed=42 + run_index)
     src = tmp / f"{pattern}-{bars}bars.wav"
     sf.write(src, song.drums_only, song.sr)
     cache_clear()
-    params = PipelineParams(stem="all", target_min=4, target_max=12, synthesize=True)
     result = run_extraction_pipeline(src, tmp / "out", params)
     return {
         "pattern": pattern,
         "bars": bars,
         "bpm": bpm,
         "run_index": run_index,
         "audio_duration_sec": result.audio_duration_sec,
         "total_duration_sec": result.duration_sec,
         "realtime_factor": result.realtime_factor,
@@ -52,15 +53,16 @@ def main() -> int:
     parser.add_argument("--bars", type=int, default=4)
     parser.add_argument("--bpm", type=float, default=120.0)
     parser.add_argument("--output", default="docs/benchmark-subprocesses.json")
     args = parser.parse_args()
     # Warm imports/JIT and discard the result.
-    run_case("rock", 1, args.bpm, -1)
     rows = []
     for run_index in range(args.runs):
         for pattern in ["rock", "funk", "halftime"]:
-            rows.append(run_case(pattern, args.bars, args.bpm, run_index))
     stage_keys = [stage["key"] for stage in rows[0]["stages"]]
     summary = []
@@ -74,7 +76,7 @@ def main() -> int:
             "max_sec": round(max(values), 6),
         })
-    payload = {"runs": rows, "summary": summary}
     out = Path(args.output)
     out.parent.mkdir(parents=True, exist_ok=True)
     out.write_text(json.dumps(payload, indent=2), encoding="utf-8")

 from synth_generator import generate_test_song
+def run_case(pattern: str, bars: int, bpm: float, run_index: int, clustering_mode: str) -> dict:
     tmp = Path(tempfile.mkdtemp(prefix="dse-bench-"))
     song = generate_test_song(pattern_name=pattern, bars=bars, bpm=bpm, add_bass=False, seed=42 + run_index)
     src = tmp / f"{pattern}-{bars}bars.wav"
     sf.write(src, song.drums_only, song.sr)
     cache_clear()
+    params = PipelineParams(stem="all", clustering_mode=clustering_mode, target_min=4, target_max=12, synthesize=True)
     result = run_extraction_pipeline(src, tmp / "out", params)
     return {
         "pattern": pattern,
         "bars": bars,
         "bpm": bpm,
         "run_index": run_index,
+        "clustering_mode": clustering_mode,
         "audio_duration_sec": result.audio_duration_sec,
         "total_duration_sec": result.duration_sec,
         "realtime_factor": result.realtime_factor,
     parser.add_argument("--bars", type=int, default=4)
     parser.add_argument("--bpm", type=float, default=120.0)
     parser.add_argument("--output", default="docs/benchmark-subprocesses.json")
+    parser.add_argument("--clustering-mode", choices=["batch_quality", "online_preview"], default="batch_quality")
     args = parser.parse_args()
     # Warm imports/JIT and discard the result.
+    run_case("rock", 1, args.bpm, -1, args.clustering_mode)
     rows = []
     for run_index in range(args.runs):
         for pattern in ["rock", "funk", "halftime"]:
+            rows.append(run_case(pattern, args.bars, args.bpm, run_index, args.clustering_mode))
     stage_keys = [stage["key"] for stage in rows[0]["stages"]]
     summary = []
             "max_sec": round(max(values), 6),
         })
+    payload = {"clustering_mode": args.clustering_mode, "runs": rows, "summary": summary}
     out = Path(args.output)
     out.parent.mkdir(parents=True, exist_ok=True)
     out.write_text(json.dumps(payload, indent=2), encoding="utf-8")

web/app.js CHANGED Viewed

@@ -1,16 +1,20 @@
 const $ = (id) => document.getElementById(id);
 const fields = [
-  "stem", "demucs_model", "demucs_shifts", "demucs_overlap", "onset_mode", "onset_delta",
   "energy_threshold_db", "pre_pad", "min_dur", "max_dur", "min_gap", "ncc_threshold",
   "attack_ms", "mel_threshold", "linkage", "target_min", "target_max", "subdivision",
-  "synthesize", "quantize_midi"
 ];
 let config = null;
 let selectedFile = null;
 let activePoll = null;
 function fmtSec(value) {
   if (value === null || value === undefined || Number.isNaN(Number(value))) return "—";
   const n = Number(value);
@@ -19,6 +23,11 @@ function fmtSec(value) {
   return `${n.toFixed(2)} s`;
 }
 function setHealth(ok, text, subtext) {
   $("healthDot").className = `status-dot ${ok ? "ok" : "bad"}`;
   $("healthText").textContent = text;
@@ -47,6 +56,7 @@ function setSelectOptions(select, values, labels = null) {
 function populateConfig() {
   setSelectOptions($("demucs_model"), config.demucs_models);
   const defaults = config.defaults;
   for (const field of fields) {
     const el = $(field);
@@ -80,9 +90,9 @@ function collectParams() {
 function renderStages(stages = []) {
   $("stageList").innerHTML = stages.map((stage) => `
-    <div class="stage ${stage.status}" title="${stage.detail || ""}">
       <span class="badge"></span>
-      <div><strong>${stage.label}</strong><small>${stage.detail || stage.status}</small></div>
       <time>${fmtSec(stage.duration_sec)}</time>
     </div>
   `).join("");
@@ -138,26 +148,27 @@ function drawWaveform(overview) {
 function renderResult(job) {
   const result = job.result;
   if (!result) return;
-  const rtf = result.realtime_factor.toFixed(2);
-  $("resultSummary").textContent = `${result.hit_count} hits → ${result.cluster_count} samples · BPM ${result.bpm ?? "—"} · ${fmtSec(result.duration_sec)} total · ${rtf}× realtime`;
   drawWaveform(result.overview);
   const fileUrls = result.file_urls ?? {};
   const labels = { archive: "Sample pack ZIP", midi: "MIDI", stem: "Stem WAV", reconstruction: "Reconstruction WAV" };
-  $("downloads").innerHTML = Object.entries(fileUrls).map(([key, url]) => `<a href="${url}" download>${labels[key] ?? key}</a>`).join("");
   $("stemAudio").src = fileUrls.stem ?? "";
   $("reconAudio").src = fileUrls.reconstruction ?? "";
   const tbody = $("samplesTable").querySelector("tbody");
   tbody.innerHTML = (result.samples ?? []).map((sample) => `
     <tr>
-      <td>${sample.label}</td>
-      <td>${sample.classification}</td>
-      <td>${sample.hits}</td>
-      <td>${sample.score}</td>
-      <td>${sample.duration_ms} ms</td>
-      <td>${sample.first_onset_sec} s</td>
-      <td><a href="${sample.url}" download>WAV</a></td>
     </tr>
   `).join("");
 }
@@ -173,6 +184,38 @@ function renderJob(job) {
   }
 }
 async function pollJob(id) {
   if (activePoll) clearInterval(activePoll);
   const tick = async () => {
@@ -183,6 +226,7 @@ async function pollJob(id) {
         clearInterval(activePoll);
         activePoll = null;
         $("runButton").disabled = !selectedFile;
       }
     } catch (error) {
       clearInterval(activePoll);
@@ -207,6 +251,7 @@ async function runExtraction() {
     const job = await api("/api/jobs", { method: "POST", body: form });
     renderJob(job);
     await pollJob(job.id);
   } catch (error) {
     $("runButton").disabled = false;
     $("resultSummary").textContent = error.message;
@@ -229,6 +274,7 @@ async function boot() {
     await api("/api/health");
     config = await api("/api/config");
     populateConfig();
     setHealth(true, "Ready", "Backend online");
   } catch (error) {
     setHealth(false, "Offline", error.message);
@@ -244,10 +290,20 @@ $("useFastButton").addEventListener("click", () => {
   $("target_min").value = 4;
   $("target_max").value = 16;
 });
 $("clearCacheButton").addEventListener("click", async () => {
   try {
     await api("/api/cache/clear", { method: "POST" });
-    $("logs").textContent = "Pipeline cache cleared.";
   } catch (error) {
     $("logs").textContent = error.message;
   }

 const $ = (id) => document.getElementById(id);
 const fields = [
+  "stem", "demucs_model", "clustering_mode", "demucs_shifts", "demucs_overlap", "onset_mode", "onset_delta",
   "energy_threshold_db", "pre_pad", "min_dur", "max_dur", "min_gap", "ncc_threshold",
   "attack_ms", "mel_threshold", "linkage", "target_min", "target_max", "subdivision",
+  "synthesize", "quantize_midi", "use_disk_cache"
 ];
 let config = null;
 let selectedFile = null;
 let activePoll = null;
+function esc(value) {
+  return String(value ?? "").replace(/[&<>'"]/g, (c) => ({ "&": "&amp;", "<": "&lt;", ">": "&gt;", "'": "&#39;", '"': "&quot;" }[c]));
+}
 function fmtSec(value) {
   if (value === null || value === undefined || Number.isNaN(Number(value))) return "—";
   const n = Number(value);
   return `${n.toFixed(2)} s`;
 }
+function fmtDate(epochSeconds) {
+  if (!epochSeconds) return "—";
+  return new Date(epochSeconds * 1000).toLocaleString(undefined, { dateStyle: "medium", timeStyle: "short" });
+}
 function setHealth(ok, text, subtext) {
   $("healthDot").className = `status-dot ${ok ? "ok" : "bad"}`;
   $("healthText").textContent = text;
 function populateConfig() {
   setSelectOptions($("demucs_model"), config.demucs_models);
+  setSelectOptions($("clustering_mode"), Object.keys(config.clustering_modes ?? { batch_quality: "", online_preview: "" }), config.clustering_modes);
   const defaults = config.defaults;
   for (const field of fields) {
     const el = $(field);
 function renderStages(stages = []) {
   $("stageList").innerHTML = stages.map((stage) => `
+    <div class="stage ${esc(stage.status)}" title="${esc(stage.detail || "")}">
       <span class="badge"></span>
+      <div><strong>${esc(stage.label)}</strong><small>${esc(stage.detail || stage.status)}</small></div>
       <time>${fmtSec(stage.duration_sec)}</time>
     </div>
   `).join("");
 function renderResult(job) {
   const result = job.result;
   if (!result) return;
+  const rtf = Number(result.realtime_factor).toFixed(2);
+  const mode = result.params?.clustering_mode ?? "—";
+  $("resultSummary").textContent = `${result.hit_count} hits → ${result.cluster_count} samples · BPM ${result.bpm ?? "—"} · ${fmtSec(result.duration_sec)} total · ${rtf}× realtime · ${mode}`;
   drawWaveform(result.overview);
   const fileUrls = result.file_urls ?? {};
   const labels = { archive: "Sample pack ZIP", midi: "MIDI", stem: "Stem WAV", reconstruction: "Reconstruction WAV" };
+  $("downloads").innerHTML = Object.entries(fileUrls).map(([key, url]) => `<a href="${esc(url)}" download>${esc(labels[key] ?? key)}</a>`).join("");
   $("stemAudio").src = fileUrls.stem ?? "";
   $("reconAudio").src = fileUrls.reconstruction ?? "";
   const tbody = $("samplesTable").querySelector("tbody");
   tbody.innerHTML = (result.samples ?? []).map((sample) => `
     <tr>
+      <td>${esc(sample.label)}</td>
+      <td>${esc(sample.classification)}</td>
+      <td>${esc(sample.hits)}</td>
+      <td>${esc(sample.score)}</td>
+      <td>${esc(sample.duration_ms)} ms</td>
+      <td>${esc(sample.first_onset_sec)} s</td>
+      <td><a href="${esc(sample.url)}" download>WAV</a></td>
     </tr>
   `).join("");
 }
   }
 }
+function renderHistory(payload) {
+  const rows = [...(payload.active ?? []), ...(payload.history ?? [])];
+  if (!rows.length) {
+    $("historyList").innerHTML = `<p class="empty">No completed runs yet.</p>`;
+    return;
+  }
+  $("historyList").innerHTML = rows.map((row) => `
+    <button class="history-row" type="button" data-job-id="${esc(row.id)}">
+      <span><strong>${esc(row.filename || row.id)}</strong><small>${esc(row.stem || "—")} · ${esc(row.clustering_mode || "—")} · ${fmtDate(row.created_at)}</small></span>
+      <span>${esc(row.hit_count ?? "…")} hits</span>
+      <span>${esc(row.cluster_count ?? "…")} samples</span>
+      <span>${row.realtime_factor == null ? "—" : `${Number(row.realtime_factor).toFixed(2)}×`}</span>
+    </button>
+  `).join("");
+  for (const button of $("historyList").querySelectorAll(".history-row")) {
+    button.addEventListener("click", async () => {
+      const job = await api(`/api/jobs/${button.dataset.jobId}`);
+      renderJob(job);
+      window.scrollTo({ top: document.body.scrollHeight, behavior: "smooth" });
+    });
+  }
+}
+async function refreshHistory() {
+  try {
+    const payload = await api("/api/jobs?limit=50");
+    renderHistory(payload);
+  } catch (error) {
+    $("historyList").innerHTML = `<p class="empty">${esc(error.message)}</p>`;
+  }
+}
 async function pollJob(id) {
   if (activePoll) clearInterval(activePoll);
   const tick = async () => {
         clearInterval(activePoll);
         activePoll = null;
         $("runButton").disabled = !selectedFile;
+        await refreshHistory();
       }
     } catch (error) {
       clearInterval(activePoll);
     const job = await api("/api/jobs", { method: "POST", body: form });
     renderJob(job);
     await pollJob(job.id);
+    await refreshHistory();
   } catch (error) {
     $("runButton").disabled = false;
     $("resultSummary").textContent = error.message;
     await api("/api/health");
     config = await api("/api/config");
     populateConfig();
+    await refreshHistory();
     setHealth(true, "Ready", "Backend online");
   } catch (error) {
     setHealth(false, "Offline", error.message);
   $("target_min").value = 4;
   $("target_max").value = 16;
 });
+$("usePreviewButton").addEventListener("click", () => {
+  $("stem").value = "all";
+  $("clustering_mode").value = "online_preview";
+  $("demucs_shifts").value = 0;
+  $("target_min").value = 4;
+  $("target_max").value = 16;
+  $("mel_threshold").value = 0.62;
+  $("ncc_threshold").value = 0.72;
+});
+$("refreshHistoryButton").addEventListener("click", refreshHistory);
 $("clearCacheButton").addEventListener("click", async () => {
   try {
     await api("/api/cache/clear", { method: "POST" });
+    $("logs").textContent = "Pipeline memory and disk cache cleared.";
   } catch (error) {
     $("logs").textContent = error.message;
   }

web/index.html CHANGED Viewed

@@ -44,7 +44,7 @@
           <div class="panel-heading">
             <div>
               <h2>2. Extraction controls</h2>
-              <p>Defaults favor quick full-song extraction. Tighten thresholds after reviewing the timeline.</p>
             </div>
             <button id="clearCacheButton" class="ghost-button" type="button">Clear cache</button>
           </div>
@@ -56,6 +56,12 @@
             <label>Demucs model
               <select id="demucs_model"></select>
             </label>
             <label>Shifts
               <input id="demucs_shifts" type="number" min="0" max="8" step="1" />
             </label>
@@ -123,11 +129,13 @@
           <div class="toggles">
             <label><input id="synthesize" type="checkbox" /> synthesize alternates</label>
             <label><input id="quantize_midi" type="checkbox" /> quantize MIDI</label>
           </div>
           <div class="actions">
             <button id="runButton" class="primary-button" type="button" disabled>Extract samples</button>
             <button id="useFastButton" class="secondary-button" type="button">Use fast full-mix mode</button>
           </div>
         </section>
@@ -143,6 +151,17 @@
           <pre id="logs" class="logs" aria-live="polite"></pre>
         </section>
         <section class="panel result-panel">
           <div class="panel-heading">
             <div>

           <div class="panel-heading">
             <div>
               <h2>2. Extraction controls</h2>
+              <p>Batch quality gives the best final grouping. Online preview is the near-realtime clustering path.</p>
             </div>
             <button id="clearCacheButton" class="ghost-button" type="button">Clear cache</button>
           </div>
             <label>Demucs model
               <select id="demucs_model"></select>
             </label>
+            <label>Clustering mode
+              <select id="clustering_mode">
+                <option value="batch_quality">batch quality</option>
+                <option value="online_preview">online preview</option>
+              </select>
+            </label>
             <label>Shifts
               <input id="demucs_shifts" type="number" min="0" max="8" step="1" />
             </label>
           <div class="toggles">
             <label><input id="synthesize" type="checkbox" /> synthesize alternates</label>
             <label><input id="quantize_midi" type="checkbox" /> quantize MIDI</label>
+            <label><input id="use_disk_cache" type="checkbox" /> disk cache stems/source loads</label>
           </div>
           <div class="actions">
             <button id="runButton" class="primary-button" type="button" disabled>Extract samples</button>
             <button id="useFastButton" class="secondary-button" type="button">Use fast full-mix mode</button>
+            <button id="usePreviewButton" class="secondary-button" type="button">Use online preview mode</button>
           </div>
         </section>
           <pre id="logs" class="logs" aria-live="polite"></pre>
         </section>
+        <section class="panel history-panel">
+          <div class="panel-heading">
+            <div>
+              <h2>Run history</h2>
+              <p>Completed manifests under <code>.runs/</code> are indexed automatically. Load a run to compare timings and artifacts.</p>
+            </div>
+            <button id="refreshHistoryButton" class="ghost-button" type="button">Refresh</button>
+          </div>
+          <div id="historyList" class="history-list"></div>
+        </section>
         <section class="panel result-panel">
           <div class="panel-heading">
             <div>

web/styles.css CHANGED Viewed

@@ -78,3 +78,11 @@ td { color: #e5eaf7; }
 tr:last-child td { border-bottom: 0; }
 @media (max-width: 1100px) { .workspace, .hero { grid-template-columns: 1fr; } .control-grid { grid-template-columns: repeat(2, minmax(0, 1fr)); } }
 @media (max-width: 680px) { .shell { width: min(100% - 20px, 1520px); padding-top: 16px; } .panel { padding: 16px; border-radius: 22px; } .control-grid, .audio-grid { grid-template-columns: 1fr; } h1 { letter-spacing: -.045em; } }

 tr:last-child td { border-bottom: 0; }
 @media (max-width: 1100px) { .workspace, .hero { grid-template-columns: 1fr; } .control-grid { grid-template-columns: repeat(2, minmax(0, 1fr)); } }
 @media (max-width: 680px) { .shell { width: min(100% - 20px, 1520px); padding-top: 16px; } .panel { padding: 16px; border-radius: 22px; } .control-grid, .audio-grid { grid-template-columns: 1fr; } h1 { letter-spacing: -.045em; } }
+.history-panel { align-self: stretch; }
+.history-list { display: grid; gap: 8px; max-height: 360px; overflow: auto; }
+.history-row { width: 100%; display: grid; grid-template-columns: minmax(0, 1fr) auto auto auto; gap: 12px; align-items: center; text-align: left; border: 1px solid var(--line); background: rgba(0,0,0,.16); border-radius: 16px; padding: 12px; }
+.history-row strong { display: block; overflow: hidden; text-overflow: ellipsis; white-space: nowrap; color: var(--text); }
+.history-row small { display: block; color: var(--muted); margin-top: 3px; }
+.history-row span:not(:first-child) { color: #dbe5f7; font-size: 12px; font-variant-numeric: tabular-nums; }
+.empty { color: var(--muted); margin: 0; }
+@media (max-width: 680px) { .history-row { grid-template-columns: 1fr 1fr; } }