ChatGPT commited on
Commit
b8fa9bf
·
1 Parent(s): eb1a122

feat: add run history and online clustering

Browse files
.gitignore CHANGED
@@ -20,3 +20,5 @@ build/
20
  *.mid
21
  *.zip
22
  !drum-sample-extractor-updated.zip
 
 
 
20
  *.mid
21
  *.zip
22
  !drum-sample-extractor-updated.zip
23
+
24
+ .cache/
README.md CHANGED
@@ -10,17 +10,41 @@ pinned: false
10
 
11
  # Drum Sample Extractor
12
 
13
- A custom FastAPI + browser UI for extracting reusable drum samples from an audio file.
14
 
15
- The pipeline can isolate a stem with Demucs, detect onsets, classify hits, cluster similar transients, choose representative samples, optionally synthesize alternate samples, and export WAVs, MIDI, reconstruction audio, and a complete ZIP sample pack.
16
 
17
  ## Current status
18
 
19
- - Gradio has been replaced by a custom web frontend in `web/` served by `app.py`.
20
- - The extraction pipeline is exposed through a JSON/multipart API and factored into `pipeline_runner.py`.
21
- - Per-stage timing is captured for every extraction run and written into `manifest.json`.
22
- - Benchmarking support is available in `scripts/benchmark_subprocesses.py`.
23
- - Legacy Gradio apps are preserved in `legacy/` for reference only.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24
 
25
  ## Run locally
26
 
@@ -33,7 +57,12 @@ uvicorn app:app --host 0.0.0.0 --port 7860
33
 
34
  Open `http://127.0.0.1:7860`.
35
 
36
- For fast iteration, set `Stem` to `all`. That bypasses Demucs and runs onset detection, classification, clustering, representative selection, synthesis, MIDI rendering, and packaging directly on the uploaded audio.
 
 
 
 
 
37
 
38
  ## Run benchmarks
39
 
@@ -43,13 +72,13 @@ python3 scripts/benchmark_subprocesses.py --runs 2 --bars 4 --output docs/benchm
43
 
44
  The benchmark uses synthetic drum fixtures and `stem=all` so the DSP stages are measured without Demucs model download/runtime noise.
45
 
46
- ## API
47
 
48
  ```bash
49
  curl http://127.0.0.1:7860/api/config
50
 
51
  curl -F 'file=@song.wav' \
52
- -F 'params={"stem":"all","target_min":4,"target_max":12}' \
53
  http://127.0.0.1:7860/api/jobs
54
  ```
55
 
@@ -59,16 +88,22 @@ Then poll the returned job id:
59
  curl http://127.0.0.1:7860/api/jobs/<job-id>
60
  ```
61
 
 
 
 
 
 
 
62
  ## Important files
63
 
64
  | Path | Purpose |
65
  |---|---|
66
- | `app.py` | FastAPI app, static UI serving, job API, artifact downloads |
67
- | `pipeline_runner.py` | Timed extraction pipeline used by API and benchmarks |
68
  | `sample_extractor.py` | Core DSP/sample extraction implementation |
69
  | `web/` | Custom no-build browser frontend |
70
  | `scripts/benchmark_subprocesses.py` | Synthetic benchmark runner for stage timings |
71
- | `docs/` | Review, timing, API, and UI documentation |
72
  | `legacy/` | Previous Gradio apps retained for reference |
73
 
74
  ## Output per run
@@ -82,4 +117,7 @@ Each run is stored under `.runs/<job-id>/output/`:
82
  - `samples/*.wav`
83
  - `manifest.json`
84
 
85
- `.runs/` is ignored by git.
 
 
 
 
10
 
11
  # Drum Sample Extractor
12
 
13
+ A custom FastAPI + browser workstation for extracting reusable drum samples from an audio file.
14
 
15
+ The pipeline can isolate a stem with Demucs, detect onsets, classify hits, cluster similar transients, choose representative samples, optionally synthesize alternate samples, and export WAVs, MIDI, reconstruction audio, manifests, and a complete ZIP sample pack.
16
 
17
  ## Current status
18
 
19
+ The project is usable as a local/Hugging Face Space application. Gradio is no longer the active UI; the active app is a custom FastAPI backend plus a no-build browser frontend.
20
+
21
+ Implemented in the current development pass:
22
+
23
+ - Custom web frontend in `web/`, served by `app.py`.
24
+ - FastAPI job API with upload, polling, safe artifact downloads, config, health, cache clearing, and run-history listing.
25
+ - Timed pipeline runner in `pipeline_runner.py`.
26
+ - Per-stage timing in every `manifest.json`.
27
+ - Two clustering modes:
28
+ - `batch_quality`: all-pairs mel/NCC similarity plus agglomerative clustering.
29
+ - `online_preview`: prototype-based incremental assignment intended for near-realtime preview.
30
+ - Disk cache for decoded full-mix/stem outputs keyed by source digest and extraction settings.
31
+ - Run history panel indexing `.runs/*/output/manifest.json`.
32
+ - Documentation for features, progress, tasks, API, timing, realtime suitability, UI, and remaining work.
33
+ - Legacy Gradio apps preserved in `legacy/` for reference only.
34
+
35
+ Not fully complete yet:
36
+
37
+ - No interactive waveform editing of onsets/clusters.
38
+ - No server-sent event stream or websocket progress channel.
39
+ - No frontend TypeScript build/test harness.
40
+ - Demucs remains offline/batch by design.
41
+
42
+ See:
43
+
44
+ - `docs/FEATURES.md`
45
+ - `docs/TASKS.md`
46
+ - `docs/PROGRESS.md`
47
+ - `docs/REMAINING_WORK.md`
48
 
49
  ## Run locally
50
 
 
57
 
58
  Open `http://127.0.0.1:7860`.
59
 
60
+ For fast iteration, set:
61
+
62
+ - `Stem = all`
63
+ - `Clustering mode = online_preview`
64
+
65
+ That bypasses Demucs and uses the near-realtime clustering path.
66
 
67
  ## Run benchmarks
68
 
 
72
 
73
  The benchmark uses synthetic drum fixtures and `stem=all` so the DSP stages are measured without Demucs model download/runtime noise.
74
 
75
+ ## API example
76
 
77
  ```bash
78
  curl http://127.0.0.1:7860/api/config
79
 
80
  curl -F 'file=@song.wav' \
81
+ -F 'params={"stem":"all","clustering_mode":"online_preview","target_min":4,"target_max":12}' \
82
  http://127.0.0.1:7860/api/jobs
83
  ```
84
 
 
88
  curl http://127.0.0.1:7860/api/jobs/<job-id>
89
  ```
90
 
91
+ List active/completed runs:
92
+
93
+ ```bash
94
+ curl http://127.0.0.1:7860/api/jobs
95
+ ```
96
+
97
  ## Important files
98
 
99
  | Path | Purpose |
100
  |---|---|
101
+ | `app.py` | FastAPI app, static UI serving, job API, run history, artifact downloads |
102
+ | `pipeline_runner.py` | Timed extraction pipeline, disk stem/source cache, batch/online clustering routing |
103
  | `sample_extractor.py` | Core DSP/sample extraction implementation |
104
  | `web/` | Custom no-build browser frontend |
105
  | `scripts/benchmark_subprocesses.py` | Synthetic benchmark runner for stage timings |
106
+ | `docs/` | Review, timing, API, UI, feature, task, progress, and remaining-work documentation |
107
  | `legacy/` | Previous Gradio apps retained for reference |
108
 
109
  ## Output per run
 
117
  - `samples/*.wav`
118
  - `manifest.json`
119
 
120
+ Generated runtime directories are ignored by git:
121
+
122
+ - `.runs/`
123
+ - `.cache/`
app.py CHANGED
@@ -9,6 +9,7 @@ from __future__ import annotations
9
 
10
  import json
11
  import shutil
 
12
  import traceback
13
  import uuid
14
  from concurrent.futures import ThreadPoolExecutor
@@ -22,7 +23,7 @@ from fastapi.middleware.cors import CORSMiddleware
22
  from fastapi.responses import FileResponse, JSONResponse
23
  from fastapi.staticfiles import StaticFiles
24
 
25
- from pipeline_runner import PipelineParams, initial_stages, run_extraction_pipeline
26
  from sample_extractor import DEMUCS_MODELS, DEMUCS_STEMS, cache_clear
27
 
28
  ROOT = Path(__file__).resolve().parent
@@ -30,7 +31,7 @@ WEB_DIR = ROOT / "web"
30
  RUNS_DIR = ROOT / ".runs"
31
  RUNS_DIR.mkdir(exist_ok=True)
32
 
33
- app = FastAPI(title="Drum Sample Extractor", version="10.0.0")
34
  app.add_middleware(
35
  CORSMiddleware,
36
  allow_origins=["*"],
@@ -61,6 +62,63 @@ def _serialise_job(job: dict[str, Any]) -> dict[str, Any]:
61
  return payload
62
 
63
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
64
  def _update_job(job_id: str, **patch: Any) -> None:
65
  with jobs_lock:
66
  jobs[job_id].update(patch)
@@ -87,7 +145,8 @@ def _run_job(job_id: str) -> None:
87
  if stage.get("status") == "running":
88
  _append_log(job_id, f"Started: {stage['label']}")
89
  elif stage.get("status") == "done":
90
- _append_log(job_id, f"Finished: {stage['label']} in {stage['duration_sec']:.3f}s")
 
91
 
92
  try:
93
  result = run_extraction_pipeline(input_path, output_dir, PipelineParams.from_mapping(params), progress_cb=progress)
@@ -109,13 +168,27 @@ def config() -> dict[str, Any]:
109
  "demucs_stems": {key: value + ["all"] for key, value in DEMUCS_STEMS.items()},
110
  "defaults": asdict(PipelineParams()),
111
  "stages": initial_stages(),
 
 
 
 
112
  }
113
 
114
 
115
  @app.post("/api/cache/clear")
116
  def clear_cache() -> dict[str, str]:
117
  cache_clear()
118
- return {"status": "cleared"}
 
 
 
 
 
 
 
 
 
 
119
 
120
 
121
  @app.post("/api/jobs")
@@ -142,6 +215,7 @@ async def create_job(file: UploadFile = File(...), params: str = Form("{}")) ->
142
  "id": job_id,
143
  "status": "pending",
144
  "filename": file.filename,
 
145
  "params": asdict(validated),
146
  "stages": initial_stages(),
147
  "logs": [],
@@ -161,26 +235,12 @@ async def create_job(file: UploadFile = File(...), params: str = Form("{}")) ->
161
  def get_job(job_id: str) -> dict[str, Any]:
162
  with jobs_lock:
163
  job = jobs.get(job_id)
164
- if not job:
165
- manifest = RUNS_DIR / job_id / "output" / "manifest.json"
166
- if manifest.exists():
167
- result = json.loads(manifest.read_text(encoding="utf-8"))
168
- return _serialise_job(
169
- {
170
- "id": job_id,
171
- "status": "complete",
172
- "filename": None,
173
- "params": result.get("params", {}),
174
- "stages": result.get("stages", []),
175
- "logs": [],
176
- "result": result,
177
- "error": None,
178
- "traceback": None,
179
- "output_dir": str(manifest.parent),
180
- }
181
- )
182
- raise HTTPException(status_code=404, detail="Job not found")
183
- return _serialise_job(dict(job))
184
 
185
 
186
  @app.get("/api/jobs/{job_id}/files/{relative_path:path}")
 
9
 
10
  import json
11
  import shutil
12
+ import time
13
  import traceback
14
  import uuid
15
  from concurrent.futures import ThreadPoolExecutor
 
23
  from fastapi.responses import FileResponse, JSONResponse
24
  from fastapi.staticfiles import StaticFiles
25
 
26
+ from pipeline_runner import PipelineParams, clear_disk_cache, initial_stages, run_extraction_pipeline
27
  from sample_extractor import DEMUCS_MODELS, DEMUCS_STEMS, cache_clear
28
 
29
  ROOT = Path(__file__).resolve().parent
 
31
  RUNS_DIR = ROOT / ".runs"
32
  RUNS_DIR.mkdir(exist_ok=True)
33
 
34
+ app = FastAPI(title="Drum Sample Extractor", version="11.0.0")
35
  app.add_middleware(
36
  CORSMiddleware,
37
  allow_origins=["*"],
 
62
  return payload
63
 
64
 
65
+ def _manifest_path(job_id: str) -> Path:
66
+ return RUNS_DIR / job_id / "output" / "manifest.json"
67
+
68
+
69
+ def _read_manifest_job(job_id: str) -> dict[str, Any] | None:
70
+ manifest = _manifest_path(job_id)
71
+ if not manifest.exists():
72
+ return None
73
+ result = json.loads(manifest.read_text(encoding="utf-8"))
74
+ return {
75
+ "id": job_id,
76
+ "status": "complete",
77
+ "filename": result.get("source", {}).get("filename"),
78
+ "params": result.get("params", {}),
79
+ "stages": result.get("stages", []),
80
+ "logs": [],
81
+ "result": result,
82
+ "error": None,
83
+ "traceback": None,
84
+ "output_dir": str(manifest.parent),
85
+ }
86
+
87
+
88
+ def _summarise_job(job: dict[str, Any]) -> dict[str, Any]:
89
+ result = job.get("result") or {}
90
+ return {
91
+ "id": job["id"],
92
+ "status": job.get("status"),
93
+ "filename": job.get("filename"),
94
+ "created_at": job.get("created_at"),
95
+ "duration_sec": result.get("duration_sec"),
96
+ "audio_duration_sec": result.get("audio_duration_sec"),
97
+ "realtime_factor": result.get("realtime_factor"),
98
+ "bpm": result.get("bpm"),
99
+ "hit_count": result.get("hit_count"),
100
+ "cluster_count": result.get("cluster_count"),
101
+ "clustering_mode": (result.get("params") or job.get("params") or {}).get("clustering_mode"),
102
+ "stem": (result.get("params") or job.get("params") or {}).get("stem"),
103
+ "error": job.get("error"),
104
+ }
105
+
106
+
107
+ def _list_manifest_jobs(limit: int = 50) -> list[dict[str, Any]]:
108
+ rows: list[dict[str, Any]] = []
109
+ for manifest in sorted(RUNS_DIR.glob("*/output/manifest.json"), key=lambda p: p.stat().st_mtime, reverse=True):
110
+ job_id = manifest.parents[1].name
111
+ manifest_job = _read_manifest_job(job_id)
112
+ if not manifest_job:
113
+ continue
114
+ summary = _summarise_job(manifest_job)
115
+ summary["created_at"] = manifest.stat().st_mtime
116
+ rows.append(summary)
117
+ if len(rows) >= limit:
118
+ break
119
+ return rows
120
+
121
+
122
  def _update_job(job_id: str, **patch: Any) -> None:
123
  with jobs_lock:
124
  jobs[job_id].update(patch)
 
145
  if stage.get("status") == "running":
146
  _append_log(job_id, f"Started: {stage['label']}")
147
  elif stage.get("status") == "done":
148
+ detail = f" · {stage['detail']}" if stage.get("detail") else ""
149
+ _append_log(job_id, f"Finished: {stage['label']} in {stage['duration_sec']:.3f}s{detail}")
150
 
151
  try:
152
  result = run_extraction_pipeline(input_path, output_dir, PipelineParams.from_mapping(params), progress_cb=progress)
 
168
  "demucs_stems": {key: value + ["all"] for key, value in DEMUCS_STEMS.items()},
169
  "defaults": asdict(PipelineParams()),
170
  "stages": initial_stages(),
171
+ "clustering_modes": {
172
+ "batch_quality": "Batch quality: all-pairs mel/NCC + agglomerative clustering",
173
+ "online_preview": "Online preview: prototype assignment for near-realtime feedback",
174
+ },
175
  }
176
 
177
 
178
  @app.post("/api/cache/clear")
179
  def clear_cache() -> dict[str, str]:
180
  cache_clear()
181
+ clear_disk_cache()
182
+ return {"status": "cleared", "scope": "memory+disk"}
183
+
184
+
185
+ @app.get("/api/jobs")
186
+ def list_jobs(limit: int = 50) -> dict[str, Any]:
187
+ limit = max(1, min(int(limit), 200))
188
+ with jobs_lock:
189
+ active = [_summarise_job(dict(job)) for job in jobs.values() if job.get("status") != "complete"]
190
+ history = _list_manifest_jobs(limit=limit)
191
+ return {"active": active, "history": history}
192
 
193
 
194
  @app.post("/api/jobs")
 
215
  "id": job_id,
216
  "status": "pending",
217
  "filename": file.filename,
218
+ "created_at": time.time(),
219
  "params": asdict(validated),
220
  "stages": initial_stages(),
221
  "logs": [],
 
235
  def get_job(job_id: str) -> dict[str, Any]:
236
  with jobs_lock:
237
  job = jobs.get(job_id)
238
+ if job:
239
+ return _serialise_job(dict(job))
240
+ manifest_job = _read_manifest_job(job_id)
241
+ if manifest_job:
242
+ return _serialise_job(manifest_job)
243
+ raise HTTPException(status_code=404, detail="Job not found")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
244
 
245
 
246
  @app.get("/api/jobs/{job_id}/files/{relative_path:path}")
docs/API.md CHANGED
@@ -1,5 +1,7 @@
1
  # API documentation
2
 
 
 
3
  The active app is `app.py`, a FastAPI application.
4
 
5
  ## Start server
@@ -18,12 +20,57 @@ Returns backend health.
18
 
19
  ## `GET /api/config`
20
 
21
- Returns supported models, stems, default pipeline params, and stage definitions.
22
 
23
  ```bash
24
  curl http://127.0.0.1:7860/api/config
25
  ```
26
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
  ## `POST /api/jobs`
28
 
29
  Creates an extraction job.
@@ -34,14 +81,14 @@ Fields:
34
 
35
  | Field | Type | Required | Description |
36
  |---|---|---:|---|
37
- | `file` | file | yes | Audio source |
38
- | `params` | JSON string | no | Partial or full pipeline params |
39
 
40
  Example:
41
 
42
  ```bash
43
  curl -F 'file=@song.wav' \
44
- -F 'params={"stem":"all","target_min":4,"target_max":12,"synthesize":true}' \
45
  http://127.0.0.1:7860/api/jobs
46
  ```
47
 
@@ -52,7 +99,7 @@ Response status: `202 Accepted`
52
  "id": "58ca0db4ac74",
53
  "status": "pending",
54
  "filename": "song.wav",
55
- "params": {"stem": "all"},
56
  "stages": [],
57
  "logs": [],
58
  "result": null,
@@ -62,32 +109,32 @@ Response status: `202 Accepted`
62
 
63
  ## `GET /api/jobs/{job_id}`
64
 
65
- Poll job status and retrieve results.
66
 
67
  Statuses:
68
 
69
  | Status | Meaning |
70
  |---|---|
71
- | `pending` | Job is queued |
72
- | `running` | Job is executing |
73
- | `complete` | Result and artifacts are ready |
74
- | `error` | Pipeline failed; `error` and `traceback` are populated |
75
 
76
  Completed jobs contain:
77
 
78
  | Key | Meaning |
79
  |---|---|
80
- | `duration_sec` | Total wall time |
81
- | `audio_duration_sec` | Duration of processed stem/source |
82
- | `realtime_factor` | `duration_sec / audio_duration_sec` |
83
- | `bpm` | Detected tempo |
84
- | `hit_count` | Number of accepted onsets/hits |
85
- | `cluster_count` | Number of sample clusters |
86
- | `stages` | Per-stage timing/status/detail list |
87
- | `samples` | Sample rows with score, duration, first onset, and download URL |
88
- | `overview` | Decimated envelope and onset markers for waveform display |
89
- | `files` | Relative artifact paths |
90
- | `file_urls` | Direct API URLs for artifacts |
91
 
92
  ## `GET /api/jobs/{job_id}/files/{relative_path}`
93
 
@@ -105,36 +152,44 @@ The endpoint prevents path traversal by resolving downloads under `.runs/<job-id
105
 
106
  ## `POST /api/cache/clear`
107
 
108
- Clears the in-memory extraction cache.
109
 
110
  ```bash
111
  curl -X POST http://127.0.0.1:7860/api/cache/clear
112
  ```
113
 
 
 
 
 
 
 
114
  ## Pipeline parameters
115
 
116
  Defined in `pipeline_runner.PipelineParams`.
117
 
118
  | Parameter | Default | Meaning |
119
  |---|---:|---|
120
- | `stem` | `drums` | Demucs source to extract, or `all` to bypass Demucs |
121
- | `demucs_model` | `htdemucs_ft` | Demucs model |
122
- | `demucs_shifts` | `1` | Test-time shifts for Demucs quality/speed tradeoff |
123
- | `demucs_overlap` | `0.25` | Demucs chunk overlap |
124
- | `onset_mode` | `auto` | `auto`, `percussive`, `harmonic`, or `broadband` |
125
- | `onset_delta` | `0.12` | Peak-pick threshold |
126
- | `energy_threshold_db` | `-35` | RMS gate for accepting hits |
127
- | `pre_pad` | `0.003` | Seconds of audio before onset |
128
- | `min_dur` | `0.02` | Minimum hit duration |
129
- | `max_dur` | `1.5` | Maximum hit duration |
130
- | `min_gap` | `0.03` | Minimum time between onsets |
131
- | `ncc_threshold` | `0.80` | Similarity threshold when not targeting cluster count |
132
- | `attack_ms` | `25` | Transient window used for NCC |
133
- | `mel_threshold` | `0.75` | Candidate prefilter threshold |
134
- | `linkage` | `average` | Agglomerative linkage |
135
- | `target_min` | `5` | Lower cluster target; `0` disables target mode |
136
- | `target_max` | `20` | Upper cluster target; `0` disables target mode |
137
- | `synthesize` | `true` | Write synthesized alternates for clusters with multiple hits |
138
- | `quantize_midi` | `true` | Snap MIDI notes to grid |
139
- | `subdivision` | `16` | MIDI grid subdivision |
140
- | `device` | `cpu` | Torch device for Demucs |
 
 
 
1
  # API documentation
2
 
3
+ Last updated: 2026-05-12
4
+
5
  The active app is `app.py`, a FastAPI application.
6
 
7
  ## Start server
 
20
 
21
  ## `GET /api/config`
22
 
23
+ Returns supported models, stems, default pipeline params, stage definitions, and clustering mode labels.
24
 
25
  ```bash
26
  curl http://127.0.0.1:7860/api/config
27
  ```
28
 
29
+ Important response keys:
30
+
31
+ | Key | Meaning |
32
+ |---|---|
33
+ | `demucs_models` | Supported Demucs model names. |
34
+ | `demucs_stems` | Valid stems per model, plus `all` for bypassing Demucs. |
35
+ | `defaults` | Default `PipelineParams`. |
36
+ | `stages` | Pipeline stage definitions. |
37
+ | `clustering_modes` | Human-readable labels for batch and online clustering modes. |
38
+
39
+ ## `GET /api/jobs`
40
+
41
+ Lists active in-memory jobs and completed run manifests found under `.runs/`.
42
+
43
+ ```bash
44
+ curl http://127.0.0.1:7860/api/jobs?limit=50
45
+ ```
46
+
47
+ Response:
48
+
49
+ ```json
50
+ {
51
+ "active": [],
52
+ "history": [
53
+ {
54
+ "id": "58ca0db4ac74",
55
+ "status": "complete",
56
+ "filename": "song.wav",
57
+ "created_at": 1778540000.0,
58
+ "duration_sec": 2.4,
59
+ "audio_duration_sec": 8.0,
60
+ "realtime_factor": 0.3,
61
+ "bpm": 120.0,
62
+ "hit_count": 32,
63
+ "cluster_count": 8,
64
+ "clustering_mode": "online_preview",
65
+ "stem": "all",
66
+ "error": null
67
+ }
68
+ ]
69
+ }
70
+ ```
71
+
72
+ `created_at` is the manifest file modification time as a Unix timestamp.
73
+
74
  ## `POST /api/jobs`
75
 
76
  Creates an extraction job.
 
81
 
82
  | Field | Type | Required | Description |
83
  |---|---|---:|---|
84
+ | `file` | file | yes | Audio source. |
85
+ | `params` | JSON string | no | Partial or full pipeline params. |
86
 
87
  Example:
88
 
89
  ```bash
90
  curl -F 'file=@song.wav' \
91
+ -F 'params={"stem":"all","clustering_mode":"online_preview","target_min":4,"target_max":12,"synthesize":true}' \
92
  http://127.0.0.1:7860/api/jobs
93
  ```
94
 
 
99
  "id": "58ca0db4ac74",
100
  "status": "pending",
101
  "filename": "song.wav",
102
+ "params": {"stem": "all", "clustering_mode": "online_preview"},
103
  "stages": [],
104
  "logs": [],
105
  "result": null,
 
109
 
110
  ## `GET /api/jobs/{job_id}`
111
 
112
+ Poll job status and retrieve results. This works for active in-memory jobs and completed historical jobs whose manifest is still present in `.runs/`.
113
 
114
  Statuses:
115
 
116
  | Status | Meaning |
117
  |---|---|
118
+ | `pending` | Job is queued. |
119
+ | `running` | Job is executing. |
120
+ | `complete` | Result and artifacts are ready. |
121
+ | `error` | Pipeline failed; `error` and `traceback` are populated. |
122
 
123
  Completed jobs contain:
124
 
125
  | Key | Meaning |
126
  |---|---|
127
+ | `duration_sec` | Total wall time. |
128
+ | `audio_duration_sec` | Duration of processed stem/source. |
129
+ | `realtime_factor` | `duration_sec / audio_duration_sec`. |
130
+ | `bpm` | Detected tempo. |
131
+ | `hit_count` | Number of accepted onsets/hits. |
132
+ | `cluster_count` | Number of sample clusters. |
133
+ | `stages` | Per-stage timing/status/detail list. |
134
+ | `samples` | Sample rows with score, duration, first onset, and download URL. |
135
+ | `overview` | Decimated envelope and onset markers for waveform display. |
136
+ | `files` | Relative artifact paths. |
137
+ | `file_urls` | Direct API URLs for artifacts. |
138
 
139
  ## `GET /api/jobs/{job_id}/files/{relative_path}`
140
 
 
152
 
153
  ## `POST /api/cache/clear`
154
 
155
+ Clears the in-memory DSP cache and disk stem/source cache.
156
 
157
  ```bash
158
  curl -X POST http://127.0.0.1:7860/api/cache/clear
159
  ```
160
 
161
+ Response:
162
+
163
+ ```json
164
+ {"status":"cleared","scope":"memory+disk"}
165
+ ```
166
+
167
  ## Pipeline parameters
168
 
169
  Defined in `pipeline_runner.PipelineParams`.
170
 
171
  | Parameter | Default | Meaning |
172
  |---|---:|---|
173
+ | `stem` | `drums` | Demucs source to extract, or `all` to bypass Demucs. |
174
+ | `demucs_model` | `htdemucs_ft` | Demucs model. |
175
+ | `demucs_shifts` | `1` | Test-time shifts for Demucs quality/speed tradeoff. |
176
+ | `demucs_overlap` | `0.25` | Demucs chunk overlap. |
177
+ | `onset_mode` | `auto` | `auto`, `percussive`, `harmonic`, or `broadband`. |
178
+ | `onset_delta` | `0.12` | Peak-pick threshold. |
179
+ | `energy_threshold_db` | `-35` | RMS gate for accepting hits. |
180
+ | `pre_pad` | `0.003` | Seconds of audio before onset. |
181
+ | `min_dur` | `0.02` | Minimum hit duration. |
182
+ | `max_dur` | `1.5` | Maximum hit duration. |
183
+ | `min_gap` | `0.03` | Minimum time between onsets. |
184
+ | `ncc_threshold` | `0.80` | Similarity threshold. Also used by online clustering assignment. |
185
+ | `attack_ms` | `25` | Transient window used for NCC/prototypes. |
186
+ | `mel_threshold` | `0.75` | Candidate prefilter threshold. For online mode, lower values such as `0.62` are useful. |
187
+ | `linkage` | `average` | Agglomerative linkage for `batch_quality`. |
188
+ | `clustering_mode` | `batch_quality` | `batch_quality` or `online_preview`. |
189
+ | `target_min` | `5` | Lower cluster target; `0` disables target mode in batch mode. |
190
+ | `target_max` | `20` | Upper cluster target; `0` disables target/cap mode. |
191
+ | `synthesize` | `true` | Write synthesized alternates for clusters with multiple hits. |
192
+ | `quantize_midi` | `true` | Snap MIDI notes to grid. |
193
+ | `subdivision` | `16` | MIDI grid subdivision. |
194
+ | `device` | `cpu` | Torch device for Demucs. |
195
+ | `use_disk_cache` | `true` | Cache decoded full mix/stems by source digest and extraction settings. |
docs/FEATURES.md ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Feature inventory
2
+
3
+ Last updated: 2026-05-12
4
+
5
+ ## Product goal
6
+
7
+ Turn an input audio file into a practical drum sample pack: detected hits, grouped sample classes, representative WAVs, optional synthesized alternates, MIDI reconstruction, rendered reconstruction audio, and an inspectable manifest.
8
+
9
+ ## Implemented features
10
+
11
+ | Area | Feature | Status | Notes |
12
+ |---|---|---:|---|
13
+ | UI | Custom browser frontend | Implemented | `web/index.html`, `web/styles.css`, `web/app.js`; no Gradio dependency in active app. |
14
+ | UI | Drag/drop audio upload | Implemented | Uses multipart upload to `POST /api/jobs`. |
15
+ | UI | Source preview | Implemented | Browser `<audio>` preview before extraction. |
16
+ | UI | Pipeline controls | Implemented | Stem/model/onset/clustering/MIDI/synthesis/cache controls. |
17
+ | UI | Live-ish progress | Implemented | Polls stage state and logs every 800 ms. |
18
+ | UI | Waveform/onset overview | Implemented | Canvas envelope plus onset markers from `manifest.json`. |
19
+ | UI | Result downloads | Implemented | ZIP, MIDI, stem WAV, reconstruction WAV, individual sample WAVs. |
20
+ | UI | Run history browser | Implemented | Lists completed `.runs/*/output/manifest.json` entries and reloads results. |
21
+ | API | Health/config | Implemented | `GET /api/health`, `GET /api/config`. |
22
+ | API | Job creation/polling | Implemented | `POST /api/jobs`, `GET /api/jobs/{id}`. |
23
+ | API | Run listing | Implemented | `GET /api/jobs` returns active and completed runs. |
24
+ | API | Safe artifact serving | Implemented | Path traversal is blocked by resolved output-root checks. |
25
+ | API | Cache clear | Implemented | Clears in-memory DSP cache and disk stem/source cache. |
26
+ | Pipeline | Demucs stem extraction | Implemented | Offline/batch stage; not advertised as realtime. |
27
+ | Pipeline | Stem/full-mix disk cache | Implemented | Keyed by source SHA-256 plus stem/model/shifts/overlap/device. |
28
+ | Pipeline | BPM detection | Implemented | `librosa` onset/beat based estimate. |
29
+ | Pipeline | SuperFlux-style onset detection | Implemented | Multi-band auto mode plus percussive/harmonic/broadband modes. |
30
+ | Pipeline | Hit classification | Implemented | Rule-based spectral class labels. |
31
+ | Pipeline | Batch quality clustering | Implemented | Mel prefilter + transient NCC + agglomerative clustering. |
32
+ | Pipeline | Online preview clustering | Implemented | Prototype-based incremental assignment for near-realtime feedback. |
33
+ | Pipeline | Representative selection | Implemented | Quality score picks best hit per cluster. |
34
+ | Pipeline | Optional synthesis | Implemented | Weighted aligned average for multi-hit clusters. |
35
+ | Pipeline | MIDI export | Implemented | Quantized or unquantized reconstruction MIDI. |
36
+ | Pipeline | Reconstruction render | Implemented | Renders MIDI-like reconstruction using selected samples. |
37
+ | Pipeline | Sample pack ZIP | Implemented | Includes WAVs, index JSON, MIDI, rendered reconstruction. |
38
+ | Docs | Project review | Implemented | `docs/PROJECT_REVIEW.md`. |
39
+ | Docs | Timing/realtime analysis | Implemented | `docs/PIPELINE_TIMING_AND_REALTIME.md`. |
40
+ | Docs | API docs | Implemented | `docs/API.md`. |
41
+ | Docs | UI replacement docs | Implemented | `docs/UI_REPLACEMENT.md`. |
42
+ | Docs | Feature/task/progress tracking | Implemented | This file, `TASKS.md`, `PROGRESS.md`. |
43
+
44
+ ## Partially implemented features
45
+
46
+ | Area | Feature | Current state | Needed to call it complete |
47
+ |---|---|---|---|
48
+ | Progress | Stage progress | Shows stage boundaries and logs | Add lower-level progress inside Demucs and clustering. |
49
+ | Realtime | Online clustering | Implemented as batch-invoked prototype assignment | Add streaming/incremental audio analysis API for true realtime preview. |
50
+ | Run history | Manifest browser | Lists and reloads completed runs | Add side-by-side comparison and filtering/search. |
51
+ | Editing | Review workflow | Displays waveform and samples | Add click-to-audition hits, onset editing, cluster merge/split, label reassignment. |
52
+ | Frontend quality | No-build JavaScript UI | Good enough for local app | Convert to TypeScript once interaction model stabilizes. |
53
+
54
+ ## Explicit non-goals for this pass
55
+
56
+ - Realtime Demucs. It is not realistic for this use-case and should remain offline/cached.
57
+ - Perfect source separation. Stem quality depends on model choice and input material.
58
+ - Full DAW/sample-editor UX. This pass creates the workstation foundation; detailed editing is next.
docs/PIPELINE_TIMING_AND_REALTIME.md CHANGED
@@ -1,214 +1,131 @@
1
- # Pipeline timing and near-real-time analysis
2
 
3
- ## Measurement setup
4
 
5
- Benchmarks were run with `scripts/benchmark_subprocesses.py` using synthetic drum fixtures from `synth_generator.py`.
6
 
7
- Important constraints:
8
 
9
- - `stem=all` was used to bypass Demucs and measure the DSP/sample-extraction subprocesses directly.
10
- - The script performs one warm-up run first, so import/JIT overhead is not included in the summary.
11
- - Runs used 4 bars at 120 BPM across `rock`, `funk`, and `halftime` synthetic patterns.
12
- - The benchmark output is stored in `docs/benchmark-subprocesses.json`.
13
 
14
- ## Measured subprocess lengths
15
-
16
- | Stage | Mean seconds | Median seconds | Min seconds | Max seconds |
17
- |---|---:|---:|---:|---:|
18
- | `stem` | 0.017 | 0.013 | 0.009 | 0.039 |
19
- | `bpm` | 0.224 | 0.223 | 0.206 | 0.241 |
20
- | `onsets` | 2.140 | 2.034 | 1.762 | 2.871 |
21
- | `classification` | 0.034 | 0.035 | 0.024 | 0.045 |
22
- | `clustering` | 0.496 | 0.597 | 0.059 | 0.913 |
23
- | `selection` | 0.499 | 0.551 | 0.311 | 0.651 |
24
- | `synthesis` | 0.002 | 0.002 | 0.002 | 0.003 |
25
- | `export` | 0.105 | 0.103 | 0.046 | 0.178 |
26
-
27
- Observed total runtime for warm synthetic 4-bar fixtures was roughly `0.30×–0.43×` realtime when Demucs was bypassed. In plain terms: the pure extraction stages ran faster than the audio duration on these fixtures. The first cold run can be much slower because librosa/scipy/numba-style initialization costs are paid up front.
28
 
29
  ## Significant subprocesses
30
 
31
- ### 1. Stem extraction / source load
32
-
33
- Current implementation:
34
-
35
- - `stem=all`: load and normalize the source audio with librosa.
36
- - any other stem: run Demucs via `demucs.pretrained.get_model` and `demucs.apply.apply_model`.
37
-
38
- Timing profile:
39
-
40
- - `stem=all` is near-instant after warm-up on short fixtures.
41
- - Demucs is the offline bottleneck and should be treated as non-realtime in this project.
42
-
43
- Real-time suitability: **No for Demucs, yes for direct source load.**
44
-
45
- Recommended strategy:
46
-
47
- - Keep Demucs as an explicit offline preprocessing stage.
48
- - Cache stem output by content hash and model parameters.
49
- - Let users bypass Demucs for drum loops, already-separated stems, and iterative parameter tuning.
50
-
51
- ### 2. BPM / tempo detection
52
-
53
- Current implementation:
54
-
55
- - `librosa.onset.onset_strength`
56
- - `librosa.feature.tempo`
57
- - beat-track sanity adjustment
58
-
59
- Timing profile:
60
-
61
- - Measured around 0.22 s for ~9 s synthetic clips after warm-up.
62
-
63
- Real-time suitability: **Near-realtime with buffering.**
64
-
65
- A live version should estimate tempo over rolling windows and refine continuously. It does not need the entire file, but short windows can be unstable.
66
-
67
- ### 3. Onset detection + slicing
68
-
69
- Current implementation:
70
-
71
- - Multiband SuperFlux-style onset envelope in `auto` mode.
72
- - Optional percussive/harmonic/broadband modes.
73
- - Peak picking and hit slicing by onset-to-next-onset boundaries.
74
- - Energy threshold and duration filtering.
75
-
76
- Timing profile:
77
-
78
- - This is the largest non-Demucs DSP stage in the measured benchmark: about 2.14 s mean for ~9 s fixtures.
79
- - It is still faster than realtime in warm synthetic tests.
80
-
81
- Real-time suitability: **Yes, with a rolling window and bounded lookahead.**
82
-
83
- Why:
84
-
85
- - Onset strength and peak picking are local-window operations.
86
- - Backtracking and next-onset slicing require a small amount of future context.
87
- - A live system can emit provisional hits and finalize durations once the next onset or max-duration cutoff arrives.
88
-
89
- ### 4. Spectral rule classification
90
-
91
- Current implementation:
92
-
93
- - STFT per hit.
94
- - Low/mid/high energy ratios.
95
- - Spectral centroid, zero-crossing rate, duration rules.
96
-
97
- Timing profile:
98
-
99
- - Measured around 34 ms mean for the benchmark fixtures.
100
-
101
- Real-time suitability: **Yes.**
102
-
103
- This is cheap per hit and can run immediately after a hit segment is finalized.
104
-
105
- ### 5. Mel fingerprinting + transient NCC clustering
106
-
107
- Current implementation:
108
-
109
- - Build mel fingerprints for hits.
110
- - Use cosine similarity as a prefilter.
111
- - Compute transient normalized cross-correlation only for candidate pairs.
112
- - Run agglomerative clustering on the resulting precomputed distance matrix.
113
- - Optionally merge singleton clusters into nearby multi-hit clusters.
114
-
115
- Timing profile:
116
-
117
- - Measured around 0.50 s mean, but depends strongly on number of hits and pair count.
118
- - Complexity is roughly quadratic in hit count for pairwise similarity, with mel prefiltering reducing NCC work.
119
-
120
- Real-time suitability: **Partially.**
121
 
122
- What can be realtime:
123
 
124
- - Mel fingerprint extraction per hit.
125
- - Transient NCC against a bounded set of existing cluster representatives.
126
- - Online assignment to existing clusters.
127
 
128
- What is not truly realtime in the current implementation:
 
129
 
130
- - Full agglomerative clustering over the complete distance matrix.
131
- - Target cluster count search through repeated clustering.
 
 
 
 
 
 
 
 
132
 
133
- Recommended live design:
134
 
135
- 1. Maintain cluster prototypes: representative transient, mel centroid, count, label histogram.
136
- 2. For each finalized hit, compute fingerprint and compare to prototypes first.
137
- 3. Only run transient NCC against likely candidates.
138
- 4. Assign immediately when above threshold; create a new cluster otherwise.
139
- 5. Periodically run batch reclustering in the background to clean up early mistakes.
140
 
141
- ### 6. Best representative selection
142
 
143
- Current implementation:
144
 
145
- - Compute sample quality score per candidate hit.
146
- - Choose highest-scoring hit per cluster.
147
 
148
- Timing profile:
 
 
 
 
 
149
 
150
- - Measured around 0.50 s mean in the benchmark.
151
- - Cost scales with number of hits and quality scoring work.
152
 
153
- Real-time suitability: **Yes as an incremental update.**
154
 
155
- A live version can maintain the current best hit per cluster and only rescore new arrivals or candidates whose cluster changed.
156
 
157
- ### 7. Optional synthesis
 
 
 
 
158
 
159
- Current implementation:
160
 
161
- - Align cluster members by peak position.
162
- - Normalize and weighted-average hits to create an alternate synthesized sample.
163
 
164
- Timing profile:
165
 
166
- - Measured around 2 ms mean on benchmark fixtures.
 
 
 
 
 
 
 
 
167
 
168
- Real-time suitability: **Yes for small clusters, but better as deferred polish.**
169
 
170
- It is fast, but users usually do not need synthesized alternates before cluster membership stabilizes.
 
 
 
 
171
 
172
- ### 8. Export: MIDI, reconstruction, WAVs, ZIP
173
 
174
- Current implementation:
 
 
 
 
175
 
176
- - Build MIDI notes from hits and cluster sample notes.
177
- - Render reconstruction with representative samples.
178
- - Write samples, reconstruction audio, MIDI, archive, and manifest.
179
 
180
- Timing profile:
181
 
182
- - Measured around 0.10 s mean on benchmark fixtures.
 
 
 
 
 
183
 
184
- Real-time suitability: **No for ZIP packaging; yes for preview rendering chunks.**
185
 
186
- The final ZIP is a completion artifact. Reconstruction can be rendered progressively for UI preview.
187
 
188
- ## Real-time feasibility summary
189
 
190
- | Subprocess | Current batch status | Near-real-time feasibility | Notes |
191
- |---|---|---|---|
192
- | Source load | Fast | Yes | Direct file/stream decode is not the bottleneck |
193
- | Demucs stem separation | Slow/offline | No | Keep offline and cached |
194
- | BPM detection | Buffered batch | Partial | Rolling estimate works, exact tempo should refine over time |
195
- | Onset detection | Batch but local-window | Yes | Needs bounded lookahead/backtracking |
196
- | Hit slicing | Depends on next onset | Yes | Emit provisional segment, finalize on next onset/max duration |
197
- | Rule classification | Per-hit | Yes | Cheap and stateless |
198
- | Mel fingerprinting | Per-hit | Yes | Compute once per finalized hit |
199
- | Transient NCC | Pairwise batch | Partial | Realtime against prototypes; batch all-pairs is not realtime |
200
- | Agglomerative clustering | Batch | No | Replace or complement with online prototype assignment |
201
- | Representative selection | Batch per cluster | Yes | Keep best-so-far per cluster |
202
- | Synthesis | Batch per cluster | Partial | Can update lazily after cluster changes |
203
- | MIDI/reconstruction preview | Batch export | Partial | Preview can stream; final MIDI is a completion artifact |
204
- | ZIP packaging | Final artifact | No | Keep as final step |
205
-
206
- ## Recommended next technical move
207
-
208
- Implement a second clustering mode named `online`:
209
-
210
- ```text
211
- onset event → segment finalized → classify → mel fingerprint → candidate prototypes → transient NCC → assign/create cluster → update best representative → UI update
212
- ```
213
-
214
- Keep the existing agglomerative mode as `batch-quality`. Use online mode for immediate feedback and batch mode for final high-quality export.
 
1
+ # Pipeline timing and realtime suitability
2
 
3
+ Last updated: 2026-05-12
4
 
5
+ ## Measurement scope
6
 
7
+ The timing benchmark in `docs/benchmark-subprocesses.json` measures synthetic drum fixtures with:
8
 
9
+ - `stem=all`, so Demucs is bypassed.
10
+ - Warm process after import/model-library initialization.
11
+ - Synthetic rock/funk/halftime fixtures generated by `synth_generator.py`.
12
+ - `scripts/benchmark_subprocesses.py` as the benchmark driver.
13
 
14
+ This isolates the sample-extraction subprocesses from source-separation noise. Demucs timing depends heavily on model, hardware, track length, first-run downloads, and CPU/GPU availability, so it is analyzed separately.
 
 
 
 
 
 
 
 
 
 
 
 
 
15
 
16
  ## Significant subprocesses
17
 
18
+ | Subprocess | Current implementation | Timing behavior | Realtime suitability |
19
+ |---|---|---|---|
20
+ | Source load / stem extraction | `extract_stem`; full mix via `librosa`, stems via Demucs | Full mix is usually small; Demucs dominates full jobs | Full mix: near-realtime. Demucs: no. |
21
+ | BPM detection | `detect_bpm` using onset envelope and beat tracking | Usually sub-second for short fixtures | Near-realtime with buffering; not critical path. |
22
+ | Onset detection + slicing | `detect_onsets` multi-band SuperFlux-style envelope | Often the largest pure-DSP stage | Near-realtime with bounded lookahead. |
23
+ | Classification | Rule-based spectral analysis per hit | Fast relative to onset/clustering | Near-realtime. |
24
+ | Batch clustering | Mel fingerprints + transient NCC + agglomerative clustering | Pairwise/batch; scales poorly with many hits | Not realtime. Final-quality batch mode. |
25
+ | Online clustering | Prototype assignment per hit | Scales with hit count × cluster count | Near-realtime preview path. |
26
+ | Representative selection | Scores each candidate hit | Moderate for many clusters/hits | Near-realtime for moderate hit counts. |
27
+ | Synthesis | Weighted aligned average per multi-hit cluster | Usually small | Near-realtime for moderate clusters. |
28
+ | Export/package | WAV/MIDI/render/ZIP writes | Disk-bound; ZIP is batch finalization | Not meaningful as realtime; finalization step. |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
 
30
+ ## Current benchmark summary
31
 
32
+ The checked-in benchmark files were refreshed on 2026-05-12 with synthetic 2-bar fixtures and Demucs bypassed:
 
 
33
 
34
+ - `docs/benchmark-subprocesses.json`: `batch_quality` clustering.
35
+ - `docs/benchmark-online-preview.json`: `online_preview` clustering.
36
 
37
+ | Stage | Batch quality mean | Online preview mean |
38
+ |---|---:|---:|
39
+ | source load | 0.011 s | 0.012 s |
40
+ | BPM detection | 0.185 s | 0.163 s |
41
+ | onset detection + slicing | 1.943 s | 1.834 s |
42
+ | classification | 0.019 s | 0.017 s |
43
+ | clustering | 0.148 s | 0.045 s |
44
+ | representative selection | 0.204 s | 0.115 s |
45
+ | synthesis | 0.001 s | 0.001 s |
46
+ | export/package | 0.156 s | 0.221 s |
47
 
48
+ On these small fixtures, `online_preview` reduced clustering time by about 3× compared with `batch_quality`. The total run is still dominated by onset detection, so the next realtime optimization target is streaming/incremental onset analysis rather than only clustering.
49
 
50
+ First cold runs can be much slower because imports and library initialization are paid up front.
 
 
 
 
51
 
52
+ ## Batch quality versus online preview clustering
53
 
54
+ ### `batch_quality`
55
 
56
+ Current final-quality clustering path:
 
57
 
58
+ 1. Compute mel fingerprint for each hit.
59
+ 2. Compute pairwise mel cosine prefilter.
60
+ 3. Compute transient NCC only for candidate pairs.
61
+ 4. Build distance matrix.
62
+ 5. Run agglomerative clustering.
63
+ 6. Optionally merge singleton clusters.
64
 
65
+ This gives better global grouping, but it is fundamentally batch-oriented because it wants the full similarity matrix.
 
66
 
67
+ ### `online_preview`
68
 
69
+ Current near-realtime-oriented clustering path:
70
 
71
+ 1. Process hits in onset order.
72
+ 2. Compute one mel fingerprint and one transient per hit.
73
+ 3. Compare the new hit against existing cluster prototypes.
74
+ 4. Assign it to the best prototype or create a new cluster until the target cap is reached.
75
+ 5. Update prototype fingerprints/transients using energy-weighted rolling averages.
76
 
77
+ Complexity is roughly `O(number_of_hits × number_of_clusters)`, not `O(number_of_hits²)`, and does not require future hits before producing a current assignment. It is suitable for progressive preview and fast iteration, but it is not guaranteed to match the global batch clustering result.
78
 
79
+ ## What can run in or near realtime
 
80
 
81
+ These can be performed progressively with small buffers:
82
 
83
+ - Source decode for already-separated/full-mix audio.
84
+ - Onset envelope computation.
85
+ - Peak picking with bounded lookahead.
86
+ - Hit slicing once enough tail audio is buffered.
87
+ - Rule-based classification.
88
+ - Mel fingerprint extraction.
89
+ - Online prototype clustering.
90
+ - Representative preview selection.
91
+ - Basic reconstruction preview.
92
 
93
+ ## What should stay offline/batch
94
 
95
+ - Demucs source separation.
96
+ - All-pairs transient NCC for large hit sets.
97
+ - Agglomerative clustering.
98
+ - Final ZIP packaging.
99
+ - Full high-quality rerender/export.
100
 
101
+ ## Recommended runtime strategy
102
 
103
+ | Phase | Mode | Purpose |
104
+ |---|---|---|
105
+ | Upload / first pass | `stem=all`, `clustering_mode=online_preview` | Fast inspection and parameter tuning. |
106
+ | Final extraction from full mix/stem | `stem=all`, `clustering_mode=batch_quality` | Better grouping without source separation. |
107
+ | Final extraction from full song | `stem=drums`, `clustering_mode=batch_quality`, disk cache on | Best quality with offline Demucs cost paid once. |
108
 
109
+ ## Disk cache impact
 
 
110
 
111
+ Disk cache now stores decoded full mix or Demucs stem output under `.cache/stems/`, keyed by:
112
 
113
+ - Source SHA-256.
114
+ - Stem name.
115
+ - Demucs model.
116
+ - Demucs shifts.
117
+ - Demucs overlap.
118
+ - Device/decode mode.
119
 
120
+ This does not make Demucs realtime, but it prevents repeated source separation work when retuning onset/clustering parameters for the same source and stem settings.
121
 
122
+ ## Remaining realtime work
123
 
124
+ The current `online_preview` mode is invoked by the batch job API after onset detection. To make the application genuinely realtime/progressive, add:
125
 
126
+ 1. A streaming/ranged audio analysis API.
127
+ 2. Incremental onset detector state.
128
+ 3. Incremental hit artifact writing.
129
+ 4. SSE progress/results stream.
130
+ 5. UI that appends hits/clusters as they arrive.
131
+ 6. Optional final `batch_quality` consolidation pass.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/PROGRESS.md ADDED
@@ -0,0 +1,63 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Progress log
2
+
3
+ Last updated: 2026-05-12
4
+
5
+ ## Pass 1: project review, timing, and Gradio replacement
6
+
7
+ Completed:
8
+
9
+ 1. Inspected the original project structure and active Gradio entrypoints.
10
+ 2. Moved previous Gradio interfaces into `legacy/`.
11
+ 3. Created `pipeline_runner.py` as the timed orchestration layer.
12
+ 4. Created `app.py` as a FastAPI backend.
13
+ 5. Created a custom no-build browser frontend under `web/`.
14
+ 6. Added stage timing to each extraction run.
15
+ 7. Added synthetic benchmarking via `scripts/benchmark_subprocesses.py`.
16
+ 8. Added initial docs for project review, timing/realtime, API, UI, and remaining work.
17
+
18
+ Outcome:
19
+
20
+ The application became usable without Gradio and produced per-run manifests/artifacts.
21
+
22
+ ## Pass 2: feature ledger and continued development
23
+
24
+ Completed in this pass:
25
+
26
+ 1. Added first-class docs for features, tasks, and progress.
27
+ 2. Added `GET /api/jobs` for active/completed run listing.
28
+ 3. Added run-history UI panel that indexes `.runs/*/output/manifest.json`.
29
+ 4. Added disk caching for decoded full mix and Demucs stem outputs.
30
+ 5. Extended cache clearing to remove both memory and disk cache.
31
+ 6. Added `clustering_mode` pipeline parameter.
32
+ 7. Added `online_preview` clustering using prototype assignment.
33
+ 8. Added frontend controls for clustering mode and disk cache.
34
+ 9. Fixed duplicate sample writes in `sample_extractor.build_archive`.
35
+ 10. Updated README and docs to reflect the new state.
36
+
37
+ Outcome:
38
+
39
+ The project now has a clearer product surface: final-quality batch extraction, faster online-style preview clustering, persistent run history, and explicit docs tracking what is done versus still missing.
40
+
41
+ ## Current assessment
42
+
43
+ The application is not “fully complete” as an editing workstation, but it is substantially implemented as an extraction workstation. The remaining gaps are concentrated around interactive correction/editing, richer progress streaming, run comparison, and frontend engineering hardening.
44
+
45
+ ## Next recommended pass
46
+
47
+ Implement the editing loop:
48
+
49
+ 1. Click waveform onset marker or sample table row to audition.
50
+ 2. Show selected hit metadata and audio snippet.
51
+ 3. Allow onset shift, label change, cluster reassignment, merge, and split.
52
+ 4. Re-export without rerunning Demucs/onset detection when only grouping changes.
53
+ 5. Save edit decisions into the manifest.
54
+
55
+ ## Validation performed in this pass
56
+
57
+ - Compiled active Python files with `python3 -m py_compile app.py pipeline_runner.py sample_extractor.py scripts/*.py`.
58
+ - Ran FastAPI smoke job through `scripts/test_api_job.py`.
59
+ - Ran an online-preview API smoke job with synthetic audio.
60
+ - Verified `GET /api/jobs` history output and `POST /api/cache/clear` behavior.
61
+ - Refreshed batch and online benchmark JSON files:
62
+ - `docs/benchmark-subprocesses.json`
63
+ - `docs/benchmark-online-preview.json`
docs/PROJECT_REVIEW.md CHANGED
@@ -1,89 +1,52 @@
1
  # Project review
2
 
3
- ## Goal
4
 
5
- Review the uploaded drum sample extractor, identify architectural and UX gaps, replace the Gradio UI with a custom frontend, and document the extraction pipeline with timing and real-time feasibility notes.
6
 
7
- ## Success checklist
8
 
9
- - The active app is no longer Gradio-based.
10
- - The core extraction process is callable independently of the UI.
11
- - Every significant extraction subprocess is timed.
12
- - Runtime artifacts are stable and downloadable.
13
- - Documentation explains current behavior, tradeoffs, and remaining work.
14
- - Legacy files are preserved but not part of the active path.
15
 
16
- ## Existing project structure before changes
 
 
 
 
 
 
17
 
18
- The archive contained a compact Python project:
19
 
20
- | File | Role |
21
- |---|---|
22
- | `app.py` | Active Gradio UI, parameter controls, extraction, eval, optimization tabs |
23
- | `app_v2.py` | Older Gradio UI variant |
24
- | `sample_extractor.py` | Current extraction pipeline: Demucs/load, SuperFlux onsets, rule labels, mel+NCC clustering, MIDI/export |
25
- | `drum_extractor.py` | Older CLI-oriented pipeline with CLAP-era comments and broader experimental code |
26
- | `synth_generator.py` | Synthetic drum fixture generator |
27
- | `evaluation.py` | Ground-truth matching and scoring |
28
- | `optimizer.py`, `optimizer_v2.py` | Parameter search experiments |
29
- | `quality_metrics.py` | Completeness, cleanness, onset, reference metrics |
30
- | `config_store.py` | Config persistence and leaderboard helpers |
31
-
32
- ## Key findings
33
-
34
- 1. `sample_extractor.py` is the right core to keep. It is compact, stage-oriented, and already exposes most of the operations needed by a proper app/API.
35
- 2. `app.py` mixed UI code, runtime hotfixing, file conversion, extraction orchestration, and artifact packaging. That made it hard to test or replace the UI.
36
- 3. The previous Gradio UI was fast to build but not ideal for this use-case: extraction is a staged process with logs, timing, waveform review, downloadable artifacts, and a dense parameter surface that benefits from a purpose-built layout.
37
- 4. The previous `app.py` patched `sample_extractor.py` at runtime to fix `_sf(..., lag=2)` vs `_sf(..., l=2)`. The underlying bug is now fixed directly in `sample_extractor.py`.
38
- 5. There was no meaningful project documentation, no API documentation, and no benchmark/timing documentation.
39
- 6. `requirements.txt` still treated Gradio as first-class. The active app now uses FastAPI; Gradio dependencies have been moved to `requirements-legacy-gradio.txt`.
40
- 7. `.runs/`, generated audio, MIDI, ZIP files, and local caches needed explicit ignore rules.
41
 
42
- ## Changes made
43
 
44
- | Area | Change |
45
  |---|---|
46
- | Active UI | Replaced Gradio with `app.py` FastAPI + custom static frontend in `web/` |
47
- | Pipeline | Added `pipeline_runner.py` with validated params, stage timing, progress callbacks, manifests, and artifact writing |
48
- | Legacy | Moved old Gradio apps into `legacy/` |
49
- | Bugfix | Fixed the `_sf(yh, lag=2, ms=5)` keyword mismatch in `sample_extractor.py` |
50
- | API | Added job creation, polling, config, health, cache clear, and safe artifact download endpoints |
51
- | UX | Added drag/drop upload, dense controls, stage timeline, logs, waveform/onset overview, audio previews, sample table, downloads |
52
- | Benchmarking | Added `scripts/benchmark_subprocesses.py` and committed benchmark output JSON |
53
- | Packaging | Added Dockerfile, updated requirements, added `.gitignore` |
54
- | Docs | Added project review, timing/real-time analysis, API docs, UI notes, and remaining work |
55
-
56
- ## Current architecture
57
-
58
- ```text
59
- browser UI in web/
60
-
61
-
62
- FastAPI app.py
63
-
64
-
65
- pipeline_runner.py
66
-
67
-
68
- sample_extractor.py + quality_metrics.py
69
-
70
-
71
- .runs/<job-id>/output/{samples, MIDI, WAV, ZIP, manifest.json}
72
- ```
73
-
74
- The UI only talks to the API. The API only calls the timed runner. The runner is now independently testable and usable from scripts.
75
-
76
- ## Risks and limitations
77
-
78
- - Demucs can dominate runtime and may require a model download on first use.
79
- - The current job store is in-memory. Completed jobs can be reloaded from `manifest.json`, but queued/running job state is lost on process restart.
80
- - The clustering implementation is still batch-oriented. It can be optimized or adapted incrementally, but current agglomerative clustering is not a streaming algorithm.
81
- - There is no authentication or quota control; this is intended as a local/Hugging Face style app, not a public multi-tenant service.
82
- - The browser UI is currently no-build static JavaScript/CSS. That is intentional for deployability, but a larger UI should eventually move to TypeScript with a real component/test setup.
83
-
84
- ## Verification performed
85
-
86
- - Python syntax compilation for `app.py`, `pipeline_runner.py`, `sample_extractor.py`, and benchmark scripts.
87
- - FastAPI `TestClient` checks for `/`, `/api/health`, and `/api/config`.
88
- - End-to-end API job test using a synthetic drum fixture with `stem=all`.
89
- - Synthetic subprocess benchmark across rock, funk, and halftime patterns.
 
1
  # Project review
2
 
3
+ Last updated: 2026-05-12
4
 
5
+ ## Summary
6
 
7
+ The project has evolved from a Gradio-driven prototype into a usable FastAPI + custom frontend extraction workstation. The core DSP pipeline is still compact and script-oriented, but it now has a clearer boundary between API/UI orchestration (`app.py`), timed pipeline execution (`pipeline_runner.py`), and lower-level sample extraction (`sample_extractor.py`).
8
 
9
+ ## What is strong
 
 
 
 
 
10
 
11
+ 1. **Useful core pipeline**: stem extraction, onset detection, classification, clustering, selection, synthesis, MIDI rendering, and packaging are all present.
12
+ 2. **Small deployable surface**: active runtime is FastAPI plus static files; no frontend build is required.
13
+ 3. **Good local iteration path**: `stem=all` bypasses Demucs for fast tuning.
14
+ 4. **Per-stage timing**: every job manifest records stage durations and details.
15
+ 5. **Artifacts are explicit**: stem WAV, reconstruction WAV, MIDI, sample WAVs, ZIP, and manifest are written per run.
16
+ 6. **Legacy preservation**: old Gradio apps remain available under `legacy/` without being active.
17
+ 7. **New near-realtime path**: `online_preview` clustering gives a practical alternative to all-pairs batch clustering.
18
 
19
+ ## Main risks
20
 
21
+ 1. **Interactive editing is missing**: users can inspect outputs but cannot correct onsets or cluster decisions in the UI yet.
22
+ 2. **Job state is process-local**: active jobs disappear from memory on restart; completed history is recovered from manifests only.
23
+ 3. **Progress is stage-level**: Demucs and clustering do not expose fine-grained progress.
24
+ 4. **Frontend is plain JavaScript**: good for speed, weaker for long-term maintainability than TypeScript modules/tests.
25
+ 5. **Demucs cost remains dominant**: source separation is necessarily offline; disk cache mitigates repeated runs but not first-run latency.
26
+ 6. **DSP code is dense**: `sample_extractor.py` is effective but would benefit from smaller modules and stronger tests.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
 
28
+ ## Development decisions made
29
 
30
+ | Decision | Rationale |
31
  |---|---|
32
+ | Replace Gradio with FastAPI/static UI | More control over workflow, layout, artifacts, and progress display. |
33
+ | Keep no-build frontend for now | Fastest robust replacement; avoids adding Node/Vite just to ship the first custom UI. |
34
+ | Preserve Gradio in `legacy/` | Avoids data loss and gives reference behavior. |
35
+ | Add `pipeline_runner.py` | Keeps API orchestration separate from DSP primitives. |
36
+ | Add disk cache in pipeline layer | Avoids invasive Demucs changes and caches both full mix and stems. |
37
+ | Add `online_preview` rather than replacing batch clustering | Preserves final-quality path while adding a near-realtime option. |
38
+
39
+ ## Current implementation quality
40
+
41
+ | Area | Rating | Notes |
42
+ |---|---:|---|
43
+ | Extraction functionality | Good | Core path works on synthetic tests. |
44
+ | UI/UX foundation | Good | Custom flow is much better than generic Gradio controls. |
45
+ | Realtime architecture | Partial | Online clustering exists; streaming onset/audio pipeline does not. |
46
+ | Documentation | Good | Feature/task/progress/API/timing docs are now embedded. |
47
+ | Test coverage | Basic | Smoke tests exist; no formal unit/browser tests yet. |
48
+ | Maintainability | Medium | Better boundaries now, but DSP module remains dense. |
49
+
50
+ ## Recommendation
51
+
52
+ Next development should not add more global parameters. It should add an editing loop: audition detected hits, manually fix bad onsets, merge/split clusters, relabel samples, then repack from edited state without rerunning expensive stages.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/REMAINING_WORK.md CHANGED
@@ -1,27 +1,35 @@
1
  # Remaining work
2
 
3
- ## Highest value next steps
4
 
5
- 1. **Online clustering mode**: add prototype-based incremental clustering for immediate feedback, while keeping agglomerative clustering as the final-quality batch mode.
6
- 2. **Run history**: index `.runs/*/output/manifest.json` so prior runs are browsable and comparable in the UI.
7
- 3. **Waveform editing**: add hit audition, onset adjustment, cluster merge/split, and label reassignment.
8
- 4. **Demucs caching**: persist stem cache on disk by input digest + model + stem + shifts + overlap.
9
- 5. **True progress reporting**: expose lower-level progress inside Demucs and pairwise clustering, not only stage transitions.
10
- 6. **Benchmark panel**: add an in-app benchmark view that can run synthetic fixtures and compare parameter profiles.
11
- 7. **Frontend test harness**: move the no-build UI to TypeScript once the interaction model stabilizes.
 
 
 
 
 
 
12
 
13
  ## Known constraints
14
 
15
- - Demucs is not a realtime stage and should stay explicitly offline.
16
- - Agglomerative clustering is a batch algorithm; it should not be sold as realtime.
17
  - First run on a fresh environment can be slower due to imports, model download, and library initialization.
18
  - The current job queue is process-local and single-worker. That is fine for local use, but not enough for a shared public deployment.
 
19
 
20
  ## Suggested implementation order
21
 
22
- 1. Add disk cache for source decode/stem separation.
23
- 2. Add run history index and UI browser.
24
- 3. Add hit audition from `overview.onsets` and sample rows.
25
- 4. Implement online prototype clustering.
26
- 5. Add comparison mode between two job manifests.
27
- 6. Add SSE log/progress streaming.
 
 
1
  # Remaining work
2
 
3
+ Last updated: 2026-05-12
4
 
5
+ ## Current gap assessment
6
+
7
+ The project is now a usable extraction workstation, not a complete interactive sample editor. The largest remaining gaps are UX/editor capabilities rather than core batch extraction.
8
+
9
+ ## Highest-priority remaining gaps
10
+
11
+ 1. **Hit audition and selection**: clicking an onset marker or sample row should audition that exact hit/sample.
12
+ 2. **Waveform editing**: add onset adjustment, delete/add hit, and rerun-from-edited-onsets without redoing Demucs.
13
+ 3. **Cluster editing**: allow merge, split, relabel, and manual reassignment of hits.
14
+ 4. **Run comparison**: compare two manifests side-by-side for parameter tuning.
15
+ 5. **Progress streaming**: replace polling or supplement it with SSE for lower-latency logs/progress.
16
+ 6. **Frontend engineering hardening**: migrate the frontend to TypeScript after the UX stabilizes and add browser-level tests.
17
+ 7. **Benchmark panel**: add an in-app benchmark view that can run synthetic fixtures and compare parameter profiles.
18
 
19
  ## Known constraints
20
 
21
+ - Demucs is not a realtime stage and should stay explicitly offline/cached.
22
+ - Batch agglomerative clustering is not realtime; `online_preview` is the progressive clustering path.
23
  - First run on a fresh environment can be slower due to imports, model download, and library initialization.
24
  - The current job queue is process-local and single-worker. That is fine for local use, but not enough for a shared public deployment.
25
+ - Run history is filesystem-backed via `.runs/`; deleting `.runs/` deletes history.
26
 
27
  ## Suggested implementation order
28
 
29
+ 1. Add click-to-audition for sample table rows and waveform onsets.
30
+ 2. Store detected hit snippets as individual review artifacts or expose ranged audio endpoints.
31
+ 3. Add edit state to manifests: deleted hits, shifted onsets, labels, cluster overrides.
32
+ 4. Add rerender/repack endpoint that starts from edited hit/cluster state.
33
+ 5. Add run comparison view.
34
+ 6. Add SSE progress streaming.
35
+ 7. Convert frontend to TypeScript and add UI tests.
docs/TASKS.md ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Task ledger
2
+
3
+ Last updated: 2026-05-12
4
+
5
+ ## User-requested tasks
6
+
7
+ | Task | Status | Evidence |
8
+ |---|---:|---|
9
+ | Review the project | Done | `docs/PROJECT_REVIEW.md`. |
10
+ | Determine length of significant subprocesses | Done | `pipeline_runner.py`, `scripts/benchmark_subprocesses.py`, `docs/benchmark-subprocesses.json`, `docs/PIPELINE_TIMING_AND_REALTIME.md`. |
11
+ | Identify near-realtime subprocesses | Done | `docs/PIPELINE_TIMING_AND_REALTIME.md`. |
12
+ | Add documentation to project | Done | `docs/*.md`, updated `README.md`. |
13
+ | Replace Gradio UI | Done | Active app is FastAPI + custom web UI; Gradio moved to `legacy/`. |
14
+ | Document features, tasks, and progress | Done | `docs/FEATURES.md`, this file, `docs/PROGRESS.md`. |
15
+ | Continue development while keeping docs up-to-date | In progress | This pass adds run history, disk cache, online clustering mode, and docs updates. |
16
+
17
+ ## Completed implementation tasks
18
+
19
+ - [x] Preserve old Gradio apps in `legacy/`.
20
+ - [x] Expose extraction as a FastAPI job API.
21
+ - [x] Serve a custom browser UI from `web/`.
22
+ - [x] Add per-stage timing to the pipeline.
23
+ - [x] Write per-run `manifest.json`.
24
+ - [x] Add synthetic benchmark script.
25
+ - [x] Add API documentation.
26
+ - [x] Add UI replacement documentation.
27
+ - [x] Add project review and realtime analysis documentation.
28
+ - [x] Add run-history listing endpoint: `GET /api/jobs`.
29
+ - [x] Add run-history UI panel.
30
+ - [x] Add disk cache for stem/full-mix loads.
31
+ - [x] Extend cache clearing to disk cache.
32
+ - [x] Add prototype-based `online_preview` clustering mode.
33
+ - [x] Add UI controls for clustering mode and disk cache.
34
+ - [x] Fix duplicate sample writes in `build_archive`.
35
+ - [x] Add feature, task, and progress docs.
36
+
37
+ ## Validation tasks
38
+
39
+ - [x] Python compile check for active Python files.
40
+ - [x] FastAPI smoke test for health/config/job flow.
41
+ - [x] Pipeline smoke test on synthetic audio.
42
+ - [x] API history/cache smoke test.
43
+ - [x] Git status reviewed before packaging.
44
+ - [x] Project archive excludes `.runs/`, `.cache/`, and dependency folders.
45
+
46
+ ## Remaining high-value tasks
47
+
48
+ - [ ] Add click-to-audition onset markers and table rows.
49
+ - [ ] Add onset adjustment and rerun-from-onsets flow.
50
+ - [ ] Add cluster merge/split/relabel workflow.
51
+ - [ ] Add side-by-side run comparison.
52
+ - [ ] Add SSE progress stream for lower-latency updates.
53
+ - [ ] Convert frontend to TypeScript with a small Vite build once UX stabilizes.
54
+ - [ ] Add automated browser-level UI tests.
docs/UI_REPLACEMENT.md CHANGED
@@ -1,26 +1,30 @@
1
  # Custom UI replacement
2
 
 
 
3
  ## What changed
4
 
5
- The active interface is now a custom browser UI served from `web/` by the FastAPI app in `app.py`. The old Gradio files were moved to `legacy/`.
6
 
7
  ## UX goals
8
 
9
  1. Make the process feel like a sample-extraction workstation, not a generic notebook form.
10
- 2. Keep upload, controls, pipeline status, logs, waveform review, audio previews, downloads, and sample rows visible without tab hunting.
11
  3. Show stage timing as a first-class result, because extraction quality and speed tradeoffs matter.
12
  4. Make `stem=all` obvious for fast iteration when Demucs is unnecessary.
13
- 5. Keep the frontend deployable without a JavaScript build step.
 
14
 
15
  ## UI structure
16
 
17
  | Area | Purpose |
18
  |---|---|
19
- | Hero/status | Backend readiness and product framing |
20
- | Source panel | Drag/drop upload and source audio preview |
21
- | Controls panel | Stem, onset, clustering, MIDI, and synthesis parameters |
22
- | Pipeline panel | Stage statuses, durations, and live logs |
23
- | Result panel | Summary, waveform/onsets, downloads, stem/reconstruction audio, sample table |
 
24
 
25
  ## Frontend implementation
26
 
@@ -32,11 +36,11 @@ Files:
32
 
33
  The frontend uses modern browser APIs directly:
34
 
35
- - `fetch` for API calls
36
- - `FormData` for upload
37
- - `<audio>` for previews
38
- - `<canvas>` for waveform/onset visualization
39
- - CSS grid, responsive layout, custom properties, and backdrop filters for layout/polish
40
 
41
  No Gradio runtime, iframe, or generated UI framework is involved.
42
 
@@ -50,6 +54,17 @@ The frontend creates a job with `POST /api/jobs`, then polls `GET /api/jobs/{id}
50
  - reconstruction WAV
51
  - individual sample WAVs
52
 
 
 
 
 
 
 
 
 
 
 
 
53
  ## Why polling instead of websockets/SSE
54
 
55
  Polling is the simplest robust option here because the current pipeline is CPU-heavy and mostly stage-based. The UI polls every 800 ms, which is enough to show stage transitions and logs without introducing websocket lifecycle complexity.
@@ -62,5 +77,5 @@ Future improvement: use Server-Sent Events for lower-latency log streaming once
62
  - Add inline controls for reassigning sample labels and merging/splitting clusters.
63
  - Add A/B comparison between parameter runs.
64
  - Add downloadable timing report per job.
65
- - Add persistent run history browser for `.runs/`.
66
- - Add online clustering mode for near-realtime progressive preview.
 
1
  # Custom UI replacement
2
 
3
+ Last updated: 2026-05-12
4
+
5
  ## What changed
6
 
7
+ The active interface is a custom browser UI served from `web/` by the FastAPI app in `app.py`. The old Gradio files live in `legacy/` and are no longer used by the active application.
8
 
9
  ## UX goals
10
 
11
  1. Make the process feel like a sample-extraction workstation, not a generic notebook form.
12
+ 2. Keep upload, controls, pipeline status, logs, waveform review, audio previews, downloads, run history, and sample rows visible without tab hunting.
13
  3. Show stage timing as a first-class result, because extraction quality and speed tradeoffs matter.
14
  4. Make `stem=all` obvious for fast iteration when Demucs is unnecessary.
15
+ 5. Make `online_preview` obvious as the near-realtime clustering path.
16
+ 6. Keep the frontend deployable without a JavaScript build step until the interaction model stabilizes.
17
 
18
  ## UI structure
19
 
20
  | Area | Purpose |
21
  |---|---|
22
+ | Hero/status | Backend readiness and product framing. |
23
+ | Source panel | Drag/drop upload and source audio preview. |
24
+ | Controls panel | Stem, onset, clustering, MIDI, synthesis, and disk-cache parameters. |
25
+ | Pipeline panel | Stage statuses, durations, and logs. |
26
+ | Run history panel | Loads completed manifests from `.runs/`. |
27
+ | Result panel | Summary, waveform/onsets, downloads, stem/reconstruction audio, sample table. |
28
 
29
  ## Frontend implementation
30
 
 
36
 
37
  The frontend uses modern browser APIs directly:
38
 
39
+ - `fetch` for API calls.
40
+ - `FormData` for upload.
41
+ - `<audio>` for previews.
42
+ - `<canvas>` for waveform/onset visualization.
43
+ - CSS grid, responsive layout, custom properties, and backdrop filters for layout/polish.
44
 
45
  No Gradio runtime, iframe, or generated UI framework is involved.
46
 
 
54
  - reconstruction WAV
55
  - individual sample WAVs
56
 
57
+ The run history panel calls `GET /api/jobs` and can reload any completed manifest still present under `.runs/`.
58
+
59
+ ## Clustering UX
60
+
61
+ Two modes are exposed:
62
+
63
+ | Mode | UX intent |
64
+ |---|---|
65
+ | `batch_quality` | Slower, final-quality clustering using all-pairs similarity plus agglomerative clustering. |
66
+ | `online_preview` | Faster near-realtime-style clustering using prototype assignment. Best for quick iteration after bypassing Demucs. |
67
+
68
  ## Why polling instead of websockets/SSE
69
 
70
  Polling is the simplest robust option here because the current pipeline is CPU-heavy and mostly stage-based. The UI polls every 800 ms, which is enough to show stage transitions and logs without introducing websocket lifecycle complexity.
 
77
  - Add inline controls for reassigning sample labels and merging/splitting clusters.
78
  - Add A/B comparison between parameter runs.
79
  - Add downloadable timing report per job.
80
+ - Add filters/search to the run history browser.
81
+ - Convert the frontend to TypeScript when the UX stops moving quickly.
docs/benchmark-online-preview.json ADDED
@@ -0,0 +1,273 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "clustering_mode": "online_preview",
3
+ "runs": [
4
+ {
5
+ "pattern": "rock",
6
+ "bars": 2,
7
+ "bpm": 120.0,
8
+ "run_index": 0,
9
+ "clustering_mode": "online_preview",
10
+ "audio_duration_sec": 4.75,
11
+ "total_duration_sec": 2.394493,
12
+ "realtime_factor": 0.504104,
13
+ "hit_count": 14,
14
+ "cluster_count": 10,
15
+ "stages": [
16
+ {
17
+ "key": "stem",
18
+ "label": "Stem extraction / source load",
19
+ "duration_sec": 0.01333964500008733,
20
+ "status": "done",
21
+ "detail": "loaded full mix \u00b7 cached"
22
+ },
23
+ {
24
+ "key": "bpm",
25
+ "label": "Tempo detection",
26
+ "duration_sec": 0.18073730900005103,
27
+ "status": "done",
28
+ "detail": "120.2 BPM"
29
+ },
30
+ {
31
+ "key": "onsets",
32
+ "label": "Onset detection + slicing",
33
+ "duration_sec": 1.8083914959997855,
34
+ "status": "done",
35
+ "detail": "14 hits"
36
+ },
37
+ {
38
+ "key": "classification",
39
+ "label": "Spectral rule classification",
40
+ "duration_sec": 0.015553790000012668,
41
+ "status": "done",
42
+ "detail": "bright:5, hihat_open:8, kick:1"
43
+ },
44
+ {
45
+ "key": "clustering",
46
+ "label": "Mel fingerprint + transient NCC clustering",
47
+ "duration_sec": 0.01717499700021108,
48
+ "status": "done",
49
+ "detail": "10 clusters \u00b7 online preview"
50
+ },
51
+ {
52
+ "key": "selection",
53
+ "label": "Best representative scoring",
54
+ "duration_sec": 0.06853683399981492,
55
+ "status": "done",
56
+ "detail": "quality-scored representatives"
57
+ },
58
+ {
59
+ "key": "synthesis",
60
+ "label": "Optional sample synthesis",
61
+ "duration_sec": 0.0004338460000781197,
62
+ "status": "done",
63
+ "detail": "2 synthesized alternates"
64
+ },
65
+ {
66
+ "key": "export",
67
+ "label": "MIDI, reconstruction, WAV, ZIP export",
68
+ "duration_sec": 0.2898033520000354,
69
+ "status": "done",
70
+ "detail": "10 WAVs + MIDI + ZIP"
71
+ }
72
+ ]
73
+ },
74
+ {
75
+ "pattern": "funk",
76
+ "bars": 2,
77
+ "bpm": 120.0,
78
+ "run_index": 0,
79
+ "clustering_mode": "online_preview",
80
+ "audio_duration_sec": 4.874989,
81
+ "total_duration_sec": 2.422223,
82
+ "realtime_factor": 0.496867,
83
+ "hit_count": 30,
84
+ "cluster_count": 12,
85
+ "stages": [
86
+ {
87
+ "key": "stem",
88
+ "label": "Stem extraction / source load",
89
+ "duration_sec": 0.012654803000032189,
90
+ "status": "done",
91
+ "detail": "loaded full mix \u00b7 cached"
92
+ },
93
+ {
94
+ "key": "bpm",
95
+ "label": "Tempo detection",
96
+ "duration_sec": 0.10868702200014013,
97
+ "status": "done",
98
+ "detail": "120.2 BPM"
99
+ },
100
+ {
101
+ "key": "onsets",
102
+ "label": "Onset detection + slicing",
103
+ "duration_sec": 1.7981390029999602,
104
+ "status": "done",
105
+ "detail": "30 hits"
106
+ },
107
+ {
108
+ "key": "classification",
109
+ "label": "Spectral rule classification",
110
+ "duration_sec": 0.020911717999979373,
111
+ "status": "done",
112
+ "detail": "bright:12, cymbal:2, hihat_closed:9, hihat_open:3, kick:1, mid:3"
113
+ },
114
+ {
115
+ "key": "clustering",
116
+ "label": "Mel fingerprint + transient NCC clustering",
117
+ "duration_sec": 0.08173960800013447,
118
+ "status": "done",
119
+ "detail": "12 clusters \u00b7 online preview"
120
+ },
121
+ {
122
+ "key": "selection",
123
+ "label": "Best representative scoring",
124
+ "duration_sec": 0.18588780100003532,
125
+ "status": "done",
126
+ "detail": "quality-scored representatives"
127
+ },
128
+ {
129
+ "key": "synthesis",
130
+ "label": "Optional sample synthesis",
131
+ "duration_sec": 0.001146163000157685,
132
+ "status": "done",
133
+ "detail": "6 synthesized alternates"
134
+ },
135
+ {
136
+ "key": "export",
137
+ "label": "MIDI, reconstruction, WAV, ZIP export",
138
+ "duration_sec": 0.21253995300003226,
139
+ "status": "done",
140
+ "detail": "12 WAVs + MIDI + ZIP"
141
+ }
142
+ ]
143
+ },
144
+ {
145
+ "pattern": "halftime",
146
+ "bars": 2,
147
+ "bpm": 120.0,
148
+ "run_index": 0,
149
+ "clustering_mode": "online_preview",
150
+ "audio_duration_sec": 4.874989,
151
+ "total_duration_sec": 2.406563,
152
+ "realtime_factor": 0.493655,
153
+ "hit_count": 28,
154
+ "cluster_count": 12,
155
+ "stages": [
156
+ {
157
+ "key": "stem",
158
+ "label": "Stem extraction / source load",
159
+ "duration_sec": 0.009107656999958635,
160
+ "status": "done",
161
+ "detail": "loaded full mix \u00b7 cached"
162
+ },
163
+ {
164
+ "key": "bpm",
165
+ "label": "Tempo detection",
166
+ "duration_sec": 0.19882379599994238,
167
+ "status": "done",
168
+ "detail": "118.8 BPM"
169
+ },
170
+ {
171
+ "key": "onsets",
172
+ "label": "Onset detection + slicing",
173
+ "duration_sec": 1.8942657120001059,
174
+ "status": "done",
175
+ "detail": "28 hits"
176
+ },
177
+ {
178
+ "key": "classification",
179
+ "label": "Spectral rule classification",
180
+ "duration_sec": 0.015083428000025378,
181
+ "status": "done",
182
+ "detail": "bright:5, cymbal:2, hihat_closed:19, hihat_open:2"
183
+ },
184
+ {
185
+ "key": "clustering",
186
+ "label": "Mel fingerprint + transient NCC clustering",
187
+ "duration_sec": 0.036892447000127504,
188
+ "status": "done",
189
+ "detail": "12 clusters \u00b7 online preview"
190
+ },
191
+ {
192
+ "key": "selection",
193
+ "label": "Best representative scoring",
194
+ "duration_sec": 0.0908485570000721,
195
+ "status": "done",
196
+ "detail": "quality-scored representatives"
197
+ },
198
+ {
199
+ "key": "synthesis",
200
+ "label": "Optional sample synthesis",
201
+ "duration_sec": 0.0007993310000529164,
202
+ "status": "done",
203
+ "detail": "4 synthesized alternates"
204
+ },
205
+ {
206
+ "key": "export",
207
+ "label": "MIDI, reconstruction, WAV, ZIP export",
208
+ "duration_sec": 0.1602465889998257,
209
+ "status": "done",
210
+ "detail": "12 WAVs + MIDI + ZIP"
211
+ }
212
+ ]
213
+ }
214
+ ],
215
+ "summary": [
216
+ {
217
+ "stage": "stem",
218
+ "mean_sec": 0.011701,
219
+ "median_sec": 0.012655,
220
+ "min_sec": 0.009108,
221
+ "max_sec": 0.01334
222
+ },
223
+ {
224
+ "stage": "bpm",
225
+ "mean_sec": 0.162749,
226
+ "median_sec": 0.180737,
227
+ "min_sec": 0.108687,
228
+ "max_sec": 0.198824
229
+ },
230
+ {
231
+ "stage": "onsets",
232
+ "mean_sec": 1.833599,
233
+ "median_sec": 1.808391,
234
+ "min_sec": 1.798139,
235
+ "max_sec": 1.894266
236
+ },
237
+ {
238
+ "stage": "classification",
239
+ "mean_sec": 0.017183,
240
+ "median_sec": 0.015554,
241
+ "min_sec": 0.015083,
242
+ "max_sec": 0.020912
243
+ },
244
+ {
245
+ "stage": "clustering",
246
+ "mean_sec": 0.045269,
247
+ "median_sec": 0.036892,
248
+ "min_sec": 0.017175,
249
+ "max_sec": 0.08174
250
+ },
251
+ {
252
+ "stage": "selection",
253
+ "mean_sec": 0.115091,
254
+ "median_sec": 0.090849,
255
+ "min_sec": 0.068537,
256
+ "max_sec": 0.185888
257
+ },
258
+ {
259
+ "stage": "synthesis",
260
+ "mean_sec": 0.000793,
261
+ "median_sec": 0.000799,
262
+ "min_sec": 0.000434,
263
+ "max_sec": 0.001146
264
+ },
265
+ {
266
+ "stage": "export",
267
+ "mean_sec": 0.220863,
268
+ "median_sec": 0.21254,
269
+ "min_sec": 0.160247,
270
+ "max_sec": 0.289803
271
+ }
272
+ ]
273
+ }
docs/benchmark-subprocesses.json CHANGED
@@ -1,138 +1,141 @@
1
  {
 
2
  "runs": [
3
  {
4
  "pattern": "rock",
5
- "bars": 4,
6
  "bpm": 120.0,
7
  "run_index": 0,
8
- "audio_duration_sec": 8.75,
9
- "total_duration_sec": 2.594698,
10
- "realtime_factor": 0.296537,
11
- "hit_count": 28,
12
- "cluster_count": 1,
 
13
  "stages": [
14
  {
15
  "key": "stem",
16
  "label": "Stem extraction / source load",
17
- "duration_sec": 0.014633260999971753,
18
  "status": "done",
19
- "detail": "loaded full mix"
20
  },
21
  {
22
  "key": "bpm",
23
  "label": "Tempo detection",
24
- "duration_sec": 0.23692302500001006,
25
  "status": "done",
26
  "detail": "120.2 BPM"
27
  },
28
  {
29
  "key": "onsets",
30
  "label": "Onset detection + slicing",
31
- "duration_sec": 1.762329765000004,
32
  "status": "done",
33
- "detail": "28 hits"
34
  },
35
  {
36
  "key": "classification",
37
  "label": "Spectral rule classification",
38
- "duration_sec": 0.02908633100003044,
39
  "status": "done",
40
- "detail": "bright:9, cymbal:1, hihat_closed:1, hihat_open:15, mid:2"
41
  },
42
  {
43
  "key": "clustering",
44
  "label": "Mel fingerprint + transient NCC clustering",
45
- "duration_sec": 0.05944011799999771,
46
  "status": "done",
47
- "detail": "1 clusters"
48
  },
49
  {
50
  "key": "selection",
51
  "label": "Best representative scoring",
52
- "duration_sec": 0.31093429700001707,
53
  "status": "done",
54
  "detail": "quality-scored representatives"
55
  },
56
  {
57
  "key": "synthesis",
58
  "label": "Optional sample synthesis",
59
- "duration_sec": 0.0028187070000171843,
60
  "status": "done",
61
- "detail": "1 synthesized alternates"
62
  },
63
  {
64
  "key": "export",
65
  "label": "MIDI, reconstruction, WAV, ZIP export",
66
- "duration_sec": 0.1779485609999938,
67
  "status": "done",
68
- "detail": "1 WAVs + MIDI + ZIP"
69
  }
70
  ]
71
  },
72
  {
73
  "pattern": "funk",
74
- "bars": 4,
75
  "bpm": 120.0,
76
  "run_index": 0,
77
- "audio_duration_sec": 8.874989,
78
- "total_duration_sec": 3.790648,
79
- "realtime_factor": 0.427116,
80
- "hit_count": 53,
 
81
  "cluster_count": 2,
82
  "stages": [
83
  {
84
  "key": "stem",
85
  "label": "Stem extraction / source load",
86
- "duration_sec": 0.009321340000042255,
87
  "status": "done",
88
- "detail": "loaded full mix"
89
  },
90
  {
91
  "key": "bpm",
92
  "label": "Tempo detection",
93
- "duration_sec": 0.23110938799999303,
94
  "status": "done",
95
  "detail": "161.5 BPM"
96
  },
97
  {
98
  "key": "onsets",
99
  "label": "Onset detection + slicing",
100
- "duration_sec": 2.1605432889999747,
101
  "status": "done",
102
- "detail": "53 hits"
103
  },
104
  {
105
  "key": "classification",
106
  "label": "Spectral rule classification",
107
- "duration_sec": 0.04475730899997643,
108
  "status": "done",
109
- "detail": "bright:25, hihat_closed:18, hihat_open:7, mid:3"
110
  },
111
  {
112
  "key": "clustering",
113
  "label": "Mel fingerprint + transient NCC clustering",
114
- "duration_sec": 0.6768225310000275,
115
  "status": "done",
116
- "detail": "2 clusters"
117
  },
118
  {
119
  "key": "selection",
120
  "label": "Best representative scoring",
121
- "duration_sec": 0.559724416999984,
122
  "status": "done",
123
  "detail": "quality-scored representatives"
124
  },
125
  {
126
  "key": "synthesis",
127
  "label": "Optional sample synthesis",
128
- "duration_sec": 0.0024601989999837315,
129
  "status": "done",
130
  "detail": "2 synthesized alternates"
131
  },
132
  {
133
  "key": "export",
134
  "label": "MIDI, reconstruction, WAV, ZIP export",
135
- "duration_sec": 0.10532420399999864,
136
  "status": "done",
137
  "detail": "2 WAVs + MIDI + ZIP"
138
  }
@@ -140,337 +143,131 @@
140
  },
141
  {
142
  "pattern": "halftime",
143
- "bars": 4,
144
  "bpm": 120.0,
145
  "run_index": 0,
146
- "audio_duration_sec": 8.874989,
147
- "total_duration_sec": 3.701891,
148
- "realtime_factor": 0.417115,
149
- "hit_count": 66,
150
- "cluster_count": 2,
151
- "stages": [
152
- {
153
- "key": "stem",
154
- "label": "Stem extraction / source load",
155
- "duration_sec": 0.009298575000002529,
156
- "status": "done",
157
- "detail": "loaded full mix"
158
- },
159
- {
160
- "key": "bpm",
161
- "label": "Tempo detection",
162
- "duration_sec": 0.21581650399997443,
163
- "status": "done",
164
- "detail": "120.2 BPM"
165
- },
166
- {
167
- "key": "onsets",
168
- "label": "Onset detection + slicing",
169
- "duration_sec": 1.9768937550000487,
170
- "status": "done",
171
- "detail": "66 hits"
172
- },
173
- {
174
- "key": "classification",
175
- "label": "Spectral rule classification",
176
- "duration_sec": 0.03783250899999757,
177
- "status": "done",
178
- "detail": "bright:11, cymbal:2, hihat_closed:48, hihat_open:5"
179
- },
180
- {
181
- "key": "clustering",
182
- "label": "Mel fingerprint + transient NCC clustering",
183
- "duration_sec": 0.7498706449999872,
184
- "status": "done",
185
- "detail": "2 clusters"
186
- },
187
- {
188
- "key": "selection",
189
- "label": "Best representative scoring",
190
- "duration_sec": 0.6169061510000233,
191
- "status": "done",
192
- "detail": "quality-scored representatives"
193
- },
194
- {
195
- "key": "synthesis",
196
- "label": "Optional sample synthesis",
197
- "duration_sec": 0.0028750459999855593,
198
- "status": "done",
199
- "detail": "2 synthesized alternates"
200
- },
201
- {
202
- "key": "export",
203
- "label": "MIDI, reconstruction, WAV, ZIP export",
204
- "duration_sec": 0.09185817900004167,
205
- "status": "done",
206
- "detail": "2 WAVs + MIDI + ZIP"
207
- }
208
- ]
209
- },
210
- {
211
- "pattern": "rock",
212
- "bars": 4,
213
- "bpm": 120.0,
214
- "run_index": 1,
215
- "audio_duration_sec": 8.75,
216
- "total_duration_sec": 2.848686,
217
- "realtime_factor": 0.325564,
218
- "hit_count": 24,
219
- "cluster_count": 1,
220
- "stages": [
221
- {
222
- "key": "stem",
223
- "label": "Stem extraction / source load",
224
- "duration_sec": 0.03869248300003392,
225
- "status": "done",
226
- "detail": "loaded full mix"
227
- },
228
- {
229
- "key": "bpm",
230
- "label": "Tempo detection",
231
- "duration_sec": 0.24107510999999704,
232
- "status": "done",
233
- "detail": "120.2 BPM"
234
- },
235
- {
236
- "key": "onsets",
237
- "label": "Onset detection + slicing",
238
- "duration_sec": 2.0721967459999746,
239
- "status": "done",
240
- "detail": "24 hits"
241
- },
242
- {
243
- "key": "classification",
244
- "label": "Spectral rule classification",
245
- "duration_sec": 0.024016725000024053,
246
- "status": "done",
247
- "detail": "bright:7, hihat_closed:2, hihat_open:15"
248
- },
249
- {
250
- "key": "clustering",
251
- "label": "Mel fingerprint + transient NCC clustering",
252
- "duration_sec": 0.05910233800000242,
253
- "status": "done",
254
- "detail": "1 clusters"
255
- },
256
- {
257
- "key": "selection",
258
- "label": "Best representative scoring",
259
- "duration_sec": 0.3106304350000073,
260
- "status": "done",
261
- "detail": "quality-scored representatives"
262
- },
263
- {
264
- "key": "synthesis",
265
- "label": "Optional sample synthesis",
266
- "duration_sec": 0.0015013799999792354,
267
- "status": "done",
268
- "detail": "1 synthesized alternates"
269
- },
270
- {
271
- "key": "export",
272
- "label": "MIDI, reconstruction, WAV, ZIP export",
273
- "duration_sec": 0.10095534999999245,
274
- "status": "done",
275
- "detail": "1 WAVs + MIDI + ZIP"
276
- }
277
- ]
278
- },
279
- {
280
- "pattern": "funk",
281
- "bars": 4,
282
- "bpm": 120.0,
283
- "run_index": 1,
284
- "audio_duration_sec": 8.874989,
285
- "total_duration_sec": 3.416797,
286
- "realtime_factor": 0.384992,
287
- "hit_count": 52,
288
  "cluster_count": 3,
289
  "stages": [
290
  {
291
  "key": "stem",
292
  "label": "Stem extraction / source load",
293
- "duration_sec": 0.011181277999980921,
294
  "status": "done",
295
- "detail": "loaded full mix"
296
  },
297
  {
298
  "key": "bpm",
299
  "label": "Tempo detection",
300
- "duration_sec": 0.20633040499996014,
301
  "status": "done",
302
  "detail": "120.2 BPM"
303
  },
304
  {
305
  "key": "onsets",
306
  "label": "Onset detection + slicing",
307
- "duration_sec": 1.9962494719999881,
308
  "status": "done",
309
- "detail": "52 hits"
310
  },
311
  {
312
  "key": "classification",
313
  "label": "Spectral rule classification",
314
- "duration_sec": 0.03461634600000707,
315
  "status": "done",
316
- "detail": "bright:23, cymbal:3, hihat_closed:15, hihat_open:8, mid:3"
317
  },
318
  {
319
  "key": "clustering",
320
  "label": "Mel fingerprint + transient NCC clustering",
321
- "duration_sec": 0.51767344000001,
322
  "status": "done",
323
- "detail": "3 clusters"
324
  },
325
  {
326
  "key": "selection",
327
  "label": "Best representative scoring",
328
- "duration_sec": 0.5431782379999959,
329
  "status": "done",
330
  "detail": "quality-scored representatives"
331
  },
332
  {
333
  "key": "synthesis",
334
  "label": "Optional sample synthesis",
335
- "duration_sec": 0.001988787999948727,
336
  "status": "done",
337
  "detail": "3 synthesized alternates"
338
  },
339
  {
340
  "key": "export",
341
  "label": "MIDI, reconstruction, WAV, ZIP export",
342
- "duration_sec": 0.10504587100001572,
343
  "status": "done",
344
  "detail": "3 WAVs + MIDI + ZIP"
345
  }
346
  ]
347
- },
348
- {
349
- "pattern": "halftime",
350
- "bars": 4,
351
- "bpm": 120.0,
352
- "run_index": 1,
353
- "audio_duration_sec": 8.874989,
354
- "total_duration_sec": 4.750472,
355
- "realtime_factor": 0.535265,
356
- "hit_count": 64,
357
- "cluster_count": 1,
358
- "stages": [
359
- {
360
- "key": "stem",
361
- "label": "Stem extraction / source load",
362
- "duration_sec": 0.016472632999978032,
363
- "status": "done",
364
- "detail": "loaded full mix"
365
- },
366
- {
367
- "key": "bpm",
368
- "label": "Tempo detection",
369
- "duration_sec": 0.2141354419999857,
370
- "status": "done",
371
- "detail": "120.2 BPM"
372
- },
373
- {
374
- "key": "onsets",
375
- "label": "Onset detection + slicing",
376
- "duration_sec": 2.8706004370000073,
377
- "status": "done",
378
- "detail": "64 hits"
379
- },
380
- {
381
- "key": "classification",
382
- "label": "Spectral rule classification",
383
- "duration_sec": 0.036172296999950504,
384
- "status": "done",
385
- "detail": "bright:11, cymbal:2, hihat_closed:45, hihat_open:4, mid:2"
386
- },
387
- {
388
- "key": "clustering",
389
- "label": "Mel fingerprint + transient NCC clustering",
390
- "duration_sec": 0.9130003360000387,
391
- "status": "done",
392
- "detail": "1 clusters"
393
- },
394
- {
395
- "key": "selection",
396
- "label": "Best representative scoring",
397
- "duration_sec": 0.6508792970000172,
398
- "status": "done",
399
- "detail": "quality-scored representatives"
400
- },
401
- {
402
- "key": "synthesis",
403
- "label": "Optional sample synthesis",
404
- "duration_sec": 0.0025003810000043813,
405
- "status": "done",
406
- "detail": "1 synthesized alternates"
407
- },
408
- {
409
- "key": "export",
410
- "label": "MIDI, reconstruction, WAV, ZIP export",
411
- "duration_sec": 0.04621197200003735,
412
- "status": "done",
413
- "detail": "1 WAVs + MIDI + ZIP"
414
- }
415
- ]
416
  }
417
  ],
418
  "summary": [
419
  {
420
  "stage": "stem",
421
- "mean_sec": 0.0166,
422
- "median_sec": 0.012907,
423
- "min_sec": 0.009299,
424
- "max_sec": 0.038692
425
  },
426
  {
427
  "stage": "bpm",
428
- "mean_sec": 0.224232,
429
- "median_sec": 0.223463,
430
- "min_sec": 0.20633,
431
- "max_sec": 0.241075
432
  },
433
  {
434
  "stage": "onsets",
435
- "mean_sec": 2.139802,
436
- "median_sec": 2.034223,
437
- "min_sec": 1.76233,
438
- "max_sec": 2.8706
439
  },
440
  {
441
  "stage": "classification",
442
- "mean_sec": 0.034414,
443
- "median_sec": 0.035394,
444
- "min_sec": 0.024017,
445
- "max_sec": 0.044757
446
  },
447
  {
448
  "stage": "clustering",
449
- "mean_sec": 0.495985,
450
- "median_sec": 0.597248,
451
- "min_sec": 0.059102,
452
- "max_sec": 0.913
453
  },
454
  {
455
  "stage": "selection",
456
- "mean_sec": 0.498709,
457
- "median_sec": 0.551451,
458
- "min_sec": 0.31063,
459
- "max_sec": 0.650879
460
  },
461
  {
462
  "stage": "synthesis",
463
- "mean_sec": 0.002357,
464
- "median_sec": 0.00248,
465
- "min_sec": 0.001501,
466
- "max_sec": 0.002875
467
  },
468
  {
469
  "stage": "export",
470
- "mean_sec": 0.104557,
471
- "median_sec": 0.103001,
472
- "min_sec": 0.046212,
473
- "max_sec": 0.177949
474
  }
475
  ]
476
  }
 
1
  {
2
+ "clustering_mode": "batch_quality",
3
  "runs": [
4
  {
5
  "pattern": "rock",
6
+ "bars": 2,
7
  "bpm": 120.0,
8
  "run_index": 0,
9
+ "clustering_mode": "batch_quality",
10
+ "audio_duration_sec": 4.75,
11
+ "total_duration_sec": 2.416794,
12
+ "realtime_factor": 0.508799,
13
+ "hit_count": 14,
14
+ "cluster_count": 7,
15
  "stages": [
16
  {
17
  "key": "stem",
18
  "label": "Stem extraction / source load",
19
+ "duration_sec": 0.011517213000161064,
20
  "status": "done",
21
+ "detail": "loaded full mix \u00b7 cached"
22
  },
23
  {
24
  "key": "bpm",
25
  "label": "Tempo detection",
26
+ "duration_sec": 0.19438482000009571,
27
  "status": "done",
28
  "detail": "120.2 BPM"
29
  },
30
  {
31
  "key": "onsets",
32
  "label": "Onset detection + slicing",
33
+ "duration_sec": 1.8062190609998652,
34
  "status": "done",
35
+ "detail": "14 hits"
36
  },
37
  {
38
  "key": "classification",
39
  "label": "Spectral rule classification",
40
+ "duration_sec": 0.016392102000054365,
41
  "status": "done",
42
+ "detail": "bright:5, hihat_closed:1, hihat_open:7, kick:1"
43
  },
44
  {
45
  "key": "clustering",
46
  "label": "Mel fingerprint + transient NCC clustering",
47
+ "duration_sec": 0.07352871200009758,
48
  "status": "done",
49
+ "detail": "7 clusters \u00b7 batch quality"
50
  },
51
  {
52
  "key": "selection",
53
  "label": "Best representative scoring",
54
+ "duration_sec": 0.096273950000068,
55
  "status": "done",
56
  "detail": "quality-scored representatives"
57
  },
58
  {
59
  "key": "synthesis",
60
  "label": "Optional sample synthesis",
61
+ "duration_sec": 0.0006992359999458131,
62
  "status": "done",
63
+ "detail": "2 synthesized alternates"
64
  },
65
  {
66
  "key": "export",
67
  "label": "MIDI, reconstruction, WAV, ZIP export",
68
+ "duration_sec": 0.2172303219999776,
69
  "status": "done",
70
+ "detail": "7 WAVs + MIDI + ZIP"
71
  }
72
  ]
73
  },
74
  {
75
  "pattern": "funk",
76
+ "bars": 2,
77
  "bpm": 120.0,
78
  "run_index": 0,
79
+ "clustering_mode": "batch_quality",
80
+ "audio_duration_sec": 4.874989,
81
+ "total_duration_sec": 2.99188,
82
+ "realtime_factor": 0.61372,
83
+ "hit_count": 35,
84
  "cluster_count": 2,
85
  "stages": [
86
  {
87
  "key": "stem",
88
  "label": "Stem extraction / source load",
89
+ "duration_sec": 0.010077079999973648,
90
  "status": "done",
91
+ "detail": "loaded full mix \u00b7 cached"
92
  },
93
  {
94
  "key": "bpm",
95
  "label": "Tempo detection",
96
+ "duration_sec": 0.17334403699987888,
97
  "status": "done",
98
  "detail": "161.5 BPM"
99
  },
100
  {
101
  "key": "onsets",
102
  "label": "Onset detection + slicing",
103
+ "duration_sec": 2.1082552409998243,
104
  "status": "done",
105
+ "detail": "35 hits"
106
  },
107
  {
108
  "key": "classification",
109
  "label": "Spectral rule classification",
110
+ "duration_sec": 0.021269321000090713,
111
  "status": "done",
112
+ "detail": "bright:14, cymbal:1, hihat_closed:14, hihat_open:3, kick:1, mid:2"
113
  },
114
  {
115
  "key": "clustering",
116
  "label": "Mel fingerprint + transient NCC clustering",
117
+ "duration_sec": 0.26927052900009585,
118
  "status": "done",
119
+ "detail": "2 clusters \u00b7 batch quality"
120
  },
121
  {
122
  "key": "selection",
123
  "label": "Best representative scoring",
124
+ "duration_sec": 0.31629775500005053,
125
  "status": "done",
126
  "detail": "quality-scored representatives"
127
  },
128
  {
129
  "key": "synthesis",
130
  "label": "Optional sample synthesis",
131
+ "duration_sec": 0.0011716779999915161,
132
  "status": "done",
133
  "detail": "2 synthesized alternates"
134
  },
135
  {
136
  "key": "export",
137
  "label": "MIDI, reconstruction, WAV, ZIP export",
138
+ "duration_sec": 0.09167172899992693,
139
  "status": "done",
140
  "detail": "2 WAVs + MIDI + ZIP"
141
  }
 
143
  },
144
  {
145
  "pattern": "halftime",
146
+ "bars": 2,
147
  "bpm": 120.0,
148
  "run_index": 0,
149
+ "clustering_mode": "batch_quality",
150
+ "audio_duration_sec": 4.874989,
151
+ "total_duration_sec": 2.597859,
152
+ "realtime_factor": 0.532895,
153
+ "hit_count": 23,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
154
  "cluster_count": 3,
155
  "stages": [
156
  {
157
  "key": "stem",
158
  "label": "Stem extraction / source load",
159
+ "duration_sec": 0.012474630000042453,
160
  "status": "done",
161
+ "detail": "loaded full mix \u00b7 cached"
162
  },
163
  {
164
  "key": "bpm",
165
  "label": "Tempo detection",
166
+ "duration_sec": 0.18858063699985905,
167
  "status": "done",
168
  "detail": "120.2 BPM"
169
  },
170
  {
171
  "key": "onsets",
172
  "label": "Onset detection + slicing",
173
+ "duration_sec": 1.9154837959999895,
174
  "status": "done",
175
+ "detail": "23 hits"
176
  },
177
  {
178
  "key": "classification",
179
  "label": "Spectral rule classification",
180
+ "duration_sec": 0.0188920179998604,
181
  "status": "done",
182
+ "detail": "bright:3, hihat_closed:17, hihat_open:3"
183
  },
184
  {
185
  "key": "clustering",
186
  "label": "Mel fingerprint + transient NCC clustering",
187
+ "duration_sec": 0.10195718500017392,
188
  "status": "done",
189
+ "detail": "3 clusters \u00b7 batch quality"
190
  },
191
  {
192
  "key": "selection",
193
  "label": "Best representative scoring",
194
+ "duration_sec": 0.19837312200002089,
195
  "status": "done",
196
  "detail": "quality-scored representatives"
197
  },
198
  {
199
  "key": "synthesis",
200
  "label": "Optional sample synthesis",
201
+ "duration_sec": 0.0011928339999940363,
202
  "status": "done",
203
  "detail": "3 synthesized alternates"
204
  },
205
  {
206
  "key": "export",
207
  "label": "MIDI, reconstruction, WAV, ZIP export",
208
+ "duration_sec": 0.1603816869999264,
209
  "status": "done",
210
  "detail": "3 WAVs + MIDI + ZIP"
211
  }
212
  ]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
213
  }
214
  ],
215
  "summary": [
216
  {
217
  "stage": "stem",
218
+ "mean_sec": 0.011356,
219
+ "median_sec": 0.011517,
220
+ "min_sec": 0.010077,
221
+ "max_sec": 0.012475
222
  },
223
  {
224
  "stage": "bpm",
225
+ "mean_sec": 0.185436,
226
+ "median_sec": 0.188581,
227
+ "min_sec": 0.173344,
228
+ "max_sec": 0.194385
229
  },
230
  {
231
  "stage": "onsets",
232
+ "mean_sec": 1.943319,
233
+ "median_sec": 1.915484,
234
+ "min_sec": 1.806219,
235
+ "max_sec": 2.108255
236
  },
237
  {
238
  "stage": "classification",
239
+ "mean_sec": 0.018851,
240
+ "median_sec": 0.018892,
241
+ "min_sec": 0.016392,
242
+ "max_sec": 0.021269
243
  },
244
  {
245
  "stage": "clustering",
246
+ "mean_sec": 0.148252,
247
+ "median_sec": 0.101957,
248
+ "min_sec": 0.073529,
249
+ "max_sec": 0.269271
250
  },
251
  {
252
  "stage": "selection",
253
+ "mean_sec": 0.203648,
254
+ "median_sec": 0.198373,
255
+ "min_sec": 0.096274,
256
+ "max_sec": 0.316298
257
  },
258
  {
259
  "stage": "synthesis",
260
+ "mean_sec": 0.001021,
261
+ "median_sec": 0.001172,
262
+ "min_sec": 0.000699,
263
+ "max_sec": 0.001193
264
  },
265
  {
266
  "stage": "export",
267
+ "mean_sec": 0.156428,
268
+ "median_sec": 0.160382,
269
+ "min_sec": 0.091672,
270
+ "max_sec": 0.21723
271
  }
272
  ]
273
  }
pipeline_runner.py CHANGED
@@ -3,6 +3,7 @@
3
 
4
  from __future__ import annotations
5
 
 
6
  import json
7
  import os
8
  import shutil
@@ -23,6 +24,7 @@ from sample_extractor import (
23
  build_archive,
24
  classify_hits,
25
  cluster_hits,
 
26
  detect_bpm,
27
  detect_onsets,
28
  export_midi,
@@ -53,12 +55,14 @@ class PipelineParams:
53
  attack_ms: float = 25.0
54
  mel_threshold: float = 0.75
55
  linkage: str = "average"
 
56
  target_min: int = 5
57
  target_max: int = 20
58
  synthesize: bool = True
59
  quantize_midi: bool = True
60
  subdivision: int = 16
61
  device: str = "cpu"
 
62
 
63
  @classmethod
64
  def from_mapping(cls, data: dict[str, Any] | None) -> "PipelineParams":
@@ -81,6 +85,8 @@ class PipelineParams:
81
  raise ValueError(f"Unsupported onset mode: {self.onset_mode}")
82
  if self.linkage not in {"average", "complete", "single"}:
83
  raise ValueError(f"Unsupported clustering linkage: {self.linkage}")
 
 
84
  if not 0 <= self.demucs_shifts <= 8:
85
  raise ValueError("demucs_shifts must be between 0 and 8")
86
  if not 0.0 <= self.demucs_overlap <= 0.9:
@@ -185,11 +191,66 @@ def _normalise_audio(audio: np.ndarray) -> np.ndarray:
185
  return audio.astype(np.float32)
186
 
187
 
 
 
 
 
 
 
188
  def _write_audio(path: Path, audio: np.ndarray, sr: int, subtype: str = "PCM_24") -> None:
189
  path.parent.mkdir(parents=True, exist_ok=True)
190
  sf.write(path, audio.astype(np.float32), sr, subtype=subtype)
191
 
192
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
193
  def _make_overview(audio: np.ndarray, sr: int, hits: list[Any], max_points: int = 1600) -> dict[str, Any]:
194
  if len(audio) == 0:
195
  return {"sample_rate": sr, "duration_sec": 0, "envelope": [], "onsets": []}
@@ -250,16 +311,9 @@ def run_extraction_pipeline(
250
  _notify(progress_cb, {"type": "start", "stages": [asdict(s) for s in stages]})
251
 
252
  with _timed_stage(stages, "stem", progress_cb) as stage:
253
- stem_audio, stem_sr = extract_stem(
254
- str(audio_path),
255
- stem=params.stem,
256
- device=params.device,
257
- model_name=params.demucs_model,
258
- shifts=int(params.demucs_shifts),
259
- overlap=float(params.demucs_overlap),
260
- )
261
  stem_audio = _normalise_audio(stem_audio)
262
- stage.detail = f"{params.stem} via {params.demucs_model}" if params.stem != "all" else "loaded full mix"
263
  _write_audio(out / "stem.wav", stem_audio, stem_sr, subtype="PCM_16")
264
 
265
  audio_duration_sec = len(stem_audio) / stem_sr if stem_sr else 0.0
@@ -291,21 +345,34 @@ def run_extraction_pipeline(
291
  stage.detail = ", ".join(f"{key}:{value}" for key, value in sorted(counts.items()))
292
 
293
  with _timed_stage(stages, "clustering", progress_cb) as stage:
294
- clusters = cluster_hits(
295
- hits,
296
- audio=stem_audio,
297
- sr=stem_sr,
298
- ncc_threshold=float(params.ncc_threshold),
299
- attack_ms=float(params.attack_ms),
300
- mel_threshold=float(params.mel_threshold),
301
- target_min=int(params.target_min),
302
- target_max=int(params.target_max),
303
- linkage=params.linkage,
304
- )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
305
  for cluster in clusters:
306
  for hit in cluster.hits:
307
  hit.cluster_id = cluster.cluster_id
308
- stage.detail = f"{len(clusters)} clusters"
309
 
310
  with _timed_stage(stages, "selection", progress_cb) as stage:
311
  select_best(clusters)
 
3
 
4
  from __future__ import annotations
5
 
6
+ import hashlib
7
  import json
8
  import os
9
  import shutil
 
24
  build_archive,
25
  classify_hits,
26
  cluster_hits,
27
+ cluster_hits_online,
28
  detect_bpm,
29
  detect_onsets,
30
  export_midi,
 
55
  attack_ms: float = 25.0
56
  mel_threshold: float = 0.75
57
  linkage: str = "average"
58
+ clustering_mode: str = "batch_quality"
59
  target_min: int = 5
60
  target_max: int = 20
61
  synthesize: bool = True
62
  quantize_midi: bool = True
63
  subdivision: int = 16
64
  device: str = "cpu"
65
+ use_disk_cache: bool = True
66
 
67
  @classmethod
68
  def from_mapping(cls, data: dict[str, Any] | None) -> "PipelineParams":
 
85
  raise ValueError(f"Unsupported onset mode: {self.onset_mode}")
86
  if self.linkage not in {"average", "complete", "single"}:
87
  raise ValueError(f"Unsupported clustering linkage: {self.linkage}")
88
+ if self.clustering_mode not in {"batch_quality", "online_preview"}:
89
+ raise ValueError(f"Unsupported clustering mode: {self.clustering_mode}")
90
  if not 0 <= self.demucs_shifts <= 8:
91
  raise ValueError("demucs_shifts must be between 0 and 8")
92
  if not 0.0 <= self.demucs_overlap <= 0.9:
 
191
  return audio.astype(np.float32)
192
 
193
 
194
+ MODULE_ROOT = Path(__file__).resolve().parent
195
+ CACHE_DIR = Path(os.environ["DSE_CACHE_DIR"]) if os.environ.get("DSE_CACHE_DIR") else MODULE_ROOT / ".cache"
196
+ STEM_CACHE_DIR = CACHE_DIR / "stems"
197
+ CACHE_VERSION = "dse-cache-v2"
198
+
199
+
200
  def _write_audio(path: Path, audio: np.ndarray, sr: int, subtype: str = "PCM_24") -> None:
201
  path.parent.mkdir(parents=True, exist_ok=True)
202
  sf.write(path, audio.astype(np.float32), sr, subtype=subtype)
203
 
204
 
205
+ def _sha256_file(path: str | os.PathLike[str]) -> str:
206
+ h = hashlib.sha256()
207
+ with Path(path).open("rb") as handle:
208
+ for chunk in iter(lambda: handle.read(1024 * 1024), b""):
209
+ h.update(chunk)
210
+ return h.hexdigest()
211
+
212
+
213
+ def _stem_cache_path(audio_path: str | os.PathLike[str], params: PipelineParams) -> Path:
214
+ key_payload = {
215
+ "version": CACHE_VERSION,
216
+ "source_sha256": _sha256_file(audio_path),
217
+ "stem": params.stem,
218
+ "demucs_model": params.demucs_model,
219
+ "demucs_shifts": params.demucs_shifts,
220
+ "demucs_overlap": params.demucs_overlap,
221
+ "device": params.device if params.stem != "all" else "decode",
222
+ }
223
+ key = hashlib.sha256(json.dumps(key_payload, sort_keys=True).encode("utf-8")).hexdigest()
224
+ return STEM_CACHE_DIR / f"{key}.wav"
225
+
226
+
227
+ def clear_disk_cache() -> None:
228
+ if CACHE_DIR.exists():
229
+ shutil.rmtree(CACHE_DIR)
230
+
231
+
232
+ def _load_or_extract_stem(audio_path: str | os.PathLike[str], params: PipelineParams) -> tuple[np.ndarray, int, str]:
233
+ if params.use_disk_cache:
234
+ cache_path = _stem_cache_path(audio_path, params)
235
+ if cache_path.exists():
236
+ audio, sr = sf.read(cache_path, dtype="float32", always_2d=False)
237
+ return np.asarray(audio, dtype=np.float32), int(sr), f"{params.stem} disk-cache hit"
238
+ audio, sr = extract_stem(
239
+ str(audio_path),
240
+ stem=params.stem,
241
+ device=params.device,
242
+ model_name=params.demucs_model,
243
+ shifts=int(params.demucs_shifts),
244
+ overlap=float(params.demucs_overlap),
245
+ )
246
+ detail = f"{params.stem} via {params.demucs_model}" if params.stem != "all" else "loaded full mix"
247
+ if params.use_disk_cache:
248
+ cache_path = _stem_cache_path(audio_path, params)
249
+ _write_audio(cache_path, audio, sr, subtype="PCM_16")
250
+ detail += " · cached"
251
+ return audio, sr, detail
252
+
253
+
254
  def _make_overview(audio: np.ndarray, sr: int, hits: list[Any], max_points: int = 1600) -> dict[str, Any]:
255
  if len(audio) == 0:
256
  return {"sample_rate": sr, "duration_sec": 0, "envelope": [], "onsets": []}
 
311
  _notify(progress_cb, {"type": "start", "stages": [asdict(s) for s in stages]})
312
 
313
  with _timed_stage(stages, "stem", progress_cb) as stage:
314
+ stem_audio, stem_sr, stem_detail = _load_or_extract_stem(audio_path, params)
 
 
 
 
 
 
 
315
  stem_audio = _normalise_audio(stem_audio)
316
+ stage.detail = stem_detail
317
  _write_audio(out / "stem.wav", stem_audio, stem_sr, subtype="PCM_16")
318
 
319
  audio_duration_sec = len(stem_audio) / stem_sr if stem_sr else 0.0
 
345
  stage.detail = ", ".join(f"{key}:{value}" for key, value in sorted(counts.items()))
346
 
347
  with _timed_stage(stages, "clustering", progress_cb) as stage:
348
+ if params.clustering_mode == "online_preview":
349
+ clusters = cluster_hits_online(
350
+ hits,
351
+ audio=stem_audio,
352
+ sr=stem_sr,
353
+ ncc_threshold=float(params.ncc_threshold),
354
+ attack_ms=float(params.attack_ms),
355
+ mel_threshold=float(params.mel_threshold),
356
+ target_min=int(params.target_min),
357
+ target_max=int(params.target_max),
358
+ )
359
+ stage.detail = f"{len(clusters)} clusters · online preview"
360
+ else:
361
+ clusters = cluster_hits(
362
+ hits,
363
+ audio=stem_audio,
364
+ sr=stem_sr,
365
+ ncc_threshold=float(params.ncc_threshold),
366
+ attack_ms=float(params.attack_ms),
367
+ mel_threshold=float(params.mel_threshold),
368
+ target_min=int(params.target_min),
369
+ target_max=int(params.target_max),
370
+ linkage=params.linkage,
371
+ )
372
+ stage.detail = f"{len(clusters)} clusters · batch quality"
373
  for cluster in clusters:
374
  for hit in cluster.hits:
375
  hit.cluster_id = cluster.cluster_id
 
376
 
377
  with _timed_stage(stages, "selection", progress_cb) as stage:
378
  select_best(clusters)
sample_extractor.py CHANGED
@@ -267,6 +267,120 @@ def _merge_singletons(clusters, sim_matrix, hits, merge_ratio=2.0):
267
  for i,c in enumerate(multi): c.cluster_id = i
268
  return multi
269
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
270
  def cluster_hits(hits, audio=None, sr=44100, ncc_threshold=0.80, attack_ms=25.0,
271
  mel_threshold=0.75, target_min=0, target_max=0,
272
  linkage='average', merge_singletons=True):
 
267
  for i,c in enumerate(multi): c.cluster_id = i
268
  return multi
269
 
270
+
271
+ def _cosine(a, b):
272
+ """Fast cosine similarity for normalized or unnormalized one-dimensional vectors."""
273
+ n = min(len(a), len(b))
274
+ if n <= 0:
275
+ return 0.0
276
+ av = a[:n]
277
+ bv = b[:n]
278
+ denom = float(np.linalg.norm(av) * np.linalg.norm(bv))
279
+ if denom < 1e-8:
280
+ return 0.0
281
+ return float(np.dot(av, bv) / denom)
282
+
283
+
284
+ def _retitle_clusters(clusters):
285
+ """Sort, re-index, and make labels stable after incremental assignment."""
286
+ clusters.sort(key=lambda c: c.count, reverse=True)
287
+ seen = defaultdict(int)
288
+ for i, c in enumerate(clusters):
289
+ c.cluster_id = i
290
+ majority = defaultdict(int)
291
+ for hit in c.hits:
292
+ majority[hit.label] += 1
293
+ base = max(majority, key=majority.get) if majority else c.label.rsplit('_', 1)[0]
294
+ suffix = seen[base]
295
+ seen[base] += 1
296
+ c.label = f"{base}_{suffix}"
297
+ return clusters
298
+
299
+
300
+ def cluster_hits_online(hits, audio=None, sr=44100, ncc_threshold=0.72, attack_ms=25.0,
301
+ mel_threshold=0.62, target_min=0, target_max=0,
302
+ max_clusters=0):
303
+ """Prototype-based online clustering for near-realtime previews.
304
+
305
+ The batch algorithm builds an all-pairs matrix and then runs agglomerative
306
+ clustering. This mode instead processes hits in onset order and compares
307
+ each new hit only against current cluster prototypes. Complexity is roughly
308
+ O(number_of_hits × number_of_clusters), so it can update progressively while
309
+ audio is being analyzed. It is intentionally a preview/final-fast algorithm,
310
+ not a replacement for the highest-quality batch pass.
311
+ """
312
+ if not hits:
313
+ return []
314
+ if len(hits) == 1:
315
+ return [Cluster(cluster_id=0, label=f"{hits[0].label}_0", hits=[hits[0]])]
316
+ if audio is None:
317
+ audio = np.concatenate([h.audio for h in hits])
318
+
319
+ cap = int(max_clusters or target_max or 0)
320
+ if cap <= 0:
321
+ cap = max(1, min(len(hits), int(target_min or 16)))
322
+ cap = max(1, min(cap, len(hits)))
323
+
324
+ print(f"[Cluster:online] {len(hits)} hits, cap={cap}, attack={attack_ms}ms")
325
+ ordered = sorted(hits, key=lambda h: h.onset_time)
326
+ clusters = []
327
+ proto_fp = []
328
+ proto_tr = []
329
+ proto_energy = []
330
+
331
+ for hit in ordered:
332
+ fp = _mel_fingerprint(audio, sr, hit.onset_time)
333
+ tr = _extract_transient(audio, sr, hit.onset_time, attack_ms)
334
+ best_idx = -1
335
+ best_score = -1.0
336
+ best_mel = 0.0
337
+ best_ncc = 0.0
338
+ for i, cluster in enumerate(clusters):
339
+ # Prefer same broad class when possible, but do not make it mandatory.
340
+ label_bonus = 0.05 if cluster.label.startswith(hit.label + "_") else 0.0
341
+ mel = _cosine(fp, proto_fp[i])
342
+ if mel < mel_threshold:
343
+ continue
344
+ ncc = _transient_ncc(tr, proto_tr[i])
345
+ score = (0.45 * mel) + (0.55 * ncc) + label_bonus
346
+ if score > best_score:
347
+ best_idx, best_score, best_mel, best_ncc = i, score, mel, ncc
348
+
349
+ should_create = best_idx < 0 or (best_ncc < ncc_threshold and best_score < ncc_threshold)
350
+ if should_create and len(clusters) < cap:
351
+ cluster = Cluster(cluster_id=len(clusters), label=f"{hit.label}_{len(clusters)}", hits=[hit])
352
+ clusters.append(cluster)
353
+ proto_fp.append(fp)
354
+ proto_tr.append(tr)
355
+ proto_energy.append(max(hit.rms_energy, 1e-8))
356
+ continue
357
+
358
+ if best_idx < 0:
359
+ # Cap reached and no good match: assign to the nearest prototype by mel.
360
+ similarities = [_cosine(fp, existing) for existing in proto_fp]
361
+ best_idx = int(np.argmax(similarities))
362
+ cluster = clusters[best_idx]
363
+ cluster.hits.append(hit)
364
+
365
+ # Energy-weighted rolling prototype update; keeps loud clean hits dominant.
366
+ w_old = proto_energy[best_idx]
367
+ w_new = max(hit.rms_energy, 1e-8)
368
+ total = w_old + w_new
369
+ max_len = max(len(proto_fp[best_idx]), len(fp))
370
+ old_fp = np.pad(proto_fp[best_idx], (0, max_len - len(proto_fp[best_idx])))
371
+ new_fp = np.pad(fp, (0, max_len - len(fp)))
372
+ proto_fp[best_idx] = ((old_fp * w_old) + (new_fp * w_new)) / total
373
+ max_tr = max(len(proto_tr[best_idx]), len(tr))
374
+ old_tr = np.pad(proto_tr[best_idx], (0, max_tr - len(proto_tr[best_idx])))
375
+ new_tr = np.pad(tr, (0, max_tr - len(tr)))
376
+ proto_tr[best_idx] = ((old_tr * w_old) + (new_tr * w_new)) / total
377
+ proto_energy[best_idx] = total
378
+
379
+ clusters = _retitle_clusters(clusters)
380
+ for c in clusters:
381
+ print(f" {c.label}: {c.count}")
382
+ return clusters
383
+
384
  def cluster_hits(hits, audio=None, sr=44100, ncc_threshold=0.80, attack_ms=25.0,
385
  mel_threshold=0.75, target_min=0, target_max=0,
386
  linkage='average', merge_singletons=True):
scripts/benchmark_subprocesses.py CHANGED
@@ -24,19 +24,20 @@ from sample_extractor import cache_clear
24
  from synth_generator import generate_test_song
25
 
26
 
27
- def run_case(pattern: str, bars: int, bpm: float, run_index: int) -> dict:
28
  tmp = Path(tempfile.mkdtemp(prefix="dse-bench-"))
29
  song = generate_test_song(pattern_name=pattern, bars=bars, bpm=bpm, add_bass=False, seed=42 + run_index)
30
  src = tmp / f"{pattern}-{bars}bars.wav"
31
  sf.write(src, song.drums_only, song.sr)
32
  cache_clear()
33
- params = PipelineParams(stem="all", target_min=4, target_max=12, synthesize=True)
34
  result = run_extraction_pipeline(src, tmp / "out", params)
35
  return {
36
  "pattern": pattern,
37
  "bars": bars,
38
  "bpm": bpm,
39
  "run_index": run_index,
 
40
  "audio_duration_sec": result.audio_duration_sec,
41
  "total_duration_sec": result.duration_sec,
42
  "realtime_factor": result.realtime_factor,
@@ -52,15 +53,16 @@ def main() -> int:
52
  parser.add_argument("--bars", type=int, default=4)
53
  parser.add_argument("--bpm", type=float, default=120.0)
54
  parser.add_argument("--output", default="docs/benchmark-subprocesses.json")
 
55
  args = parser.parse_args()
56
 
57
  # Warm imports/JIT and discard the result.
58
- run_case("rock", 1, args.bpm, -1)
59
 
60
  rows = []
61
  for run_index in range(args.runs):
62
  for pattern in ["rock", "funk", "halftime"]:
63
- rows.append(run_case(pattern, args.bars, args.bpm, run_index))
64
 
65
  stage_keys = [stage["key"] for stage in rows[0]["stages"]]
66
  summary = []
@@ -74,7 +76,7 @@ def main() -> int:
74
  "max_sec": round(max(values), 6),
75
  })
76
 
77
- payload = {"runs": rows, "summary": summary}
78
  out = Path(args.output)
79
  out.parent.mkdir(parents=True, exist_ok=True)
80
  out.write_text(json.dumps(payload, indent=2), encoding="utf-8")
 
24
  from synth_generator import generate_test_song
25
 
26
 
27
+ def run_case(pattern: str, bars: int, bpm: float, run_index: int, clustering_mode: str) -> dict:
28
  tmp = Path(tempfile.mkdtemp(prefix="dse-bench-"))
29
  song = generate_test_song(pattern_name=pattern, bars=bars, bpm=bpm, add_bass=False, seed=42 + run_index)
30
  src = tmp / f"{pattern}-{bars}bars.wav"
31
  sf.write(src, song.drums_only, song.sr)
32
  cache_clear()
33
+ params = PipelineParams(stem="all", clustering_mode=clustering_mode, target_min=4, target_max=12, synthesize=True)
34
  result = run_extraction_pipeline(src, tmp / "out", params)
35
  return {
36
  "pattern": pattern,
37
  "bars": bars,
38
  "bpm": bpm,
39
  "run_index": run_index,
40
+ "clustering_mode": clustering_mode,
41
  "audio_duration_sec": result.audio_duration_sec,
42
  "total_duration_sec": result.duration_sec,
43
  "realtime_factor": result.realtime_factor,
 
53
  parser.add_argument("--bars", type=int, default=4)
54
  parser.add_argument("--bpm", type=float, default=120.0)
55
  parser.add_argument("--output", default="docs/benchmark-subprocesses.json")
56
+ parser.add_argument("--clustering-mode", choices=["batch_quality", "online_preview"], default="batch_quality")
57
  args = parser.parse_args()
58
 
59
  # Warm imports/JIT and discard the result.
60
+ run_case("rock", 1, args.bpm, -1, args.clustering_mode)
61
 
62
  rows = []
63
  for run_index in range(args.runs):
64
  for pattern in ["rock", "funk", "halftime"]:
65
+ rows.append(run_case(pattern, args.bars, args.bpm, run_index, args.clustering_mode))
66
 
67
  stage_keys = [stage["key"] for stage in rows[0]["stages"]]
68
  summary = []
 
76
  "max_sec": round(max(values), 6),
77
  })
78
 
79
+ payload = {"clustering_mode": args.clustering_mode, "runs": rows, "summary": summary}
80
  out = Path(args.output)
81
  out.parent.mkdir(parents=True, exist_ok=True)
82
  out.write_text(json.dumps(payload, indent=2), encoding="utf-8")
web/app.js CHANGED
@@ -1,16 +1,20 @@
1
  const $ = (id) => document.getElementById(id);
2
 
3
  const fields = [
4
- "stem", "demucs_model", "demucs_shifts", "demucs_overlap", "onset_mode", "onset_delta",
5
  "energy_threshold_db", "pre_pad", "min_dur", "max_dur", "min_gap", "ncc_threshold",
6
  "attack_ms", "mel_threshold", "linkage", "target_min", "target_max", "subdivision",
7
- "synthesize", "quantize_midi"
8
  ];
9
 
10
  let config = null;
11
  let selectedFile = null;
12
  let activePoll = null;
13
 
 
 
 
 
14
  function fmtSec(value) {
15
  if (value === null || value === undefined || Number.isNaN(Number(value))) return "—";
16
  const n = Number(value);
@@ -19,6 +23,11 @@ function fmtSec(value) {
19
  return `${n.toFixed(2)} s`;
20
  }
21
 
 
 
 
 
 
22
  function setHealth(ok, text, subtext) {
23
  $("healthDot").className = `status-dot ${ok ? "ok" : "bad"}`;
24
  $("healthText").textContent = text;
@@ -47,6 +56,7 @@ function setSelectOptions(select, values, labels = null) {
47
 
48
  function populateConfig() {
49
  setSelectOptions($("demucs_model"), config.demucs_models);
 
50
  const defaults = config.defaults;
51
  for (const field of fields) {
52
  const el = $(field);
@@ -80,9 +90,9 @@ function collectParams() {
80
 
81
  function renderStages(stages = []) {
82
  $("stageList").innerHTML = stages.map((stage) => `
83
- <div class="stage ${stage.status}" title="${stage.detail || ""}">
84
  <span class="badge"></span>
85
- <div><strong>${stage.label}</strong><small>${stage.detail || stage.status}</small></div>
86
  <time>${fmtSec(stage.duration_sec)}</time>
87
  </div>
88
  `).join("");
@@ -138,26 +148,27 @@ function drawWaveform(overview) {
138
  function renderResult(job) {
139
  const result = job.result;
140
  if (!result) return;
141
- const rtf = result.realtime_factor.toFixed(2);
142
- $("resultSummary").textContent = `${result.hit_count} hits → ${result.cluster_count} samples · BPM ${result.bpm ?? "—"} · ${fmtSec(result.duration_sec)} total · ${rtf}× realtime`;
 
143
  drawWaveform(result.overview);
144
 
145
  const fileUrls = result.file_urls ?? {};
146
  const labels = { archive: "Sample pack ZIP", midi: "MIDI", stem: "Stem WAV", reconstruction: "Reconstruction WAV" };
147
- $("downloads").innerHTML = Object.entries(fileUrls).map(([key, url]) => `<a href="${url}" download>${labels[key] ?? key}</a>`).join("");
148
  $("stemAudio").src = fileUrls.stem ?? "";
149
  $("reconAudio").src = fileUrls.reconstruction ?? "";
150
 
151
  const tbody = $("samplesTable").querySelector("tbody");
152
  tbody.innerHTML = (result.samples ?? []).map((sample) => `
153
  <tr>
154
- <td>${sample.label}</td>
155
- <td>${sample.classification}</td>
156
- <td>${sample.hits}</td>
157
- <td>${sample.score}</td>
158
- <td>${sample.duration_ms} ms</td>
159
- <td>${sample.first_onset_sec} s</td>
160
- <td><a href="${sample.url}" download>WAV</a></td>
161
  </tr>
162
  `).join("");
163
  }
@@ -173,6 +184,38 @@ function renderJob(job) {
173
  }
174
  }
175
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
176
  async function pollJob(id) {
177
  if (activePoll) clearInterval(activePoll);
178
  const tick = async () => {
@@ -183,6 +226,7 @@ async function pollJob(id) {
183
  clearInterval(activePoll);
184
  activePoll = null;
185
  $("runButton").disabled = !selectedFile;
 
186
  }
187
  } catch (error) {
188
  clearInterval(activePoll);
@@ -207,6 +251,7 @@ async function runExtraction() {
207
  const job = await api("/api/jobs", { method: "POST", body: form });
208
  renderJob(job);
209
  await pollJob(job.id);
 
210
  } catch (error) {
211
  $("runButton").disabled = false;
212
  $("resultSummary").textContent = error.message;
@@ -229,6 +274,7 @@ async function boot() {
229
  await api("/api/health");
230
  config = await api("/api/config");
231
  populateConfig();
 
232
  setHealth(true, "Ready", "Backend online");
233
  } catch (error) {
234
  setHealth(false, "Offline", error.message);
@@ -244,10 +290,20 @@ $("useFastButton").addEventListener("click", () => {
244
  $("target_min").value = 4;
245
  $("target_max").value = 16;
246
  });
 
 
 
 
 
 
 
 
 
 
247
  $("clearCacheButton").addEventListener("click", async () => {
248
  try {
249
  await api("/api/cache/clear", { method: "POST" });
250
- $("logs").textContent = "Pipeline cache cleared.";
251
  } catch (error) {
252
  $("logs").textContent = error.message;
253
  }
 
1
  const $ = (id) => document.getElementById(id);
2
 
3
  const fields = [
4
+ "stem", "demucs_model", "clustering_mode", "demucs_shifts", "demucs_overlap", "onset_mode", "onset_delta",
5
  "energy_threshold_db", "pre_pad", "min_dur", "max_dur", "min_gap", "ncc_threshold",
6
  "attack_ms", "mel_threshold", "linkage", "target_min", "target_max", "subdivision",
7
+ "synthesize", "quantize_midi", "use_disk_cache"
8
  ];
9
 
10
  let config = null;
11
  let selectedFile = null;
12
  let activePoll = null;
13
 
14
+ function esc(value) {
15
+ return String(value ?? "").replace(/[&<>'"]/g, (c) => ({ "&": "&amp;", "<": "&lt;", ">": "&gt;", "'": "&#39;", '"': "&quot;" }[c]));
16
+ }
17
+
18
  function fmtSec(value) {
19
  if (value === null || value === undefined || Number.isNaN(Number(value))) return "—";
20
  const n = Number(value);
 
23
  return `${n.toFixed(2)} s`;
24
  }
25
 
26
+ function fmtDate(epochSeconds) {
27
+ if (!epochSeconds) return "—";
28
+ return new Date(epochSeconds * 1000).toLocaleString(undefined, { dateStyle: "medium", timeStyle: "short" });
29
+ }
30
+
31
  function setHealth(ok, text, subtext) {
32
  $("healthDot").className = `status-dot ${ok ? "ok" : "bad"}`;
33
  $("healthText").textContent = text;
 
56
 
57
  function populateConfig() {
58
  setSelectOptions($("demucs_model"), config.demucs_models);
59
+ setSelectOptions($("clustering_mode"), Object.keys(config.clustering_modes ?? { batch_quality: "", online_preview: "" }), config.clustering_modes);
60
  const defaults = config.defaults;
61
  for (const field of fields) {
62
  const el = $(field);
 
90
 
91
  function renderStages(stages = []) {
92
  $("stageList").innerHTML = stages.map((stage) => `
93
+ <div class="stage ${esc(stage.status)}" title="${esc(stage.detail || "")}">
94
  <span class="badge"></span>
95
+ <div><strong>${esc(stage.label)}</strong><small>${esc(stage.detail || stage.status)}</small></div>
96
  <time>${fmtSec(stage.duration_sec)}</time>
97
  </div>
98
  `).join("");
 
148
  function renderResult(job) {
149
  const result = job.result;
150
  if (!result) return;
151
+ const rtf = Number(result.realtime_factor).toFixed(2);
152
+ const mode = result.params?.clustering_mode ?? "—";
153
+ $("resultSummary").textContent = `${result.hit_count} hits → ${result.cluster_count} samples · BPM ${result.bpm ?? "—"} · ${fmtSec(result.duration_sec)} total · ${rtf}× realtime · ${mode}`;
154
  drawWaveform(result.overview);
155
 
156
  const fileUrls = result.file_urls ?? {};
157
  const labels = { archive: "Sample pack ZIP", midi: "MIDI", stem: "Stem WAV", reconstruction: "Reconstruction WAV" };
158
+ $("downloads").innerHTML = Object.entries(fileUrls).map(([key, url]) => `<a href="${esc(url)}" download>${esc(labels[key] ?? key)}</a>`).join("");
159
  $("stemAudio").src = fileUrls.stem ?? "";
160
  $("reconAudio").src = fileUrls.reconstruction ?? "";
161
 
162
  const tbody = $("samplesTable").querySelector("tbody");
163
  tbody.innerHTML = (result.samples ?? []).map((sample) => `
164
  <tr>
165
+ <td>${esc(sample.label)}</td>
166
+ <td>${esc(sample.classification)}</td>
167
+ <td>${esc(sample.hits)}</td>
168
+ <td>${esc(sample.score)}</td>
169
+ <td>${esc(sample.duration_ms)} ms</td>
170
+ <td>${esc(sample.first_onset_sec)} s</td>
171
+ <td><a href="${esc(sample.url)}" download>WAV</a></td>
172
  </tr>
173
  `).join("");
174
  }
 
184
  }
185
  }
186
 
187
+ function renderHistory(payload) {
188
+ const rows = [...(payload.active ?? []), ...(payload.history ?? [])];
189
+ if (!rows.length) {
190
+ $("historyList").innerHTML = `<p class="empty">No completed runs yet.</p>`;
191
+ return;
192
+ }
193
+ $("historyList").innerHTML = rows.map((row) => `
194
+ <button class="history-row" type="button" data-job-id="${esc(row.id)}">
195
+ <span><strong>${esc(row.filename || row.id)}</strong><small>${esc(row.stem || "—")} · ${esc(row.clustering_mode || "—")} · ${fmtDate(row.created_at)}</small></span>
196
+ <span>${esc(row.hit_count ?? "…")} hits</span>
197
+ <span>${esc(row.cluster_count ?? "…")} samples</span>
198
+ <span>${row.realtime_factor == null ? "—" : `${Number(row.realtime_factor).toFixed(2)}×`}</span>
199
+ </button>
200
+ `).join("");
201
+ for (const button of $("historyList").querySelectorAll(".history-row")) {
202
+ button.addEventListener("click", async () => {
203
+ const job = await api(`/api/jobs/${button.dataset.jobId}`);
204
+ renderJob(job);
205
+ window.scrollTo({ top: document.body.scrollHeight, behavior: "smooth" });
206
+ });
207
+ }
208
+ }
209
+
210
+ async function refreshHistory() {
211
+ try {
212
+ const payload = await api("/api/jobs?limit=50");
213
+ renderHistory(payload);
214
+ } catch (error) {
215
+ $("historyList").innerHTML = `<p class="empty">${esc(error.message)}</p>`;
216
+ }
217
+ }
218
+
219
  async function pollJob(id) {
220
  if (activePoll) clearInterval(activePoll);
221
  const tick = async () => {
 
226
  clearInterval(activePoll);
227
  activePoll = null;
228
  $("runButton").disabled = !selectedFile;
229
+ await refreshHistory();
230
  }
231
  } catch (error) {
232
  clearInterval(activePoll);
 
251
  const job = await api("/api/jobs", { method: "POST", body: form });
252
  renderJob(job);
253
  await pollJob(job.id);
254
+ await refreshHistory();
255
  } catch (error) {
256
  $("runButton").disabled = false;
257
  $("resultSummary").textContent = error.message;
 
274
  await api("/api/health");
275
  config = await api("/api/config");
276
  populateConfig();
277
+ await refreshHistory();
278
  setHealth(true, "Ready", "Backend online");
279
  } catch (error) {
280
  setHealth(false, "Offline", error.message);
 
290
  $("target_min").value = 4;
291
  $("target_max").value = 16;
292
  });
293
+ $("usePreviewButton").addEventListener("click", () => {
294
+ $("stem").value = "all";
295
+ $("clustering_mode").value = "online_preview";
296
+ $("demucs_shifts").value = 0;
297
+ $("target_min").value = 4;
298
+ $("target_max").value = 16;
299
+ $("mel_threshold").value = 0.62;
300
+ $("ncc_threshold").value = 0.72;
301
+ });
302
+ $("refreshHistoryButton").addEventListener("click", refreshHistory);
303
  $("clearCacheButton").addEventListener("click", async () => {
304
  try {
305
  await api("/api/cache/clear", { method: "POST" });
306
+ $("logs").textContent = "Pipeline memory and disk cache cleared.";
307
  } catch (error) {
308
  $("logs").textContent = error.message;
309
  }
web/index.html CHANGED
@@ -44,7 +44,7 @@
44
  <div class="panel-heading">
45
  <div>
46
  <h2>2. Extraction controls</h2>
47
- <p>Defaults favor quick full-song extraction. Tighten thresholds after reviewing the timeline.</p>
48
  </div>
49
  <button id="clearCacheButton" class="ghost-button" type="button">Clear cache</button>
50
  </div>
@@ -56,6 +56,12 @@
56
  <label>Demucs model
57
  <select id="demucs_model"></select>
58
  </label>
 
 
 
 
 
 
59
  <label>Shifts
60
  <input id="demucs_shifts" type="number" min="0" max="8" step="1" />
61
  </label>
@@ -123,11 +129,13 @@
123
  <div class="toggles">
124
  <label><input id="synthesize" type="checkbox" /> synthesize alternates</label>
125
  <label><input id="quantize_midi" type="checkbox" /> quantize MIDI</label>
 
126
  </div>
127
 
128
  <div class="actions">
129
  <button id="runButton" class="primary-button" type="button" disabled>Extract samples</button>
130
  <button id="useFastButton" class="secondary-button" type="button">Use fast full-mix mode</button>
 
131
  </div>
132
  </section>
133
 
@@ -143,6 +151,17 @@
143
  <pre id="logs" class="logs" aria-live="polite"></pre>
144
  </section>
145
 
 
 
 
 
 
 
 
 
 
 
 
146
  <section class="panel result-panel">
147
  <div class="panel-heading">
148
  <div>
 
44
  <div class="panel-heading">
45
  <div>
46
  <h2>2. Extraction controls</h2>
47
+ <p>Batch quality gives the best final grouping. Online preview is the near-realtime clustering path.</p>
48
  </div>
49
  <button id="clearCacheButton" class="ghost-button" type="button">Clear cache</button>
50
  </div>
 
56
  <label>Demucs model
57
  <select id="demucs_model"></select>
58
  </label>
59
+ <label>Clustering mode
60
+ <select id="clustering_mode">
61
+ <option value="batch_quality">batch quality</option>
62
+ <option value="online_preview">online preview</option>
63
+ </select>
64
+ </label>
65
  <label>Shifts
66
  <input id="demucs_shifts" type="number" min="0" max="8" step="1" />
67
  </label>
 
129
  <div class="toggles">
130
  <label><input id="synthesize" type="checkbox" /> synthesize alternates</label>
131
  <label><input id="quantize_midi" type="checkbox" /> quantize MIDI</label>
132
+ <label><input id="use_disk_cache" type="checkbox" /> disk cache stems/source loads</label>
133
  </div>
134
 
135
  <div class="actions">
136
  <button id="runButton" class="primary-button" type="button" disabled>Extract samples</button>
137
  <button id="useFastButton" class="secondary-button" type="button">Use fast full-mix mode</button>
138
+ <button id="usePreviewButton" class="secondary-button" type="button">Use online preview mode</button>
139
  </div>
140
  </section>
141
 
 
151
  <pre id="logs" class="logs" aria-live="polite"></pre>
152
  </section>
153
 
154
+ <section class="panel history-panel">
155
+ <div class="panel-heading">
156
+ <div>
157
+ <h2>Run history</h2>
158
+ <p>Completed manifests under <code>.runs/</code> are indexed automatically. Load a run to compare timings and artifacts.</p>
159
+ </div>
160
+ <button id="refreshHistoryButton" class="ghost-button" type="button">Refresh</button>
161
+ </div>
162
+ <div id="historyList" class="history-list"></div>
163
+ </section>
164
+
165
  <section class="panel result-panel">
166
  <div class="panel-heading">
167
  <div>
web/styles.css CHANGED
@@ -78,3 +78,11 @@ td { color: #e5eaf7; }
78
  tr:last-child td { border-bottom: 0; }
79
  @media (max-width: 1100px) { .workspace, .hero { grid-template-columns: 1fr; } .control-grid { grid-template-columns: repeat(2, minmax(0, 1fr)); } }
80
  @media (max-width: 680px) { .shell { width: min(100% - 20px, 1520px); padding-top: 16px; } .panel { padding: 16px; border-radius: 22px; } .control-grid, .audio-grid { grid-template-columns: 1fr; } h1 { letter-spacing: -.045em; } }
 
 
 
 
 
 
 
 
 
78
  tr:last-child td { border-bottom: 0; }
79
  @media (max-width: 1100px) { .workspace, .hero { grid-template-columns: 1fr; } .control-grid { grid-template-columns: repeat(2, minmax(0, 1fr)); } }
80
  @media (max-width: 680px) { .shell { width: min(100% - 20px, 1520px); padding-top: 16px; } .panel { padding: 16px; border-radius: 22px; } .control-grid, .audio-grid { grid-template-columns: 1fr; } h1 { letter-spacing: -.045em; } }
81
+ .history-panel { align-self: stretch; }
82
+ .history-list { display: grid; gap: 8px; max-height: 360px; overflow: auto; }
83
+ .history-row { width: 100%; display: grid; grid-template-columns: minmax(0, 1fr) auto auto auto; gap: 12px; align-items: center; text-align: left; border: 1px solid var(--line); background: rgba(0,0,0,.16); border-radius: 16px; padding: 12px; }
84
+ .history-row strong { display: block; overflow: hidden; text-overflow: ellipsis; white-space: nowrap; color: var(--text); }
85
+ .history-row small { display: block; color: var(--muted); margin-top: 3px; }
86
+ .history-row span:not(:first-child) { color: #dbe5f7; font-size: 12px; font-variant-numeric: tabular-nums; }
87
+ .empty { color: var(--muted); margin: 0; }
88
+ @media (max-width: 680px) { .history-row { grid-template-columns: 1fr 1fr; } }