Spaces:
Sleeping
Sleeping
ChatGPT commited on
Commit ·
b8fa9bf
1
Parent(s): eb1a122
feat: add run history and online clustering
Browse files- .gitignore +2 -0
- README.md +52 -14
- app.py +84 -24
- docs/API.md +98 -43
- docs/FEATURES.md +58 -0
- docs/PIPELINE_TIMING_AND_REALTIME.md +94 -177
- docs/PROGRESS.md +63 -0
- docs/PROJECT_REVIEW.md +41 -78
- docs/REMAINING_WORK.md +24 -16
- docs/TASKS.md +54 -0
- docs/UI_REPLACEMENT.md +30 -15
- docs/benchmark-online-preview.json +273 -0
- docs/benchmark-subprocesses.json +90 -293
- pipeline_runner.py +88 -21
- sample_extractor.py +114 -0
- scripts/benchmark_subprocesses.py +7 -5
- web/app.js +71 -15
- web/index.html +20 -1
- web/styles.css +8 -0
.gitignore
CHANGED
|
@@ -20,3 +20,5 @@ build/
|
|
| 20 |
*.mid
|
| 21 |
*.zip
|
| 22 |
!drum-sample-extractor-updated.zip
|
|
|
|
|
|
|
|
|
| 20 |
*.mid
|
| 21 |
*.zip
|
| 22 |
!drum-sample-extractor-updated.zip
|
| 23 |
+
|
| 24 |
+
.cache/
|
README.md
CHANGED
|
@@ -10,17 +10,41 @@ pinned: false
|
|
| 10 |
|
| 11 |
# Drum Sample Extractor
|
| 12 |
|
| 13 |
-
A custom FastAPI + browser
|
| 14 |
|
| 15 |
-
The pipeline can isolate a stem with Demucs, detect onsets, classify hits, cluster similar transients, choose representative samples, optionally synthesize alternate samples, and export WAVs, MIDI, reconstruction audio, and a complete ZIP sample pack.
|
| 16 |
|
| 17 |
## Current status
|
| 18 |
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 24 |
|
| 25 |
## Run locally
|
| 26 |
|
|
@@ -33,7 +57,12 @@ uvicorn app:app --host 0.0.0.0 --port 7860
|
|
| 33 |
|
| 34 |
Open `http://127.0.0.1:7860`.
|
| 35 |
|
| 36 |
-
For fast iteration, set
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 37 |
|
| 38 |
## Run benchmarks
|
| 39 |
|
|
@@ -43,13 +72,13 @@ python3 scripts/benchmark_subprocesses.py --runs 2 --bars 4 --output docs/benchm
|
|
| 43 |
|
| 44 |
The benchmark uses synthetic drum fixtures and `stem=all` so the DSP stages are measured without Demucs model download/runtime noise.
|
| 45 |
|
| 46 |
-
## API
|
| 47 |
|
| 48 |
```bash
|
| 49 |
curl http://127.0.0.1:7860/api/config
|
| 50 |
|
| 51 |
curl -F 'file=@song.wav' \
|
| 52 |
-
-F 'params={"stem":"all","target_min":4,"target_max":12}' \
|
| 53 |
http://127.0.0.1:7860/api/jobs
|
| 54 |
```
|
| 55 |
|
|
@@ -59,16 +88,22 @@ Then poll the returned job id:
|
|
| 59 |
curl http://127.0.0.1:7860/api/jobs/<job-id>
|
| 60 |
```
|
| 61 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 62 |
## Important files
|
| 63 |
|
| 64 |
| Path | Purpose |
|
| 65 |
|---|---|
|
| 66 |
-
| `app.py` | FastAPI app, static UI serving, job API, artifact downloads |
|
| 67 |
-
| `pipeline_runner.py` | Timed extraction pipeline
|
| 68 |
| `sample_extractor.py` | Core DSP/sample extraction implementation |
|
| 69 |
| `web/` | Custom no-build browser frontend |
|
| 70 |
| `scripts/benchmark_subprocesses.py` | Synthetic benchmark runner for stage timings |
|
| 71 |
-
| `docs/` | Review, timing, API, and
|
| 72 |
| `legacy/` | Previous Gradio apps retained for reference |
|
| 73 |
|
| 74 |
## Output per run
|
|
@@ -82,4 +117,7 @@ Each run is stored under `.runs/<job-id>/output/`:
|
|
| 82 |
- `samples/*.wav`
|
| 83 |
- `manifest.json`
|
| 84 |
|
| 85 |
-
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
|
| 11 |
# Drum Sample Extractor
|
| 12 |
|
| 13 |
+
A custom FastAPI + browser workstation for extracting reusable drum samples from an audio file.
|
| 14 |
|
| 15 |
+
The pipeline can isolate a stem with Demucs, detect onsets, classify hits, cluster similar transients, choose representative samples, optionally synthesize alternate samples, and export WAVs, MIDI, reconstruction audio, manifests, and a complete ZIP sample pack.
|
| 16 |
|
| 17 |
## Current status
|
| 18 |
|
| 19 |
+
The project is usable as a local/Hugging Face Space application. Gradio is no longer the active UI; the active app is a custom FastAPI backend plus a no-build browser frontend.
|
| 20 |
+
|
| 21 |
+
Implemented in the current development pass:
|
| 22 |
+
|
| 23 |
+
- Custom web frontend in `web/`, served by `app.py`.
|
| 24 |
+
- FastAPI job API with upload, polling, safe artifact downloads, config, health, cache clearing, and run-history listing.
|
| 25 |
+
- Timed pipeline runner in `pipeline_runner.py`.
|
| 26 |
+
- Per-stage timing in every `manifest.json`.
|
| 27 |
+
- Two clustering modes:
|
| 28 |
+
- `batch_quality`: all-pairs mel/NCC similarity plus agglomerative clustering.
|
| 29 |
+
- `online_preview`: prototype-based incremental assignment intended for near-realtime preview.
|
| 30 |
+
- Disk cache for decoded full-mix/stem outputs keyed by source digest and extraction settings.
|
| 31 |
+
- Run history panel indexing `.runs/*/output/manifest.json`.
|
| 32 |
+
- Documentation for features, progress, tasks, API, timing, realtime suitability, UI, and remaining work.
|
| 33 |
+
- Legacy Gradio apps preserved in `legacy/` for reference only.
|
| 34 |
+
|
| 35 |
+
Not fully complete yet:
|
| 36 |
+
|
| 37 |
+
- No interactive waveform editing of onsets/clusters.
|
| 38 |
+
- No server-sent event stream or websocket progress channel.
|
| 39 |
+
- No frontend TypeScript build/test harness.
|
| 40 |
+
- Demucs remains offline/batch by design.
|
| 41 |
+
|
| 42 |
+
See:
|
| 43 |
+
|
| 44 |
+
- `docs/FEATURES.md`
|
| 45 |
+
- `docs/TASKS.md`
|
| 46 |
+
- `docs/PROGRESS.md`
|
| 47 |
+
- `docs/REMAINING_WORK.md`
|
| 48 |
|
| 49 |
## Run locally
|
| 50 |
|
|
|
|
| 57 |
|
| 58 |
Open `http://127.0.0.1:7860`.
|
| 59 |
|
| 60 |
+
For fast iteration, set:
|
| 61 |
+
|
| 62 |
+
- `Stem = all`
|
| 63 |
+
- `Clustering mode = online_preview`
|
| 64 |
+
|
| 65 |
+
That bypasses Demucs and uses the near-realtime clustering path.
|
| 66 |
|
| 67 |
## Run benchmarks
|
| 68 |
|
|
|
|
| 72 |
|
| 73 |
The benchmark uses synthetic drum fixtures and `stem=all` so the DSP stages are measured without Demucs model download/runtime noise.
|
| 74 |
|
| 75 |
+
## API example
|
| 76 |
|
| 77 |
```bash
|
| 78 |
curl http://127.0.0.1:7860/api/config
|
| 79 |
|
| 80 |
curl -F 'file=@song.wav' \
|
| 81 |
+
-F 'params={"stem":"all","clustering_mode":"online_preview","target_min":4,"target_max":12}' \
|
| 82 |
http://127.0.0.1:7860/api/jobs
|
| 83 |
```
|
| 84 |
|
|
|
|
| 88 |
curl http://127.0.0.1:7860/api/jobs/<job-id>
|
| 89 |
```
|
| 90 |
|
| 91 |
+
List active/completed runs:
|
| 92 |
+
|
| 93 |
+
```bash
|
| 94 |
+
curl http://127.0.0.1:7860/api/jobs
|
| 95 |
+
```
|
| 96 |
+
|
| 97 |
## Important files
|
| 98 |
|
| 99 |
| Path | Purpose |
|
| 100 |
|---|---|
|
| 101 |
+
| `app.py` | FastAPI app, static UI serving, job API, run history, artifact downloads |
|
| 102 |
+
| `pipeline_runner.py` | Timed extraction pipeline, disk stem/source cache, batch/online clustering routing |
|
| 103 |
| `sample_extractor.py` | Core DSP/sample extraction implementation |
|
| 104 |
| `web/` | Custom no-build browser frontend |
|
| 105 |
| `scripts/benchmark_subprocesses.py` | Synthetic benchmark runner for stage timings |
|
| 106 |
+
| `docs/` | Review, timing, API, UI, feature, task, progress, and remaining-work documentation |
|
| 107 |
| `legacy/` | Previous Gradio apps retained for reference |
|
| 108 |
|
| 109 |
## Output per run
|
|
|
|
| 117 |
- `samples/*.wav`
|
| 118 |
- `manifest.json`
|
| 119 |
|
| 120 |
+
Generated runtime directories are ignored by git:
|
| 121 |
+
|
| 122 |
+
- `.runs/`
|
| 123 |
+
- `.cache/`
|
app.py
CHANGED
|
@@ -9,6 +9,7 @@ from __future__ import annotations
|
|
| 9 |
|
| 10 |
import json
|
| 11 |
import shutil
|
|
|
|
| 12 |
import traceback
|
| 13 |
import uuid
|
| 14 |
from concurrent.futures import ThreadPoolExecutor
|
|
@@ -22,7 +23,7 @@ from fastapi.middleware.cors import CORSMiddleware
|
|
| 22 |
from fastapi.responses import FileResponse, JSONResponse
|
| 23 |
from fastapi.staticfiles import StaticFiles
|
| 24 |
|
| 25 |
-
from pipeline_runner import PipelineParams, initial_stages, run_extraction_pipeline
|
| 26 |
from sample_extractor import DEMUCS_MODELS, DEMUCS_STEMS, cache_clear
|
| 27 |
|
| 28 |
ROOT = Path(__file__).resolve().parent
|
|
@@ -30,7 +31,7 @@ WEB_DIR = ROOT / "web"
|
|
| 30 |
RUNS_DIR = ROOT / ".runs"
|
| 31 |
RUNS_DIR.mkdir(exist_ok=True)
|
| 32 |
|
| 33 |
-
app = FastAPI(title="Drum Sample Extractor", version="
|
| 34 |
app.add_middleware(
|
| 35 |
CORSMiddleware,
|
| 36 |
allow_origins=["*"],
|
|
@@ -61,6 +62,63 @@ def _serialise_job(job: dict[str, Any]) -> dict[str, Any]:
|
|
| 61 |
return payload
|
| 62 |
|
| 63 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 64 |
def _update_job(job_id: str, **patch: Any) -> None:
|
| 65 |
with jobs_lock:
|
| 66 |
jobs[job_id].update(patch)
|
|
@@ -87,7 +145,8 @@ def _run_job(job_id: str) -> None:
|
|
| 87 |
if stage.get("status") == "running":
|
| 88 |
_append_log(job_id, f"Started: {stage['label']}")
|
| 89 |
elif stage.get("status") == "done":
|
| 90 |
-
|
|
|
|
| 91 |
|
| 92 |
try:
|
| 93 |
result = run_extraction_pipeline(input_path, output_dir, PipelineParams.from_mapping(params), progress_cb=progress)
|
|
@@ -109,13 +168,27 @@ def config() -> dict[str, Any]:
|
|
| 109 |
"demucs_stems": {key: value + ["all"] for key, value in DEMUCS_STEMS.items()},
|
| 110 |
"defaults": asdict(PipelineParams()),
|
| 111 |
"stages": initial_stages(),
|
|
|
|
|
|
|
|
|
|
|
|
|
| 112 |
}
|
| 113 |
|
| 114 |
|
| 115 |
@app.post("/api/cache/clear")
|
| 116 |
def clear_cache() -> dict[str, str]:
|
| 117 |
cache_clear()
|
| 118 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 119 |
|
| 120 |
|
| 121 |
@app.post("/api/jobs")
|
|
@@ -142,6 +215,7 @@ async def create_job(file: UploadFile = File(...), params: str = Form("{}")) ->
|
|
| 142 |
"id": job_id,
|
| 143 |
"status": "pending",
|
| 144 |
"filename": file.filename,
|
|
|
|
| 145 |
"params": asdict(validated),
|
| 146 |
"stages": initial_stages(),
|
| 147 |
"logs": [],
|
|
@@ -161,26 +235,12 @@ async def create_job(file: UploadFile = File(...), params: str = Form("{}")) ->
|
|
| 161 |
def get_job(job_id: str) -> dict[str, Any]:
|
| 162 |
with jobs_lock:
|
| 163 |
job = jobs.get(job_id)
|
| 164 |
-
if
|
| 165 |
-
|
| 166 |
-
|
| 167 |
-
|
| 168 |
-
|
| 169 |
-
|
| 170 |
-
"id": job_id,
|
| 171 |
-
"status": "complete",
|
| 172 |
-
"filename": None,
|
| 173 |
-
"params": result.get("params", {}),
|
| 174 |
-
"stages": result.get("stages", []),
|
| 175 |
-
"logs": [],
|
| 176 |
-
"result": result,
|
| 177 |
-
"error": None,
|
| 178 |
-
"traceback": None,
|
| 179 |
-
"output_dir": str(manifest.parent),
|
| 180 |
-
}
|
| 181 |
-
)
|
| 182 |
-
raise HTTPException(status_code=404, detail="Job not found")
|
| 183 |
-
return _serialise_job(dict(job))
|
| 184 |
|
| 185 |
|
| 186 |
@app.get("/api/jobs/{job_id}/files/{relative_path:path}")
|
|
|
|
| 9 |
|
| 10 |
import json
|
| 11 |
import shutil
|
| 12 |
+
import time
|
| 13 |
import traceback
|
| 14 |
import uuid
|
| 15 |
from concurrent.futures import ThreadPoolExecutor
|
|
|
|
| 23 |
from fastapi.responses import FileResponse, JSONResponse
|
| 24 |
from fastapi.staticfiles import StaticFiles
|
| 25 |
|
| 26 |
+
from pipeline_runner import PipelineParams, clear_disk_cache, initial_stages, run_extraction_pipeline
|
| 27 |
from sample_extractor import DEMUCS_MODELS, DEMUCS_STEMS, cache_clear
|
| 28 |
|
| 29 |
ROOT = Path(__file__).resolve().parent
|
|
|
|
| 31 |
RUNS_DIR = ROOT / ".runs"
|
| 32 |
RUNS_DIR.mkdir(exist_ok=True)
|
| 33 |
|
| 34 |
+
app = FastAPI(title="Drum Sample Extractor", version="11.0.0")
|
| 35 |
app.add_middleware(
|
| 36 |
CORSMiddleware,
|
| 37 |
allow_origins=["*"],
|
|
|
|
| 62 |
return payload
|
| 63 |
|
| 64 |
|
| 65 |
+
def _manifest_path(job_id: str) -> Path:
|
| 66 |
+
return RUNS_DIR / job_id / "output" / "manifest.json"
|
| 67 |
+
|
| 68 |
+
|
| 69 |
+
def _read_manifest_job(job_id: str) -> dict[str, Any] | None:
|
| 70 |
+
manifest = _manifest_path(job_id)
|
| 71 |
+
if not manifest.exists():
|
| 72 |
+
return None
|
| 73 |
+
result = json.loads(manifest.read_text(encoding="utf-8"))
|
| 74 |
+
return {
|
| 75 |
+
"id": job_id,
|
| 76 |
+
"status": "complete",
|
| 77 |
+
"filename": result.get("source", {}).get("filename"),
|
| 78 |
+
"params": result.get("params", {}),
|
| 79 |
+
"stages": result.get("stages", []),
|
| 80 |
+
"logs": [],
|
| 81 |
+
"result": result,
|
| 82 |
+
"error": None,
|
| 83 |
+
"traceback": None,
|
| 84 |
+
"output_dir": str(manifest.parent),
|
| 85 |
+
}
|
| 86 |
+
|
| 87 |
+
|
| 88 |
+
def _summarise_job(job: dict[str, Any]) -> dict[str, Any]:
|
| 89 |
+
result = job.get("result") or {}
|
| 90 |
+
return {
|
| 91 |
+
"id": job["id"],
|
| 92 |
+
"status": job.get("status"),
|
| 93 |
+
"filename": job.get("filename"),
|
| 94 |
+
"created_at": job.get("created_at"),
|
| 95 |
+
"duration_sec": result.get("duration_sec"),
|
| 96 |
+
"audio_duration_sec": result.get("audio_duration_sec"),
|
| 97 |
+
"realtime_factor": result.get("realtime_factor"),
|
| 98 |
+
"bpm": result.get("bpm"),
|
| 99 |
+
"hit_count": result.get("hit_count"),
|
| 100 |
+
"cluster_count": result.get("cluster_count"),
|
| 101 |
+
"clustering_mode": (result.get("params") or job.get("params") or {}).get("clustering_mode"),
|
| 102 |
+
"stem": (result.get("params") or job.get("params") or {}).get("stem"),
|
| 103 |
+
"error": job.get("error"),
|
| 104 |
+
}
|
| 105 |
+
|
| 106 |
+
|
| 107 |
+
def _list_manifest_jobs(limit: int = 50) -> list[dict[str, Any]]:
|
| 108 |
+
rows: list[dict[str, Any]] = []
|
| 109 |
+
for manifest in sorted(RUNS_DIR.glob("*/output/manifest.json"), key=lambda p: p.stat().st_mtime, reverse=True):
|
| 110 |
+
job_id = manifest.parents[1].name
|
| 111 |
+
manifest_job = _read_manifest_job(job_id)
|
| 112 |
+
if not manifest_job:
|
| 113 |
+
continue
|
| 114 |
+
summary = _summarise_job(manifest_job)
|
| 115 |
+
summary["created_at"] = manifest.stat().st_mtime
|
| 116 |
+
rows.append(summary)
|
| 117 |
+
if len(rows) >= limit:
|
| 118 |
+
break
|
| 119 |
+
return rows
|
| 120 |
+
|
| 121 |
+
|
| 122 |
def _update_job(job_id: str, **patch: Any) -> None:
|
| 123 |
with jobs_lock:
|
| 124 |
jobs[job_id].update(patch)
|
|
|
|
| 145 |
if stage.get("status") == "running":
|
| 146 |
_append_log(job_id, f"Started: {stage['label']}")
|
| 147 |
elif stage.get("status") == "done":
|
| 148 |
+
detail = f" · {stage['detail']}" if stage.get("detail") else ""
|
| 149 |
+
_append_log(job_id, f"Finished: {stage['label']} in {stage['duration_sec']:.3f}s{detail}")
|
| 150 |
|
| 151 |
try:
|
| 152 |
result = run_extraction_pipeline(input_path, output_dir, PipelineParams.from_mapping(params), progress_cb=progress)
|
|
|
|
| 168 |
"demucs_stems": {key: value + ["all"] for key, value in DEMUCS_STEMS.items()},
|
| 169 |
"defaults": asdict(PipelineParams()),
|
| 170 |
"stages": initial_stages(),
|
| 171 |
+
"clustering_modes": {
|
| 172 |
+
"batch_quality": "Batch quality: all-pairs mel/NCC + agglomerative clustering",
|
| 173 |
+
"online_preview": "Online preview: prototype assignment for near-realtime feedback",
|
| 174 |
+
},
|
| 175 |
}
|
| 176 |
|
| 177 |
|
| 178 |
@app.post("/api/cache/clear")
|
| 179 |
def clear_cache() -> dict[str, str]:
|
| 180 |
cache_clear()
|
| 181 |
+
clear_disk_cache()
|
| 182 |
+
return {"status": "cleared", "scope": "memory+disk"}
|
| 183 |
+
|
| 184 |
+
|
| 185 |
+
@app.get("/api/jobs")
|
| 186 |
+
def list_jobs(limit: int = 50) -> dict[str, Any]:
|
| 187 |
+
limit = max(1, min(int(limit), 200))
|
| 188 |
+
with jobs_lock:
|
| 189 |
+
active = [_summarise_job(dict(job)) for job in jobs.values() if job.get("status") != "complete"]
|
| 190 |
+
history = _list_manifest_jobs(limit=limit)
|
| 191 |
+
return {"active": active, "history": history}
|
| 192 |
|
| 193 |
|
| 194 |
@app.post("/api/jobs")
|
|
|
|
| 215 |
"id": job_id,
|
| 216 |
"status": "pending",
|
| 217 |
"filename": file.filename,
|
| 218 |
+
"created_at": time.time(),
|
| 219 |
"params": asdict(validated),
|
| 220 |
"stages": initial_stages(),
|
| 221 |
"logs": [],
|
|
|
|
| 235 |
def get_job(job_id: str) -> dict[str, Any]:
|
| 236 |
with jobs_lock:
|
| 237 |
job = jobs.get(job_id)
|
| 238 |
+
if job:
|
| 239 |
+
return _serialise_job(dict(job))
|
| 240 |
+
manifest_job = _read_manifest_job(job_id)
|
| 241 |
+
if manifest_job:
|
| 242 |
+
return _serialise_job(manifest_job)
|
| 243 |
+
raise HTTPException(status_code=404, detail="Job not found")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 244 |
|
| 245 |
|
| 246 |
@app.get("/api/jobs/{job_id}/files/{relative_path:path}")
|
docs/API.md
CHANGED
|
@@ -1,5 +1,7 @@
|
|
| 1 |
# API documentation
|
| 2 |
|
|
|
|
|
|
|
| 3 |
The active app is `app.py`, a FastAPI application.
|
| 4 |
|
| 5 |
## Start server
|
|
@@ -18,12 +20,57 @@ Returns backend health.
|
|
| 18 |
|
| 19 |
## `GET /api/config`
|
| 20 |
|
| 21 |
-
Returns supported models, stems, default pipeline params,
|
| 22 |
|
| 23 |
```bash
|
| 24 |
curl http://127.0.0.1:7860/api/config
|
| 25 |
```
|
| 26 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 27 |
## `POST /api/jobs`
|
| 28 |
|
| 29 |
Creates an extraction job.
|
|
@@ -34,14 +81,14 @@ Fields:
|
|
| 34 |
|
| 35 |
| Field | Type | Required | Description |
|
| 36 |
|---|---|---:|---|
|
| 37 |
-
| `file` | file | yes | Audio source |
|
| 38 |
-
| `params` | JSON string | no | Partial or full pipeline params |
|
| 39 |
|
| 40 |
Example:
|
| 41 |
|
| 42 |
```bash
|
| 43 |
curl -F 'file=@song.wav' \
|
| 44 |
-
-F 'params={"stem":"all","target_min":4,"target_max":12,"synthesize":true}' \
|
| 45 |
http://127.0.0.1:7860/api/jobs
|
| 46 |
```
|
| 47 |
|
|
@@ -52,7 +99,7 @@ Response status: `202 Accepted`
|
|
| 52 |
"id": "58ca0db4ac74",
|
| 53 |
"status": "pending",
|
| 54 |
"filename": "song.wav",
|
| 55 |
-
"params": {"stem": "all"},
|
| 56 |
"stages": [],
|
| 57 |
"logs": [],
|
| 58 |
"result": null,
|
|
@@ -62,32 +109,32 @@ Response status: `202 Accepted`
|
|
| 62 |
|
| 63 |
## `GET /api/jobs/{job_id}`
|
| 64 |
|
| 65 |
-
Poll job status and retrieve results.
|
| 66 |
|
| 67 |
Statuses:
|
| 68 |
|
| 69 |
| Status | Meaning |
|
| 70 |
|---|---|
|
| 71 |
-
| `pending` | Job is queued |
|
| 72 |
-
| `running` | Job is executing |
|
| 73 |
-
| `complete` | Result and artifacts are ready |
|
| 74 |
-
| `error` | Pipeline failed; `error` and `traceback` are populated |
|
| 75 |
|
| 76 |
Completed jobs contain:
|
| 77 |
|
| 78 |
| Key | Meaning |
|
| 79 |
|---|---|
|
| 80 |
-
| `duration_sec` | Total wall time |
|
| 81 |
-
| `audio_duration_sec` | Duration of processed stem/source |
|
| 82 |
-
| `realtime_factor` | `duration_sec / audio_duration_sec` |
|
| 83 |
-
| `bpm` | Detected tempo |
|
| 84 |
-
| `hit_count` | Number of accepted onsets/hits |
|
| 85 |
-
| `cluster_count` | Number of sample clusters |
|
| 86 |
-
| `stages` | Per-stage timing/status/detail list |
|
| 87 |
-
| `samples` | Sample rows with score, duration, first onset, and download URL |
|
| 88 |
-
| `overview` | Decimated envelope and onset markers for waveform display |
|
| 89 |
-
| `files` | Relative artifact paths |
|
| 90 |
-
| `file_urls` | Direct API URLs for artifacts |
|
| 91 |
|
| 92 |
## `GET /api/jobs/{job_id}/files/{relative_path}`
|
| 93 |
|
|
@@ -105,36 +152,44 @@ The endpoint prevents path traversal by resolving downloads under `.runs/<job-id
|
|
| 105 |
|
| 106 |
## `POST /api/cache/clear`
|
| 107 |
|
| 108 |
-
Clears the in-memory
|
| 109 |
|
| 110 |
```bash
|
| 111 |
curl -X POST http://127.0.0.1:7860/api/cache/clear
|
| 112 |
```
|
| 113 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 114 |
## Pipeline parameters
|
| 115 |
|
| 116 |
Defined in `pipeline_runner.PipelineParams`.
|
| 117 |
|
| 118 |
| Parameter | Default | Meaning |
|
| 119 |
|---|---:|---|
|
| 120 |
-
| `stem` | `drums` | Demucs source to extract, or `all` to bypass Demucs |
|
| 121 |
-
| `demucs_model` | `htdemucs_ft` | Demucs model |
|
| 122 |
-
| `demucs_shifts` | `1` | Test-time shifts for Demucs quality/speed tradeoff |
|
| 123 |
-
| `demucs_overlap` | `0.25` | Demucs chunk overlap |
|
| 124 |
-
| `onset_mode` | `auto` | `auto`, `percussive`, `harmonic`, or `broadband` |
|
| 125 |
-
| `onset_delta` | `0.12` | Peak-pick threshold |
|
| 126 |
-
| `energy_threshold_db` | `-35` | RMS gate for accepting hits |
|
| 127 |
-
| `pre_pad` | `0.003` | Seconds of audio before onset |
|
| 128 |
-
| `min_dur` | `0.02` | Minimum hit duration |
|
| 129 |
-
| `max_dur` | `1.5` | Maximum hit duration |
|
| 130 |
-
| `min_gap` | `0.03` | Minimum time between onsets |
|
| 131 |
-
| `ncc_threshold` | `0.80` | Similarity threshold
|
| 132 |
-
| `attack_ms` | `25` | Transient window used for NCC |
|
| 133 |
-
| `mel_threshold` | `0.75` | Candidate prefilter threshold |
|
| 134 |
-
| `linkage` | `average` | Agglomerative linkage |
|
| 135 |
-
| `
|
| 136 |
-
| `
|
| 137 |
-
| `
|
| 138 |
-
| `
|
| 139 |
-
| `
|
| 140 |
-
| `
|
|
|
|
|
|
|
|
|
| 1 |
# API documentation
|
| 2 |
|
| 3 |
+
Last updated: 2026-05-12
|
| 4 |
+
|
| 5 |
The active app is `app.py`, a FastAPI application.
|
| 6 |
|
| 7 |
## Start server
|
|
|
|
| 20 |
|
| 21 |
## `GET /api/config`
|
| 22 |
|
| 23 |
+
Returns supported models, stems, default pipeline params, stage definitions, and clustering mode labels.
|
| 24 |
|
| 25 |
```bash
|
| 26 |
curl http://127.0.0.1:7860/api/config
|
| 27 |
```
|
| 28 |
|
| 29 |
+
Important response keys:
|
| 30 |
+
|
| 31 |
+
| Key | Meaning |
|
| 32 |
+
|---|---|
|
| 33 |
+
| `demucs_models` | Supported Demucs model names. |
|
| 34 |
+
| `demucs_stems` | Valid stems per model, plus `all` for bypassing Demucs. |
|
| 35 |
+
| `defaults` | Default `PipelineParams`. |
|
| 36 |
+
| `stages` | Pipeline stage definitions. |
|
| 37 |
+
| `clustering_modes` | Human-readable labels for batch and online clustering modes. |
|
| 38 |
+
|
| 39 |
+
## `GET /api/jobs`
|
| 40 |
+
|
| 41 |
+
Lists active in-memory jobs and completed run manifests found under `.runs/`.
|
| 42 |
+
|
| 43 |
+
```bash
|
| 44 |
+
curl http://127.0.0.1:7860/api/jobs?limit=50
|
| 45 |
+
```
|
| 46 |
+
|
| 47 |
+
Response:
|
| 48 |
+
|
| 49 |
+
```json
|
| 50 |
+
{
|
| 51 |
+
"active": [],
|
| 52 |
+
"history": [
|
| 53 |
+
{
|
| 54 |
+
"id": "58ca0db4ac74",
|
| 55 |
+
"status": "complete",
|
| 56 |
+
"filename": "song.wav",
|
| 57 |
+
"created_at": 1778540000.0,
|
| 58 |
+
"duration_sec": 2.4,
|
| 59 |
+
"audio_duration_sec": 8.0,
|
| 60 |
+
"realtime_factor": 0.3,
|
| 61 |
+
"bpm": 120.0,
|
| 62 |
+
"hit_count": 32,
|
| 63 |
+
"cluster_count": 8,
|
| 64 |
+
"clustering_mode": "online_preview",
|
| 65 |
+
"stem": "all",
|
| 66 |
+
"error": null
|
| 67 |
+
}
|
| 68 |
+
]
|
| 69 |
+
}
|
| 70 |
+
```
|
| 71 |
+
|
| 72 |
+
`created_at` is the manifest file modification time as a Unix timestamp.
|
| 73 |
+
|
| 74 |
## `POST /api/jobs`
|
| 75 |
|
| 76 |
Creates an extraction job.
|
|
|
|
| 81 |
|
| 82 |
| Field | Type | Required | Description |
|
| 83 |
|---|---|---:|---|
|
| 84 |
+
| `file` | file | yes | Audio source. |
|
| 85 |
+
| `params` | JSON string | no | Partial or full pipeline params. |
|
| 86 |
|
| 87 |
Example:
|
| 88 |
|
| 89 |
```bash
|
| 90 |
curl -F 'file=@song.wav' \
|
| 91 |
+
-F 'params={"stem":"all","clustering_mode":"online_preview","target_min":4,"target_max":12,"synthesize":true}' \
|
| 92 |
http://127.0.0.1:7860/api/jobs
|
| 93 |
```
|
| 94 |
|
|
|
|
| 99 |
"id": "58ca0db4ac74",
|
| 100 |
"status": "pending",
|
| 101 |
"filename": "song.wav",
|
| 102 |
+
"params": {"stem": "all", "clustering_mode": "online_preview"},
|
| 103 |
"stages": [],
|
| 104 |
"logs": [],
|
| 105 |
"result": null,
|
|
|
|
| 109 |
|
| 110 |
## `GET /api/jobs/{job_id}`
|
| 111 |
|
| 112 |
+
Poll job status and retrieve results. This works for active in-memory jobs and completed historical jobs whose manifest is still present in `.runs/`.
|
| 113 |
|
| 114 |
Statuses:
|
| 115 |
|
| 116 |
| Status | Meaning |
|
| 117 |
|---|---|
|
| 118 |
+
| `pending` | Job is queued. |
|
| 119 |
+
| `running` | Job is executing. |
|
| 120 |
+
| `complete` | Result and artifacts are ready. |
|
| 121 |
+
| `error` | Pipeline failed; `error` and `traceback` are populated. |
|
| 122 |
|
| 123 |
Completed jobs contain:
|
| 124 |
|
| 125 |
| Key | Meaning |
|
| 126 |
|---|---|
|
| 127 |
+
| `duration_sec` | Total wall time. |
|
| 128 |
+
| `audio_duration_sec` | Duration of processed stem/source. |
|
| 129 |
+
| `realtime_factor` | `duration_sec / audio_duration_sec`. |
|
| 130 |
+
| `bpm` | Detected tempo. |
|
| 131 |
+
| `hit_count` | Number of accepted onsets/hits. |
|
| 132 |
+
| `cluster_count` | Number of sample clusters. |
|
| 133 |
+
| `stages` | Per-stage timing/status/detail list. |
|
| 134 |
+
| `samples` | Sample rows with score, duration, first onset, and download URL. |
|
| 135 |
+
| `overview` | Decimated envelope and onset markers for waveform display. |
|
| 136 |
+
| `files` | Relative artifact paths. |
|
| 137 |
+
| `file_urls` | Direct API URLs for artifacts. |
|
| 138 |
|
| 139 |
## `GET /api/jobs/{job_id}/files/{relative_path}`
|
| 140 |
|
|
|
|
| 152 |
|
| 153 |
## `POST /api/cache/clear`
|
| 154 |
|
| 155 |
+
Clears the in-memory DSP cache and disk stem/source cache.
|
| 156 |
|
| 157 |
```bash
|
| 158 |
curl -X POST http://127.0.0.1:7860/api/cache/clear
|
| 159 |
```
|
| 160 |
|
| 161 |
+
Response:
|
| 162 |
+
|
| 163 |
+
```json
|
| 164 |
+
{"status":"cleared","scope":"memory+disk"}
|
| 165 |
+
```
|
| 166 |
+
|
| 167 |
## Pipeline parameters
|
| 168 |
|
| 169 |
Defined in `pipeline_runner.PipelineParams`.
|
| 170 |
|
| 171 |
| Parameter | Default | Meaning |
|
| 172 |
|---|---:|---|
|
| 173 |
+
| `stem` | `drums` | Demucs source to extract, or `all` to bypass Demucs. |
|
| 174 |
+
| `demucs_model` | `htdemucs_ft` | Demucs model. |
|
| 175 |
+
| `demucs_shifts` | `1` | Test-time shifts for Demucs quality/speed tradeoff. |
|
| 176 |
+
| `demucs_overlap` | `0.25` | Demucs chunk overlap. |
|
| 177 |
+
| `onset_mode` | `auto` | `auto`, `percussive`, `harmonic`, or `broadband`. |
|
| 178 |
+
| `onset_delta` | `0.12` | Peak-pick threshold. |
|
| 179 |
+
| `energy_threshold_db` | `-35` | RMS gate for accepting hits. |
|
| 180 |
+
| `pre_pad` | `0.003` | Seconds of audio before onset. |
|
| 181 |
+
| `min_dur` | `0.02` | Minimum hit duration. |
|
| 182 |
+
| `max_dur` | `1.5` | Maximum hit duration. |
|
| 183 |
+
| `min_gap` | `0.03` | Minimum time between onsets. |
|
| 184 |
+
| `ncc_threshold` | `0.80` | Similarity threshold. Also used by online clustering assignment. |
|
| 185 |
+
| `attack_ms` | `25` | Transient window used for NCC/prototypes. |
|
| 186 |
+
| `mel_threshold` | `0.75` | Candidate prefilter threshold. For online mode, lower values such as `0.62` are useful. |
|
| 187 |
+
| `linkage` | `average` | Agglomerative linkage for `batch_quality`. |
|
| 188 |
+
| `clustering_mode` | `batch_quality` | `batch_quality` or `online_preview`. |
|
| 189 |
+
| `target_min` | `5` | Lower cluster target; `0` disables target mode in batch mode. |
|
| 190 |
+
| `target_max` | `20` | Upper cluster target; `0` disables target/cap mode. |
|
| 191 |
+
| `synthesize` | `true` | Write synthesized alternates for clusters with multiple hits. |
|
| 192 |
+
| `quantize_midi` | `true` | Snap MIDI notes to grid. |
|
| 193 |
+
| `subdivision` | `16` | MIDI grid subdivision. |
|
| 194 |
+
| `device` | `cpu` | Torch device for Demucs. |
|
| 195 |
+
| `use_disk_cache` | `true` | Cache decoded full mix/stems by source digest and extraction settings. |
|
docs/FEATURES.md
ADDED
|
@@ -0,0 +1,58 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Feature inventory
|
| 2 |
+
|
| 3 |
+
Last updated: 2026-05-12
|
| 4 |
+
|
| 5 |
+
## Product goal
|
| 6 |
+
|
| 7 |
+
Turn an input audio file into a practical drum sample pack: detected hits, grouped sample classes, representative WAVs, optional synthesized alternates, MIDI reconstruction, rendered reconstruction audio, and an inspectable manifest.
|
| 8 |
+
|
| 9 |
+
## Implemented features
|
| 10 |
+
|
| 11 |
+
| Area | Feature | Status | Notes |
|
| 12 |
+
|---|---|---:|---|
|
| 13 |
+
| UI | Custom browser frontend | Implemented | `web/index.html`, `web/styles.css`, `web/app.js`; no Gradio dependency in active app. |
|
| 14 |
+
| UI | Drag/drop audio upload | Implemented | Uses multipart upload to `POST /api/jobs`. |
|
| 15 |
+
| UI | Source preview | Implemented | Browser `<audio>` preview before extraction. |
|
| 16 |
+
| UI | Pipeline controls | Implemented | Stem/model/onset/clustering/MIDI/synthesis/cache controls. |
|
| 17 |
+
| UI | Live-ish progress | Implemented | Polls stage state and logs every 800 ms. |
|
| 18 |
+
| UI | Waveform/onset overview | Implemented | Canvas envelope plus onset markers from `manifest.json`. |
|
| 19 |
+
| UI | Result downloads | Implemented | ZIP, MIDI, stem WAV, reconstruction WAV, individual sample WAVs. |
|
| 20 |
+
| UI | Run history browser | Implemented | Lists completed `.runs/*/output/manifest.json` entries and reloads results. |
|
| 21 |
+
| API | Health/config | Implemented | `GET /api/health`, `GET /api/config`. |
|
| 22 |
+
| API | Job creation/polling | Implemented | `POST /api/jobs`, `GET /api/jobs/{id}`. |
|
| 23 |
+
| API | Run listing | Implemented | `GET /api/jobs` returns active and completed runs. |
|
| 24 |
+
| API | Safe artifact serving | Implemented | Path traversal is blocked by resolved output-root checks. |
|
| 25 |
+
| API | Cache clear | Implemented | Clears in-memory DSP cache and disk stem/source cache. |
|
| 26 |
+
| Pipeline | Demucs stem extraction | Implemented | Offline/batch stage; not advertised as realtime. |
|
| 27 |
+
| Pipeline | Stem/full-mix disk cache | Implemented | Keyed by source SHA-256 plus stem/model/shifts/overlap/device. |
|
| 28 |
+
| Pipeline | BPM detection | Implemented | `librosa` onset/beat based estimate. |
|
| 29 |
+
| Pipeline | SuperFlux-style onset detection | Implemented | Multi-band auto mode plus percussive/harmonic/broadband modes. |
|
| 30 |
+
| Pipeline | Hit classification | Implemented | Rule-based spectral class labels. |
|
| 31 |
+
| Pipeline | Batch quality clustering | Implemented | Mel prefilter + transient NCC + agglomerative clustering. |
|
| 32 |
+
| Pipeline | Online preview clustering | Implemented | Prototype-based incremental assignment for near-realtime feedback. |
|
| 33 |
+
| Pipeline | Representative selection | Implemented | Quality score picks best hit per cluster. |
|
| 34 |
+
| Pipeline | Optional synthesis | Implemented | Weighted aligned average for multi-hit clusters. |
|
| 35 |
+
| Pipeline | MIDI export | Implemented | Quantized or unquantized reconstruction MIDI. |
|
| 36 |
+
| Pipeline | Reconstruction render | Implemented | Renders MIDI-like reconstruction using selected samples. |
|
| 37 |
+
| Pipeline | Sample pack ZIP | Implemented | Includes WAVs, index JSON, MIDI, rendered reconstruction. |
|
| 38 |
+
| Docs | Project review | Implemented | `docs/PROJECT_REVIEW.md`. |
|
| 39 |
+
| Docs | Timing/realtime analysis | Implemented | `docs/PIPELINE_TIMING_AND_REALTIME.md`. |
|
| 40 |
+
| Docs | API docs | Implemented | `docs/API.md`. |
|
| 41 |
+
| Docs | UI replacement docs | Implemented | `docs/UI_REPLACEMENT.md`. |
|
| 42 |
+
| Docs | Feature/task/progress tracking | Implemented | This file, `TASKS.md`, `PROGRESS.md`. |
|
| 43 |
+
|
| 44 |
+
## Partially implemented features
|
| 45 |
+
|
| 46 |
+
| Area | Feature | Current state | Needed to call it complete |
|
| 47 |
+
|---|---|---|---|
|
| 48 |
+
| Progress | Stage progress | Shows stage boundaries and logs | Add lower-level progress inside Demucs and clustering. |
|
| 49 |
+
| Realtime | Online clustering | Implemented as batch-invoked prototype assignment | Add streaming/incremental audio analysis API for true realtime preview. |
|
| 50 |
+
| Run history | Manifest browser | Lists and reloads completed runs | Add side-by-side comparison and filtering/search. |
|
| 51 |
+
| Editing | Review workflow | Displays waveform and samples | Add click-to-audition hits, onset editing, cluster merge/split, label reassignment. |
|
| 52 |
+
| Frontend quality | No-build JavaScript UI | Good enough for local app | Convert to TypeScript once interaction model stabilizes. |
|
| 53 |
+
|
| 54 |
+
## Explicit non-goals for this pass
|
| 55 |
+
|
| 56 |
+
- Realtime Demucs. It is not realistic for this use-case and should remain offline/cached.
|
| 57 |
+
- Perfect source separation. Stem quality depends on model choice and input material.
|
| 58 |
+
- Full DAW/sample-editor UX. This pass creates the workstation foundation; detailed editing is next.
|
docs/PIPELINE_TIMING_AND_REALTIME.md
CHANGED
|
@@ -1,214 +1,131 @@
|
|
| 1 |
-
# Pipeline timing and
|
| 2 |
|
| 3 |
-
|
| 4 |
|
| 5 |
-
|
| 6 |
|
| 7 |
-
|
| 8 |
|
| 9 |
-
- `stem=all`
|
| 10 |
-
-
|
| 11 |
-
-
|
| 12 |
-
-
|
| 13 |
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
| Stage | Mean seconds | Median seconds | Min seconds | Max seconds |
|
| 17 |
-
|---|---:|---:|---:|---:|
|
| 18 |
-
| `stem` | 0.017 | 0.013 | 0.009 | 0.039 |
|
| 19 |
-
| `bpm` | 0.224 | 0.223 | 0.206 | 0.241 |
|
| 20 |
-
| `onsets` | 2.140 | 2.034 | 1.762 | 2.871 |
|
| 21 |
-
| `classification` | 0.034 | 0.035 | 0.024 | 0.045 |
|
| 22 |
-
| `clustering` | 0.496 | 0.597 | 0.059 | 0.913 |
|
| 23 |
-
| `selection` | 0.499 | 0.551 | 0.311 | 0.651 |
|
| 24 |
-
| `synthesis` | 0.002 | 0.002 | 0.002 | 0.003 |
|
| 25 |
-
| `export` | 0.105 | 0.103 | 0.046 | 0.178 |
|
| 26 |
-
|
| 27 |
-
Observed total runtime for warm synthetic 4-bar fixtures was roughly `0.30×–0.43×` realtime when Demucs was bypassed. In plain terms: the pure extraction stages ran faster than the audio duration on these fixtures. The first cold run can be much slower because librosa/scipy/numba-style initialization costs are paid up front.
|
| 28 |
|
| 29 |
## Significant subprocesses
|
| 30 |
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
-
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
Real-time suitability: **No for Demucs, yes for direct source load.**
|
| 44 |
-
|
| 45 |
-
Recommended strategy:
|
| 46 |
-
|
| 47 |
-
- Keep Demucs as an explicit offline preprocessing stage.
|
| 48 |
-
- Cache stem output by content hash and model parameters.
|
| 49 |
-
- Let users bypass Demucs for drum loops, already-separated stems, and iterative parameter tuning.
|
| 50 |
-
|
| 51 |
-
### 2. BPM / tempo detection
|
| 52 |
-
|
| 53 |
-
Current implementation:
|
| 54 |
-
|
| 55 |
-
- `librosa.onset.onset_strength`
|
| 56 |
-
- `librosa.feature.tempo`
|
| 57 |
-
- beat-track sanity adjustment
|
| 58 |
-
|
| 59 |
-
Timing profile:
|
| 60 |
-
|
| 61 |
-
- Measured around 0.22 s for ~9 s synthetic clips after warm-up.
|
| 62 |
-
|
| 63 |
-
Real-time suitability: **Near-realtime with buffering.**
|
| 64 |
-
|
| 65 |
-
A live version should estimate tempo over rolling windows and refine continuously. It does not need the entire file, but short windows can be unstable.
|
| 66 |
-
|
| 67 |
-
### 3. Onset detection + slicing
|
| 68 |
-
|
| 69 |
-
Current implementation:
|
| 70 |
-
|
| 71 |
-
- Multiband SuperFlux-style onset envelope in `auto` mode.
|
| 72 |
-
- Optional percussive/harmonic/broadband modes.
|
| 73 |
-
- Peak picking and hit slicing by onset-to-next-onset boundaries.
|
| 74 |
-
- Energy threshold and duration filtering.
|
| 75 |
-
|
| 76 |
-
Timing profile:
|
| 77 |
-
|
| 78 |
-
- This is the largest non-Demucs DSP stage in the measured benchmark: about 2.14 s mean for ~9 s fixtures.
|
| 79 |
-
- It is still faster than realtime in warm synthetic tests.
|
| 80 |
-
|
| 81 |
-
Real-time suitability: **Yes, with a rolling window and bounded lookahead.**
|
| 82 |
-
|
| 83 |
-
Why:
|
| 84 |
-
|
| 85 |
-
- Onset strength and peak picking are local-window operations.
|
| 86 |
-
- Backtracking and next-onset slicing require a small amount of future context.
|
| 87 |
-
- A live system can emit provisional hits and finalize durations once the next onset or max-duration cutoff arrives.
|
| 88 |
-
|
| 89 |
-
### 4. Spectral rule classification
|
| 90 |
-
|
| 91 |
-
Current implementation:
|
| 92 |
-
|
| 93 |
-
- STFT per hit.
|
| 94 |
-
- Low/mid/high energy ratios.
|
| 95 |
-
- Spectral centroid, zero-crossing rate, duration rules.
|
| 96 |
-
|
| 97 |
-
Timing profile:
|
| 98 |
-
|
| 99 |
-
- Measured around 34 ms mean for the benchmark fixtures.
|
| 100 |
-
|
| 101 |
-
Real-time suitability: **Yes.**
|
| 102 |
-
|
| 103 |
-
This is cheap per hit and can run immediately after a hit segment is finalized.
|
| 104 |
-
|
| 105 |
-
### 5. Mel fingerprinting + transient NCC clustering
|
| 106 |
-
|
| 107 |
-
Current implementation:
|
| 108 |
-
|
| 109 |
-
- Build mel fingerprints for hits.
|
| 110 |
-
- Use cosine similarity as a prefilter.
|
| 111 |
-
- Compute transient normalized cross-correlation only for candidate pairs.
|
| 112 |
-
- Run agglomerative clustering on the resulting precomputed distance matrix.
|
| 113 |
-
- Optionally merge singleton clusters into nearby multi-hit clusters.
|
| 114 |
-
|
| 115 |
-
Timing profile:
|
| 116 |
-
|
| 117 |
-
- Measured around 0.50 s mean, but depends strongly on number of hits and pair count.
|
| 118 |
-
- Complexity is roughly quadratic in hit count for pairwise similarity, with mel prefiltering reducing NCC work.
|
| 119 |
-
|
| 120 |
-
Real-time suitability: **Partially.**
|
| 121 |
|
| 122 |
-
|
| 123 |
|
| 124 |
-
-
|
| 125 |
-
- Transient NCC against a bounded set of existing cluster representatives.
|
| 126 |
-
- Online assignment to existing clusters.
|
| 127 |
|
| 128 |
-
|
|
|
|
| 129 |
|
| 130 |
-
|
| 131 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 132 |
|
| 133 |
-
|
| 134 |
|
| 135 |
-
|
| 136 |
-
2. For each finalized hit, compute fingerprint and compare to prototypes first.
|
| 137 |
-
3. Only run transient NCC against likely candidates.
|
| 138 |
-
4. Assign immediately when above threshold; create a new cluster otherwise.
|
| 139 |
-
5. Periodically run batch reclustering in the background to clean up early mistakes.
|
| 140 |
|
| 141 |
-
##
|
| 142 |
|
| 143 |
-
|
| 144 |
|
| 145 |
-
|
| 146 |
-
- Choose highest-scoring hit per cluster.
|
| 147 |
|
| 148 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 149 |
|
| 150 |
-
|
| 151 |
-
- Cost scales with number of hits and quality scoring work.
|
| 152 |
|
| 153 |
-
|
| 154 |
|
| 155 |
-
|
| 156 |
|
| 157 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 158 |
|
| 159 |
-
|
| 160 |
|
| 161 |
-
|
| 162 |
-
- Normalize and weighted-average hits to create an alternate synthesized sample.
|
| 163 |
|
| 164 |
-
|
| 165 |
|
| 166 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 167 |
|
| 168 |
-
|
| 169 |
|
| 170 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 171 |
|
| 172 |
-
##
|
| 173 |
|
| 174 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 175 |
|
| 176 |
-
|
| 177 |
-
- Render reconstruction with representative samples.
|
| 178 |
-
- Write samples, reconstruction audio, MIDI, archive, and manifest.
|
| 179 |
|
| 180 |
-
|
| 181 |
|
| 182 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 183 |
|
| 184 |
-
|
| 185 |
|
| 186 |
-
|
| 187 |
|
| 188 |
-
|
| 189 |
|
| 190 |
-
|
| 191 |
-
|
| 192 |
-
|
| 193 |
-
|
| 194 |
-
|
| 195 |
-
|
| 196 |
-
| Hit slicing | Depends on next onset | Yes | Emit provisional segment, finalize on next onset/max duration |
|
| 197 |
-
| Rule classification | Per-hit | Yes | Cheap and stateless |
|
| 198 |
-
| Mel fingerprinting | Per-hit | Yes | Compute once per finalized hit |
|
| 199 |
-
| Transient NCC | Pairwise batch | Partial | Realtime against prototypes; batch all-pairs is not realtime |
|
| 200 |
-
| Agglomerative clustering | Batch | No | Replace or complement with online prototype assignment |
|
| 201 |
-
| Representative selection | Batch per cluster | Yes | Keep best-so-far per cluster |
|
| 202 |
-
| Synthesis | Batch per cluster | Partial | Can update lazily after cluster changes |
|
| 203 |
-
| MIDI/reconstruction preview | Batch export | Partial | Preview can stream; final MIDI is a completion artifact |
|
| 204 |
-
| ZIP packaging | Final artifact | No | Keep as final step |
|
| 205 |
-
|
| 206 |
-
## Recommended next technical move
|
| 207 |
-
|
| 208 |
-
Implement a second clustering mode named `online`:
|
| 209 |
-
|
| 210 |
-
```text
|
| 211 |
-
onset event → segment finalized → classify → mel fingerprint → candidate prototypes → transient NCC → assign/create cluster → update best representative → UI update
|
| 212 |
-
```
|
| 213 |
-
|
| 214 |
-
Keep the existing agglomerative mode as `batch-quality`. Use online mode for immediate feedback and batch mode for final high-quality export.
|
|
|
|
| 1 |
+
# Pipeline timing and realtime suitability
|
| 2 |
|
| 3 |
+
Last updated: 2026-05-12
|
| 4 |
|
| 5 |
+
## Measurement scope
|
| 6 |
|
| 7 |
+
The timing benchmark in `docs/benchmark-subprocesses.json` measures synthetic drum fixtures with:
|
| 8 |
|
| 9 |
+
- `stem=all`, so Demucs is bypassed.
|
| 10 |
+
- Warm process after import/model-library initialization.
|
| 11 |
+
- Synthetic rock/funk/halftime fixtures generated by `synth_generator.py`.
|
| 12 |
+
- `scripts/benchmark_subprocesses.py` as the benchmark driver.
|
| 13 |
|
| 14 |
+
This isolates the sample-extraction subprocesses from source-separation noise. Demucs timing depends heavily on model, hardware, track length, first-run downloads, and CPU/GPU availability, so it is analyzed separately.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
|
| 16 |
## Significant subprocesses
|
| 17 |
|
| 18 |
+
| Subprocess | Current implementation | Timing behavior | Realtime suitability |
|
| 19 |
+
|---|---|---|---|
|
| 20 |
+
| Source load / stem extraction | `extract_stem`; full mix via `librosa`, stems via Demucs | Full mix is usually small; Demucs dominates full jobs | Full mix: near-realtime. Demucs: no. |
|
| 21 |
+
| BPM detection | `detect_bpm` using onset envelope and beat tracking | Usually sub-second for short fixtures | Near-realtime with buffering; not critical path. |
|
| 22 |
+
| Onset detection + slicing | `detect_onsets` multi-band SuperFlux-style envelope | Often the largest pure-DSP stage | Near-realtime with bounded lookahead. |
|
| 23 |
+
| Classification | Rule-based spectral analysis per hit | Fast relative to onset/clustering | Near-realtime. |
|
| 24 |
+
| Batch clustering | Mel fingerprints + transient NCC + agglomerative clustering | Pairwise/batch; scales poorly with many hits | Not realtime. Final-quality batch mode. |
|
| 25 |
+
| Online clustering | Prototype assignment per hit | Scales with hit count × cluster count | Near-realtime preview path. |
|
| 26 |
+
| Representative selection | Scores each candidate hit | Moderate for many clusters/hits | Near-realtime for moderate hit counts. |
|
| 27 |
+
| Synthesis | Weighted aligned average per multi-hit cluster | Usually small | Near-realtime for moderate clusters. |
|
| 28 |
+
| Export/package | WAV/MIDI/render/ZIP writes | Disk-bound; ZIP is batch finalization | Not meaningful as realtime; finalization step. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 29 |
|
| 30 |
+
## Current benchmark summary
|
| 31 |
|
| 32 |
+
The checked-in benchmark files were refreshed on 2026-05-12 with synthetic 2-bar fixtures and Demucs bypassed:
|
|
|
|
|
|
|
| 33 |
|
| 34 |
+
- `docs/benchmark-subprocesses.json`: `batch_quality` clustering.
|
| 35 |
+
- `docs/benchmark-online-preview.json`: `online_preview` clustering.
|
| 36 |
|
| 37 |
+
| Stage | Batch quality mean | Online preview mean |
|
| 38 |
+
|---|---:|---:|
|
| 39 |
+
| source load | 0.011 s | 0.012 s |
|
| 40 |
+
| BPM detection | 0.185 s | 0.163 s |
|
| 41 |
+
| onset detection + slicing | 1.943 s | 1.834 s |
|
| 42 |
+
| classification | 0.019 s | 0.017 s |
|
| 43 |
+
| clustering | 0.148 s | 0.045 s |
|
| 44 |
+
| representative selection | 0.204 s | 0.115 s |
|
| 45 |
+
| synthesis | 0.001 s | 0.001 s |
|
| 46 |
+
| export/package | 0.156 s | 0.221 s |
|
| 47 |
|
| 48 |
+
On these small fixtures, `online_preview` reduced clustering time by about 3× compared with `batch_quality`. The total run is still dominated by onset detection, so the next realtime optimization target is streaming/incremental onset analysis rather than only clustering.
|
| 49 |
|
| 50 |
+
First cold runs can be much slower because imports and library initialization are paid up front.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 51 |
|
| 52 |
+
## Batch quality versus online preview clustering
|
| 53 |
|
| 54 |
+
### `batch_quality`
|
| 55 |
|
| 56 |
+
Current final-quality clustering path:
|
|
|
|
| 57 |
|
| 58 |
+
1. Compute mel fingerprint for each hit.
|
| 59 |
+
2. Compute pairwise mel cosine prefilter.
|
| 60 |
+
3. Compute transient NCC only for candidate pairs.
|
| 61 |
+
4. Build distance matrix.
|
| 62 |
+
5. Run agglomerative clustering.
|
| 63 |
+
6. Optionally merge singleton clusters.
|
| 64 |
|
| 65 |
+
This gives better global grouping, but it is fundamentally batch-oriented because it wants the full similarity matrix.
|
|
|
|
| 66 |
|
| 67 |
+
### `online_preview`
|
| 68 |
|
| 69 |
+
Current near-realtime-oriented clustering path:
|
| 70 |
|
| 71 |
+
1. Process hits in onset order.
|
| 72 |
+
2. Compute one mel fingerprint and one transient per hit.
|
| 73 |
+
3. Compare the new hit against existing cluster prototypes.
|
| 74 |
+
4. Assign it to the best prototype or create a new cluster until the target cap is reached.
|
| 75 |
+
5. Update prototype fingerprints/transients using energy-weighted rolling averages.
|
| 76 |
|
| 77 |
+
Complexity is roughly `O(number_of_hits × number_of_clusters)`, not `O(number_of_hits²)`, and does not require future hits before producing a current assignment. It is suitable for progressive preview and fast iteration, but it is not guaranteed to match the global batch clustering result.
|
| 78 |
|
| 79 |
+
## What can run in or near realtime
|
|
|
|
| 80 |
|
| 81 |
+
These can be performed progressively with small buffers:
|
| 82 |
|
| 83 |
+
- Source decode for already-separated/full-mix audio.
|
| 84 |
+
- Onset envelope computation.
|
| 85 |
+
- Peak picking with bounded lookahead.
|
| 86 |
+
- Hit slicing once enough tail audio is buffered.
|
| 87 |
+
- Rule-based classification.
|
| 88 |
+
- Mel fingerprint extraction.
|
| 89 |
+
- Online prototype clustering.
|
| 90 |
+
- Representative preview selection.
|
| 91 |
+
- Basic reconstruction preview.
|
| 92 |
|
| 93 |
+
## What should stay offline/batch
|
| 94 |
|
| 95 |
+
- Demucs source separation.
|
| 96 |
+
- All-pairs transient NCC for large hit sets.
|
| 97 |
+
- Agglomerative clustering.
|
| 98 |
+
- Final ZIP packaging.
|
| 99 |
+
- Full high-quality rerender/export.
|
| 100 |
|
| 101 |
+
## Recommended runtime strategy
|
| 102 |
|
| 103 |
+
| Phase | Mode | Purpose |
|
| 104 |
+
|---|---|---|
|
| 105 |
+
| Upload / first pass | `stem=all`, `clustering_mode=online_preview` | Fast inspection and parameter tuning. |
|
| 106 |
+
| Final extraction from full mix/stem | `stem=all`, `clustering_mode=batch_quality` | Better grouping without source separation. |
|
| 107 |
+
| Final extraction from full song | `stem=drums`, `clustering_mode=batch_quality`, disk cache on | Best quality with offline Demucs cost paid once. |
|
| 108 |
|
| 109 |
+
## Disk cache impact
|
|
|
|
|
|
|
| 110 |
|
| 111 |
+
Disk cache now stores decoded full mix or Demucs stem output under `.cache/stems/`, keyed by:
|
| 112 |
|
| 113 |
+
- Source SHA-256.
|
| 114 |
+
- Stem name.
|
| 115 |
+
- Demucs model.
|
| 116 |
+
- Demucs shifts.
|
| 117 |
+
- Demucs overlap.
|
| 118 |
+
- Device/decode mode.
|
| 119 |
|
| 120 |
+
This does not make Demucs realtime, but it prevents repeated source separation work when retuning onset/clustering parameters for the same source and stem settings.
|
| 121 |
|
| 122 |
+
## Remaining realtime work
|
| 123 |
|
| 124 |
+
The current `online_preview` mode is invoked by the batch job API after onset detection. To make the application genuinely realtime/progressive, add:
|
| 125 |
|
| 126 |
+
1. A streaming/ranged audio analysis API.
|
| 127 |
+
2. Incremental onset detector state.
|
| 128 |
+
3. Incremental hit artifact writing.
|
| 129 |
+
4. SSE progress/results stream.
|
| 130 |
+
5. UI that appends hits/clusters as they arrive.
|
| 131 |
+
6. Optional final `batch_quality` consolidation pass.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
docs/PROGRESS.md
ADDED
|
@@ -0,0 +1,63 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Progress log
|
| 2 |
+
|
| 3 |
+
Last updated: 2026-05-12
|
| 4 |
+
|
| 5 |
+
## Pass 1: project review, timing, and Gradio replacement
|
| 6 |
+
|
| 7 |
+
Completed:
|
| 8 |
+
|
| 9 |
+
1. Inspected the original project structure and active Gradio entrypoints.
|
| 10 |
+
2. Moved previous Gradio interfaces into `legacy/`.
|
| 11 |
+
3. Created `pipeline_runner.py` as the timed orchestration layer.
|
| 12 |
+
4. Created `app.py` as a FastAPI backend.
|
| 13 |
+
5. Created a custom no-build browser frontend under `web/`.
|
| 14 |
+
6. Added stage timing to each extraction run.
|
| 15 |
+
7. Added synthetic benchmarking via `scripts/benchmark_subprocesses.py`.
|
| 16 |
+
8. Added initial docs for project review, timing/realtime, API, UI, and remaining work.
|
| 17 |
+
|
| 18 |
+
Outcome:
|
| 19 |
+
|
| 20 |
+
The application became usable without Gradio and produced per-run manifests/artifacts.
|
| 21 |
+
|
| 22 |
+
## Pass 2: feature ledger and continued development
|
| 23 |
+
|
| 24 |
+
Completed in this pass:
|
| 25 |
+
|
| 26 |
+
1. Added first-class docs for features, tasks, and progress.
|
| 27 |
+
2. Added `GET /api/jobs` for active/completed run listing.
|
| 28 |
+
3. Added run-history UI panel that indexes `.runs/*/output/manifest.json`.
|
| 29 |
+
4. Added disk caching for decoded full mix and Demucs stem outputs.
|
| 30 |
+
5. Extended cache clearing to remove both memory and disk cache.
|
| 31 |
+
6. Added `clustering_mode` pipeline parameter.
|
| 32 |
+
7. Added `online_preview` clustering using prototype assignment.
|
| 33 |
+
8. Added frontend controls for clustering mode and disk cache.
|
| 34 |
+
9. Fixed duplicate sample writes in `sample_extractor.build_archive`.
|
| 35 |
+
10. Updated README and docs to reflect the new state.
|
| 36 |
+
|
| 37 |
+
Outcome:
|
| 38 |
+
|
| 39 |
+
The project now has a clearer product surface: final-quality batch extraction, faster online-style preview clustering, persistent run history, and explicit docs tracking what is done versus still missing.
|
| 40 |
+
|
| 41 |
+
## Current assessment
|
| 42 |
+
|
| 43 |
+
The application is not “fully complete” as an editing workstation, but it is substantially implemented as an extraction workstation. The remaining gaps are concentrated around interactive correction/editing, richer progress streaming, run comparison, and frontend engineering hardening.
|
| 44 |
+
|
| 45 |
+
## Next recommended pass
|
| 46 |
+
|
| 47 |
+
Implement the editing loop:
|
| 48 |
+
|
| 49 |
+
1. Click waveform onset marker or sample table row to audition.
|
| 50 |
+
2. Show selected hit metadata and audio snippet.
|
| 51 |
+
3. Allow onset shift, label change, cluster reassignment, merge, and split.
|
| 52 |
+
4. Re-export without rerunning Demucs/onset detection when only grouping changes.
|
| 53 |
+
5. Save edit decisions into the manifest.
|
| 54 |
+
|
| 55 |
+
## Validation performed in this pass
|
| 56 |
+
|
| 57 |
+
- Compiled active Python files with `python3 -m py_compile app.py pipeline_runner.py sample_extractor.py scripts/*.py`.
|
| 58 |
+
- Ran FastAPI smoke job through `scripts/test_api_job.py`.
|
| 59 |
+
- Ran an online-preview API smoke job with synthetic audio.
|
| 60 |
+
- Verified `GET /api/jobs` history output and `POST /api/cache/clear` behavior.
|
| 61 |
+
- Refreshed batch and online benchmark JSON files:
|
| 62 |
+
- `docs/benchmark-subprocesses.json`
|
| 63 |
+
- `docs/benchmark-online-preview.json`
|
docs/PROJECT_REVIEW.md
CHANGED
|
@@ -1,89 +1,52 @@
|
|
| 1 |
# Project review
|
| 2 |
|
| 3 |
-
|
| 4 |
|
| 5 |
-
|
| 6 |
|
| 7 |
-
|
| 8 |
|
| 9 |
-
|
| 10 |
-
- The core extraction process is callable independently of the UI.
|
| 11 |
-
- Every significant extraction subprocess is timed.
|
| 12 |
-
- Runtime artifacts are stable and downloadable.
|
| 13 |
-
- Documentation explains current behavior, tradeoffs, and remaining work.
|
| 14 |
-
- Legacy files are preserved but not part of the active path.
|
| 15 |
|
| 16 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 17 |
|
| 18 |
-
|
| 19 |
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
| `synth_generator.py` | Synthetic drum fixture generator |
|
| 27 |
-
| `evaluation.py` | Ground-truth matching and scoring |
|
| 28 |
-
| `optimizer.py`, `optimizer_v2.py` | Parameter search experiments |
|
| 29 |
-
| `quality_metrics.py` | Completeness, cleanness, onset, reference metrics |
|
| 30 |
-
| `config_store.py` | Config persistence and leaderboard helpers |
|
| 31 |
-
|
| 32 |
-
## Key findings
|
| 33 |
-
|
| 34 |
-
1. `sample_extractor.py` is the right core to keep. It is compact, stage-oriented, and already exposes most of the operations needed by a proper app/API.
|
| 35 |
-
2. `app.py` mixed UI code, runtime hotfixing, file conversion, extraction orchestration, and artifact packaging. That made it hard to test or replace the UI.
|
| 36 |
-
3. The previous Gradio UI was fast to build but not ideal for this use-case: extraction is a staged process with logs, timing, waveform review, downloadable artifacts, and a dense parameter surface that benefits from a purpose-built layout.
|
| 37 |
-
4. The previous `app.py` patched `sample_extractor.py` at runtime to fix `_sf(..., lag=2)` vs `_sf(..., l=2)`. The underlying bug is now fixed directly in `sample_extractor.py`.
|
| 38 |
-
5. There was no meaningful project documentation, no API documentation, and no benchmark/timing documentation.
|
| 39 |
-
6. `requirements.txt` still treated Gradio as first-class. The active app now uses FastAPI; Gradio dependencies have been moved to `requirements-legacy-gradio.txt`.
|
| 40 |
-
7. `.runs/`, generated audio, MIDI, ZIP files, and local caches needed explicit ignore rules.
|
| 41 |
|
| 42 |
-
##
|
| 43 |
|
| 44 |
-
|
|
| 45 |
|---|---|
|
| 46 |
-
|
|
| 47 |
-
|
|
| 48 |
-
|
|
| 49 |
-
|
|
| 50 |
-
|
|
| 51 |
-
|
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
|
| 55 |
-
|
| 56 |
-
|
| 57 |
-
|
| 58 |
-
|
| 59 |
-
|
| 60 |
-
|
| 61 |
-
|
| 62 |
-
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
▼
|
| 68 |
-
sample_extractor.py + quality_metrics.py
|
| 69 |
-
│
|
| 70 |
-
▼
|
| 71 |
-
.runs/<job-id>/output/{samples, MIDI, WAV, ZIP, manifest.json}
|
| 72 |
-
```
|
| 73 |
-
|
| 74 |
-
The UI only talks to the API. The API only calls the timed runner. The runner is now independently testable and usable from scripts.
|
| 75 |
-
|
| 76 |
-
## Risks and limitations
|
| 77 |
-
|
| 78 |
-
- Demucs can dominate runtime and may require a model download on first use.
|
| 79 |
-
- The current job store is in-memory. Completed jobs can be reloaded from `manifest.json`, but queued/running job state is lost on process restart.
|
| 80 |
-
- The clustering implementation is still batch-oriented. It can be optimized or adapted incrementally, but current agglomerative clustering is not a streaming algorithm.
|
| 81 |
-
- There is no authentication or quota control; this is intended as a local/Hugging Face style app, not a public multi-tenant service.
|
| 82 |
-
- The browser UI is currently no-build static JavaScript/CSS. That is intentional for deployability, but a larger UI should eventually move to TypeScript with a real component/test setup.
|
| 83 |
-
|
| 84 |
-
## Verification performed
|
| 85 |
-
|
| 86 |
-
- Python syntax compilation for `app.py`, `pipeline_runner.py`, `sample_extractor.py`, and benchmark scripts.
|
| 87 |
-
- FastAPI `TestClient` checks for `/`, `/api/health`, and `/api/config`.
|
| 88 |
-
- End-to-end API job test using a synthetic drum fixture with `stem=all`.
|
| 89 |
-
- Synthetic subprocess benchmark across rock, funk, and halftime patterns.
|
|
|
|
| 1 |
# Project review
|
| 2 |
|
| 3 |
+
Last updated: 2026-05-12
|
| 4 |
|
| 5 |
+
## Summary
|
| 6 |
|
| 7 |
+
The project has evolved from a Gradio-driven prototype into a usable FastAPI + custom frontend extraction workstation. The core DSP pipeline is still compact and script-oriented, but it now has a clearer boundary between API/UI orchestration (`app.py`), timed pipeline execution (`pipeline_runner.py`), and lower-level sample extraction (`sample_extractor.py`).
|
| 8 |
|
| 9 |
+
## What is strong
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
|
| 11 |
+
1. **Useful core pipeline**: stem extraction, onset detection, classification, clustering, selection, synthesis, MIDI rendering, and packaging are all present.
|
| 12 |
+
2. **Small deployable surface**: active runtime is FastAPI plus static files; no frontend build is required.
|
| 13 |
+
3. **Good local iteration path**: `stem=all` bypasses Demucs for fast tuning.
|
| 14 |
+
4. **Per-stage timing**: every job manifest records stage durations and details.
|
| 15 |
+
5. **Artifacts are explicit**: stem WAV, reconstruction WAV, MIDI, sample WAVs, ZIP, and manifest are written per run.
|
| 16 |
+
6. **Legacy preservation**: old Gradio apps remain available under `legacy/` without being active.
|
| 17 |
+
7. **New near-realtime path**: `online_preview` clustering gives a practical alternative to all-pairs batch clustering.
|
| 18 |
|
| 19 |
+
## Main risks
|
| 20 |
|
| 21 |
+
1. **Interactive editing is missing**: users can inspect outputs but cannot correct onsets or cluster decisions in the UI yet.
|
| 22 |
+
2. **Job state is process-local**: active jobs disappear from memory on restart; completed history is recovered from manifests only.
|
| 23 |
+
3. **Progress is stage-level**: Demucs and clustering do not expose fine-grained progress.
|
| 24 |
+
4. **Frontend is plain JavaScript**: good for speed, weaker for long-term maintainability than TypeScript modules/tests.
|
| 25 |
+
5. **Demucs cost remains dominant**: source separation is necessarily offline; disk cache mitigates repeated runs but not first-run latency.
|
| 26 |
+
6. **DSP code is dense**: `sample_extractor.py` is effective but would benefit from smaller modules and stronger tests.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 27 |
|
| 28 |
+
## Development decisions made
|
| 29 |
|
| 30 |
+
| Decision | Rationale |
|
| 31 |
|---|---|
|
| 32 |
+
| Replace Gradio with FastAPI/static UI | More control over workflow, layout, artifacts, and progress display. |
|
| 33 |
+
| Keep no-build frontend for now | Fastest robust replacement; avoids adding Node/Vite just to ship the first custom UI. |
|
| 34 |
+
| Preserve Gradio in `legacy/` | Avoids data loss and gives reference behavior. |
|
| 35 |
+
| Add `pipeline_runner.py` | Keeps API orchestration separate from DSP primitives. |
|
| 36 |
+
| Add disk cache in pipeline layer | Avoids invasive Demucs changes and caches both full mix and stems. |
|
| 37 |
+
| Add `online_preview` rather than replacing batch clustering | Preserves final-quality path while adding a near-realtime option. |
|
| 38 |
+
|
| 39 |
+
## Current implementation quality
|
| 40 |
+
|
| 41 |
+
| Area | Rating | Notes |
|
| 42 |
+
|---|---:|---|
|
| 43 |
+
| Extraction functionality | Good | Core path works on synthetic tests. |
|
| 44 |
+
| UI/UX foundation | Good | Custom flow is much better than generic Gradio controls. |
|
| 45 |
+
| Realtime architecture | Partial | Online clustering exists; streaming onset/audio pipeline does not. |
|
| 46 |
+
| Documentation | Good | Feature/task/progress/API/timing docs are now embedded. |
|
| 47 |
+
| Test coverage | Basic | Smoke tests exist; no formal unit/browser tests yet. |
|
| 48 |
+
| Maintainability | Medium | Better boundaries now, but DSP module remains dense. |
|
| 49 |
+
|
| 50 |
+
## Recommendation
|
| 51 |
+
|
| 52 |
+
Next development should not add more global parameters. It should add an editing loop: audition detected hits, manually fix bad onsets, merge/split clusters, relabel samples, then repack from edited state without rerunning expensive stages.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
docs/REMAINING_WORK.md
CHANGED
|
@@ -1,27 +1,35 @@
|
|
| 1 |
# Remaining work
|
| 2 |
|
| 3 |
-
|
| 4 |
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
|
| 13 |
## Known constraints
|
| 14 |
|
| 15 |
-
- Demucs is not a realtime stage and should stay explicitly offline.
|
| 16 |
-
-
|
| 17 |
- First run on a fresh environment can be slower due to imports, model download, and library initialization.
|
| 18 |
- The current job queue is process-local and single-worker. That is fine for local use, but not enough for a shared public deployment.
|
|
|
|
| 19 |
|
| 20 |
## Suggested implementation order
|
| 21 |
|
| 22 |
-
1. Add
|
| 23 |
-
2.
|
| 24 |
-
3. Add
|
| 25 |
-
4.
|
| 26 |
-
5. Add comparison
|
| 27 |
-
6. Add SSE
|
|
|
|
|
|
| 1 |
# Remaining work
|
| 2 |
|
| 3 |
+
Last updated: 2026-05-12
|
| 4 |
|
| 5 |
+
## Current gap assessment
|
| 6 |
+
|
| 7 |
+
The project is now a usable extraction workstation, not a complete interactive sample editor. The largest remaining gaps are UX/editor capabilities rather than core batch extraction.
|
| 8 |
+
|
| 9 |
+
## Highest-priority remaining gaps
|
| 10 |
+
|
| 11 |
+
1. **Hit audition and selection**: clicking an onset marker or sample row should audition that exact hit/sample.
|
| 12 |
+
2. **Waveform editing**: add onset adjustment, delete/add hit, and rerun-from-edited-onsets without redoing Demucs.
|
| 13 |
+
3. **Cluster editing**: allow merge, split, relabel, and manual reassignment of hits.
|
| 14 |
+
4. **Run comparison**: compare two manifests side-by-side for parameter tuning.
|
| 15 |
+
5. **Progress streaming**: replace polling or supplement it with SSE for lower-latency logs/progress.
|
| 16 |
+
6. **Frontend engineering hardening**: migrate the frontend to TypeScript after the UX stabilizes and add browser-level tests.
|
| 17 |
+
7. **Benchmark panel**: add an in-app benchmark view that can run synthetic fixtures and compare parameter profiles.
|
| 18 |
|
| 19 |
## Known constraints
|
| 20 |
|
| 21 |
+
- Demucs is not a realtime stage and should stay explicitly offline/cached.
|
| 22 |
+
- Batch agglomerative clustering is not realtime; `online_preview` is the progressive clustering path.
|
| 23 |
- First run on a fresh environment can be slower due to imports, model download, and library initialization.
|
| 24 |
- The current job queue is process-local and single-worker. That is fine for local use, but not enough for a shared public deployment.
|
| 25 |
+
- Run history is filesystem-backed via `.runs/`; deleting `.runs/` deletes history.
|
| 26 |
|
| 27 |
## Suggested implementation order
|
| 28 |
|
| 29 |
+
1. Add click-to-audition for sample table rows and waveform onsets.
|
| 30 |
+
2. Store detected hit snippets as individual review artifacts or expose ranged audio endpoints.
|
| 31 |
+
3. Add edit state to manifests: deleted hits, shifted onsets, labels, cluster overrides.
|
| 32 |
+
4. Add rerender/repack endpoint that starts from edited hit/cluster state.
|
| 33 |
+
5. Add run comparison view.
|
| 34 |
+
6. Add SSE progress streaming.
|
| 35 |
+
7. Convert frontend to TypeScript and add UI tests.
|
docs/TASKS.md
ADDED
|
@@ -0,0 +1,54 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Task ledger
|
| 2 |
+
|
| 3 |
+
Last updated: 2026-05-12
|
| 4 |
+
|
| 5 |
+
## User-requested tasks
|
| 6 |
+
|
| 7 |
+
| Task | Status | Evidence |
|
| 8 |
+
|---|---:|---|
|
| 9 |
+
| Review the project | Done | `docs/PROJECT_REVIEW.md`. |
|
| 10 |
+
| Determine length of significant subprocesses | Done | `pipeline_runner.py`, `scripts/benchmark_subprocesses.py`, `docs/benchmark-subprocesses.json`, `docs/PIPELINE_TIMING_AND_REALTIME.md`. |
|
| 11 |
+
| Identify near-realtime subprocesses | Done | `docs/PIPELINE_TIMING_AND_REALTIME.md`. |
|
| 12 |
+
| Add documentation to project | Done | `docs/*.md`, updated `README.md`. |
|
| 13 |
+
| Replace Gradio UI | Done | Active app is FastAPI + custom web UI; Gradio moved to `legacy/`. |
|
| 14 |
+
| Document features, tasks, and progress | Done | `docs/FEATURES.md`, this file, `docs/PROGRESS.md`. |
|
| 15 |
+
| Continue development while keeping docs up-to-date | In progress | This pass adds run history, disk cache, online clustering mode, and docs updates. |
|
| 16 |
+
|
| 17 |
+
## Completed implementation tasks
|
| 18 |
+
|
| 19 |
+
- [x] Preserve old Gradio apps in `legacy/`.
|
| 20 |
+
- [x] Expose extraction as a FastAPI job API.
|
| 21 |
+
- [x] Serve a custom browser UI from `web/`.
|
| 22 |
+
- [x] Add per-stage timing to the pipeline.
|
| 23 |
+
- [x] Write per-run `manifest.json`.
|
| 24 |
+
- [x] Add synthetic benchmark script.
|
| 25 |
+
- [x] Add API documentation.
|
| 26 |
+
- [x] Add UI replacement documentation.
|
| 27 |
+
- [x] Add project review and realtime analysis documentation.
|
| 28 |
+
- [x] Add run-history listing endpoint: `GET /api/jobs`.
|
| 29 |
+
- [x] Add run-history UI panel.
|
| 30 |
+
- [x] Add disk cache for stem/full-mix loads.
|
| 31 |
+
- [x] Extend cache clearing to disk cache.
|
| 32 |
+
- [x] Add prototype-based `online_preview` clustering mode.
|
| 33 |
+
- [x] Add UI controls for clustering mode and disk cache.
|
| 34 |
+
- [x] Fix duplicate sample writes in `build_archive`.
|
| 35 |
+
- [x] Add feature, task, and progress docs.
|
| 36 |
+
|
| 37 |
+
## Validation tasks
|
| 38 |
+
|
| 39 |
+
- [x] Python compile check for active Python files.
|
| 40 |
+
- [x] FastAPI smoke test for health/config/job flow.
|
| 41 |
+
- [x] Pipeline smoke test on synthetic audio.
|
| 42 |
+
- [x] API history/cache smoke test.
|
| 43 |
+
- [x] Git status reviewed before packaging.
|
| 44 |
+
- [x] Project archive excludes `.runs/`, `.cache/`, and dependency folders.
|
| 45 |
+
|
| 46 |
+
## Remaining high-value tasks
|
| 47 |
+
|
| 48 |
+
- [ ] Add click-to-audition onset markers and table rows.
|
| 49 |
+
- [ ] Add onset adjustment and rerun-from-onsets flow.
|
| 50 |
+
- [ ] Add cluster merge/split/relabel workflow.
|
| 51 |
+
- [ ] Add side-by-side run comparison.
|
| 52 |
+
- [ ] Add SSE progress stream for lower-latency updates.
|
| 53 |
+
- [ ] Convert frontend to TypeScript with a small Vite build once UX stabilizes.
|
| 54 |
+
- [ ] Add automated browser-level UI tests.
|
docs/UI_REPLACEMENT.md
CHANGED
|
@@ -1,26 +1,30 @@
|
|
| 1 |
# Custom UI replacement
|
| 2 |
|
|
|
|
|
|
|
| 3 |
## What changed
|
| 4 |
|
| 5 |
-
The active interface is
|
| 6 |
|
| 7 |
## UX goals
|
| 8 |
|
| 9 |
1. Make the process feel like a sample-extraction workstation, not a generic notebook form.
|
| 10 |
-
2. Keep upload, controls, pipeline status, logs, waveform review, audio previews, downloads, and sample rows visible without tab hunting.
|
| 11 |
3. Show stage timing as a first-class result, because extraction quality and speed tradeoffs matter.
|
| 12 |
4. Make `stem=all` obvious for fast iteration when Demucs is unnecessary.
|
| 13 |
-
5.
|
|
|
|
| 14 |
|
| 15 |
## UI structure
|
| 16 |
|
| 17 |
| Area | Purpose |
|
| 18 |
|---|---|
|
| 19 |
-
| Hero/status | Backend readiness and product framing |
|
| 20 |
-
| Source panel | Drag/drop upload and source audio preview |
|
| 21 |
-
| Controls panel | Stem, onset, clustering, MIDI, and
|
| 22 |
-
| Pipeline panel | Stage statuses, durations, and
|
| 23 |
-
|
|
|
|
|
| 24 |
|
| 25 |
## Frontend implementation
|
| 26 |
|
|
@@ -32,11 +36,11 @@ Files:
|
|
| 32 |
|
| 33 |
The frontend uses modern browser APIs directly:
|
| 34 |
|
| 35 |
-
- `fetch` for API calls
|
| 36 |
-
- `FormData` for upload
|
| 37 |
-
- `<audio>` for previews
|
| 38 |
-
- `<canvas>` for waveform/onset visualization
|
| 39 |
-
- CSS grid, responsive layout, custom properties, and backdrop filters for layout/polish
|
| 40 |
|
| 41 |
No Gradio runtime, iframe, or generated UI framework is involved.
|
| 42 |
|
|
@@ -50,6 +54,17 @@ The frontend creates a job with `POST /api/jobs`, then polls `GET /api/jobs/{id}
|
|
| 50 |
- reconstruction WAV
|
| 51 |
- individual sample WAVs
|
| 52 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 53 |
## Why polling instead of websockets/SSE
|
| 54 |
|
| 55 |
Polling is the simplest robust option here because the current pipeline is CPU-heavy and mostly stage-based. The UI polls every 800 ms, which is enough to show stage transitions and logs without introducing websocket lifecycle complexity.
|
|
@@ -62,5 +77,5 @@ Future improvement: use Server-Sent Events for lower-latency log streaming once
|
|
| 62 |
- Add inline controls for reassigning sample labels and merging/splitting clusters.
|
| 63 |
- Add A/B comparison between parameter runs.
|
| 64 |
- Add downloadable timing report per job.
|
| 65 |
-
- Add
|
| 66 |
-
-
|
|
|
|
| 1 |
# Custom UI replacement
|
| 2 |
|
| 3 |
+
Last updated: 2026-05-12
|
| 4 |
+
|
| 5 |
## What changed
|
| 6 |
|
| 7 |
+
The active interface is a custom browser UI served from `web/` by the FastAPI app in `app.py`. The old Gradio files live in `legacy/` and are no longer used by the active application.
|
| 8 |
|
| 9 |
## UX goals
|
| 10 |
|
| 11 |
1. Make the process feel like a sample-extraction workstation, not a generic notebook form.
|
| 12 |
+
2. Keep upload, controls, pipeline status, logs, waveform review, audio previews, downloads, run history, and sample rows visible without tab hunting.
|
| 13 |
3. Show stage timing as a first-class result, because extraction quality and speed tradeoffs matter.
|
| 14 |
4. Make `stem=all` obvious for fast iteration when Demucs is unnecessary.
|
| 15 |
+
5. Make `online_preview` obvious as the near-realtime clustering path.
|
| 16 |
+
6. Keep the frontend deployable without a JavaScript build step until the interaction model stabilizes.
|
| 17 |
|
| 18 |
## UI structure
|
| 19 |
|
| 20 |
| Area | Purpose |
|
| 21 |
|---|---|
|
| 22 |
+
| Hero/status | Backend readiness and product framing. |
|
| 23 |
+
| Source panel | Drag/drop upload and source audio preview. |
|
| 24 |
+
| Controls panel | Stem, onset, clustering, MIDI, synthesis, and disk-cache parameters. |
|
| 25 |
+
| Pipeline panel | Stage statuses, durations, and logs. |
|
| 26 |
+
| Run history panel | Loads completed manifests from `.runs/`. |
|
| 27 |
+
| Result panel | Summary, waveform/onsets, downloads, stem/reconstruction audio, sample table. |
|
| 28 |
|
| 29 |
## Frontend implementation
|
| 30 |
|
|
|
|
| 36 |
|
| 37 |
The frontend uses modern browser APIs directly:
|
| 38 |
|
| 39 |
+
- `fetch` for API calls.
|
| 40 |
+
- `FormData` for upload.
|
| 41 |
+
- `<audio>` for previews.
|
| 42 |
+
- `<canvas>` for waveform/onset visualization.
|
| 43 |
+
- CSS grid, responsive layout, custom properties, and backdrop filters for layout/polish.
|
| 44 |
|
| 45 |
No Gradio runtime, iframe, or generated UI framework is involved.
|
| 46 |
|
|
|
|
| 54 |
- reconstruction WAV
|
| 55 |
- individual sample WAVs
|
| 56 |
|
| 57 |
+
The run history panel calls `GET /api/jobs` and can reload any completed manifest still present under `.runs/`.
|
| 58 |
+
|
| 59 |
+
## Clustering UX
|
| 60 |
+
|
| 61 |
+
Two modes are exposed:
|
| 62 |
+
|
| 63 |
+
| Mode | UX intent |
|
| 64 |
+
|---|---|
|
| 65 |
+
| `batch_quality` | Slower, final-quality clustering using all-pairs similarity plus agglomerative clustering. |
|
| 66 |
+
| `online_preview` | Faster near-realtime-style clustering using prototype assignment. Best for quick iteration after bypassing Demucs. |
|
| 67 |
+
|
| 68 |
## Why polling instead of websockets/SSE
|
| 69 |
|
| 70 |
Polling is the simplest robust option here because the current pipeline is CPU-heavy and mostly stage-based. The UI polls every 800 ms, which is enough to show stage transitions and logs without introducing websocket lifecycle complexity.
|
|
|
|
| 77 |
- Add inline controls for reassigning sample labels and merging/splitting clusters.
|
| 78 |
- Add A/B comparison between parameter runs.
|
| 79 |
- Add downloadable timing report per job.
|
| 80 |
+
- Add filters/search to the run history browser.
|
| 81 |
+
- Convert the frontend to TypeScript when the UX stops moving quickly.
|
docs/benchmark-online-preview.json
ADDED
|
@@ -0,0 +1,273 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"clustering_mode": "online_preview",
|
| 3 |
+
"runs": [
|
| 4 |
+
{
|
| 5 |
+
"pattern": "rock",
|
| 6 |
+
"bars": 2,
|
| 7 |
+
"bpm": 120.0,
|
| 8 |
+
"run_index": 0,
|
| 9 |
+
"clustering_mode": "online_preview",
|
| 10 |
+
"audio_duration_sec": 4.75,
|
| 11 |
+
"total_duration_sec": 2.394493,
|
| 12 |
+
"realtime_factor": 0.504104,
|
| 13 |
+
"hit_count": 14,
|
| 14 |
+
"cluster_count": 10,
|
| 15 |
+
"stages": [
|
| 16 |
+
{
|
| 17 |
+
"key": "stem",
|
| 18 |
+
"label": "Stem extraction / source load",
|
| 19 |
+
"duration_sec": 0.01333964500008733,
|
| 20 |
+
"status": "done",
|
| 21 |
+
"detail": "loaded full mix \u00b7 cached"
|
| 22 |
+
},
|
| 23 |
+
{
|
| 24 |
+
"key": "bpm",
|
| 25 |
+
"label": "Tempo detection",
|
| 26 |
+
"duration_sec": 0.18073730900005103,
|
| 27 |
+
"status": "done",
|
| 28 |
+
"detail": "120.2 BPM"
|
| 29 |
+
},
|
| 30 |
+
{
|
| 31 |
+
"key": "onsets",
|
| 32 |
+
"label": "Onset detection + slicing",
|
| 33 |
+
"duration_sec": 1.8083914959997855,
|
| 34 |
+
"status": "done",
|
| 35 |
+
"detail": "14 hits"
|
| 36 |
+
},
|
| 37 |
+
{
|
| 38 |
+
"key": "classification",
|
| 39 |
+
"label": "Spectral rule classification",
|
| 40 |
+
"duration_sec": 0.015553790000012668,
|
| 41 |
+
"status": "done",
|
| 42 |
+
"detail": "bright:5, hihat_open:8, kick:1"
|
| 43 |
+
},
|
| 44 |
+
{
|
| 45 |
+
"key": "clustering",
|
| 46 |
+
"label": "Mel fingerprint + transient NCC clustering",
|
| 47 |
+
"duration_sec": 0.01717499700021108,
|
| 48 |
+
"status": "done",
|
| 49 |
+
"detail": "10 clusters \u00b7 online preview"
|
| 50 |
+
},
|
| 51 |
+
{
|
| 52 |
+
"key": "selection",
|
| 53 |
+
"label": "Best representative scoring",
|
| 54 |
+
"duration_sec": 0.06853683399981492,
|
| 55 |
+
"status": "done",
|
| 56 |
+
"detail": "quality-scored representatives"
|
| 57 |
+
},
|
| 58 |
+
{
|
| 59 |
+
"key": "synthesis",
|
| 60 |
+
"label": "Optional sample synthesis",
|
| 61 |
+
"duration_sec": 0.0004338460000781197,
|
| 62 |
+
"status": "done",
|
| 63 |
+
"detail": "2 synthesized alternates"
|
| 64 |
+
},
|
| 65 |
+
{
|
| 66 |
+
"key": "export",
|
| 67 |
+
"label": "MIDI, reconstruction, WAV, ZIP export",
|
| 68 |
+
"duration_sec": 0.2898033520000354,
|
| 69 |
+
"status": "done",
|
| 70 |
+
"detail": "10 WAVs + MIDI + ZIP"
|
| 71 |
+
}
|
| 72 |
+
]
|
| 73 |
+
},
|
| 74 |
+
{
|
| 75 |
+
"pattern": "funk",
|
| 76 |
+
"bars": 2,
|
| 77 |
+
"bpm": 120.0,
|
| 78 |
+
"run_index": 0,
|
| 79 |
+
"clustering_mode": "online_preview",
|
| 80 |
+
"audio_duration_sec": 4.874989,
|
| 81 |
+
"total_duration_sec": 2.422223,
|
| 82 |
+
"realtime_factor": 0.496867,
|
| 83 |
+
"hit_count": 30,
|
| 84 |
+
"cluster_count": 12,
|
| 85 |
+
"stages": [
|
| 86 |
+
{
|
| 87 |
+
"key": "stem",
|
| 88 |
+
"label": "Stem extraction / source load",
|
| 89 |
+
"duration_sec": 0.012654803000032189,
|
| 90 |
+
"status": "done",
|
| 91 |
+
"detail": "loaded full mix \u00b7 cached"
|
| 92 |
+
},
|
| 93 |
+
{
|
| 94 |
+
"key": "bpm",
|
| 95 |
+
"label": "Tempo detection",
|
| 96 |
+
"duration_sec": 0.10868702200014013,
|
| 97 |
+
"status": "done",
|
| 98 |
+
"detail": "120.2 BPM"
|
| 99 |
+
},
|
| 100 |
+
{
|
| 101 |
+
"key": "onsets",
|
| 102 |
+
"label": "Onset detection + slicing",
|
| 103 |
+
"duration_sec": 1.7981390029999602,
|
| 104 |
+
"status": "done",
|
| 105 |
+
"detail": "30 hits"
|
| 106 |
+
},
|
| 107 |
+
{
|
| 108 |
+
"key": "classification",
|
| 109 |
+
"label": "Spectral rule classification",
|
| 110 |
+
"duration_sec": 0.020911717999979373,
|
| 111 |
+
"status": "done",
|
| 112 |
+
"detail": "bright:12, cymbal:2, hihat_closed:9, hihat_open:3, kick:1, mid:3"
|
| 113 |
+
},
|
| 114 |
+
{
|
| 115 |
+
"key": "clustering",
|
| 116 |
+
"label": "Mel fingerprint + transient NCC clustering",
|
| 117 |
+
"duration_sec": 0.08173960800013447,
|
| 118 |
+
"status": "done",
|
| 119 |
+
"detail": "12 clusters \u00b7 online preview"
|
| 120 |
+
},
|
| 121 |
+
{
|
| 122 |
+
"key": "selection",
|
| 123 |
+
"label": "Best representative scoring",
|
| 124 |
+
"duration_sec": 0.18588780100003532,
|
| 125 |
+
"status": "done",
|
| 126 |
+
"detail": "quality-scored representatives"
|
| 127 |
+
},
|
| 128 |
+
{
|
| 129 |
+
"key": "synthesis",
|
| 130 |
+
"label": "Optional sample synthesis",
|
| 131 |
+
"duration_sec": 0.001146163000157685,
|
| 132 |
+
"status": "done",
|
| 133 |
+
"detail": "6 synthesized alternates"
|
| 134 |
+
},
|
| 135 |
+
{
|
| 136 |
+
"key": "export",
|
| 137 |
+
"label": "MIDI, reconstruction, WAV, ZIP export",
|
| 138 |
+
"duration_sec": 0.21253995300003226,
|
| 139 |
+
"status": "done",
|
| 140 |
+
"detail": "12 WAVs + MIDI + ZIP"
|
| 141 |
+
}
|
| 142 |
+
]
|
| 143 |
+
},
|
| 144 |
+
{
|
| 145 |
+
"pattern": "halftime",
|
| 146 |
+
"bars": 2,
|
| 147 |
+
"bpm": 120.0,
|
| 148 |
+
"run_index": 0,
|
| 149 |
+
"clustering_mode": "online_preview",
|
| 150 |
+
"audio_duration_sec": 4.874989,
|
| 151 |
+
"total_duration_sec": 2.406563,
|
| 152 |
+
"realtime_factor": 0.493655,
|
| 153 |
+
"hit_count": 28,
|
| 154 |
+
"cluster_count": 12,
|
| 155 |
+
"stages": [
|
| 156 |
+
{
|
| 157 |
+
"key": "stem",
|
| 158 |
+
"label": "Stem extraction / source load",
|
| 159 |
+
"duration_sec": 0.009107656999958635,
|
| 160 |
+
"status": "done",
|
| 161 |
+
"detail": "loaded full mix \u00b7 cached"
|
| 162 |
+
},
|
| 163 |
+
{
|
| 164 |
+
"key": "bpm",
|
| 165 |
+
"label": "Tempo detection",
|
| 166 |
+
"duration_sec": 0.19882379599994238,
|
| 167 |
+
"status": "done",
|
| 168 |
+
"detail": "118.8 BPM"
|
| 169 |
+
},
|
| 170 |
+
{
|
| 171 |
+
"key": "onsets",
|
| 172 |
+
"label": "Onset detection + slicing",
|
| 173 |
+
"duration_sec": 1.8942657120001059,
|
| 174 |
+
"status": "done",
|
| 175 |
+
"detail": "28 hits"
|
| 176 |
+
},
|
| 177 |
+
{
|
| 178 |
+
"key": "classification",
|
| 179 |
+
"label": "Spectral rule classification",
|
| 180 |
+
"duration_sec": 0.015083428000025378,
|
| 181 |
+
"status": "done",
|
| 182 |
+
"detail": "bright:5, cymbal:2, hihat_closed:19, hihat_open:2"
|
| 183 |
+
},
|
| 184 |
+
{
|
| 185 |
+
"key": "clustering",
|
| 186 |
+
"label": "Mel fingerprint + transient NCC clustering",
|
| 187 |
+
"duration_sec": 0.036892447000127504,
|
| 188 |
+
"status": "done",
|
| 189 |
+
"detail": "12 clusters \u00b7 online preview"
|
| 190 |
+
},
|
| 191 |
+
{
|
| 192 |
+
"key": "selection",
|
| 193 |
+
"label": "Best representative scoring",
|
| 194 |
+
"duration_sec": 0.0908485570000721,
|
| 195 |
+
"status": "done",
|
| 196 |
+
"detail": "quality-scored representatives"
|
| 197 |
+
},
|
| 198 |
+
{
|
| 199 |
+
"key": "synthesis",
|
| 200 |
+
"label": "Optional sample synthesis",
|
| 201 |
+
"duration_sec": 0.0007993310000529164,
|
| 202 |
+
"status": "done",
|
| 203 |
+
"detail": "4 synthesized alternates"
|
| 204 |
+
},
|
| 205 |
+
{
|
| 206 |
+
"key": "export",
|
| 207 |
+
"label": "MIDI, reconstruction, WAV, ZIP export",
|
| 208 |
+
"duration_sec": 0.1602465889998257,
|
| 209 |
+
"status": "done",
|
| 210 |
+
"detail": "12 WAVs + MIDI + ZIP"
|
| 211 |
+
}
|
| 212 |
+
]
|
| 213 |
+
}
|
| 214 |
+
],
|
| 215 |
+
"summary": [
|
| 216 |
+
{
|
| 217 |
+
"stage": "stem",
|
| 218 |
+
"mean_sec": 0.011701,
|
| 219 |
+
"median_sec": 0.012655,
|
| 220 |
+
"min_sec": 0.009108,
|
| 221 |
+
"max_sec": 0.01334
|
| 222 |
+
},
|
| 223 |
+
{
|
| 224 |
+
"stage": "bpm",
|
| 225 |
+
"mean_sec": 0.162749,
|
| 226 |
+
"median_sec": 0.180737,
|
| 227 |
+
"min_sec": 0.108687,
|
| 228 |
+
"max_sec": 0.198824
|
| 229 |
+
},
|
| 230 |
+
{
|
| 231 |
+
"stage": "onsets",
|
| 232 |
+
"mean_sec": 1.833599,
|
| 233 |
+
"median_sec": 1.808391,
|
| 234 |
+
"min_sec": 1.798139,
|
| 235 |
+
"max_sec": 1.894266
|
| 236 |
+
},
|
| 237 |
+
{
|
| 238 |
+
"stage": "classification",
|
| 239 |
+
"mean_sec": 0.017183,
|
| 240 |
+
"median_sec": 0.015554,
|
| 241 |
+
"min_sec": 0.015083,
|
| 242 |
+
"max_sec": 0.020912
|
| 243 |
+
},
|
| 244 |
+
{
|
| 245 |
+
"stage": "clustering",
|
| 246 |
+
"mean_sec": 0.045269,
|
| 247 |
+
"median_sec": 0.036892,
|
| 248 |
+
"min_sec": 0.017175,
|
| 249 |
+
"max_sec": 0.08174
|
| 250 |
+
},
|
| 251 |
+
{
|
| 252 |
+
"stage": "selection",
|
| 253 |
+
"mean_sec": 0.115091,
|
| 254 |
+
"median_sec": 0.090849,
|
| 255 |
+
"min_sec": 0.068537,
|
| 256 |
+
"max_sec": 0.185888
|
| 257 |
+
},
|
| 258 |
+
{
|
| 259 |
+
"stage": "synthesis",
|
| 260 |
+
"mean_sec": 0.000793,
|
| 261 |
+
"median_sec": 0.000799,
|
| 262 |
+
"min_sec": 0.000434,
|
| 263 |
+
"max_sec": 0.001146
|
| 264 |
+
},
|
| 265 |
+
{
|
| 266 |
+
"stage": "export",
|
| 267 |
+
"mean_sec": 0.220863,
|
| 268 |
+
"median_sec": 0.21254,
|
| 269 |
+
"min_sec": 0.160247,
|
| 270 |
+
"max_sec": 0.289803
|
| 271 |
+
}
|
| 272 |
+
]
|
| 273 |
+
}
|
docs/benchmark-subprocesses.json
CHANGED
|
@@ -1,138 +1,141 @@
|
|
| 1 |
{
|
|
|
|
| 2 |
"runs": [
|
| 3 |
{
|
| 4 |
"pattern": "rock",
|
| 5 |
-
"bars":
|
| 6 |
"bpm": 120.0,
|
| 7 |
"run_index": 0,
|
| 8 |
-
"
|
| 9 |
-
"
|
| 10 |
-
"
|
| 11 |
-
"
|
| 12 |
-
"
|
|
|
|
| 13 |
"stages": [
|
| 14 |
{
|
| 15 |
"key": "stem",
|
| 16 |
"label": "Stem extraction / source load",
|
| 17 |
-
"duration_sec": 0.
|
| 18 |
"status": "done",
|
| 19 |
-
"detail": "loaded full mix"
|
| 20 |
},
|
| 21 |
{
|
| 22 |
"key": "bpm",
|
| 23 |
"label": "Tempo detection",
|
| 24 |
-
"duration_sec": 0.
|
| 25 |
"status": "done",
|
| 26 |
"detail": "120.2 BPM"
|
| 27 |
},
|
| 28 |
{
|
| 29 |
"key": "onsets",
|
| 30 |
"label": "Onset detection + slicing",
|
| 31 |
-
"duration_sec": 1.
|
| 32 |
"status": "done",
|
| 33 |
-
"detail": "
|
| 34 |
},
|
| 35 |
{
|
| 36 |
"key": "classification",
|
| 37 |
"label": "Spectral rule classification",
|
| 38 |
-
"duration_sec": 0.
|
| 39 |
"status": "done",
|
| 40 |
-
"detail": "bright:
|
| 41 |
},
|
| 42 |
{
|
| 43 |
"key": "clustering",
|
| 44 |
"label": "Mel fingerprint + transient NCC clustering",
|
| 45 |
-
"duration_sec": 0.
|
| 46 |
"status": "done",
|
| 47 |
-
"detail": "
|
| 48 |
},
|
| 49 |
{
|
| 50 |
"key": "selection",
|
| 51 |
"label": "Best representative scoring",
|
| 52 |
-
"duration_sec": 0.
|
| 53 |
"status": "done",
|
| 54 |
"detail": "quality-scored representatives"
|
| 55 |
},
|
| 56 |
{
|
| 57 |
"key": "synthesis",
|
| 58 |
"label": "Optional sample synthesis",
|
| 59 |
-
"duration_sec": 0.
|
| 60 |
"status": "done",
|
| 61 |
-
"detail": "
|
| 62 |
},
|
| 63 |
{
|
| 64 |
"key": "export",
|
| 65 |
"label": "MIDI, reconstruction, WAV, ZIP export",
|
| 66 |
-
"duration_sec": 0.
|
| 67 |
"status": "done",
|
| 68 |
-
"detail": "
|
| 69 |
}
|
| 70 |
]
|
| 71 |
},
|
| 72 |
{
|
| 73 |
"pattern": "funk",
|
| 74 |
-
"bars":
|
| 75 |
"bpm": 120.0,
|
| 76 |
"run_index": 0,
|
| 77 |
-
"
|
| 78 |
-
"
|
| 79 |
-
"
|
| 80 |
-
"
|
|
|
|
| 81 |
"cluster_count": 2,
|
| 82 |
"stages": [
|
| 83 |
{
|
| 84 |
"key": "stem",
|
| 85 |
"label": "Stem extraction / source load",
|
| 86 |
-
"duration_sec": 0.
|
| 87 |
"status": "done",
|
| 88 |
-
"detail": "loaded full mix"
|
| 89 |
},
|
| 90 |
{
|
| 91 |
"key": "bpm",
|
| 92 |
"label": "Tempo detection",
|
| 93 |
-
"duration_sec": 0.
|
| 94 |
"status": "done",
|
| 95 |
"detail": "161.5 BPM"
|
| 96 |
},
|
| 97 |
{
|
| 98 |
"key": "onsets",
|
| 99 |
"label": "Onset detection + slicing",
|
| 100 |
-
"duration_sec": 2.
|
| 101 |
"status": "done",
|
| 102 |
-
"detail": "
|
| 103 |
},
|
| 104 |
{
|
| 105 |
"key": "classification",
|
| 106 |
"label": "Spectral rule classification",
|
| 107 |
-
"duration_sec": 0.
|
| 108 |
"status": "done",
|
| 109 |
-
"detail": "bright:
|
| 110 |
},
|
| 111 |
{
|
| 112 |
"key": "clustering",
|
| 113 |
"label": "Mel fingerprint + transient NCC clustering",
|
| 114 |
-
"duration_sec": 0.
|
| 115 |
"status": "done",
|
| 116 |
-
"detail": "2 clusters"
|
| 117 |
},
|
| 118 |
{
|
| 119 |
"key": "selection",
|
| 120 |
"label": "Best representative scoring",
|
| 121 |
-
"duration_sec": 0.
|
| 122 |
"status": "done",
|
| 123 |
"detail": "quality-scored representatives"
|
| 124 |
},
|
| 125 |
{
|
| 126 |
"key": "synthesis",
|
| 127 |
"label": "Optional sample synthesis",
|
| 128 |
-
"duration_sec": 0.
|
| 129 |
"status": "done",
|
| 130 |
"detail": "2 synthesized alternates"
|
| 131 |
},
|
| 132 |
{
|
| 133 |
"key": "export",
|
| 134 |
"label": "MIDI, reconstruction, WAV, ZIP export",
|
| 135 |
-
"duration_sec": 0.
|
| 136 |
"status": "done",
|
| 137 |
"detail": "2 WAVs + MIDI + ZIP"
|
| 138 |
}
|
|
@@ -140,337 +143,131 @@
|
|
| 140 |
},
|
| 141 |
{
|
| 142 |
"pattern": "halftime",
|
| 143 |
-
"bars":
|
| 144 |
"bpm": 120.0,
|
| 145 |
"run_index": 0,
|
| 146 |
-
"
|
| 147 |
-
"
|
| 148 |
-
"
|
| 149 |
-
"
|
| 150 |
-
"
|
| 151 |
-
"stages": [
|
| 152 |
-
{
|
| 153 |
-
"key": "stem",
|
| 154 |
-
"label": "Stem extraction / source load",
|
| 155 |
-
"duration_sec": 0.009298575000002529,
|
| 156 |
-
"status": "done",
|
| 157 |
-
"detail": "loaded full mix"
|
| 158 |
-
},
|
| 159 |
-
{
|
| 160 |
-
"key": "bpm",
|
| 161 |
-
"label": "Tempo detection",
|
| 162 |
-
"duration_sec": 0.21581650399997443,
|
| 163 |
-
"status": "done",
|
| 164 |
-
"detail": "120.2 BPM"
|
| 165 |
-
},
|
| 166 |
-
{
|
| 167 |
-
"key": "onsets",
|
| 168 |
-
"label": "Onset detection + slicing",
|
| 169 |
-
"duration_sec": 1.9768937550000487,
|
| 170 |
-
"status": "done",
|
| 171 |
-
"detail": "66 hits"
|
| 172 |
-
},
|
| 173 |
-
{
|
| 174 |
-
"key": "classification",
|
| 175 |
-
"label": "Spectral rule classification",
|
| 176 |
-
"duration_sec": 0.03783250899999757,
|
| 177 |
-
"status": "done",
|
| 178 |
-
"detail": "bright:11, cymbal:2, hihat_closed:48, hihat_open:5"
|
| 179 |
-
},
|
| 180 |
-
{
|
| 181 |
-
"key": "clustering",
|
| 182 |
-
"label": "Mel fingerprint + transient NCC clustering",
|
| 183 |
-
"duration_sec": 0.7498706449999872,
|
| 184 |
-
"status": "done",
|
| 185 |
-
"detail": "2 clusters"
|
| 186 |
-
},
|
| 187 |
-
{
|
| 188 |
-
"key": "selection",
|
| 189 |
-
"label": "Best representative scoring",
|
| 190 |
-
"duration_sec": 0.6169061510000233,
|
| 191 |
-
"status": "done",
|
| 192 |
-
"detail": "quality-scored representatives"
|
| 193 |
-
},
|
| 194 |
-
{
|
| 195 |
-
"key": "synthesis",
|
| 196 |
-
"label": "Optional sample synthesis",
|
| 197 |
-
"duration_sec": 0.0028750459999855593,
|
| 198 |
-
"status": "done",
|
| 199 |
-
"detail": "2 synthesized alternates"
|
| 200 |
-
},
|
| 201 |
-
{
|
| 202 |
-
"key": "export",
|
| 203 |
-
"label": "MIDI, reconstruction, WAV, ZIP export",
|
| 204 |
-
"duration_sec": 0.09185817900004167,
|
| 205 |
-
"status": "done",
|
| 206 |
-
"detail": "2 WAVs + MIDI + ZIP"
|
| 207 |
-
}
|
| 208 |
-
]
|
| 209 |
-
},
|
| 210 |
-
{
|
| 211 |
-
"pattern": "rock",
|
| 212 |
-
"bars": 4,
|
| 213 |
-
"bpm": 120.0,
|
| 214 |
-
"run_index": 1,
|
| 215 |
-
"audio_duration_sec": 8.75,
|
| 216 |
-
"total_duration_sec": 2.848686,
|
| 217 |
-
"realtime_factor": 0.325564,
|
| 218 |
-
"hit_count": 24,
|
| 219 |
-
"cluster_count": 1,
|
| 220 |
-
"stages": [
|
| 221 |
-
{
|
| 222 |
-
"key": "stem",
|
| 223 |
-
"label": "Stem extraction / source load",
|
| 224 |
-
"duration_sec": 0.03869248300003392,
|
| 225 |
-
"status": "done",
|
| 226 |
-
"detail": "loaded full mix"
|
| 227 |
-
},
|
| 228 |
-
{
|
| 229 |
-
"key": "bpm",
|
| 230 |
-
"label": "Tempo detection",
|
| 231 |
-
"duration_sec": 0.24107510999999704,
|
| 232 |
-
"status": "done",
|
| 233 |
-
"detail": "120.2 BPM"
|
| 234 |
-
},
|
| 235 |
-
{
|
| 236 |
-
"key": "onsets",
|
| 237 |
-
"label": "Onset detection + slicing",
|
| 238 |
-
"duration_sec": 2.0721967459999746,
|
| 239 |
-
"status": "done",
|
| 240 |
-
"detail": "24 hits"
|
| 241 |
-
},
|
| 242 |
-
{
|
| 243 |
-
"key": "classification",
|
| 244 |
-
"label": "Spectral rule classification",
|
| 245 |
-
"duration_sec": 0.024016725000024053,
|
| 246 |
-
"status": "done",
|
| 247 |
-
"detail": "bright:7, hihat_closed:2, hihat_open:15"
|
| 248 |
-
},
|
| 249 |
-
{
|
| 250 |
-
"key": "clustering",
|
| 251 |
-
"label": "Mel fingerprint + transient NCC clustering",
|
| 252 |
-
"duration_sec": 0.05910233800000242,
|
| 253 |
-
"status": "done",
|
| 254 |
-
"detail": "1 clusters"
|
| 255 |
-
},
|
| 256 |
-
{
|
| 257 |
-
"key": "selection",
|
| 258 |
-
"label": "Best representative scoring",
|
| 259 |
-
"duration_sec": 0.3106304350000073,
|
| 260 |
-
"status": "done",
|
| 261 |
-
"detail": "quality-scored representatives"
|
| 262 |
-
},
|
| 263 |
-
{
|
| 264 |
-
"key": "synthesis",
|
| 265 |
-
"label": "Optional sample synthesis",
|
| 266 |
-
"duration_sec": 0.0015013799999792354,
|
| 267 |
-
"status": "done",
|
| 268 |
-
"detail": "1 synthesized alternates"
|
| 269 |
-
},
|
| 270 |
-
{
|
| 271 |
-
"key": "export",
|
| 272 |
-
"label": "MIDI, reconstruction, WAV, ZIP export",
|
| 273 |
-
"duration_sec": 0.10095534999999245,
|
| 274 |
-
"status": "done",
|
| 275 |
-
"detail": "1 WAVs + MIDI + ZIP"
|
| 276 |
-
}
|
| 277 |
-
]
|
| 278 |
-
},
|
| 279 |
-
{
|
| 280 |
-
"pattern": "funk",
|
| 281 |
-
"bars": 4,
|
| 282 |
-
"bpm": 120.0,
|
| 283 |
-
"run_index": 1,
|
| 284 |
-
"audio_duration_sec": 8.874989,
|
| 285 |
-
"total_duration_sec": 3.416797,
|
| 286 |
-
"realtime_factor": 0.384992,
|
| 287 |
-
"hit_count": 52,
|
| 288 |
"cluster_count": 3,
|
| 289 |
"stages": [
|
| 290 |
{
|
| 291 |
"key": "stem",
|
| 292 |
"label": "Stem extraction / source load",
|
| 293 |
-
"duration_sec": 0.
|
| 294 |
"status": "done",
|
| 295 |
-
"detail": "loaded full mix"
|
| 296 |
},
|
| 297 |
{
|
| 298 |
"key": "bpm",
|
| 299 |
"label": "Tempo detection",
|
| 300 |
-
"duration_sec": 0.
|
| 301 |
"status": "done",
|
| 302 |
"detail": "120.2 BPM"
|
| 303 |
},
|
| 304 |
{
|
| 305 |
"key": "onsets",
|
| 306 |
"label": "Onset detection + slicing",
|
| 307 |
-
"duration_sec": 1.
|
| 308 |
"status": "done",
|
| 309 |
-
"detail": "
|
| 310 |
},
|
| 311 |
{
|
| 312 |
"key": "classification",
|
| 313 |
"label": "Spectral rule classification",
|
| 314 |
-
"duration_sec": 0.
|
| 315 |
"status": "done",
|
| 316 |
-
"detail": "bright:
|
| 317 |
},
|
| 318 |
{
|
| 319 |
"key": "clustering",
|
| 320 |
"label": "Mel fingerprint + transient NCC clustering",
|
| 321 |
-
"duration_sec": 0.
|
| 322 |
"status": "done",
|
| 323 |
-
"detail": "3 clusters"
|
| 324 |
},
|
| 325 |
{
|
| 326 |
"key": "selection",
|
| 327 |
"label": "Best representative scoring",
|
| 328 |
-
"duration_sec": 0.
|
| 329 |
"status": "done",
|
| 330 |
"detail": "quality-scored representatives"
|
| 331 |
},
|
| 332 |
{
|
| 333 |
"key": "synthesis",
|
| 334 |
"label": "Optional sample synthesis",
|
| 335 |
-
"duration_sec": 0.
|
| 336 |
"status": "done",
|
| 337 |
"detail": "3 synthesized alternates"
|
| 338 |
},
|
| 339 |
{
|
| 340 |
"key": "export",
|
| 341 |
"label": "MIDI, reconstruction, WAV, ZIP export",
|
| 342 |
-
"duration_sec": 0.
|
| 343 |
"status": "done",
|
| 344 |
"detail": "3 WAVs + MIDI + ZIP"
|
| 345 |
}
|
| 346 |
]
|
| 347 |
-
},
|
| 348 |
-
{
|
| 349 |
-
"pattern": "halftime",
|
| 350 |
-
"bars": 4,
|
| 351 |
-
"bpm": 120.0,
|
| 352 |
-
"run_index": 1,
|
| 353 |
-
"audio_duration_sec": 8.874989,
|
| 354 |
-
"total_duration_sec": 4.750472,
|
| 355 |
-
"realtime_factor": 0.535265,
|
| 356 |
-
"hit_count": 64,
|
| 357 |
-
"cluster_count": 1,
|
| 358 |
-
"stages": [
|
| 359 |
-
{
|
| 360 |
-
"key": "stem",
|
| 361 |
-
"label": "Stem extraction / source load",
|
| 362 |
-
"duration_sec": 0.016472632999978032,
|
| 363 |
-
"status": "done",
|
| 364 |
-
"detail": "loaded full mix"
|
| 365 |
-
},
|
| 366 |
-
{
|
| 367 |
-
"key": "bpm",
|
| 368 |
-
"label": "Tempo detection",
|
| 369 |
-
"duration_sec": 0.2141354419999857,
|
| 370 |
-
"status": "done",
|
| 371 |
-
"detail": "120.2 BPM"
|
| 372 |
-
},
|
| 373 |
-
{
|
| 374 |
-
"key": "onsets",
|
| 375 |
-
"label": "Onset detection + slicing",
|
| 376 |
-
"duration_sec": 2.8706004370000073,
|
| 377 |
-
"status": "done",
|
| 378 |
-
"detail": "64 hits"
|
| 379 |
-
},
|
| 380 |
-
{
|
| 381 |
-
"key": "classification",
|
| 382 |
-
"label": "Spectral rule classification",
|
| 383 |
-
"duration_sec": 0.036172296999950504,
|
| 384 |
-
"status": "done",
|
| 385 |
-
"detail": "bright:11, cymbal:2, hihat_closed:45, hihat_open:4, mid:2"
|
| 386 |
-
},
|
| 387 |
-
{
|
| 388 |
-
"key": "clustering",
|
| 389 |
-
"label": "Mel fingerprint + transient NCC clustering",
|
| 390 |
-
"duration_sec": 0.9130003360000387,
|
| 391 |
-
"status": "done",
|
| 392 |
-
"detail": "1 clusters"
|
| 393 |
-
},
|
| 394 |
-
{
|
| 395 |
-
"key": "selection",
|
| 396 |
-
"label": "Best representative scoring",
|
| 397 |
-
"duration_sec": 0.6508792970000172,
|
| 398 |
-
"status": "done",
|
| 399 |
-
"detail": "quality-scored representatives"
|
| 400 |
-
},
|
| 401 |
-
{
|
| 402 |
-
"key": "synthesis",
|
| 403 |
-
"label": "Optional sample synthesis",
|
| 404 |
-
"duration_sec": 0.0025003810000043813,
|
| 405 |
-
"status": "done",
|
| 406 |
-
"detail": "1 synthesized alternates"
|
| 407 |
-
},
|
| 408 |
-
{
|
| 409 |
-
"key": "export",
|
| 410 |
-
"label": "MIDI, reconstruction, WAV, ZIP export",
|
| 411 |
-
"duration_sec": 0.04621197200003735,
|
| 412 |
-
"status": "done",
|
| 413 |
-
"detail": "1 WAVs + MIDI + ZIP"
|
| 414 |
-
}
|
| 415 |
-
]
|
| 416 |
}
|
| 417 |
],
|
| 418 |
"summary": [
|
| 419 |
{
|
| 420 |
"stage": "stem",
|
| 421 |
-
"mean_sec": 0.
|
| 422 |
-
"median_sec": 0.
|
| 423 |
-
"min_sec": 0.
|
| 424 |
-
"max_sec": 0.
|
| 425 |
},
|
| 426 |
{
|
| 427 |
"stage": "bpm",
|
| 428 |
-
"mean_sec": 0.
|
| 429 |
-
"median_sec": 0.
|
| 430 |
-
"min_sec": 0.
|
| 431 |
-
"max_sec": 0.
|
| 432 |
},
|
| 433 |
{
|
| 434 |
"stage": "onsets",
|
| 435 |
-
"mean_sec":
|
| 436 |
-
"median_sec":
|
| 437 |
-
"min_sec": 1.
|
| 438 |
-
"max_sec": 2.
|
| 439 |
},
|
| 440 |
{
|
| 441 |
"stage": "classification",
|
| 442 |
-
"mean_sec": 0.
|
| 443 |
-
"median_sec": 0.
|
| 444 |
-
"min_sec": 0.
|
| 445 |
-
"max_sec": 0.
|
| 446 |
},
|
| 447 |
{
|
| 448 |
"stage": "clustering",
|
| 449 |
-
"mean_sec": 0.
|
| 450 |
-
"median_sec": 0.
|
| 451 |
-
"min_sec": 0.
|
| 452 |
-
"max_sec": 0.
|
| 453 |
},
|
| 454 |
{
|
| 455 |
"stage": "selection",
|
| 456 |
-
"mean_sec": 0.
|
| 457 |
-
"median_sec": 0.
|
| 458 |
-
"min_sec": 0.
|
| 459 |
-
"max_sec": 0.
|
| 460 |
},
|
| 461 |
{
|
| 462 |
"stage": "synthesis",
|
| 463 |
-
"mean_sec": 0.
|
| 464 |
-
"median_sec": 0.
|
| 465 |
-
"min_sec": 0.
|
| 466 |
-
"max_sec": 0.
|
| 467 |
},
|
| 468 |
{
|
| 469 |
"stage": "export",
|
| 470 |
-
"mean_sec": 0.
|
| 471 |
-
"median_sec": 0.
|
| 472 |
-
"min_sec": 0.
|
| 473 |
-
"max_sec": 0.
|
| 474 |
}
|
| 475 |
]
|
| 476 |
}
|
|
|
|
| 1 |
{
|
| 2 |
+
"clustering_mode": "batch_quality",
|
| 3 |
"runs": [
|
| 4 |
{
|
| 5 |
"pattern": "rock",
|
| 6 |
+
"bars": 2,
|
| 7 |
"bpm": 120.0,
|
| 8 |
"run_index": 0,
|
| 9 |
+
"clustering_mode": "batch_quality",
|
| 10 |
+
"audio_duration_sec": 4.75,
|
| 11 |
+
"total_duration_sec": 2.416794,
|
| 12 |
+
"realtime_factor": 0.508799,
|
| 13 |
+
"hit_count": 14,
|
| 14 |
+
"cluster_count": 7,
|
| 15 |
"stages": [
|
| 16 |
{
|
| 17 |
"key": "stem",
|
| 18 |
"label": "Stem extraction / source load",
|
| 19 |
+
"duration_sec": 0.011517213000161064,
|
| 20 |
"status": "done",
|
| 21 |
+
"detail": "loaded full mix \u00b7 cached"
|
| 22 |
},
|
| 23 |
{
|
| 24 |
"key": "bpm",
|
| 25 |
"label": "Tempo detection",
|
| 26 |
+
"duration_sec": 0.19438482000009571,
|
| 27 |
"status": "done",
|
| 28 |
"detail": "120.2 BPM"
|
| 29 |
},
|
| 30 |
{
|
| 31 |
"key": "onsets",
|
| 32 |
"label": "Onset detection + slicing",
|
| 33 |
+
"duration_sec": 1.8062190609998652,
|
| 34 |
"status": "done",
|
| 35 |
+
"detail": "14 hits"
|
| 36 |
},
|
| 37 |
{
|
| 38 |
"key": "classification",
|
| 39 |
"label": "Spectral rule classification",
|
| 40 |
+
"duration_sec": 0.016392102000054365,
|
| 41 |
"status": "done",
|
| 42 |
+
"detail": "bright:5, hihat_closed:1, hihat_open:7, kick:1"
|
| 43 |
},
|
| 44 |
{
|
| 45 |
"key": "clustering",
|
| 46 |
"label": "Mel fingerprint + transient NCC clustering",
|
| 47 |
+
"duration_sec": 0.07352871200009758,
|
| 48 |
"status": "done",
|
| 49 |
+
"detail": "7 clusters \u00b7 batch quality"
|
| 50 |
},
|
| 51 |
{
|
| 52 |
"key": "selection",
|
| 53 |
"label": "Best representative scoring",
|
| 54 |
+
"duration_sec": 0.096273950000068,
|
| 55 |
"status": "done",
|
| 56 |
"detail": "quality-scored representatives"
|
| 57 |
},
|
| 58 |
{
|
| 59 |
"key": "synthesis",
|
| 60 |
"label": "Optional sample synthesis",
|
| 61 |
+
"duration_sec": 0.0006992359999458131,
|
| 62 |
"status": "done",
|
| 63 |
+
"detail": "2 synthesized alternates"
|
| 64 |
},
|
| 65 |
{
|
| 66 |
"key": "export",
|
| 67 |
"label": "MIDI, reconstruction, WAV, ZIP export",
|
| 68 |
+
"duration_sec": 0.2172303219999776,
|
| 69 |
"status": "done",
|
| 70 |
+
"detail": "7 WAVs + MIDI + ZIP"
|
| 71 |
}
|
| 72 |
]
|
| 73 |
},
|
| 74 |
{
|
| 75 |
"pattern": "funk",
|
| 76 |
+
"bars": 2,
|
| 77 |
"bpm": 120.0,
|
| 78 |
"run_index": 0,
|
| 79 |
+
"clustering_mode": "batch_quality",
|
| 80 |
+
"audio_duration_sec": 4.874989,
|
| 81 |
+
"total_duration_sec": 2.99188,
|
| 82 |
+
"realtime_factor": 0.61372,
|
| 83 |
+
"hit_count": 35,
|
| 84 |
"cluster_count": 2,
|
| 85 |
"stages": [
|
| 86 |
{
|
| 87 |
"key": "stem",
|
| 88 |
"label": "Stem extraction / source load",
|
| 89 |
+
"duration_sec": 0.010077079999973648,
|
| 90 |
"status": "done",
|
| 91 |
+
"detail": "loaded full mix \u00b7 cached"
|
| 92 |
},
|
| 93 |
{
|
| 94 |
"key": "bpm",
|
| 95 |
"label": "Tempo detection",
|
| 96 |
+
"duration_sec": 0.17334403699987888,
|
| 97 |
"status": "done",
|
| 98 |
"detail": "161.5 BPM"
|
| 99 |
},
|
| 100 |
{
|
| 101 |
"key": "onsets",
|
| 102 |
"label": "Onset detection + slicing",
|
| 103 |
+
"duration_sec": 2.1082552409998243,
|
| 104 |
"status": "done",
|
| 105 |
+
"detail": "35 hits"
|
| 106 |
},
|
| 107 |
{
|
| 108 |
"key": "classification",
|
| 109 |
"label": "Spectral rule classification",
|
| 110 |
+
"duration_sec": 0.021269321000090713,
|
| 111 |
"status": "done",
|
| 112 |
+
"detail": "bright:14, cymbal:1, hihat_closed:14, hihat_open:3, kick:1, mid:2"
|
| 113 |
},
|
| 114 |
{
|
| 115 |
"key": "clustering",
|
| 116 |
"label": "Mel fingerprint + transient NCC clustering",
|
| 117 |
+
"duration_sec": 0.26927052900009585,
|
| 118 |
"status": "done",
|
| 119 |
+
"detail": "2 clusters \u00b7 batch quality"
|
| 120 |
},
|
| 121 |
{
|
| 122 |
"key": "selection",
|
| 123 |
"label": "Best representative scoring",
|
| 124 |
+
"duration_sec": 0.31629775500005053,
|
| 125 |
"status": "done",
|
| 126 |
"detail": "quality-scored representatives"
|
| 127 |
},
|
| 128 |
{
|
| 129 |
"key": "synthesis",
|
| 130 |
"label": "Optional sample synthesis",
|
| 131 |
+
"duration_sec": 0.0011716779999915161,
|
| 132 |
"status": "done",
|
| 133 |
"detail": "2 synthesized alternates"
|
| 134 |
},
|
| 135 |
{
|
| 136 |
"key": "export",
|
| 137 |
"label": "MIDI, reconstruction, WAV, ZIP export",
|
| 138 |
+
"duration_sec": 0.09167172899992693,
|
| 139 |
"status": "done",
|
| 140 |
"detail": "2 WAVs + MIDI + ZIP"
|
| 141 |
}
|
|
|
|
| 143 |
},
|
| 144 |
{
|
| 145 |
"pattern": "halftime",
|
| 146 |
+
"bars": 2,
|
| 147 |
"bpm": 120.0,
|
| 148 |
"run_index": 0,
|
| 149 |
+
"clustering_mode": "batch_quality",
|
| 150 |
+
"audio_duration_sec": 4.874989,
|
| 151 |
+
"total_duration_sec": 2.597859,
|
| 152 |
+
"realtime_factor": 0.532895,
|
| 153 |
+
"hit_count": 23,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 154 |
"cluster_count": 3,
|
| 155 |
"stages": [
|
| 156 |
{
|
| 157 |
"key": "stem",
|
| 158 |
"label": "Stem extraction / source load",
|
| 159 |
+
"duration_sec": 0.012474630000042453,
|
| 160 |
"status": "done",
|
| 161 |
+
"detail": "loaded full mix \u00b7 cached"
|
| 162 |
},
|
| 163 |
{
|
| 164 |
"key": "bpm",
|
| 165 |
"label": "Tempo detection",
|
| 166 |
+
"duration_sec": 0.18858063699985905,
|
| 167 |
"status": "done",
|
| 168 |
"detail": "120.2 BPM"
|
| 169 |
},
|
| 170 |
{
|
| 171 |
"key": "onsets",
|
| 172 |
"label": "Onset detection + slicing",
|
| 173 |
+
"duration_sec": 1.9154837959999895,
|
| 174 |
"status": "done",
|
| 175 |
+
"detail": "23 hits"
|
| 176 |
},
|
| 177 |
{
|
| 178 |
"key": "classification",
|
| 179 |
"label": "Spectral rule classification",
|
| 180 |
+
"duration_sec": 0.0188920179998604,
|
| 181 |
"status": "done",
|
| 182 |
+
"detail": "bright:3, hihat_closed:17, hihat_open:3"
|
| 183 |
},
|
| 184 |
{
|
| 185 |
"key": "clustering",
|
| 186 |
"label": "Mel fingerprint + transient NCC clustering",
|
| 187 |
+
"duration_sec": 0.10195718500017392,
|
| 188 |
"status": "done",
|
| 189 |
+
"detail": "3 clusters \u00b7 batch quality"
|
| 190 |
},
|
| 191 |
{
|
| 192 |
"key": "selection",
|
| 193 |
"label": "Best representative scoring",
|
| 194 |
+
"duration_sec": 0.19837312200002089,
|
| 195 |
"status": "done",
|
| 196 |
"detail": "quality-scored representatives"
|
| 197 |
},
|
| 198 |
{
|
| 199 |
"key": "synthesis",
|
| 200 |
"label": "Optional sample synthesis",
|
| 201 |
+
"duration_sec": 0.0011928339999940363,
|
| 202 |
"status": "done",
|
| 203 |
"detail": "3 synthesized alternates"
|
| 204 |
},
|
| 205 |
{
|
| 206 |
"key": "export",
|
| 207 |
"label": "MIDI, reconstruction, WAV, ZIP export",
|
| 208 |
+
"duration_sec": 0.1603816869999264,
|
| 209 |
"status": "done",
|
| 210 |
"detail": "3 WAVs + MIDI + ZIP"
|
| 211 |
}
|
| 212 |
]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 213 |
}
|
| 214 |
],
|
| 215 |
"summary": [
|
| 216 |
{
|
| 217 |
"stage": "stem",
|
| 218 |
+
"mean_sec": 0.011356,
|
| 219 |
+
"median_sec": 0.011517,
|
| 220 |
+
"min_sec": 0.010077,
|
| 221 |
+
"max_sec": 0.012475
|
| 222 |
},
|
| 223 |
{
|
| 224 |
"stage": "bpm",
|
| 225 |
+
"mean_sec": 0.185436,
|
| 226 |
+
"median_sec": 0.188581,
|
| 227 |
+
"min_sec": 0.173344,
|
| 228 |
+
"max_sec": 0.194385
|
| 229 |
},
|
| 230 |
{
|
| 231 |
"stage": "onsets",
|
| 232 |
+
"mean_sec": 1.943319,
|
| 233 |
+
"median_sec": 1.915484,
|
| 234 |
+
"min_sec": 1.806219,
|
| 235 |
+
"max_sec": 2.108255
|
| 236 |
},
|
| 237 |
{
|
| 238 |
"stage": "classification",
|
| 239 |
+
"mean_sec": 0.018851,
|
| 240 |
+
"median_sec": 0.018892,
|
| 241 |
+
"min_sec": 0.016392,
|
| 242 |
+
"max_sec": 0.021269
|
| 243 |
},
|
| 244 |
{
|
| 245 |
"stage": "clustering",
|
| 246 |
+
"mean_sec": 0.148252,
|
| 247 |
+
"median_sec": 0.101957,
|
| 248 |
+
"min_sec": 0.073529,
|
| 249 |
+
"max_sec": 0.269271
|
| 250 |
},
|
| 251 |
{
|
| 252 |
"stage": "selection",
|
| 253 |
+
"mean_sec": 0.203648,
|
| 254 |
+
"median_sec": 0.198373,
|
| 255 |
+
"min_sec": 0.096274,
|
| 256 |
+
"max_sec": 0.316298
|
| 257 |
},
|
| 258 |
{
|
| 259 |
"stage": "synthesis",
|
| 260 |
+
"mean_sec": 0.001021,
|
| 261 |
+
"median_sec": 0.001172,
|
| 262 |
+
"min_sec": 0.000699,
|
| 263 |
+
"max_sec": 0.001193
|
| 264 |
},
|
| 265 |
{
|
| 266 |
"stage": "export",
|
| 267 |
+
"mean_sec": 0.156428,
|
| 268 |
+
"median_sec": 0.160382,
|
| 269 |
+
"min_sec": 0.091672,
|
| 270 |
+
"max_sec": 0.21723
|
| 271 |
}
|
| 272 |
]
|
| 273 |
}
|
pipeline_runner.py
CHANGED
|
@@ -3,6 +3,7 @@
|
|
| 3 |
|
| 4 |
from __future__ import annotations
|
| 5 |
|
|
|
|
| 6 |
import json
|
| 7 |
import os
|
| 8 |
import shutil
|
|
@@ -23,6 +24,7 @@ from sample_extractor import (
|
|
| 23 |
build_archive,
|
| 24 |
classify_hits,
|
| 25 |
cluster_hits,
|
|
|
|
| 26 |
detect_bpm,
|
| 27 |
detect_onsets,
|
| 28 |
export_midi,
|
|
@@ -53,12 +55,14 @@ class PipelineParams:
|
|
| 53 |
attack_ms: float = 25.0
|
| 54 |
mel_threshold: float = 0.75
|
| 55 |
linkage: str = "average"
|
|
|
|
| 56 |
target_min: int = 5
|
| 57 |
target_max: int = 20
|
| 58 |
synthesize: bool = True
|
| 59 |
quantize_midi: bool = True
|
| 60 |
subdivision: int = 16
|
| 61 |
device: str = "cpu"
|
|
|
|
| 62 |
|
| 63 |
@classmethod
|
| 64 |
def from_mapping(cls, data: dict[str, Any] | None) -> "PipelineParams":
|
|
@@ -81,6 +85,8 @@ class PipelineParams:
|
|
| 81 |
raise ValueError(f"Unsupported onset mode: {self.onset_mode}")
|
| 82 |
if self.linkage not in {"average", "complete", "single"}:
|
| 83 |
raise ValueError(f"Unsupported clustering linkage: {self.linkage}")
|
|
|
|
|
|
|
| 84 |
if not 0 <= self.demucs_shifts <= 8:
|
| 85 |
raise ValueError("demucs_shifts must be between 0 and 8")
|
| 86 |
if not 0.0 <= self.demucs_overlap <= 0.9:
|
|
@@ -185,11 +191,66 @@ def _normalise_audio(audio: np.ndarray) -> np.ndarray:
|
|
| 185 |
return audio.astype(np.float32)
|
| 186 |
|
| 187 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 188 |
def _write_audio(path: Path, audio: np.ndarray, sr: int, subtype: str = "PCM_24") -> None:
|
| 189 |
path.parent.mkdir(parents=True, exist_ok=True)
|
| 190 |
sf.write(path, audio.astype(np.float32), sr, subtype=subtype)
|
| 191 |
|
| 192 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 193 |
def _make_overview(audio: np.ndarray, sr: int, hits: list[Any], max_points: int = 1600) -> dict[str, Any]:
|
| 194 |
if len(audio) == 0:
|
| 195 |
return {"sample_rate": sr, "duration_sec": 0, "envelope": [], "onsets": []}
|
|
@@ -250,16 +311,9 @@ def run_extraction_pipeline(
|
|
| 250 |
_notify(progress_cb, {"type": "start", "stages": [asdict(s) for s in stages]})
|
| 251 |
|
| 252 |
with _timed_stage(stages, "stem", progress_cb) as stage:
|
| 253 |
-
stem_audio, stem_sr =
|
| 254 |
-
str(audio_path),
|
| 255 |
-
stem=params.stem,
|
| 256 |
-
device=params.device,
|
| 257 |
-
model_name=params.demucs_model,
|
| 258 |
-
shifts=int(params.demucs_shifts),
|
| 259 |
-
overlap=float(params.demucs_overlap),
|
| 260 |
-
)
|
| 261 |
stem_audio = _normalise_audio(stem_audio)
|
| 262 |
-
stage.detail =
|
| 263 |
_write_audio(out / "stem.wav", stem_audio, stem_sr, subtype="PCM_16")
|
| 264 |
|
| 265 |
audio_duration_sec = len(stem_audio) / stem_sr if stem_sr else 0.0
|
|
@@ -291,21 +345,34 @@ def run_extraction_pipeline(
|
|
| 291 |
stage.detail = ", ".join(f"{key}:{value}" for key, value in sorted(counts.items()))
|
| 292 |
|
| 293 |
with _timed_stage(stages, "clustering", progress_cb) as stage:
|
| 294 |
-
|
| 295 |
-
|
| 296 |
-
|
| 297 |
-
|
| 298 |
-
|
| 299 |
-
|
| 300 |
-
|
| 301 |
-
|
| 302 |
-
|
| 303 |
-
|
| 304 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 305 |
for cluster in clusters:
|
| 306 |
for hit in cluster.hits:
|
| 307 |
hit.cluster_id = cluster.cluster_id
|
| 308 |
-
stage.detail = f"{len(clusters)} clusters"
|
| 309 |
|
| 310 |
with _timed_stage(stages, "selection", progress_cb) as stage:
|
| 311 |
select_best(clusters)
|
|
|
|
| 3 |
|
| 4 |
from __future__ import annotations
|
| 5 |
|
| 6 |
+
import hashlib
|
| 7 |
import json
|
| 8 |
import os
|
| 9 |
import shutil
|
|
|
|
| 24 |
build_archive,
|
| 25 |
classify_hits,
|
| 26 |
cluster_hits,
|
| 27 |
+
cluster_hits_online,
|
| 28 |
detect_bpm,
|
| 29 |
detect_onsets,
|
| 30 |
export_midi,
|
|
|
|
| 55 |
attack_ms: float = 25.0
|
| 56 |
mel_threshold: float = 0.75
|
| 57 |
linkage: str = "average"
|
| 58 |
+
clustering_mode: str = "batch_quality"
|
| 59 |
target_min: int = 5
|
| 60 |
target_max: int = 20
|
| 61 |
synthesize: bool = True
|
| 62 |
quantize_midi: bool = True
|
| 63 |
subdivision: int = 16
|
| 64 |
device: str = "cpu"
|
| 65 |
+
use_disk_cache: bool = True
|
| 66 |
|
| 67 |
@classmethod
|
| 68 |
def from_mapping(cls, data: dict[str, Any] | None) -> "PipelineParams":
|
|
|
|
| 85 |
raise ValueError(f"Unsupported onset mode: {self.onset_mode}")
|
| 86 |
if self.linkage not in {"average", "complete", "single"}:
|
| 87 |
raise ValueError(f"Unsupported clustering linkage: {self.linkage}")
|
| 88 |
+
if self.clustering_mode not in {"batch_quality", "online_preview"}:
|
| 89 |
+
raise ValueError(f"Unsupported clustering mode: {self.clustering_mode}")
|
| 90 |
if not 0 <= self.demucs_shifts <= 8:
|
| 91 |
raise ValueError("demucs_shifts must be between 0 and 8")
|
| 92 |
if not 0.0 <= self.demucs_overlap <= 0.9:
|
|
|
|
| 191 |
return audio.astype(np.float32)
|
| 192 |
|
| 193 |
|
| 194 |
+
MODULE_ROOT = Path(__file__).resolve().parent
|
| 195 |
+
CACHE_DIR = Path(os.environ["DSE_CACHE_DIR"]) if os.environ.get("DSE_CACHE_DIR") else MODULE_ROOT / ".cache"
|
| 196 |
+
STEM_CACHE_DIR = CACHE_DIR / "stems"
|
| 197 |
+
CACHE_VERSION = "dse-cache-v2"
|
| 198 |
+
|
| 199 |
+
|
| 200 |
def _write_audio(path: Path, audio: np.ndarray, sr: int, subtype: str = "PCM_24") -> None:
|
| 201 |
path.parent.mkdir(parents=True, exist_ok=True)
|
| 202 |
sf.write(path, audio.astype(np.float32), sr, subtype=subtype)
|
| 203 |
|
| 204 |
|
| 205 |
+
def _sha256_file(path: str | os.PathLike[str]) -> str:
|
| 206 |
+
h = hashlib.sha256()
|
| 207 |
+
with Path(path).open("rb") as handle:
|
| 208 |
+
for chunk in iter(lambda: handle.read(1024 * 1024), b""):
|
| 209 |
+
h.update(chunk)
|
| 210 |
+
return h.hexdigest()
|
| 211 |
+
|
| 212 |
+
|
| 213 |
+
def _stem_cache_path(audio_path: str | os.PathLike[str], params: PipelineParams) -> Path:
|
| 214 |
+
key_payload = {
|
| 215 |
+
"version": CACHE_VERSION,
|
| 216 |
+
"source_sha256": _sha256_file(audio_path),
|
| 217 |
+
"stem": params.stem,
|
| 218 |
+
"demucs_model": params.demucs_model,
|
| 219 |
+
"demucs_shifts": params.demucs_shifts,
|
| 220 |
+
"demucs_overlap": params.demucs_overlap,
|
| 221 |
+
"device": params.device if params.stem != "all" else "decode",
|
| 222 |
+
}
|
| 223 |
+
key = hashlib.sha256(json.dumps(key_payload, sort_keys=True).encode("utf-8")).hexdigest()
|
| 224 |
+
return STEM_CACHE_DIR / f"{key}.wav"
|
| 225 |
+
|
| 226 |
+
|
| 227 |
+
def clear_disk_cache() -> None:
|
| 228 |
+
if CACHE_DIR.exists():
|
| 229 |
+
shutil.rmtree(CACHE_DIR)
|
| 230 |
+
|
| 231 |
+
|
| 232 |
+
def _load_or_extract_stem(audio_path: str | os.PathLike[str], params: PipelineParams) -> tuple[np.ndarray, int, str]:
|
| 233 |
+
if params.use_disk_cache:
|
| 234 |
+
cache_path = _stem_cache_path(audio_path, params)
|
| 235 |
+
if cache_path.exists():
|
| 236 |
+
audio, sr = sf.read(cache_path, dtype="float32", always_2d=False)
|
| 237 |
+
return np.asarray(audio, dtype=np.float32), int(sr), f"{params.stem} disk-cache hit"
|
| 238 |
+
audio, sr = extract_stem(
|
| 239 |
+
str(audio_path),
|
| 240 |
+
stem=params.stem,
|
| 241 |
+
device=params.device,
|
| 242 |
+
model_name=params.demucs_model,
|
| 243 |
+
shifts=int(params.demucs_shifts),
|
| 244 |
+
overlap=float(params.demucs_overlap),
|
| 245 |
+
)
|
| 246 |
+
detail = f"{params.stem} via {params.demucs_model}" if params.stem != "all" else "loaded full mix"
|
| 247 |
+
if params.use_disk_cache:
|
| 248 |
+
cache_path = _stem_cache_path(audio_path, params)
|
| 249 |
+
_write_audio(cache_path, audio, sr, subtype="PCM_16")
|
| 250 |
+
detail += " · cached"
|
| 251 |
+
return audio, sr, detail
|
| 252 |
+
|
| 253 |
+
|
| 254 |
def _make_overview(audio: np.ndarray, sr: int, hits: list[Any], max_points: int = 1600) -> dict[str, Any]:
|
| 255 |
if len(audio) == 0:
|
| 256 |
return {"sample_rate": sr, "duration_sec": 0, "envelope": [], "onsets": []}
|
|
|
|
| 311 |
_notify(progress_cb, {"type": "start", "stages": [asdict(s) for s in stages]})
|
| 312 |
|
| 313 |
with _timed_stage(stages, "stem", progress_cb) as stage:
|
| 314 |
+
stem_audio, stem_sr, stem_detail = _load_or_extract_stem(audio_path, params)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 315 |
stem_audio = _normalise_audio(stem_audio)
|
| 316 |
+
stage.detail = stem_detail
|
| 317 |
_write_audio(out / "stem.wav", stem_audio, stem_sr, subtype="PCM_16")
|
| 318 |
|
| 319 |
audio_duration_sec = len(stem_audio) / stem_sr if stem_sr else 0.0
|
|
|
|
| 345 |
stage.detail = ", ".join(f"{key}:{value}" for key, value in sorted(counts.items()))
|
| 346 |
|
| 347 |
with _timed_stage(stages, "clustering", progress_cb) as stage:
|
| 348 |
+
if params.clustering_mode == "online_preview":
|
| 349 |
+
clusters = cluster_hits_online(
|
| 350 |
+
hits,
|
| 351 |
+
audio=stem_audio,
|
| 352 |
+
sr=stem_sr,
|
| 353 |
+
ncc_threshold=float(params.ncc_threshold),
|
| 354 |
+
attack_ms=float(params.attack_ms),
|
| 355 |
+
mel_threshold=float(params.mel_threshold),
|
| 356 |
+
target_min=int(params.target_min),
|
| 357 |
+
target_max=int(params.target_max),
|
| 358 |
+
)
|
| 359 |
+
stage.detail = f"{len(clusters)} clusters · online preview"
|
| 360 |
+
else:
|
| 361 |
+
clusters = cluster_hits(
|
| 362 |
+
hits,
|
| 363 |
+
audio=stem_audio,
|
| 364 |
+
sr=stem_sr,
|
| 365 |
+
ncc_threshold=float(params.ncc_threshold),
|
| 366 |
+
attack_ms=float(params.attack_ms),
|
| 367 |
+
mel_threshold=float(params.mel_threshold),
|
| 368 |
+
target_min=int(params.target_min),
|
| 369 |
+
target_max=int(params.target_max),
|
| 370 |
+
linkage=params.linkage,
|
| 371 |
+
)
|
| 372 |
+
stage.detail = f"{len(clusters)} clusters · batch quality"
|
| 373 |
for cluster in clusters:
|
| 374 |
for hit in cluster.hits:
|
| 375 |
hit.cluster_id = cluster.cluster_id
|
|
|
|
| 376 |
|
| 377 |
with _timed_stage(stages, "selection", progress_cb) as stage:
|
| 378 |
select_best(clusters)
|
sample_extractor.py
CHANGED
|
@@ -267,6 +267,120 @@ def _merge_singletons(clusters, sim_matrix, hits, merge_ratio=2.0):
|
|
| 267 |
for i,c in enumerate(multi): c.cluster_id = i
|
| 268 |
return multi
|
| 269 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 270 |
def cluster_hits(hits, audio=None, sr=44100, ncc_threshold=0.80, attack_ms=25.0,
|
| 271 |
mel_threshold=0.75, target_min=0, target_max=0,
|
| 272 |
linkage='average', merge_singletons=True):
|
|
|
|
| 267 |
for i,c in enumerate(multi): c.cluster_id = i
|
| 268 |
return multi
|
| 269 |
|
| 270 |
+
|
| 271 |
+
def _cosine(a, b):
|
| 272 |
+
"""Fast cosine similarity for normalized or unnormalized one-dimensional vectors."""
|
| 273 |
+
n = min(len(a), len(b))
|
| 274 |
+
if n <= 0:
|
| 275 |
+
return 0.0
|
| 276 |
+
av = a[:n]
|
| 277 |
+
bv = b[:n]
|
| 278 |
+
denom = float(np.linalg.norm(av) * np.linalg.norm(bv))
|
| 279 |
+
if denom < 1e-8:
|
| 280 |
+
return 0.0
|
| 281 |
+
return float(np.dot(av, bv) / denom)
|
| 282 |
+
|
| 283 |
+
|
| 284 |
+
def _retitle_clusters(clusters):
|
| 285 |
+
"""Sort, re-index, and make labels stable after incremental assignment."""
|
| 286 |
+
clusters.sort(key=lambda c: c.count, reverse=True)
|
| 287 |
+
seen = defaultdict(int)
|
| 288 |
+
for i, c in enumerate(clusters):
|
| 289 |
+
c.cluster_id = i
|
| 290 |
+
majority = defaultdict(int)
|
| 291 |
+
for hit in c.hits:
|
| 292 |
+
majority[hit.label] += 1
|
| 293 |
+
base = max(majority, key=majority.get) if majority else c.label.rsplit('_', 1)[0]
|
| 294 |
+
suffix = seen[base]
|
| 295 |
+
seen[base] += 1
|
| 296 |
+
c.label = f"{base}_{suffix}"
|
| 297 |
+
return clusters
|
| 298 |
+
|
| 299 |
+
|
| 300 |
+
def cluster_hits_online(hits, audio=None, sr=44100, ncc_threshold=0.72, attack_ms=25.0,
|
| 301 |
+
mel_threshold=0.62, target_min=0, target_max=0,
|
| 302 |
+
max_clusters=0):
|
| 303 |
+
"""Prototype-based online clustering for near-realtime previews.
|
| 304 |
+
|
| 305 |
+
The batch algorithm builds an all-pairs matrix and then runs agglomerative
|
| 306 |
+
clustering. This mode instead processes hits in onset order and compares
|
| 307 |
+
each new hit only against current cluster prototypes. Complexity is roughly
|
| 308 |
+
O(number_of_hits × number_of_clusters), so it can update progressively while
|
| 309 |
+
audio is being analyzed. It is intentionally a preview/final-fast algorithm,
|
| 310 |
+
not a replacement for the highest-quality batch pass.
|
| 311 |
+
"""
|
| 312 |
+
if not hits:
|
| 313 |
+
return []
|
| 314 |
+
if len(hits) == 1:
|
| 315 |
+
return [Cluster(cluster_id=0, label=f"{hits[0].label}_0", hits=[hits[0]])]
|
| 316 |
+
if audio is None:
|
| 317 |
+
audio = np.concatenate([h.audio for h in hits])
|
| 318 |
+
|
| 319 |
+
cap = int(max_clusters or target_max or 0)
|
| 320 |
+
if cap <= 0:
|
| 321 |
+
cap = max(1, min(len(hits), int(target_min or 16)))
|
| 322 |
+
cap = max(1, min(cap, len(hits)))
|
| 323 |
+
|
| 324 |
+
print(f"[Cluster:online] {len(hits)} hits, cap={cap}, attack={attack_ms}ms")
|
| 325 |
+
ordered = sorted(hits, key=lambda h: h.onset_time)
|
| 326 |
+
clusters = []
|
| 327 |
+
proto_fp = []
|
| 328 |
+
proto_tr = []
|
| 329 |
+
proto_energy = []
|
| 330 |
+
|
| 331 |
+
for hit in ordered:
|
| 332 |
+
fp = _mel_fingerprint(audio, sr, hit.onset_time)
|
| 333 |
+
tr = _extract_transient(audio, sr, hit.onset_time, attack_ms)
|
| 334 |
+
best_idx = -1
|
| 335 |
+
best_score = -1.0
|
| 336 |
+
best_mel = 0.0
|
| 337 |
+
best_ncc = 0.0
|
| 338 |
+
for i, cluster in enumerate(clusters):
|
| 339 |
+
# Prefer same broad class when possible, but do not make it mandatory.
|
| 340 |
+
label_bonus = 0.05 if cluster.label.startswith(hit.label + "_") else 0.0
|
| 341 |
+
mel = _cosine(fp, proto_fp[i])
|
| 342 |
+
if mel < mel_threshold:
|
| 343 |
+
continue
|
| 344 |
+
ncc = _transient_ncc(tr, proto_tr[i])
|
| 345 |
+
score = (0.45 * mel) + (0.55 * ncc) + label_bonus
|
| 346 |
+
if score > best_score:
|
| 347 |
+
best_idx, best_score, best_mel, best_ncc = i, score, mel, ncc
|
| 348 |
+
|
| 349 |
+
should_create = best_idx < 0 or (best_ncc < ncc_threshold and best_score < ncc_threshold)
|
| 350 |
+
if should_create and len(clusters) < cap:
|
| 351 |
+
cluster = Cluster(cluster_id=len(clusters), label=f"{hit.label}_{len(clusters)}", hits=[hit])
|
| 352 |
+
clusters.append(cluster)
|
| 353 |
+
proto_fp.append(fp)
|
| 354 |
+
proto_tr.append(tr)
|
| 355 |
+
proto_energy.append(max(hit.rms_energy, 1e-8))
|
| 356 |
+
continue
|
| 357 |
+
|
| 358 |
+
if best_idx < 0:
|
| 359 |
+
# Cap reached and no good match: assign to the nearest prototype by mel.
|
| 360 |
+
similarities = [_cosine(fp, existing) for existing in proto_fp]
|
| 361 |
+
best_idx = int(np.argmax(similarities))
|
| 362 |
+
cluster = clusters[best_idx]
|
| 363 |
+
cluster.hits.append(hit)
|
| 364 |
+
|
| 365 |
+
# Energy-weighted rolling prototype update; keeps loud clean hits dominant.
|
| 366 |
+
w_old = proto_energy[best_idx]
|
| 367 |
+
w_new = max(hit.rms_energy, 1e-8)
|
| 368 |
+
total = w_old + w_new
|
| 369 |
+
max_len = max(len(proto_fp[best_idx]), len(fp))
|
| 370 |
+
old_fp = np.pad(proto_fp[best_idx], (0, max_len - len(proto_fp[best_idx])))
|
| 371 |
+
new_fp = np.pad(fp, (0, max_len - len(fp)))
|
| 372 |
+
proto_fp[best_idx] = ((old_fp * w_old) + (new_fp * w_new)) / total
|
| 373 |
+
max_tr = max(len(proto_tr[best_idx]), len(tr))
|
| 374 |
+
old_tr = np.pad(proto_tr[best_idx], (0, max_tr - len(proto_tr[best_idx])))
|
| 375 |
+
new_tr = np.pad(tr, (0, max_tr - len(tr)))
|
| 376 |
+
proto_tr[best_idx] = ((old_tr * w_old) + (new_tr * w_new)) / total
|
| 377 |
+
proto_energy[best_idx] = total
|
| 378 |
+
|
| 379 |
+
clusters = _retitle_clusters(clusters)
|
| 380 |
+
for c in clusters:
|
| 381 |
+
print(f" {c.label}: {c.count}")
|
| 382 |
+
return clusters
|
| 383 |
+
|
| 384 |
def cluster_hits(hits, audio=None, sr=44100, ncc_threshold=0.80, attack_ms=25.0,
|
| 385 |
mel_threshold=0.75, target_min=0, target_max=0,
|
| 386 |
linkage='average', merge_singletons=True):
|
scripts/benchmark_subprocesses.py
CHANGED
|
@@ -24,19 +24,20 @@ from sample_extractor import cache_clear
|
|
| 24 |
from synth_generator import generate_test_song
|
| 25 |
|
| 26 |
|
| 27 |
-
def run_case(pattern: str, bars: int, bpm: float, run_index: int) -> dict:
|
| 28 |
tmp = Path(tempfile.mkdtemp(prefix="dse-bench-"))
|
| 29 |
song = generate_test_song(pattern_name=pattern, bars=bars, bpm=bpm, add_bass=False, seed=42 + run_index)
|
| 30 |
src = tmp / f"{pattern}-{bars}bars.wav"
|
| 31 |
sf.write(src, song.drums_only, song.sr)
|
| 32 |
cache_clear()
|
| 33 |
-
params = PipelineParams(stem="all", target_min=4, target_max=12, synthesize=True)
|
| 34 |
result = run_extraction_pipeline(src, tmp / "out", params)
|
| 35 |
return {
|
| 36 |
"pattern": pattern,
|
| 37 |
"bars": bars,
|
| 38 |
"bpm": bpm,
|
| 39 |
"run_index": run_index,
|
|
|
|
| 40 |
"audio_duration_sec": result.audio_duration_sec,
|
| 41 |
"total_duration_sec": result.duration_sec,
|
| 42 |
"realtime_factor": result.realtime_factor,
|
|
@@ -52,15 +53,16 @@ def main() -> int:
|
|
| 52 |
parser.add_argument("--bars", type=int, default=4)
|
| 53 |
parser.add_argument("--bpm", type=float, default=120.0)
|
| 54 |
parser.add_argument("--output", default="docs/benchmark-subprocesses.json")
|
|
|
|
| 55 |
args = parser.parse_args()
|
| 56 |
|
| 57 |
# Warm imports/JIT and discard the result.
|
| 58 |
-
run_case("rock", 1, args.bpm, -1)
|
| 59 |
|
| 60 |
rows = []
|
| 61 |
for run_index in range(args.runs):
|
| 62 |
for pattern in ["rock", "funk", "halftime"]:
|
| 63 |
-
rows.append(run_case(pattern, args.bars, args.bpm, run_index))
|
| 64 |
|
| 65 |
stage_keys = [stage["key"] for stage in rows[0]["stages"]]
|
| 66 |
summary = []
|
|
@@ -74,7 +76,7 @@ def main() -> int:
|
|
| 74 |
"max_sec": round(max(values), 6),
|
| 75 |
})
|
| 76 |
|
| 77 |
-
payload = {"runs": rows, "summary": summary}
|
| 78 |
out = Path(args.output)
|
| 79 |
out.parent.mkdir(parents=True, exist_ok=True)
|
| 80 |
out.write_text(json.dumps(payload, indent=2), encoding="utf-8")
|
|
|
|
| 24 |
from synth_generator import generate_test_song
|
| 25 |
|
| 26 |
|
| 27 |
+
def run_case(pattern: str, bars: int, bpm: float, run_index: int, clustering_mode: str) -> dict:
|
| 28 |
tmp = Path(tempfile.mkdtemp(prefix="dse-bench-"))
|
| 29 |
song = generate_test_song(pattern_name=pattern, bars=bars, bpm=bpm, add_bass=False, seed=42 + run_index)
|
| 30 |
src = tmp / f"{pattern}-{bars}bars.wav"
|
| 31 |
sf.write(src, song.drums_only, song.sr)
|
| 32 |
cache_clear()
|
| 33 |
+
params = PipelineParams(stem="all", clustering_mode=clustering_mode, target_min=4, target_max=12, synthesize=True)
|
| 34 |
result = run_extraction_pipeline(src, tmp / "out", params)
|
| 35 |
return {
|
| 36 |
"pattern": pattern,
|
| 37 |
"bars": bars,
|
| 38 |
"bpm": bpm,
|
| 39 |
"run_index": run_index,
|
| 40 |
+
"clustering_mode": clustering_mode,
|
| 41 |
"audio_duration_sec": result.audio_duration_sec,
|
| 42 |
"total_duration_sec": result.duration_sec,
|
| 43 |
"realtime_factor": result.realtime_factor,
|
|
|
|
| 53 |
parser.add_argument("--bars", type=int, default=4)
|
| 54 |
parser.add_argument("--bpm", type=float, default=120.0)
|
| 55 |
parser.add_argument("--output", default="docs/benchmark-subprocesses.json")
|
| 56 |
+
parser.add_argument("--clustering-mode", choices=["batch_quality", "online_preview"], default="batch_quality")
|
| 57 |
args = parser.parse_args()
|
| 58 |
|
| 59 |
# Warm imports/JIT and discard the result.
|
| 60 |
+
run_case("rock", 1, args.bpm, -1, args.clustering_mode)
|
| 61 |
|
| 62 |
rows = []
|
| 63 |
for run_index in range(args.runs):
|
| 64 |
for pattern in ["rock", "funk", "halftime"]:
|
| 65 |
+
rows.append(run_case(pattern, args.bars, args.bpm, run_index, args.clustering_mode))
|
| 66 |
|
| 67 |
stage_keys = [stage["key"] for stage in rows[0]["stages"]]
|
| 68 |
summary = []
|
|
|
|
| 76 |
"max_sec": round(max(values), 6),
|
| 77 |
})
|
| 78 |
|
| 79 |
+
payload = {"clustering_mode": args.clustering_mode, "runs": rows, "summary": summary}
|
| 80 |
out = Path(args.output)
|
| 81 |
out.parent.mkdir(parents=True, exist_ok=True)
|
| 82 |
out.write_text(json.dumps(payload, indent=2), encoding="utf-8")
|
web/app.js
CHANGED
|
@@ -1,16 +1,20 @@
|
|
| 1 |
const $ = (id) => document.getElementById(id);
|
| 2 |
|
| 3 |
const fields = [
|
| 4 |
-
"stem", "demucs_model", "demucs_shifts", "demucs_overlap", "onset_mode", "onset_delta",
|
| 5 |
"energy_threshold_db", "pre_pad", "min_dur", "max_dur", "min_gap", "ncc_threshold",
|
| 6 |
"attack_ms", "mel_threshold", "linkage", "target_min", "target_max", "subdivision",
|
| 7 |
-
"synthesize", "quantize_midi"
|
| 8 |
];
|
| 9 |
|
| 10 |
let config = null;
|
| 11 |
let selectedFile = null;
|
| 12 |
let activePoll = null;
|
| 13 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 14 |
function fmtSec(value) {
|
| 15 |
if (value === null || value === undefined || Number.isNaN(Number(value))) return "—";
|
| 16 |
const n = Number(value);
|
|
@@ -19,6 +23,11 @@ function fmtSec(value) {
|
|
| 19 |
return `${n.toFixed(2)} s`;
|
| 20 |
}
|
| 21 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 22 |
function setHealth(ok, text, subtext) {
|
| 23 |
$("healthDot").className = `status-dot ${ok ? "ok" : "bad"}`;
|
| 24 |
$("healthText").textContent = text;
|
|
@@ -47,6 +56,7 @@ function setSelectOptions(select, values, labels = null) {
|
|
| 47 |
|
| 48 |
function populateConfig() {
|
| 49 |
setSelectOptions($("demucs_model"), config.demucs_models);
|
|
|
|
| 50 |
const defaults = config.defaults;
|
| 51 |
for (const field of fields) {
|
| 52 |
const el = $(field);
|
|
@@ -80,9 +90,9 @@ function collectParams() {
|
|
| 80 |
|
| 81 |
function renderStages(stages = []) {
|
| 82 |
$("stageList").innerHTML = stages.map((stage) => `
|
| 83 |
-
<div class="stage ${stage.status}" title="${stage.detail || ""}">
|
| 84 |
<span class="badge"></span>
|
| 85 |
-
<div><strong>${stage.label}</strong><small>${stage.detail || stage.status}</small></div>
|
| 86 |
<time>${fmtSec(stage.duration_sec)}</time>
|
| 87 |
</div>
|
| 88 |
`).join("");
|
|
@@ -138,26 +148,27 @@ function drawWaveform(overview) {
|
|
| 138 |
function renderResult(job) {
|
| 139 |
const result = job.result;
|
| 140 |
if (!result) return;
|
| 141 |
-
const rtf = result.realtime_factor.toFixed(2);
|
| 142 |
-
|
|
|
|
| 143 |
drawWaveform(result.overview);
|
| 144 |
|
| 145 |
const fileUrls = result.file_urls ?? {};
|
| 146 |
const labels = { archive: "Sample pack ZIP", midi: "MIDI", stem: "Stem WAV", reconstruction: "Reconstruction WAV" };
|
| 147 |
-
$("downloads").innerHTML = Object.entries(fileUrls).map(([key, url]) => `<a href="${url}" download>${labels[key] ?? key}</a>`).join("");
|
| 148 |
$("stemAudio").src = fileUrls.stem ?? "";
|
| 149 |
$("reconAudio").src = fileUrls.reconstruction ?? "";
|
| 150 |
|
| 151 |
const tbody = $("samplesTable").querySelector("tbody");
|
| 152 |
tbody.innerHTML = (result.samples ?? []).map((sample) => `
|
| 153 |
<tr>
|
| 154 |
-
<td>${sample.label}</td>
|
| 155 |
-
<td>${sample.classification}</td>
|
| 156 |
-
<td>${sample.hits}</td>
|
| 157 |
-
<td>${sample.score}</td>
|
| 158 |
-
<td>${sample.duration_ms} ms</td>
|
| 159 |
-
<td>${sample.first_onset_sec} s</td>
|
| 160 |
-
<td><a href="${sample.url}" download>WAV</a></td>
|
| 161 |
</tr>
|
| 162 |
`).join("");
|
| 163 |
}
|
|
@@ -173,6 +184,38 @@ function renderJob(job) {
|
|
| 173 |
}
|
| 174 |
}
|
| 175 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 176 |
async function pollJob(id) {
|
| 177 |
if (activePoll) clearInterval(activePoll);
|
| 178 |
const tick = async () => {
|
|
@@ -183,6 +226,7 @@ async function pollJob(id) {
|
|
| 183 |
clearInterval(activePoll);
|
| 184 |
activePoll = null;
|
| 185 |
$("runButton").disabled = !selectedFile;
|
|
|
|
| 186 |
}
|
| 187 |
} catch (error) {
|
| 188 |
clearInterval(activePoll);
|
|
@@ -207,6 +251,7 @@ async function runExtraction() {
|
|
| 207 |
const job = await api("/api/jobs", { method: "POST", body: form });
|
| 208 |
renderJob(job);
|
| 209 |
await pollJob(job.id);
|
|
|
|
| 210 |
} catch (error) {
|
| 211 |
$("runButton").disabled = false;
|
| 212 |
$("resultSummary").textContent = error.message;
|
|
@@ -229,6 +274,7 @@ async function boot() {
|
|
| 229 |
await api("/api/health");
|
| 230 |
config = await api("/api/config");
|
| 231 |
populateConfig();
|
|
|
|
| 232 |
setHealth(true, "Ready", "Backend online");
|
| 233 |
} catch (error) {
|
| 234 |
setHealth(false, "Offline", error.message);
|
|
@@ -244,10 +290,20 @@ $("useFastButton").addEventListener("click", () => {
|
|
| 244 |
$("target_min").value = 4;
|
| 245 |
$("target_max").value = 16;
|
| 246 |
});
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 247 |
$("clearCacheButton").addEventListener("click", async () => {
|
| 248 |
try {
|
| 249 |
await api("/api/cache/clear", { method: "POST" });
|
| 250 |
-
$("logs").textContent = "Pipeline cache cleared.";
|
| 251 |
} catch (error) {
|
| 252 |
$("logs").textContent = error.message;
|
| 253 |
}
|
|
|
|
| 1 |
const $ = (id) => document.getElementById(id);
|
| 2 |
|
| 3 |
const fields = [
|
| 4 |
+
"stem", "demucs_model", "clustering_mode", "demucs_shifts", "demucs_overlap", "onset_mode", "onset_delta",
|
| 5 |
"energy_threshold_db", "pre_pad", "min_dur", "max_dur", "min_gap", "ncc_threshold",
|
| 6 |
"attack_ms", "mel_threshold", "linkage", "target_min", "target_max", "subdivision",
|
| 7 |
+
"synthesize", "quantize_midi", "use_disk_cache"
|
| 8 |
];
|
| 9 |
|
| 10 |
let config = null;
|
| 11 |
let selectedFile = null;
|
| 12 |
let activePoll = null;
|
| 13 |
|
| 14 |
+
function esc(value) {
|
| 15 |
+
return String(value ?? "").replace(/[&<>'"]/g, (c) => ({ "&": "&", "<": "<", ">": ">", "'": "'", '"': """ }[c]));
|
| 16 |
+
}
|
| 17 |
+
|
| 18 |
function fmtSec(value) {
|
| 19 |
if (value === null || value === undefined || Number.isNaN(Number(value))) return "—";
|
| 20 |
const n = Number(value);
|
|
|
|
| 23 |
return `${n.toFixed(2)} s`;
|
| 24 |
}
|
| 25 |
|
| 26 |
+
function fmtDate(epochSeconds) {
|
| 27 |
+
if (!epochSeconds) return "—";
|
| 28 |
+
return new Date(epochSeconds * 1000).toLocaleString(undefined, { dateStyle: "medium", timeStyle: "short" });
|
| 29 |
+
}
|
| 30 |
+
|
| 31 |
function setHealth(ok, text, subtext) {
|
| 32 |
$("healthDot").className = `status-dot ${ok ? "ok" : "bad"}`;
|
| 33 |
$("healthText").textContent = text;
|
|
|
|
| 56 |
|
| 57 |
function populateConfig() {
|
| 58 |
setSelectOptions($("demucs_model"), config.demucs_models);
|
| 59 |
+
setSelectOptions($("clustering_mode"), Object.keys(config.clustering_modes ?? { batch_quality: "", online_preview: "" }), config.clustering_modes);
|
| 60 |
const defaults = config.defaults;
|
| 61 |
for (const field of fields) {
|
| 62 |
const el = $(field);
|
|
|
|
| 90 |
|
| 91 |
function renderStages(stages = []) {
|
| 92 |
$("stageList").innerHTML = stages.map((stage) => `
|
| 93 |
+
<div class="stage ${esc(stage.status)}" title="${esc(stage.detail || "")}">
|
| 94 |
<span class="badge"></span>
|
| 95 |
+
<div><strong>${esc(stage.label)}</strong><small>${esc(stage.detail || stage.status)}</small></div>
|
| 96 |
<time>${fmtSec(stage.duration_sec)}</time>
|
| 97 |
</div>
|
| 98 |
`).join("");
|
|
|
|
| 148 |
function renderResult(job) {
|
| 149 |
const result = job.result;
|
| 150 |
if (!result) return;
|
| 151 |
+
const rtf = Number(result.realtime_factor).toFixed(2);
|
| 152 |
+
const mode = result.params?.clustering_mode ?? "—";
|
| 153 |
+
$("resultSummary").textContent = `${result.hit_count} hits → ${result.cluster_count} samples · BPM ${result.bpm ?? "—"} · ${fmtSec(result.duration_sec)} total · ${rtf}× realtime · ${mode}`;
|
| 154 |
drawWaveform(result.overview);
|
| 155 |
|
| 156 |
const fileUrls = result.file_urls ?? {};
|
| 157 |
const labels = { archive: "Sample pack ZIP", midi: "MIDI", stem: "Stem WAV", reconstruction: "Reconstruction WAV" };
|
| 158 |
+
$("downloads").innerHTML = Object.entries(fileUrls).map(([key, url]) => `<a href="${esc(url)}" download>${esc(labels[key] ?? key)}</a>`).join("");
|
| 159 |
$("stemAudio").src = fileUrls.stem ?? "";
|
| 160 |
$("reconAudio").src = fileUrls.reconstruction ?? "";
|
| 161 |
|
| 162 |
const tbody = $("samplesTable").querySelector("tbody");
|
| 163 |
tbody.innerHTML = (result.samples ?? []).map((sample) => `
|
| 164 |
<tr>
|
| 165 |
+
<td>${esc(sample.label)}</td>
|
| 166 |
+
<td>${esc(sample.classification)}</td>
|
| 167 |
+
<td>${esc(sample.hits)}</td>
|
| 168 |
+
<td>${esc(sample.score)}</td>
|
| 169 |
+
<td>${esc(sample.duration_ms)} ms</td>
|
| 170 |
+
<td>${esc(sample.first_onset_sec)} s</td>
|
| 171 |
+
<td><a href="${esc(sample.url)}" download>WAV</a></td>
|
| 172 |
</tr>
|
| 173 |
`).join("");
|
| 174 |
}
|
|
|
|
| 184 |
}
|
| 185 |
}
|
| 186 |
|
| 187 |
+
function renderHistory(payload) {
|
| 188 |
+
const rows = [...(payload.active ?? []), ...(payload.history ?? [])];
|
| 189 |
+
if (!rows.length) {
|
| 190 |
+
$("historyList").innerHTML = `<p class="empty">No completed runs yet.</p>`;
|
| 191 |
+
return;
|
| 192 |
+
}
|
| 193 |
+
$("historyList").innerHTML = rows.map((row) => `
|
| 194 |
+
<button class="history-row" type="button" data-job-id="${esc(row.id)}">
|
| 195 |
+
<span><strong>${esc(row.filename || row.id)}</strong><small>${esc(row.stem || "—")} · ${esc(row.clustering_mode || "—")} · ${fmtDate(row.created_at)}</small></span>
|
| 196 |
+
<span>${esc(row.hit_count ?? "…")} hits</span>
|
| 197 |
+
<span>${esc(row.cluster_count ?? "…")} samples</span>
|
| 198 |
+
<span>${row.realtime_factor == null ? "—" : `${Number(row.realtime_factor).toFixed(2)}×`}</span>
|
| 199 |
+
</button>
|
| 200 |
+
`).join("");
|
| 201 |
+
for (const button of $("historyList").querySelectorAll(".history-row")) {
|
| 202 |
+
button.addEventListener("click", async () => {
|
| 203 |
+
const job = await api(`/api/jobs/${button.dataset.jobId}`);
|
| 204 |
+
renderJob(job);
|
| 205 |
+
window.scrollTo({ top: document.body.scrollHeight, behavior: "smooth" });
|
| 206 |
+
});
|
| 207 |
+
}
|
| 208 |
+
}
|
| 209 |
+
|
| 210 |
+
async function refreshHistory() {
|
| 211 |
+
try {
|
| 212 |
+
const payload = await api("/api/jobs?limit=50");
|
| 213 |
+
renderHistory(payload);
|
| 214 |
+
} catch (error) {
|
| 215 |
+
$("historyList").innerHTML = `<p class="empty">${esc(error.message)}</p>`;
|
| 216 |
+
}
|
| 217 |
+
}
|
| 218 |
+
|
| 219 |
async function pollJob(id) {
|
| 220 |
if (activePoll) clearInterval(activePoll);
|
| 221 |
const tick = async () => {
|
|
|
|
| 226 |
clearInterval(activePoll);
|
| 227 |
activePoll = null;
|
| 228 |
$("runButton").disabled = !selectedFile;
|
| 229 |
+
await refreshHistory();
|
| 230 |
}
|
| 231 |
} catch (error) {
|
| 232 |
clearInterval(activePoll);
|
|
|
|
| 251 |
const job = await api("/api/jobs", { method: "POST", body: form });
|
| 252 |
renderJob(job);
|
| 253 |
await pollJob(job.id);
|
| 254 |
+
await refreshHistory();
|
| 255 |
} catch (error) {
|
| 256 |
$("runButton").disabled = false;
|
| 257 |
$("resultSummary").textContent = error.message;
|
|
|
|
| 274 |
await api("/api/health");
|
| 275 |
config = await api("/api/config");
|
| 276 |
populateConfig();
|
| 277 |
+
await refreshHistory();
|
| 278 |
setHealth(true, "Ready", "Backend online");
|
| 279 |
} catch (error) {
|
| 280 |
setHealth(false, "Offline", error.message);
|
|
|
|
| 290 |
$("target_min").value = 4;
|
| 291 |
$("target_max").value = 16;
|
| 292 |
});
|
| 293 |
+
$("usePreviewButton").addEventListener("click", () => {
|
| 294 |
+
$("stem").value = "all";
|
| 295 |
+
$("clustering_mode").value = "online_preview";
|
| 296 |
+
$("demucs_shifts").value = 0;
|
| 297 |
+
$("target_min").value = 4;
|
| 298 |
+
$("target_max").value = 16;
|
| 299 |
+
$("mel_threshold").value = 0.62;
|
| 300 |
+
$("ncc_threshold").value = 0.72;
|
| 301 |
+
});
|
| 302 |
+
$("refreshHistoryButton").addEventListener("click", refreshHistory);
|
| 303 |
$("clearCacheButton").addEventListener("click", async () => {
|
| 304 |
try {
|
| 305 |
await api("/api/cache/clear", { method: "POST" });
|
| 306 |
+
$("logs").textContent = "Pipeline memory and disk cache cleared.";
|
| 307 |
} catch (error) {
|
| 308 |
$("logs").textContent = error.message;
|
| 309 |
}
|
web/index.html
CHANGED
|
@@ -44,7 +44,7 @@
|
|
| 44 |
<div class="panel-heading">
|
| 45 |
<div>
|
| 46 |
<h2>2. Extraction controls</h2>
|
| 47 |
-
<p>
|
| 48 |
</div>
|
| 49 |
<button id="clearCacheButton" class="ghost-button" type="button">Clear cache</button>
|
| 50 |
</div>
|
|
@@ -56,6 +56,12 @@
|
|
| 56 |
<label>Demucs model
|
| 57 |
<select id="demucs_model"></select>
|
| 58 |
</label>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 59 |
<label>Shifts
|
| 60 |
<input id="demucs_shifts" type="number" min="0" max="8" step="1" />
|
| 61 |
</label>
|
|
@@ -123,11 +129,13 @@
|
|
| 123 |
<div class="toggles">
|
| 124 |
<label><input id="synthesize" type="checkbox" /> synthesize alternates</label>
|
| 125 |
<label><input id="quantize_midi" type="checkbox" /> quantize MIDI</label>
|
|
|
|
| 126 |
</div>
|
| 127 |
|
| 128 |
<div class="actions">
|
| 129 |
<button id="runButton" class="primary-button" type="button" disabled>Extract samples</button>
|
| 130 |
<button id="useFastButton" class="secondary-button" type="button">Use fast full-mix mode</button>
|
|
|
|
| 131 |
</div>
|
| 132 |
</section>
|
| 133 |
|
|
@@ -143,6 +151,17 @@
|
|
| 143 |
<pre id="logs" class="logs" aria-live="polite"></pre>
|
| 144 |
</section>
|
| 145 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 146 |
<section class="panel result-panel">
|
| 147 |
<div class="panel-heading">
|
| 148 |
<div>
|
|
|
|
| 44 |
<div class="panel-heading">
|
| 45 |
<div>
|
| 46 |
<h2>2. Extraction controls</h2>
|
| 47 |
+
<p>Batch quality gives the best final grouping. Online preview is the near-realtime clustering path.</p>
|
| 48 |
</div>
|
| 49 |
<button id="clearCacheButton" class="ghost-button" type="button">Clear cache</button>
|
| 50 |
</div>
|
|
|
|
| 56 |
<label>Demucs model
|
| 57 |
<select id="demucs_model"></select>
|
| 58 |
</label>
|
| 59 |
+
<label>Clustering mode
|
| 60 |
+
<select id="clustering_mode">
|
| 61 |
+
<option value="batch_quality">batch quality</option>
|
| 62 |
+
<option value="online_preview">online preview</option>
|
| 63 |
+
</select>
|
| 64 |
+
</label>
|
| 65 |
<label>Shifts
|
| 66 |
<input id="demucs_shifts" type="number" min="0" max="8" step="1" />
|
| 67 |
</label>
|
|
|
|
| 129 |
<div class="toggles">
|
| 130 |
<label><input id="synthesize" type="checkbox" /> synthesize alternates</label>
|
| 131 |
<label><input id="quantize_midi" type="checkbox" /> quantize MIDI</label>
|
| 132 |
+
<label><input id="use_disk_cache" type="checkbox" /> disk cache stems/source loads</label>
|
| 133 |
</div>
|
| 134 |
|
| 135 |
<div class="actions">
|
| 136 |
<button id="runButton" class="primary-button" type="button" disabled>Extract samples</button>
|
| 137 |
<button id="useFastButton" class="secondary-button" type="button">Use fast full-mix mode</button>
|
| 138 |
+
<button id="usePreviewButton" class="secondary-button" type="button">Use online preview mode</button>
|
| 139 |
</div>
|
| 140 |
</section>
|
| 141 |
|
|
|
|
| 151 |
<pre id="logs" class="logs" aria-live="polite"></pre>
|
| 152 |
</section>
|
| 153 |
|
| 154 |
+
<section class="panel history-panel">
|
| 155 |
+
<div class="panel-heading">
|
| 156 |
+
<div>
|
| 157 |
+
<h2>Run history</h2>
|
| 158 |
+
<p>Completed manifests under <code>.runs/</code> are indexed automatically. Load a run to compare timings and artifacts.</p>
|
| 159 |
+
</div>
|
| 160 |
+
<button id="refreshHistoryButton" class="ghost-button" type="button">Refresh</button>
|
| 161 |
+
</div>
|
| 162 |
+
<div id="historyList" class="history-list"></div>
|
| 163 |
+
</section>
|
| 164 |
+
|
| 165 |
<section class="panel result-panel">
|
| 166 |
<div class="panel-heading">
|
| 167 |
<div>
|
web/styles.css
CHANGED
|
@@ -78,3 +78,11 @@ td { color: #e5eaf7; }
|
|
| 78 |
tr:last-child td { border-bottom: 0; }
|
| 79 |
@media (max-width: 1100px) { .workspace, .hero { grid-template-columns: 1fr; } .control-grid { grid-template-columns: repeat(2, minmax(0, 1fr)); } }
|
| 80 |
@media (max-width: 680px) { .shell { width: min(100% - 20px, 1520px); padding-top: 16px; } .panel { padding: 16px; border-radius: 22px; } .control-grid, .audio-grid { grid-template-columns: 1fr; } h1 { letter-spacing: -.045em; } }
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 78 |
tr:last-child td { border-bottom: 0; }
|
| 79 |
@media (max-width: 1100px) { .workspace, .hero { grid-template-columns: 1fr; } .control-grid { grid-template-columns: repeat(2, minmax(0, 1fr)); } }
|
| 80 |
@media (max-width: 680px) { .shell { width: min(100% - 20px, 1520px); padding-top: 16px; } .panel { padding: 16px; border-radius: 22px; } .control-grid, .audio-grid { grid-template-columns: 1fr; } h1 { letter-spacing: -.045em; } }
|
| 81 |
+
.history-panel { align-self: stretch; }
|
| 82 |
+
.history-list { display: grid; gap: 8px; max-height: 360px; overflow: auto; }
|
| 83 |
+
.history-row { width: 100%; display: grid; grid-template-columns: minmax(0, 1fr) auto auto auto; gap: 12px; align-items: center; text-align: left; border: 1px solid var(--line); background: rgba(0,0,0,.16); border-radius: 16px; padding: 12px; }
|
| 84 |
+
.history-row strong { display: block; overflow: hidden; text-overflow: ellipsis; white-space: nowrap; color: var(--text); }
|
| 85 |
+
.history-row small { display: block; color: var(--muted); margin-top: 3px; }
|
| 86 |
+
.history-row span:not(:first-child) { color: #dbe5f7; font-size: 12px; font-variant-numeric: tabular-nums; }
|
| 87 |
+
.empty { color: var(--muted); margin: 0; }
|
| 88 |
+
@media (max-width: 680px) { .history-row { grid-template-columns: 1fr 1fr; } }
|