Spaces:
Sleeping
Sleeping
ChatGPT commited on
Commit ·
0b5f0f0
1
Parent(s): e33cc90
feat: add full-context reproduction and clearer controls
Browse files- README.md +4 -2
- app.py +1 -1
- docs/API.md +16 -1
- docs/FEATURES.md +6 -5
- docs/FIXED_WORKSTATION_UI.md +10 -2
- docs/PROGRESS.md +24 -0
- docs/REMAINING_WORK.md +3 -0
- docs/REPRODUCED_AUDIO_AND_PARAMETERS.md +82 -0
- docs/SUPERVISED_EXPORT_AND_FORCE_ONSET.md +14 -2
- docs/TASKS.md +22 -1
- docs/UI_REPLACEMENT.md +23 -6
- docs/benchmark-online-preview.json +81 -81
- docs/benchmark-subprocesses.json +86 -86
- docs/interactive-ux/README.md +1 -1
- pipeline_runner.py +93 -14
- sample_extractor.py +9 -1
- scripts/test_api_job.py +2 -0
- scripts/test_supervised_export_and_force_onset.py +1 -1
- supervised_export.py +77 -3
- web/app.js +58 -17
- web/index.html +117 -88
- web/styles.css +72 -0
README.md
CHANGED
|
@@ -12,7 +12,7 @@ pinned: false
|
|
| 12 |
|
| 13 |
A custom FastAPI + browser workstation for extracting, reviewing, and now semantically supervising reusable drum samples from an audio file.
|
| 14 |
|
| 15 |
-
The pipeline can isolate a stem with Demucs, detect onsets, classify hits, cluster similar transients, choose representative samples, optionally synthesize alternate samples, and export WAVs, MIDI, reconstruction audio, manifests, and a complete ZIP sample pack. The interactive layer stores user corrections as replayable semantic state beside each run manifest.
|
| 16 |
|
| 17 |
## Current status
|
| 18 |
|
|
@@ -73,6 +73,7 @@ See:
|
|
| 73 |
- `docs/REMAINING_WORK.md`
|
| 74 |
- `docs/SUPERVISED_EXPORT_AND_FORCE_ONSET.md`
|
| 75 |
- `docs/FIXED_WORKSTATION_UI.md`
|
|
|
|
| 76 |
|
| 77 |
## Run locally
|
| 78 |
|
|
@@ -155,7 +156,7 @@ curl http://127.0.0.1:7860/api/jobs
|
|
| 155 |
| `sample_extractor.py` | Core DSP/sample extraction implementation |
|
| 156 |
| `supervised_state.py` | Persistent semantic state, confidence, constraints, events, suggestions, force-onset, restore, undo |
|
| 157 |
| `supervised_export.py` | Renders edited semantic state into supervised WAV/MIDI/reconstruction/ZIP artifacts |
|
| 158 |
-
| `web/` | Custom no-build browser frontend with fixed non-scrolling workstation layout, explicit upload/whole-page drag-drop, sidebars, bottom dock, sample-card grid, hidden-audio audition, add-onset mode, and edited export |
|
| 159 |
| `scripts/benchmark_subprocesses.py` | Synthetic benchmark runner for stage timings |
|
| 160 |
| `scripts/test_interactive_supervision.py` | Smoke test for supervised state endpoints |
|
| 161 |
| `scripts/test_supervised_export_and_force_onset.py` | Smoke test for force-onset, restore, suggestion diffs, and edited exports |
|
|
@@ -180,6 +181,7 @@ Each run is stored under `.runs/<job-id>/output/`:
|
|
| 180 |
- `supervised/samples/*.wav` after edited export
|
| 181 |
- `supervised/reconstruction.mid` after edited export
|
| 182 |
- `supervised/reconstruction.wav` after edited export
|
|
|
|
| 183 |
|
| 184 |
Generated runtime directories are ignored by git:
|
| 185 |
|
|
|
|
| 12 |
|
| 13 |
A custom FastAPI + browser workstation for extracting, reviewing, and now semantically supervising reusable drum samples from an audio file.
|
| 14 |
|
| 15 |
+
The pipeline can isolate a stem with Demucs, detect onsets, classify hits, cluster similar transients, choose representative samples, optionally synthesize alternate samples, and export WAVs, MIDI, target-stem reconstruction, full-context reproduced audio, manifests, and a complete ZIP sample pack. The interactive layer stores user corrections as replayable semantic state beside each run manifest.
|
| 16 |
|
| 17 |
## Current status
|
| 18 |
|
|
|
|
| 73 |
- `docs/REMAINING_WORK.md`
|
| 74 |
- `docs/SUPERVISED_EXPORT_AND_FORCE_ONSET.md`
|
| 75 |
- `docs/FIXED_WORKSTATION_UI.md`
|
| 76 |
+
- `docs/REPRODUCED_AUDIO_AND_PARAMETERS.md`
|
| 77 |
|
| 78 |
## Run locally
|
| 79 |
|
|
|
|
| 156 |
| `sample_extractor.py` | Core DSP/sample extraction implementation |
|
| 157 |
| `supervised_state.py` | Persistent semantic state, confidence, constraints, events, suggestions, force-onset, restore, undo |
|
| 158 |
| `supervised_export.py` | Renders edited semantic state into supervised WAV/MIDI/reconstruction/ZIP artifacts |
|
| 159 |
+
| `web/` | Custom no-build browser frontend with fixed non-scrolling workstation layout, explicit upload/whole-page drag-drop, source/stem/reproduced preview transport, common/advanced parameter separation, sidebars, bottom dock, sample-card grid, hidden-audio audition, add-onset mode, and edited export |
|
| 160 |
| `scripts/benchmark_subprocesses.py` | Synthetic benchmark runner for stage timings |
|
| 161 |
| `scripts/test_interactive_supervision.py` | Smoke test for supervised state endpoints |
|
| 162 |
| `scripts/test_supervised_export_and_force_onset.py` | Smoke test for force-onset, restore, suggestion diffs, and edited exports |
|
|
|
|
| 181 |
- `supervised/samples/*.wav` after edited export
|
| 182 |
- `supervised/reconstruction.mid` after edited export
|
| 183 |
- `supervised/reconstruction.wav` after edited export
|
| 184 |
+
- `source.wav`, `context_bed.wav`, and `target_reconstruction.wav` for source/stem/reproduced A/B previews
|
| 185 |
|
| 186 |
Generated runtime directories are ignored by git:
|
| 187 |
|
app.py
CHANGED
|
@@ -48,7 +48,7 @@ WEB_DIR = ROOT / "web"
|
|
| 48 |
RUNS_DIR = ROOT / ".runs"
|
| 49 |
RUNS_DIR.mkdir(exist_ok=True)
|
| 50 |
|
| 51 |
-
app = FastAPI(title="Drum Sample Extractor", version="
|
| 52 |
app.add_middleware(
|
| 53 |
CORSMiddleware,
|
| 54 |
allow_origins=["*"],
|
|
|
|
| 48 |
RUNS_DIR = ROOT / ".runs"
|
| 49 |
RUNS_DIR.mkdir(exist_ok=True)
|
| 50 |
|
| 51 |
+
app = FastAPI(title="Drum Sample Extractor", version="13.0.0")
|
| 52 |
app.add_middleware(
|
| 53 |
CORSMiddleware,
|
| 54 |
allow_origins=["*"],
|
docs/API.md
CHANGED
|
@@ -134,7 +134,7 @@ Completed jobs contain:
|
|
| 134 |
| `samples` | Representative sample rows with score, duration, first onset, and playback/download URL. |
|
| 135 |
| `hits` | Per-detected-hit review rows with onset, duration, label, cluster, representative flag, and playback/download URL. |
|
| 136 |
| `overview` | Decimated envelope and clickable onset markers for waveform display. |
|
| 137 |
-
| `files` | Relative artifact paths. |
|
| 138 |
| `file_urls` | Direct API URLs for top-level artifacts. |
|
| 139 |
|
| 140 |
## `GET /api/jobs/{job_id}/events`
|
|
@@ -154,6 +154,18 @@ data: {"id":"58ca0db4ac74","status":"running","stages":[...]}
|
|
| 154 |
|
| 155 |
The stream closes after `complete` or `error`. Completed historical jobs emit one final `job` event and close.
|
| 156 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 157 |
## `GET /api/jobs/{job_id}/files/{relative_path}`
|
| 158 |
|
| 159 |
Downloads an artifact from a completed job.
|
|
@@ -163,6 +175,8 @@ Examples:
|
|
| 163 |
```bash
|
| 164 |
curl -O http://127.0.0.1:7860/api/jobs/58ca0db4ac74/files/sample-pack.zip
|
| 165 |
curl -O http://127.0.0.1:7860/api/jobs/58ca0db4ac74/files/reconstruction.mid
|
|
|
|
|
|
|
| 166 |
curl -O http://127.0.0.1:7860/api/jobs/58ca0db4ac74/files/samples/hihat_open_0.wav
|
| 167 |
curl -O http://127.0.0.1:7860/api/jobs/58ca0db4ac74/files/review/hits/hit_00000_kick.wav
|
| 168 |
```
|
|
@@ -313,6 +327,7 @@ Response shape:
|
|
| 313 |
"files": {
|
| 314 |
"archive": "supervised/sample-pack.zip",
|
| 315 |
"midi": "supervised/reconstruction.mid",
|
|
|
|
| 316 |
"reconstruction": "supervised/reconstruction.wav"
|
| 317 |
},
|
| 318 |
"file_urls": {}
|
|
|
|
| 134 |
| `samples` | Representative sample rows with score, duration, first onset, and playback/download URL. |
|
| 135 |
| `hits` | Per-detected-hit review rows with onset, duration, label, cluster, representative flag, and playback/download URL. |
|
| 136 |
| `overview` | Decimated envelope and clickable onset markers for waveform display. |
|
| 137 |
+
| `files` | Relative artifact paths. Includes `source`, `stem`, `context_bed`, `target_reconstruction`, `reconstruction`, `midi`, and `archive` when available. |
|
| 138 |
| `file_urls` | Direct API URLs for top-level artifacts. |
|
| 139 |
|
| 140 |
## `GET /api/jobs/{job_id}/events`
|
|
|
|
| 154 |
|
| 155 |
The stream closes after `complete` or `error`. Completed historical jobs emit one final `job` event and close.
|
| 156 |
|
| 157 |
+
## Top-level artifact meanings
|
| 158 |
+
|
| 159 |
+
| Key | Path | Meaning |
|
| 160 |
+
|---|---|---|
|
| 161 |
+
| `source` | `source.wav` | Normalized source mix used for source preview. |
|
| 162 |
+
| `stem` | `stem.wav` | Target stem being sampled. |
|
| 163 |
+
| `context_bed` | `context_bed.wav` | Non-target stems/context bed; silent for `stem=all`. |
|
| 164 |
+
| `target_reconstruction` | `target_reconstruction.wav` | Sample-triggered reconstruction of only the target stem. |
|
| 165 |
+
| `reconstruction` | `reconstruction.wav` | Full-context reproduced mix: context bed plus target reconstruction. |
|
| 166 |
+
| `midi` | `reconstruction.mid` | MIDI trigger reconstruction. |
|
| 167 |
+
| `archive` | `sample-pack.zip` | Complete sample pack and reproduction artifacts. |
|
| 168 |
+
|
| 169 |
## `GET /api/jobs/{job_id}/files/{relative_path}`
|
| 170 |
|
| 171 |
Downloads an artifact from a completed job.
|
|
|
|
| 175 |
```bash
|
| 176 |
curl -O http://127.0.0.1:7860/api/jobs/58ca0db4ac74/files/sample-pack.zip
|
| 177 |
curl -O http://127.0.0.1:7860/api/jobs/58ca0db4ac74/files/reconstruction.mid
|
| 178 |
+
curl -O http://127.0.0.1:7860/api/jobs/58ca0db4ac74/files/reconstruction.wav
|
| 179 |
+
curl -O http://127.0.0.1:7860/api/jobs/58ca0db4ac74/files/target_reconstruction.wav
|
| 180 |
curl -O http://127.0.0.1:7860/api/jobs/58ca0db4ac74/files/samples/hihat_open_0.wav
|
| 181 |
curl -O http://127.0.0.1:7860/api/jobs/58ca0db4ac74/files/review/hits/hit_00000_kick.wav
|
| 182 |
```
|
|
|
|
| 327 |
"files": {
|
| 328 |
"archive": "supervised/sample-pack.zip",
|
| 329 |
"midi": "supervised/reconstruction.mid",
|
| 330 |
+
"target_reconstruction": "supervised/target_reconstruction.wav",
|
| 331 |
"reconstruction": "supervised/reconstruction.wav"
|
| 332 |
},
|
| 333 |
"file_urls": {}
|
docs/FEATURES.md
CHANGED
|
@@ -4,7 +4,7 @@ Last updated: 2026-05-12
|
|
| 4 |
|
| 5 |
## Product goal
|
| 6 |
|
| 7 |
-
Turn an input audio file into a practical drum sample pack: detected hits, grouped sample classes, representative WAVs, optional synthesized alternates, MIDI reconstruction,
|
| 8 |
|
| 9 |
## Implemented features
|
| 10 |
|
|
@@ -14,8 +14,8 @@ Turn an input audio file into a practical drum sample pack: detected hits, group
|
|
| 14 |
| UI | Explicit upload button | Implemented | Top bar contains a visible `Upload audio` control. |
|
| 15 |
| UI | Whole-app drag/drop audio upload | Implemented | Dropping files anywhere on the app selects the file and shows a drop overlay during drag. |
|
| 16 |
| UI | Fixed non-scrolling workstation layout | Implemented | Body is viewport-locked; tools live in left/right sidebars and a bottom dock; long content scrolls inside panels only. |
|
| 17 |
-
| UI | Minimal custom transport | Implemented | One
|
| 18 |
-
| UI |
|
| 19 |
| UI | Streaming progress | Implemented | Uses `EventSource` over `GET /api/jobs/{id}/events`, with polling fallback. |
|
| 20 |
| UI | Waveform/onset overview | Implemented | Canvas envelope plus clickable onset markers from `manifest.json`. |
|
| 21 |
| UI | Result downloads | Implemented | ZIP, MIDI, stem WAV, reconstruction WAV, individual sample WAVs, and per-hit review WAVs. |
|
|
@@ -37,9 +37,10 @@ Turn an input audio file into a practical drum sample pack: detected hits, group
|
|
| 37 |
| Pipeline | Representative selection | Implemented | Quality score picks best hit per cluster. |
|
| 38 |
| Pipeline | Optional synthesis | Implemented | Weighted aligned average for multi-hit clusters. |
|
| 39 |
| Pipeline | MIDI export | Implemented | Quantized or unquantized reconstruction MIDI. |
|
| 40 |
-
| Pipeline |
|
|
|
|
| 41 |
| Pipeline | Per-hit review export | Implemented | Writes every accepted detected hit to `review/hits/*.wav` and records rows in the manifest. |
|
| 42 |
-
| Pipeline | Sample pack ZIP | Implemented | Includes WAVs, index JSON, MIDI,
|
| 43 |
| Supervision | Edited artifact re-export | Implemented | `supervised_export.py` writes edited samples, MIDI, reconstruction, ZIP, and `supervised/manifest.json`. |
|
| 44 |
| Supervision | Force-onset from waveform | Implemented | Adds user-forced hit slices from cached `stem.wav`; UI add-onset mode posts to `/hits/force-onset`. |
|
| 45 |
| Supervision | Suppressed-hit restore | Implemented | Restore endpoint and UI button reverse suppression without undoing unrelated edits. |
|
|
|
|
| 4 |
|
| 5 |
## Product goal
|
| 6 |
|
| 7 |
+
Turn an input audio file into a practical drum sample pack: detected hits, grouped sample classes, representative WAVs, optional synthesized alternates, MIDI reconstruction, target-stem reconstruction, full-context reproduced audio, and an inspectable manifest.
|
| 8 |
|
| 9 |
## Implemented features
|
| 10 |
|
|
|
|
| 14 |
| UI | Explicit upload button | Implemented | Top bar contains a visible `Upload audio` control. |
|
| 15 |
| UI | Whole-app drag/drop audio upload | Implemented | Dropping files anywhere on the app selects the file and shows a drop overlay during drag. |
|
| 16 |
| UI | Fixed non-scrolling workstation layout | Implemented | Body is viewport-locked; tools live in left/right sidebars and a bottom dock; long content scrolls inside panels only. |
|
| 17 |
+
| UI | Minimal custom transport | Implemented | One play/time/progress row can audition Source, Stem, or Reproduced previews; completed runs default to Reproduced. |
|
| 18 |
+
| UI | Common vs advanced parameters | Implemented | Common controls show stem, sensitivity, group count, and two presets; advanced model/DSP/export controls are grouped by pipeline stage. |
|
| 19 |
| UI | Streaming progress | Implemented | Uses `EventSource` over `GET /api/jobs/{id}/events`, with polling fallback. |
|
| 20 |
| UI | Waveform/onset overview | Implemented | Canvas envelope plus clickable onset markers from `manifest.json`. |
|
| 21 |
| UI | Result downloads | Implemented | ZIP, MIDI, stem WAV, reconstruction WAV, individual sample WAVs, and per-hit review WAVs. |
|
|
|
|
| 37 |
| Pipeline | Representative selection | Implemented | Quality score picks best hit per cluster. |
|
| 38 |
| Pipeline | Optional synthesis | Implemented | Weighted aligned average for multi-hit clusters. |
|
| 39 |
| Pipeline | MIDI export | Implemented | Quantized or unquantized reconstruction MIDI. |
|
| 40 |
+
| Pipeline | Target reconstruction render | Implemented | Renders the selected sample representatives from MIDI/onset timing and matches RMS to the target stem. |
|
| 41 |
+
| Pipeline | Full-context reproduced mix | Implemented | Writes `reconstruction.wav` as non-target context bed plus target reconstruction, so separated-stem runs incorporate all other stems. |
|
| 42 |
| Pipeline | Per-hit review export | Implemented | Writes every accepted detected hit to `review/hits/*.wav` and records rows in the manifest. |
|
| 43 |
+
| Pipeline | Sample pack ZIP | Implemented | Includes WAVs, index JSON, MIDI, full-context reproduced mix, and target-stem reconstruction. |
|
| 44 |
| Supervision | Edited artifact re-export | Implemented | `supervised_export.py` writes edited samples, MIDI, reconstruction, ZIP, and `supervised/manifest.json`. |
|
| 45 |
| Supervision | Force-onset from waveform | Implemented | Adds user-forced hit slices from cached `stem.wav`; UI add-onset mode posts to `/hits/force-onset`. |
|
| 46 |
| Supervision | Suppressed-hit restore | Implemented | Restore endpoint and UI button reverse suppression without undoing unrelated edits. |
|
docs/FIXED_WORKSTATION_UI.md
CHANGED
|
@@ -12,8 +12,8 @@ The web app should behave like a compact workstation rather than a long document
|
|
| 12 |
|---|---|---|
|
| 13 |
| Top bar | App title, explicit `Upload audio` button, selected-file metadata, backend status, primary `Extract Samples` action | Fixed height; no scroll |
|
| 14 |
| Left sidebar | Source/drop guidance, selected-hit/sample context, pipeline stages/logs, run history | Sidebar-internal scroll only |
|
| 15 |
-
| Center workspace | Large waveform/transport and representative sample cards | Sample grid scrolls internally when needed |
|
| 16 |
-
| Right sidebar |
|
| 17 |
| Bottom bar | Interactive review/edit tools and raw tables | Panel-internal scroll only |
|
| 18 |
|
| 19 |
The document itself is locked with `overflow: hidden`; long content is constrained to the relevant tool panel.
|
|
@@ -45,3 +45,11 @@ Static checks added/performed for this pass:
|
|
| 45 |
- Duplicate id check for `web/index.html`
|
| 46 |
- Python compile check for active Python files
|
| 47 |
- FastAPI extraction smoke test
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
|---|---|---|
|
| 13 |
| Top bar | App title, explicit `Upload audio` button, selected-file metadata, backend status, primary `Extract Samples` action | Fixed height; no scroll |
|
| 14 |
| Left sidebar | Source/drop guidance, selected-hit/sample context, pipeline stages/logs, run history | Sidebar-internal scroll only |
|
| 15 |
+
| Center workspace | Large waveform, Source/Stem/Reproduced transport, and representative sample cards | Sample grid scrolls internally when needed |
|
| 16 |
+
| Right sidebar | Common extraction controls, exports, and advanced parameters grouped by stage | Sidebar-internal scroll only |
|
| 17 |
| Bottom bar | Interactive review/edit tools and raw tables | Panel-internal scroll only |
|
| 18 |
|
| 19 |
The document itself is locked with `overflow: hidden`; long content is constrained to the relevant tool panel.
|
|
|
|
| 45 |
- Duplicate id check for `web/index.html`
|
| 46 |
- Python compile check for active Python files
|
| 47 |
- FastAPI extraction smoke test
|
| 48 |
+
|
| 49 |
+
|
| 50 |
+
## Pass 9 additions
|
| 51 |
+
|
| 52 |
+
- The center transport now exposes explicit `Source`, `Stem`, and `Reproduced` preview modes.
|
| 53 |
+
- The right sidebar now separates the normal workflow from advanced tuning.
|
| 54 |
+
- Common controls are limited to stem choice, hit sensitivity, sample group count, and two presets.
|
| 55 |
+
- Advanced parameters are grouped into stem separation, hit detection, grouping, export/cache sections.
|
docs/PROGRESS.md
CHANGED
|
@@ -259,3 +259,27 @@ Completed in this pass:
|
|
| 259 |
Outcome:
|
| 260 |
|
| 261 |
The UI no longer behaves like a scrollable webpage. It now behaves like a compact desktop-style sample extraction workstation with simple expandable tool panels around a central waveform/sample workspace.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 259 |
Outcome:
|
| 260 |
|
| 261 |
The UI no longer behaves like a scrollable webpage. It now behaves like a compact desktop-style sample extraction workstation with simple expandable tool panels around a central waveform/sample workspace.
|
| 262 |
+
|
| 263 |
+
## Pass 9: full-context reproduced audio and clearer parameters
|
| 264 |
+
|
| 265 |
+
Completed in this pass:
|
| 266 |
+
|
| 267 |
+
1. Added explicit source/context/reconstruction layers to the pipeline export:
|
| 268 |
+
- `source.wav`
|
| 269 |
+
- `stem.wav`
|
| 270 |
+
- `context_bed.wav`
|
| 271 |
+
- `target_reconstruction.wav`
|
| 272 |
+
- `reconstruction.wav`
|
| 273 |
+
2. Changed `reconstruction.wav` to a full-context reproduced mix: non-target context bed plus sample-triggered target reconstruction.
|
| 274 |
+
3. Kept `target_reconstruction.wav` as the isolated sample-only target layer for debugging and focused listening.
|
| 275 |
+
4. Matched the target reconstruction RMS to the target stem before mixing it back into context.
|
| 276 |
+
5. Updated `sample-pack.zip` to include both the full-context reproduced mix and target-stem reconstruction.
|
| 277 |
+
6. Updated supervised edited export so edited packs follow the same audio-layer model under `supervised/`.
|
| 278 |
+
7. Added Source / Stem / Reproduced preview modes to the single transport row; completed jobs default to Reproduced.
|
| 279 |
+
8. Reworked the right sidebar into Common controls vs Advanced parameters.
|
| 280 |
+
9. Grouped advanced parameters by pipeline stage: stem separation, hit detection, grouping, export/cache.
|
| 281 |
+
10. Added `docs/REPRODUCED_AUDIO_AND_PARAMETERS.md`.
|
| 282 |
+
|
| 283 |
+
Outcome:
|
| 284 |
+
|
| 285 |
+
The app is easier to understand for normal use: the main right-side controls are now only the few controls users are likely to touch repeatedly, while lower-level DSP/model controls stay available but grouped by stage. Reproduced audio is now useful for musical judgment because separated-stem runs are previewed inside the rest of the mix rather than as an isolated sample-triggered stem only.
|
docs/REMAINING_WORK.md
CHANGED
|
@@ -8,12 +8,15 @@ The project is now a usable extraction workstation, not a complete interactive s
|
|
| 8 |
|
| 9 |
## Highest-priority remaining gaps
|
| 10 |
|
|
|
|
|
|
|
| 11 |
1. **Cluster editing**: allow merge, split, relabel, and manual reassignment of groups from the `Review & edit` workbench.
|
| 12 |
2. **Waveform editing depth**: add onset drag/shift, hit trim boundaries, and rerun-from-edited-onsets without redoing Demucs.
|
| 13 |
3. **Run comparison**: compare two manifests side-by-side for parameter tuning.
|
| 14 |
4. **Lower-level progress**: expose internal Demucs/clustering progress where libraries make that possible.
|
| 15 |
5. **Frontend engineering hardening**: migrate the frontend to TypeScript after the UX stabilizes and add browser-level tests.
|
| 16 |
6. **Benchmark panel**: add an in-app benchmark view that can run synthetic fixtures and compare parameter profiles.
|
|
|
|
| 17 |
|
| 18 |
## Known constraints
|
| 19 |
|
|
|
|
| 8 |
|
| 9 |
## Highest-priority remaining gaps
|
| 10 |
|
| 11 |
+
Completed since the previous snapshot: reproduced audio now incorporates non-target context stems, and common/advanced parameters are separated in the right sidebar.
|
| 12 |
+
|
| 13 |
1. **Cluster editing**: allow merge, split, relabel, and manual reassignment of groups from the `Review & edit` workbench.
|
| 14 |
2. **Waveform editing depth**: add onset drag/shift, hit trim boundaries, and rerun-from-edited-onsets without redoing Demucs.
|
| 15 |
3. **Run comparison**: compare two manifests side-by-side for parameter tuning.
|
| 16 |
4. **Lower-level progress**: expose internal Demucs/clustering progress where libraries make that possible.
|
| 17 |
5. **Frontend engineering hardening**: migrate the frontend to TypeScript after the UX stabilizes and add browser-level tests.
|
| 18 |
6. **Benchmark panel**: add an in-app benchmark view that can run synthetic fixtures and compare parameter profiles.
|
| 19 |
+
7. **Reproduction diagnostics**: add source-vs-reproduced A/B error visualization and region ranking.
|
| 20 |
|
| 21 |
## Known constraints
|
| 22 |
|
docs/REPRODUCED_AUDIO_AND_PARAMETERS.md
ADDED
|
@@ -0,0 +1,82 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Reproduced audio and parameter hierarchy
|
| 2 |
+
|
| 3 |
+
Last updated: 2026-05-12
|
| 4 |
+
|
| 5 |
+
## Goal
|
| 6 |
+
|
| 7 |
+
The preview called “reproduced audio” should be understandable and useful for judging whether the extracted samples reproduce the source. It should not only play the extracted target stem in isolation when the user extracted a separated stem.
|
| 8 |
+
|
| 9 |
+
## Implemented audio model
|
| 10 |
+
|
| 11 |
+
The pipeline now writes three different preview layers per run:
|
| 12 |
+
|
| 13 |
+
| File | Meaning | UI label |
|
| 14 |
+
|---|---|---|
|
| 15 |
+
| `source.wav` | The normalized source mix used as preview context. | Source mix WAV |
|
| 16 |
+
| `stem.wav` | The target audio being sampled, for example the separated drums stem. | Target stem WAV |
|
| 17 |
+
| `context_bed.wav` | The non-target context bed: source mix minus target stem. For `stem=all`, this is silence. | Non-target stems WAV |
|
| 18 |
+
| `target_reconstruction.wav` | The selected sample representatives rendered from MIDI/onset timing, matched to the target stem’s RMS. | Target reconstruction WAV |
|
| 19 |
+
| `reconstruction.wav` | Full-context reproduced mix: `context_bed.wav + target_reconstruction.wav`, soft-limited. | Reproduced mix WAV |
|
| 20 |
+
|
| 21 |
+
For separated-stem runs, `reconstruction.wav` therefore incorporates the other stems instead of only replaying the drum/sample layer. This makes A/B listening more useful: the user hears the extracted sample layer inside the rest of the track.
|
| 22 |
+
|
| 23 |
+
## Archive behavior
|
| 24 |
+
|
| 25 |
+
`sample-pack.zip` now includes:
|
| 26 |
+
|
| 27 |
+
- `rendered_reproduction_full_mix.wav` — the full-context reproduced mix.
|
| 28 |
+
- `rendered_reconstruction_target_stem.wav` — the sample-only target reconstruction.
|
| 29 |
+
- `rendered_reconstruction.wav` — backwards-compatible alias for the full-context reproduced mix.
|
| 30 |
+
- `reconstruction.mid` — the MIDI trigger reconstruction.
|
| 31 |
+
- `samples/*.wav` — representative samples.
|
| 32 |
+
- `index.json` — sample metadata and reproduction file references.
|
| 33 |
+
|
| 34 |
+
## Supervised edited export behavior
|
| 35 |
+
|
| 36 |
+
Edited exports under `supervised/` follow the same model:
|
| 37 |
+
|
| 38 |
+
- `supervised/target_reconstruction.wav`
|
| 39 |
+
- `supervised/reconstruction.wav`
|
| 40 |
+
- `supervised/sample-pack.zip`
|
| 41 |
+
|
| 42 |
+
The edited full-context reproduction reuses `context_bed.wav`, so semantic edits affect the sampled target layer while the non-target stems remain available for context.
|
| 43 |
+
|
| 44 |
+
## UI parameter hierarchy
|
| 45 |
+
|
| 46 |
+
The right sidebar now separates controls by frequency of use.
|
| 47 |
+
|
| 48 |
+
### Common controls
|
| 49 |
+
|
| 50 |
+
These are visible by default:
|
| 51 |
+
|
| 52 |
+
1. **Stem to sample** — choose the target source, usually `drums` or `all`.
|
| 53 |
+
2. **Hit sensitivity** — tune onset density.
|
| 54 |
+
3. **Sample groups** — choose the approximate maximum number of sample cards.
|
| 55 |
+
4. **Fast preview** preset — full mix + online clustering + no Demucs shifts.
|
| 56 |
+
5. **Best quality** preset — separated drums + batch clustering + conservative thresholds.
|
| 57 |
+
|
| 58 |
+
### Advanced parameters
|
| 59 |
+
|
| 60 |
+
Advanced controls are hidden in a collapsed panel and grouped by stage:
|
| 61 |
+
|
| 62 |
+
- Stem separation: model, shifts, overlap.
|
| 63 |
+
- Hit detection: onset mode, energy floor, gap, padding, min/max duration.
|
| 64 |
+
- Grouping: clustering mode, minimum groups, transient/mel/linkage thresholds.
|
| 65 |
+
- Export and cache: MIDI grid, synthesis, quantization, disk cache, cache clear.
|
| 66 |
+
|
| 67 |
+
## Preview transport
|
| 68 |
+
|
| 69 |
+
The single transport row now has explicit preview modes:
|
| 70 |
+
|
| 71 |
+
- **Source** — original source mix.
|
| 72 |
+
- **Stem** — target stem being sampled.
|
| 73 |
+
- **Reproduced** — full-context reproduced mix after extraction.
|
| 74 |
+
|
| 75 |
+
After a successful extraction, the transport switches to **Reproduced** by default because that is the most relevant quality check.
|
| 76 |
+
|
| 77 |
+
## Remaining audio-quality work
|
| 78 |
+
|
| 79 |
+
- Add a true A/B difference view between `source.wav` and `reconstruction.wav`.
|
| 80 |
+
- Compute per-region reconstruction error and expose the worst mismatch regions.
|
| 81 |
+
- Avoid residual subtraction artifacts by optionally caching explicit Demucs non-target stem sums when the full source separation bundle is available.
|
| 82 |
+
- Add loudness matching using LUFS instead of RMS for longer previews.
|
docs/SUPERVISED_EXPORT_AND_FORCE_ONSET.md
CHANGED
|
@@ -11,6 +11,7 @@ manifest.json + review/hits/*.wav + supervision_state.json
|
|
| 11 |
→ supervised/manifest.json
|
| 12 |
→ supervised/samples/*.wav
|
| 13 |
→ supervised/reconstruction.mid
|
|
|
|
| 14 |
→ supervised/reconstruction.wav
|
| 15 |
→ supervised/sample-pack.zip
|
| 16 |
```
|
|
@@ -27,7 +28,7 @@ manifest.json + review/hits/*.wav + supervision_state.json
|
|
| 27 |
| Suppressed-hit restore | Implemented | `POST /api/jobs/{job_id}/hits/{hit_id}/restore` |
|
| 28 |
| Exact suggestion diff preview | Implemented | `suggestion.diff` in state responses and UI diff button. |
|
| 29 |
| UI add-onset mode | Implemented | Toggle in supervision header; waveform clicks add forced hits. |
|
| 30 |
-
| UI edited export downloads | Implemented | Edited ZIP/MIDI/reconstruction links render after export. |
|
| 31 |
|
| 32 |
## Export behavior
|
| 33 |
|
|
@@ -39,7 +40,7 @@ The supervised export builds clusters from current semantic state:
|
|
| 39 |
4. Preserve forced hits and moved/pulled hits through current cluster membership.
|
| 40 |
5. Pick representatives from semantic `representative_hit_id` or favorite hits first.
|
| 41 |
6. Quality-score representatives only for unpinned clusters.
|
| 42 |
-
7. Write edited samples, MIDI, reconstruction WAV, ZIP, and `supervised/manifest.json`.
|
| 43 |
8. Append a `supervised.exported` event and `latest_export` entry to `supervision_state.json`.
|
| 44 |
|
| 45 |
The original `manifest.json`, original `sample-pack.zip`, and original `samples/*.wav` are not modified.
|
|
@@ -108,3 +109,14 @@ This test verifies:
|
|
| 108 |
- supervised export creation,
|
| 109 |
- artifact download URLs for edited ZIP/MIDI/reconstruction,
|
| 110 |
- latest export state metadata.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
→ supervised/manifest.json
|
| 12 |
→ supervised/samples/*.wav
|
| 13 |
→ supervised/reconstruction.mid
|
| 14 |
+
→ supervised/target_reconstruction.wav
|
| 15 |
→ supervised/reconstruction.wav
|
| 16 |
→ supervised/sample-pack.zip
|
| 17 |
```
|
|
|
|
| 28 |
| Suppressed-hit restore | Implemented | `POST /api/jobs/{job_id}/hits/{hit_id}/restore` |
|
| 29 |
| Exact suggestion diff preview | Implemented | `suggestion.diff` in state responses and UI diff button. |
|
| 30 |
| UI add-onset mode | Implemented | Toggle in supervision header; waveform clicks add forced hits. |
|
| 31 |
+
| UI edited export downloads | Implemented | Edited ZIP/MIDI/target-reconstruction/full-context-reproduction links render after export. |
|
| 32 |
|
| 33 |
## Export behavior
|
| 34 |
|
|
|
|
| 40 |
4. Preserve forced hits and moved/pulled hits through current cluster membership.
|
| 41 |
5. Pick representatives from semantic `representative_hit_id` or favorite hits first.
|
| 42 |
6. Quality-score representatives only for unpinned clusters.
|
| 43 |
+
7. Write edited samples, MIDI, target reconstruction WAV, full-context reproduction WAV, ZIP, and `supervised/manifest.json`.
|
| 44 |
8. Append a `supervised.exported` event and `latest_export` entry to `supervision_state.json`.
|
| 45 |
|
| 46 |
The original `manifest.json`, original `sample-pack.zip`, and original `samples/*.wav` are not modified.
|
|
|
|
| 109 |
- supervised export creation,
|
| 110 |
- artifact download URLs for edited ZIP/MIDI/reconstruction,
|
| 111 |
- latest export state metadata.
|
| 112 |
+
|
| 113 |
+
|
| 114 |
+
## Reproduced-audio update
|
| 115 |
+
|
| 116 |
+
Supervised export now mirrors the batch export audio model:
|
| 117 |
+
|
| 118 |
+
- `supervised/target_reconstruction.wav` is the edited sample-triggered target layer.
|
| 119 |
+
- `supervised/reconstruction.wav` is the edited full-context reproduced mix.
|
| 120 |
+
- The full-context mix reuses the original run’s `context_bed.wav`, then adds the edited target reconstruction.
|
| 121 |
+
|
| 122 |
+
This keeps the original batch artifacts immutable while making edited exports useful for listening inside the whole track context.
|
docs/TASKS.md
CHANGED
|
@@ -72,7 +72,7 @@ Last updated: 2026-05-12
|
|
| 72 |
| Add pull hit into new cluster | Done | `POST /api/jobs/{job_id}/hits/{hit_id}/pull-out`. |
|
| 73 |
| Add cluster lock/unlock | Done | `POST /api/jobs/{job_id}/clusters/{cluster_id}/lock`. |
|
| 74 |
| Add suppress hit as bleed/noise | Done | `POST /api/jobs/{job_id}/hits/{hit_id}/suppress`. |
|
| 75 |
-
| Add accept/favorite hit action | Done
|
| 76 |
| Add suggestion inbox | Done/Partial | UI/API supports accept/reject; exact diff preview still open. |
|
| 77 |
| Add cluster explanation drawer | Done | `GET /api/jobs/{job_id}/explain/cluster/{cluster_id}` plus UI drawer. |
|
| 78 |
| Add semantic undo | Done | `POST /api/jobs/{job_id}/undo`. |
|
|
@@ -106,3 +106,24 @@ Last updated: 2026-05-12
|
|
| 106 |
| Add explicit upload button | Done | Top bar now has a visible `Upload audio` control. |
|
| 107 |
| Make whole-app file dropping work | Done | Window-level drag/drop handlers select dropped files and prevent browser navigation. |
|
| 108 |
| Add drag overlay | Done | `globalDropOverlay` appears while dragging files over the app. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 72 |
| Add pull hit into new cluster | Done | `POST /api/jobs/{job_id}/hits/{hit_id}/pull-out`. |
|
| 73 |
| Add cluster lock/unlock | Done | `POST /api/jobs/{job_id}/clusters/{cluster_id}/lock`. |
|
| 74 |
| Add suppress hit as bleed/noise | Done | `POST /api/jobs/{job_id}/hits/{hit_id}/suppress`. |
|
| 75 |
+
| Add accept/favorite hit action | Done | `POST /api/jobs/{job_id}/hits/{hit_id}/review`; supervised re-export honors pinned/favorite representatives. |
|
| 76 |
| Add suggestion inbox | Done/Partial | UI/API supports accept/reject; exact diff preview still open. |
|
| 77 |
| Add cluster explanation drawer | Done | `GET /api/jobs/{job_id}/explain/cluster/{cluster_id}` plus UI drawer. |
|
| 78 |
| Add semantic undo | Done | `POST /api/jobs/{job_id}/undo`. |
|
|
|
|
| 106 |
| Add explicit upload button | Done | Top bar now has a visible `Upload audio` control. |
|
| 107 |
| Make whole-app file dropping work | Done | Window-level drag/drop handlers select dropped files and prevent browser navigation. |
|
| 108 |
| Add drag overlay | Done | `globalDropOverlay` appears while dragging files over the app. |
|
| 109 |
+
|
| 110 |
+
## Pass 9 tasks: reproduced audio and parameter hierarchy
|
| 111 |
+
|
| 112 |
+
| Task | Status | Notes |
|
| 113 |
+
|---|---:|---|
|
| 114 |
+
| Export normalized source mix | Done | `source.wav` written per run. |
|
| 115 |
+
| Export non-target context bed | Done | `context_bed.wav` is source minus target stem; silent for `stem=all`. |
|
| 116 |
+
| Keep isolated target reconstruction | Done | `target_reconstruction.wav` written per run and per supervised export. |
|
| 117 |
+
| Make reproduced audio incorporate context | Done | `reconstruction.wav` is context bed plus target reconstruction. |
|
| 118 |
+
| Add full-context reproduction to ZIP | Done | `rendered_reproduction_full_mix.wav` plus compatibility alias. |
|
| 119 |
+
| Add target-stem reconstruction to ZIP | Done | `rendered_reconstruction_target_stem.wav`. |
|
| 120 |
+
| Update supervised export audio model | Done | Edited export writes full-context and target-only previews. |
|
| 121 |
+
| Add Source/Stem/Reproduced transport modes | Done | Transport buttons added in `web/index.html` and wired in `web/app.js`. |
|
| 122 |
+
| Separate common controls from advanced parameters | Done | Common controls: stem, sensitivity, sample groups, presets. |
|
| 123 |
+
| Group advanced parameters by pipeline stage | Done | Stem separation, hit detection, grouping, export/cache. |
|
| 124 |
+
|
| 125 |
+
Remaining follow-up tasks:
|
| 126 |
+
|
| 127 |
+
- [ ] Add source-vs-reproduced waveform/error comparison.
|
| 128 |
+
- [ ] Add LUFS loudness matching for long previews.
|
| 129 |
+
- [ ] Optionally cache explicit Demucs non-target stem sums instead of residual subtraction.
|
docs/UI_REPLACEMENT.md
CHANGED
|
@@ -33,8 +33,8 @@ The UI was first restyled to the supplied minimal reference direction:
|
|
| 33 |
This pass closed the visual fidelity gaps from the previous approximation:
|
| 34 |
|
| 35 |
- removed the visible waveform header so the canvas is quiet like the reference image;
|
| 36 |
-
- replaced separate native stem/reconstruction audio controls with one minimal transport row: play button, time,
|
| 37 |
-
-
|
| 38 |
- collapsed pipeline/history/supervision/tables into one `Review & edit` workbench below the sample cards;
|
| 39 |
- hid selected-hit/sample audio elements from the default layout while preserving click-to-audition behavior;
|
| 40 |
- tightened card spacing, border radii, font scale, waveform height, and sample-card proportions to better match the supplied image.
|
|
@@ -61,8 +61,8 @@ This pass responds to the no-scroll workstation requirement and the missing-uplo
|
|
| 61 |
|---|---|
|
| 62 |
| Top bar | App identity, explicit upload button, selected-file metadata, backend status, and one primary purple `Extract Samples` action. |
|
| 63 |
| Left sidebar | Source/drop guidance, selected-hit/sample context, pipeline logs, and run history. |
|
| 64 |
-
| Center workspace | Quiet waveform canvas,
|
| 65 |
-
| Right sidebar |
|
| 66 |
| Bottom dock | Review/edit semantic supervision tools and raw tables in expandable panels. |
|
| 67 |
|
| 68 |
## Frontend implementation
|
|
@@ -89,8 +89,11 @@ The frontend creates a job with `POST /api/jobs`, then polls `GET /api/jobs/{id}
|
|
| 89 |
|
| 90 |
- sample pack ZIP
|
| 91 |
- MIDI reconstruction
|
| 92 |
-
-
|
| 93 |
-
-
|
|
|
|
|
|
|
|
|
|
| 94 |
- individual sample WAVs
|
| 95 |
|
| 96 |
The run history panel calls `GET /api/jobs` and can reload any completed manifest still present under `.runs/`.
|
|
@@ -128,3 +131,17 @@ The current UI now includes:
|
|
| 128 |
- Server-sent-events job progress via `GET /api/jobs/{job_id}/events`, with polling fallback.
|
| 129 |
|
| 130 |
This still stops short of destructive editing. The next UI layer should store edits as manifest overlays, then call a re-export endpoint that reuses cached hit audio instead of rerunning Demucs/onset detection.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 33 |
This pass closed the visual fidelity gaps from the previous approximation:
|
| 34 |
|
| 35 |
- removed the visible waveform header so the canvas is quiet like the reference image;
|
| 36 |
+
- replaced separate native stem/reconstruction audio controls with one minimal transport row: play button, time, progress line, and Source/Stem/Reproduced preview modes;
|
| 37 |
+
- renamed the right card to `Common controls` and limited it to stem, hit sensitivity, sample groups, plus fast-preview/best-quality presets;
|
| 38 |
- collapsed pipeline/history/supervision/tables into one `Review & edit` workbench below the sample cards;
|
| 39 |
- hid selected-hit/sample audio elements from the default layout while preserving click-to-audition behavior;
|
| 40 |
- tightened card spacing, border radii, font scale, waveform height, and sample-card proportions to better match the supplied image.
|
|
|
|
| 61 |
|---|---|
|
| 62 |
| Top bar | App identity, explicit upload button, selected-file metadata, backend status, and one primary purple `Extract Samples` action. |
|
| 63 |
| Left sidebar | Source/drop guidance, selected-hit/sample context, pipeline logs, and run history. |
|
| 64 |
+
| Center workspace | Quiet waveform canvas, Source/Stem/Reproduced transport row, and representative sample cards. |
|
| 65 |
+
| Right sidebar | Common controls, exports, and collapsed advanced parameters grouped by stem separation, hit detection, grouping, export, and cache. |
|
| 66 |
| Bottom dock | Review/edit semantic supervision tools and raw tables in expandable panels. |
|
| 67 |
|
| 68 |
## Frontend implementation
|
|
|
|
| 89 |
|
| 90 |
- sample pack ZIP
|
| 91 |
- MIDI reconstruction
|
| 92 |
+
- source mix WAV
|
| 93 |
+
- target stem WAV
|
| 94 |
+
- non-target context bed WAV
|
| 95 |
+
- target reconstruction WAV
|
| 96 |
+
- full-context reproduced mix WAV
|
| 97 |
- individual sample WAVs
|
| 98 |
|
| 99 |
The run history panel calls `GET /api/jobs` and can reload any completed manifest still present under `.runs/`.
|
|
|
|
| 131 |
- Server-sent-events job progress via `GET /api/jobs/{job_id}/events`, with polling fallback.
|
| 132 |
|
| 133 |
This still stops short of destructive editing. The next UI layer should store edits as manifest overlays, then call a re-export endpoint that reuses cached hit audio instead of rerunning Demucs/onset detection.
|
| 134 |
+
|
| 135 |
+
|
| 136 |
+
## Pass 9 reproduced audio and parameter hierarchy
|
| 137 |
+
|
| 138 |
+
This pass made the audio preview and control model more explicit:
|
| 139 |
+
|
| 140 |
+
- `reconstruction.wav` is now a full-context reproduced mix, not just the sample-triggered target layer.
|
| 141 |
+
- `target_reconstruction.wav` preserves the sample-only target reconstruction for focused inspection.
|
| 142 |
+
- `source.wav`, `stem.wav`, and `context_bed.wav` are exported as explicit layers.
|
| 143 |
+
- The transport has Source, Stem, and Reproduced preview buttons and switches to Reproduced after extraction.
|
| 144 |
+
- The right sidebar now separates Common controls from Advanced parameters.
|
| 145 |
+
- Advanced parameters are grouped by pipeline stage: stem separation, hit detection, grouping, export/cache.
|
| 146 |
+
|
| 147 |
+
See `docs/REPRODUCED_AUDIO_AND_PARAMETERS.md`.
|
docs/benchmark-online-preview.json
CHANGED
|
@@ -8,66 +8,66 @@
|
|
| 8 |
"run_index": 0,
|
| 9 |
"clustering_mode": "online_preview",
|
| 10 |
"audio_duration_sec": 4.75,
|
| 11 |
-
"total_duration_sec":
|
| 12 |
-
"realtime_factor": 0.
|
| 13 |
-
"hit_count":
|
| 14 |
-
"cluster_count":
|
| 15 |
"stages": [
|
| 16 |
{
|
| 17 |
"key": "stem",
|
| 18 |
"label": "Stem extraction / source load",
|
| 19 |
-
"duration_sec": 0.
|
| 20 |
"status": "done",
|
| 21 |
-
"detail": "loaded full mix \u00b7 cached"
|
| 22 |
},
|
| 23 |
{
|
| 24 |
"key": "bpm",
|
| 25 |
"label": "Tempo detection",
|
| 26 |
-
"duration_sec": 0.
|
| 27 |
"status": "done",
|
| 28 |
"detail": "120.2 BPM"
|
| 29 |
},
|
| 30 |
{
|
| 31 |
"key": "onsets",
|
| 32 |
"label": "Onset detection + slicing",
|
| 33 |
-
"duration_sec":
|
| 34 |
"status": "done",
|
| 35 |
-
"detail": "
|
| 36 |
},
|
| 37 |
{
|
| 38 |
"key": "classification",
|
| 39 |
"label": "Spectral rule classification",
|
| 40 |
-
"duration_sec": 0.
|
| 41 |
"status": "done",
|
| 42 |
-
"detail": "bright:
|
| 43 |
},
|
| 44 |
{
|
| 45 |
"key": "clustering",
|
| 46 |
"label": "Mel fingerprint + transient NCC clustering",
|
| 47 |
-
"duration_sec": 0.
|
| 48 |
"status": "done",
|
| 49 |
-
"detail": "
|
| 50 |
},
|
| 51 |
{
|
| 52 |
"key": "selection",
|
| 53 |
"label": "Best representative scoring",
|
| 54 |
-
"duration_sec": 0.
|
| 55 |
"status": "done",
|
| 56 |
"detail": "quality-scored representatives"
|
| 57 |
},
|
| 58 |
{
|
| 59 |
"key": "synthesis",
|
| 60 |
"label": "Optional sample synthesis",
|
| 61 |
-
"duration_sec": 0.
|
| 62 |
"status": "done",
|
| 63 |
"detail": "2 synthesized alternates"
|
| 64 |
},
|
| 65 |
{
|
| 66 |
"key": "export",
|
| 67 |
-
"label": "MIDI,
|
| 68 |
-
"duration_sec": 0.
|
| 69 |
"status": "done",
|
| 70 |
-
"detail": "
|
| 71 |
}
|
| 72 |
]
|
| 73 |
},
|
|
@@ -78,66 +78,66 @@
|
|
| 78 |
"run_index": 0,
|
| 79 |
"clustering_mode": "online_preview",
|
| 80 |
"audio_duration_sec": 4.874989,
|
| 81 |
-
"total_duration_sec":
|
| 82 |
-
"realtime_factor": 0.
|
| 83 |
-
"hit_count":
|
| 84 |
"cluster_count": 12,
|
| 85 |
"stages": [
|
| 86 |
{
|
| 87 |
"key": "stem",
|
| 88 |
"label": "Stem extraction / source load",
|
| 89 |
-
"duration_sec": 0.
|
| 90 |
"status": "done",
|
| 91 |
-
"detail": "loaded full mix \u00b7 cached"
|
| 92 |
},
|
| 93 |
{
|
| 94 |
"key": "bpm",
|
| 95 |
"label": "Tempo detection",
|
| 96 |
-
"duration_sec": 0.
|
| 97 |
"status": "done",
|
| 98 |
"detail": "161.5 BPM"
|
| 99 |
},
|
| 100 |
{
|
| 101 |
"key": "onsets",
|
| 102 |
"label": "Onset detection + slicing",
|
| 103 |
-
"duration_sec": 2.
|
| 104 |
"status": "done",
|
| 105 |
-
"detail": "
|
| 106 |
},
|
| 107 |
{
|
| 108 |
"key": "classification",
|
| 109 |
"label": "Spectral rule classification",
|
| 110 |
-
"duration_sec": 0.
|
| 111 |
"status": "done",
|
| 112 |
-
"detail": "bright:12, cymbal:1, hihat_closed:9, hihat_open:3, mid:
|
| 113 |
},
|
| 114 |
{
|
| 115 |
"key": "clustering",
|
| 116 |
"label": "Mel fingerprint + transient NCC clustering",
|
| 117 |
-
"duration_sec": 0.
|
| 118 |
"status": "done",
|
| 119 |
"detail": "12 clusters \u00b7 online preview"
|
| 120 |
},
|
| 121 |
{
|
| 122 |
"key": "selection",
|
| 123 |
"label": "Best representative scoring",
|
| 124 |
-
"duration_sec": 0.
|
| 125 |
"status": "done",
|
| 126 |
"detail": "quality-scored representatives"
|
| 127 |
},
|
| 128 |
{
|
| 129 |
"key": "synthesis",
|
| 130 |
"label": "Optional sample synthesis",
|
| 131 |
-
"duration_sec": 0.
|
| 132 |
"status": "done",
|
| 133 |
"detail": "5 synthesized alternates"
|
| 134 |
},
|
| 135 |
{
|
| 136 |
"key": "export",
|
| 137 |
-
"label": "MIDI,
|
| 138 |
-
"duration_sec": 0.
|
| 139 |
"status": "done",
|
| 140 |
-
"detail": "12 samples +
|
| 141 |
}
|
| 142 |
]
|
| 143 |
},
|
|
@@ -148,66 +148,66 @@
|
|
| 148 |
"run_index": 0,
|
| 149 |
"clustering_mode": "online_preview",
|
| 150 |
"audio_duration_sec": 4.874989,
|
| 151 |
-
"total_duration_sec": 2.
|
| 152 |
-
"realtime_factor": 0.
|
| 153 |
"hit_count": 29,
|
| 154 |
"cluster_count": 12,
|
| 155 |
"stages": [
|
| 156 |
{
|
| 157 |
"key": "stem",
|
| 158 |
"label": "Stem extraction / source load",
|
| 159 |
-
"duration_sec": 0.
|
| 160 |
"status": "done",
|
| 161 |
-
"detail": "loaded full mix \u00b7 cached"
|
| 162 |
},
|
| 163 |
{
|
| 164 |
"key": "bpm",
|
| 165 |
"label": "Tempo detection",
|
| 166 |
-
"duration_sec": 0.
|
| 167 |
"status": "done",
|
| 168 |
"detail": "120.2 BPM"
|
| 169 |
},
|
| 170 |
{
|
| 171 |
"key": "onsets",
|
| 172 |
"label": "Onset detection + slicing",
|
| 173 |
-
"duration_sec": 1.
|
| 174 |
"status": "done",
|
| 175 |
"detail": "29 hits"
|
| 176 |
},
|
| 177 |
{
|
| 178 |
"key": "classification",
|
| 179 |
"label": "Spectral rule classification",
|
| 180 |
-
"duration_sec": 0.
|
| 181 |
"status": "done",
|
| 182 |
-
"detail": "bright:
|
| 183 |
},
|
| 184 |
{
|
| 185 |
"key": "clustering",
|
| 186 |
"label": "Mel fingerprint + transient NCC clustering",
|
| 187 |
-
"duration_sec": 0.
|
| 188 |
"status": "done",
|
| 189 |
"detail": "12 clusters \u00b7 online preview"
|
| 190 |
},
|
| 191 |
{
|
| 192 |
"key": "selection",
|
| 193 |
"label": "Best representative scoring",
|
| 194 |
-
"duration_sec": 0.
|
| 195 |
"status": "done",
|
| 196 |
"detail": "quality-scored representatives"
|
| 197 |
},
|
| 198 |
{
|
| 199 |
"key": "synthesis",
|
| 200 |
"label": "Optional sample synthesis",
|
| 201 |
-
"duration_sec": 0.
|
| 202 |
"status": "done",
|
| 203 |
-
"detail": "
|
| 204 |
},
|
| 205 |
{
|
| 206 |
"key": "export",
|
| 207 |
-
"label": "MIDI,
|
| 208 |
-
"duration_sec": 0.
|
| 209 |
"status": "done",
|
| 210 |
-
"detail": "12 samples + 29 review hits + MIDI + ZIP"
|
| 211 |
}
|
| 212 |
]
|
| 213 |
}
|
|
@@ -215,59 +215,59 @@
|
|
| 215 |
"summary": [
|
| 216 |
{
|
| 217 |
"stage": "stem",
|
| 218 |
-
"mean_sec": 0.
|
| 219 |
-
"median_sec": 0.
|
| 220 |
-
"min_sec": 0.
|
| 221 |
-
"max_sec": 0.
|
| 222 |
},
|
| 223 |
{
|
| 224 |
"stage": "bpm",
|
| 225 |
-
"mean_sec": 0.
|
| 226 |
-
"median_sec": 0.
|
| 227 |
-
"min_sec": 0.
|
| 228 |
-
"max_sec": 0.
|
| 229 |
},
|
| 230 |
{
|
| 231 |
"stage": "onsets",
|
| 232 |
-
"mean_sec":
|
| 233 |
-
"median_sec":
|
| 234 |
-
"min_sec": 1.
|
| 235 |
-
"max_sec": 2.
|
| 236 |
},
|
| 237 |
{
|
| 238 |
"stage": "classification",
|
| 239 |
-
"mean_sec": 0.
|
| 240 |
-
"median_sec": 0.
|
| 241 |
-
"min_sec": 0.
|
| 242 |
-
"max_sec": 0.
|
| 243 |
},
|
| 244 |
{
|
| 245 |
"stage": "clustering",
|
| 246 |
-
"mean_sec": 0.
|
| 247 |
-
"median_sec": 0.
|
| 248 |
-
"min_sec": 0.
|
| 249 |
-
"max_sec": 0.
|
| 250 |
},
|
| 251 |
{
|
| 252 |
"stage": "selection",
|
| 253 |
-
"mean_sec": 0.
|
| 254 |
-
"median_sec": 0.
|
| 255 |
-
"min_sec": 0.
|
| 256 |
-
"max_sec": 0.
|
| 257 |
},
|
| 258 |
{
|
| 259 |
"stage": "synthesis",
|
| 260 |
-
"mean_sec": 0.
|
| 261 |
-
"median_sec": 0.
|
| 262 |
-
"min_sec": 0.
|
| 263 |
-
"max_sec": 0.
|
| 264 |
},
|
| 265 |
{
|
| 266 |
"stage": "export",
|
| 267 |
-
"mean_sec": 0.
|
| 268 |
-
"median_sec": 0.
|
| 269 |
-
"min_sec": 0.
|
| 270 |
-
"max_sec": 0.
|
| 271 |
}
|
| 272 |
]
|
| 273 |
}
|
|
|
|
| 8 |
"run_index": 0,
|
| 9 |
"clustering_mode": "online_preview",
|
| 10 |
"audio_duration_sec": 4.75,
|
| 11 |
+
"total_duration_sec": 2.619948,
|
| 12 |
+
"realtime_factor": 0.551568,
|
| 13 |
+
"hit_count": 14,
|
| 14 |
+
"cluster_count": 11,
|
| 15 |
"stages": [
|
| 16 |
{
|
| 17 |
"key": "stem",
|
| 18 |
"label": "Stem extraction / source load",
|
| 19 |
+
"duration_sec": 0.025709866000397597,
|
| 20 |
"status": "done",
|
| 21 |
+
"detail": "loaded full mix \u00b7 cached \u00b7 reproduction uses full mix"
|
| 22 |
},
|
| 23 |
{
|
| 24 |
"key": "bpm",
|
| 25 |
"label": "Tempo detection",
|
| 26 |
+
"duration_sec": 0.17244149500038475,
|
| 27 |
"status": "done",
|
| 28 |
"detail": "120.2 BPM"
|
| 29 |
},
|
| 30 |
{
|
| 31 |
"key": "onsets",
|
| 32 |
"label": "Onset detection + slicing",
|
| 33 |
+
"duration_sec": 2.01123834900136,
|
| 34 |
"status": "done",
|
| 35 |
+
"detail": "14 hits"
|
| 36 |
},
|
| 37 |
{
|
| 38 |
"key": "classification",
|
| 39 |
"label": "Spectral rule classification",
|
| 40 |
+
"duration_sec": 0.015600401000483544,
|
| 41 |
"status": "done",
|
| 42 |
+
"detail": "bright:4, hihat_closed:1, hihat_open:7, kick:1, mid:1"
|
| 43 |
},
|
| 44 |
{
|
| 45 |
"key": "clustering",
|
| 46 |
"label": "Mel fingerprint + transient NCC clustering",
|
| 47 |
+
"duration_sec": 0.08166359800088685,
|
| 48 |
"status": "done",
|
| 49 |
+
"detail": "11 clusters \u00b7 online preview"
|
| 50 |
},
|
| 51 |
{
|
| 52 |
"key": "selection",
|
| 53 |
"label": "Best representative scoring",
|
| 54 |
+
"duration_sec": 0.02976710500115587,
|
| 55 |
"status": "done",
|
| 56 |
"detail": "quality-scored representatives"
|
| 57 |
},
|
| 58 |
{
|
| 59 |
"key": "synthesis",
|
| 60 |
"label": "Optional sample synthesis",
|
| 61 |
+
"duration_sec": 0.0003646059994935058,
|
| 62 |
"status": "done",
|
| 63 |
"detail": "2 synthesized alternates"
|
| 64 |
},
|
| 65 |
{
|
| 66 |
"key": "export",
|
| 67 |
+
"label": "MIDI, reproduced mix, WAV, ZIP export",
|
| 68 |
+
"duration_sec": 0.28268050299993774,
|
| 69 |
"status": "done",
|
| 70 |
+
"detail": "11 samples + 14 review hits + MIDI + full-context reproduction + ZIP"
|
| 71 |
}
|
| 72 |
]
|
| 73 |
},
|
|
|
|
| 78 |
"run_index": 0,
|
| 79 |
"clustering_mode": "online_preview",
|
| 80 |
"audio_duration_sec": 4.874989,
|
| 81 |
+
"total_duration_sec": 3.096466,
|
| 82 |
+
"realtime_factor": 0.635174,
|
| 83 |
+
"hit_count": 29,
|
| 84 |
"cluster_count": 12,
|
| 85 |
"stages": [
|
| 86 |
{
|
| 87 |
"key": "stem",
|
| 88 |
"label": "Stem extraction / source load",
|
| 89 |
+
"duration_sec": 0.02813513600085571,
|
| 90 |
"status": "done",
|
| 91 |
+
"detail": "loaded full mix \u00b7 cached \u00b7 reproduction uses full mix"
|
| 92 |
},
|
| 93 |
{
|
| 94 |
"key": "bpm",
|
| 95 |
"label": "Tempo detection",
|
| 96 |
+
"duration_sec": 0.0898798819998774,
|
| 97 |
"status": "done",
|
| 98 |
"detail": "161.5 BPM"
|
| 99 |
},
|
| 100 |
{
|
| 101 |
"key": "onsets",
|
| 102 |
"label": "Onset detection + slicing",
|
| 103 |
+
"duration_sec": 2.39328171499983,
|
| 104 |
"status": "done",
|
| 105 |
+
"detail": "29 hits"
|
| 106 |
},
|
| 107 |
{
|
| 108 |
"key": "classification",
|
| 109 |
"label": "Spectral rule classification",
|
| 110 |
+
"duration_sec": 0.01869549100047152,
|
| 111 |
"status": "done",
|
| 112 |
+
"detail": "bright:12, cymbal:1, hihat_closed:9, hihat_open:3, mid:4"
|
| 113 |
},
|
| 114 |
{
|
| 115 |
"key": "clustering",
|
| 116 |
"label": "Mel fingerprint + transient NCC clustering",
|
| 117 |
+
"duration_sec": 0.03839712700028031,
|
| 118 |
"status": "done",
|
| 119 |
"detail": "12 clusters \u00b7 online preview"
|
| 120 |
},
|
| 121 |
{
|
| 122 |
"key": "selection",
|
| 123 |
"label": "Best representative scoring",
|
| 124 |
+
"duration_sec": 0.2286190050017467,
|
| 125 |
"status": "done",
|
| 126 |
"detail": "quality-scored representatives"
|
| 127 |
},
|
| 128 |
{
|
| 129 |
"key": "synthesis",
|
| 130 |
"label": "Optional sample synthesis",
|
| 131 |
+
"duration_sec": 0.0011234290013817372,
|
| 132 |
"status": "done",
|
| 133 |
"detail": "5 synthesized alternates"
|
| 134 |
},
|
| 135 |
{
|
| 136 |
"key": "export",
|
| 137 |
+
"label": "MIDI, reproduced mix, WAV, ZIP export",
|
| 138 |
+
"duration_sec": 0.29779865499949665,
|
| 139 |
"status": "done",
|
| 140 |
+
"detail": "12 samples + 29 review hits + MIDI + full-context reproduction + ZIP"
|
| 141 |
}
|
| 142 |
]
|
| 143 |
},
|
|
|
|
| 148 |
"run_index": 0,
|
| 149 |
"clustering_mode": "online_preview",
|
| 150 |
"audio_duration_sec": 4.874989,
|
| 151 |
+
"total_duration_sec": 2.58942,
|
| 152 |
+
"realtime_factor": 0.531164,
|
| 153 |
"hit_count": 29,
|
| 154 |
"cluster_count": 12,
|
| 155 |
"stages": [
|
| 156 |
{
|
| 157 |
"key": "stem",
|
| 158 |
"label": "Stem extraction / source load",
|
| 159 |
+
"duration_sec": 0.0707627699994191,
|
| 160 |
"status": "done",
|
| 161 |
+
"detail": "loaded full mix \u00b7 cached \u00b7 reproduction uses full mix"
|
| 162 |
},
|
| 163 |
{
|
| 164 |
"key": "bpm",
|
| 165 |
"label": "Tempo detection",
|
| 166 |
+
"duration_sec": 0.1129706889987574,
|
| 167 |
"status": "done",
|
| 168 |
"detail": "120.2 BPM"
|
| 169 |
},
|
| 170 |
{
|
| 171 |
"key": "onsets",
|
| 172 |
"label": "Onset detection + slicing",
|
| 173 |
+
"duration_sec": 1.902288953999232,
|
| 174 |
"status": "done",
|
| 175 |
"detail": "29 hits"
|
| 176 |
},
|
| 177 |
{
|
| 178 |
"key": "classification",
|
| 179 |
"label": "Spectral rule classification",
|
| 180 |
+
"duration_sec": 0.018896421999670565,
|
| 181 |
"status": "done",
|
| 182 |
+
"detail": "bright:4, hihat_closed:23, hihat_open:2"
|
| 183 |
},
|
| 184 |
{
|
| 185 |
"key": "clustering",
|
| 186 |
"label": "Mel fingerprint + transient NCC clustering",
|
| 187 |
+
"duration_sec": 0.10450396400119644,
|
| 188 |
"status": "done",
|
| 189 |
"detail": "12 clusters \u00b7 online preview"
|
| 190 |
},
|
| 191 |
{
|
| 192 |
"key": "selection",
|
| 193 |
"label": "Best representative scoring",
|
| 194 |
+
"duration_sec": 0.11157589499998721,
|
| 195 |
"status": "done",
|
| 196 |
"detail": "quality-scored representatives"
|
| 197 |
},
|
| 198 |
{
|
| 199 |
"key": "synthesis",
|
| 200 |
"label": "Optional sample synthesis",
|
| 201 |
+
"duration_sec": 0.0010697859997890191,
|
| 202 |
"status": "done",
|
| 203 |
+
"detail": "6 synthesized alternates"
|
| 204 |
},
|
| 205 |
{
|
| 206 |
"key": "export",
|
| 207 |
+
"label": "MIDI, reproduced mix, WAV, ZIP export",
|
| 208 |
+
"duration_sec": 0.2668189519990847,
|
| 209 |
"status": "done",
|
| 210 |
+
"detail": "12 samples + 29 review hits + MIDI + full-context reproduction + ZIP"
|
| 211 |
}
|
| 212 |
]
|
| 213 |
}
|
|
|
|
| 215 |
"summary": [
|
| 216 |
{
|
| 217 |
"stage": "stem",
|
| 218 |
+
"mean_sec": 0.041536,
|
| 219 |
+
"median_sec": 0.028135,
|
| 220 |
+
"min_sec": 0.02571,
|
| 221 |
+
"max_sec": 0.070763
|
| 222 |
},
|
| 223 |
{
|
| 224 |
"stage": "bpm",
|
| 225 |
+
"mean_sec": 0.125097,
|
| 226 |
+
"median_sec": 0.112971,
|
| 227 |
+
"min_sec": 0.08988,
|
| 228 |
+
"max_sec": 0.172441
|
| 229 |
},
|
| 230 |
{
|
| 231 |
"stage": "onsets",
|
| 232 |
+
"mean_sec": 2.10227,
|
| 233 |
+
"median_sec": 2.011238,
|
| 234 |
+
"min_sec": 1.902289,
|
| 235 |
+
"max_sec": 2.393282
|
| 236 |
},
|
| 237 |
{
|
| 238 |
"stage": "classification",
|
| 239 |
+
"mean_sec": 0.017731,
|
| 240 |
+
"median_sec": 0.018695,
|
| 241 |
+
"min_sec": 0.0156,
|
| 242 |
+
"max_sec": 0.018896
|
| 243 |
},
|
| 244 |
{
|
| 245 |
"stage": "clustering",
|
| 246 |
+
"mean_sec": 0.074855,
|
| 247 |
+
"median_sec": 0.081664,
|
| 248 |
+
"min_sec": 0.038397,
|
| 249 |
+
"max_sec": 0.104504
|
| 250 |
},
|
| 251 |
{
|
| 252 |
"stage": "selection",
|
| 253 |
+
"mean_sec": 0.123321,
|
| 254 |
+
"median_sec": 0.111576,
|
| 255 |
+
"min_sec": 0.029767,
|
| 256 |
+
"max_sec": 0.228619
|
| 257 |
},
|
| 258 |
{
|
| 259 |
"stage": "synthesis",
|
| 260 |
+
"mean_sec": 0.000853,
|
| 261 |
+
"median_sec": 0.00107,
|
| 262 |
+
"min_sec": 0.000365,
|
| 263 |
+
"max_sec": 0.001123
|
| 264 |
},
|
| 265 |
{
|
| 266 |
"stage": "export",
|
| 267 |
+
"mean_sec": 0.282433,
|
| 268 |
+
"median_sec": 0.282681,
|
| 269 |
+
"min_sec": 0.266819,
|
| 270 |
+
"max_sec": 0.297799
|
| 271 |
}
|
| 272 |
]
|
| 273 |
}
|
docs/benchmark-subprocesses.json
CHANGED
|
@@ -8,66 +8,66 @@
|
|
| 8 |
"run_index": 0,
|
| 9 |
"clustering_mode": "batch_quality",
|
| 10 |
"audio_duration_sec": 4.75,
|
| 11 |
-
"total_duration_sec": 2.
|
| 12 |
-
"realtime_factor": 0.
|
| 13 |
-
"hit_count":
|
| 14 |
"cluster_count": 7,
|
| 15 |
"stages": [
|
| 16 |
{
|
| 17 |
"key": "stem",
|
| 18 |
"label": "Stem extraction / source load",
|
| 19 |
-
"duration_sec": 0.
|
| 20 |
"status": "done",
|
| 21 |
-
"detail": "loaded full mix \u00b7 cached"
|
| 22 |
},
|
| 23 |
{
|
| 24 |
"key": "bpm",
|
| 25 |
"label": "Tempo detection",
|
| 26 |
-
"duration_sec": 0.
|
| 27 |
"status": "done",
|
| 28 |
"detail": "120.2 BPM"
|
| 29 |
},
|
| 30 |
{
|
| 31 |
"key": "onsets",
|
| 32 |
"label": "Onset detection + slicing",
|
| 33 |
-
"duration_sec":
|
| 34 |
"status": "done",
|
| 35 |
-
"detail": "
|
| 36 |
},
|
| 37 |
{
|
| 38 |
"key": "classification",
|
| 39 |
"label": "Spectral rule classification",
|
| 40 |
-
"duration_sec": 0.
|
| 41 |
"status": "done",
|
| 42 |
-
"detail": "bright:5,
|
| 43 |
},
|
| 44 |
{
|
| 45 |
"key": "clustering",
|
| 46 |
"label": "Mel fingerprint + transient NCC clustering",
|
| 47 |
-
"duration_sec": 0.
|
| 48 |
"status": "done",
|
| 49 |
"detail": "7 clusters \u00b7 batch quality"
|
| 50 |
},
|
| 51 |
{
|
| 52 |
"key": "selection",
|
| 53 |
"label": "Best representative scoring",
|
| 54 |
-
"duration_sec": 0.
|
| 55 |
"status": "done",
|
| 56 |
"detail": "quality-scored representatives"
|
| 57 |
},
|
| 58 |
{
|
| 59 |
"key": "synthesis",
|
| 60 |
"label": "Optional sample synthesis",
|
| 61 |
-
"duration_sec": 0.
|
| 62 |
"status": "done",
|
| 63 |
"detail": "2 synthesized alternates"
|
| 64 |
},
|
| 65 |
{
|
| 66 |
"key": "export",
|
| 67 |
-
"label": "MIDI,
|
| 68 |
-
"duration_sec": 0.
|
| 69 |
"status": "done",
|
| 70 |
-
"detail": "7 samples +
|
| 71 |
}
|
| 72 |
]
|
| 73 |
},
|
|
@@ -78,66 +78,66 @@
|
|
| 78 |
"run_index": 0,
|
| 79 |
"clustering_mode": "batch_quality",
|
| 80 |
"audio_duration_sec": 4.874989,
|
| 81 |
-
"total_duration_sec": 2.
|
| 82 |
-
"realtime_factor": 0.
|
| 83 |
-
"hit_count":
|
| 84 |
-
"cluster_count":
|
| 85 |
"stages": [
|
| 86 |
{
|
| 87 |
"key": "stem",
|
| 88 |
"label": "Stem extraction / source load",
|
| 89 |
-
"duration_sec": 0.
|
| 90 |
"status": "done",
|
| 91 |
-
"detail": "loaded full mix \u00b7 cached"
|
| 92 |
},
|
| 93 |
{
|
| 94 |
"key": "bpm",
|
| 95 |
"label": "Tempo detection",
|
| 96 |
-
"duration_sec": 0.
|
| 97 |
"status": "done",
|
| 98 |
"detail": "161.5 BPM"
|
| 99 |
},
|
| 100 |
{
|
| 101 |
"key": "onsets",
|
| 102 |
"label": "Onset detection + slicing",
|
| 103 |
-
"duration_sec":
|
| 104 |
"status": "done",
|
| 105 |
-
"detail": "
|
| 106 |
},
|
| 107 |
{
|
| 108 |
"key": "classification",
|
| 109 |
"label": "Spectral rule classification",
|
| 110 |
-
"duration_sec": 0.
|
| 111 |
"status": "done",
|
| 112 |
-
"detail": "bright:
|
| 113 |
},
|
| 114 |
{
|
| 115 |
"key": "clustering",
|
| 116 |
"label": "Mel fingerprint + transient NCC clustering",
|
| 117 |
-
"duration_sec": 0.
|
| 118 |
"status": "done",
|
| 119 |
-
"detail": "
|
| 120 |
},
|
| 121 |
{
|
| 122 |
"key": "selection",
|
| 123 |
"label": "Best representative scoring",
|
| 124 |
-
"duration_sec": 0.
|
| 125 |
"status": "done",
|
| 126 |
"detail": "quality-scored representatives"
|
| 127 |
},
|
| 128 |
{
|
| 129 |
"key": "synthesis",
|
| 130 |
"label": "Optional sample synthesis",
|
| 131 |
-
"duration_sec": 0.
|
| 132 |
"status": "done",
|
| 133 |
-
"detail": "
|
| 134 |
},
|
| 135 |
{
|
| 136 |
"key": "export",
|
| 137 |
-
"label": "MIDI,
|
| 138 |
-
"duration_sec": 0.
|
| 139 |
"status": "done",
|
| 140 |
-
"detail": "
|
| 141 |
}
|
| 142 |
]
|
| 143 |
},
|
|
@@ -148,66 +148,66 @@
|
|
| 148 |
"run_index": 0,
|
| 149 |
"clustering_mode": "batch_quality",
|
| 150 |
"audio_duration_sec": 4.874989,
|
| 151 |
-
"total_duration_sec": 2.
|
| 152 |
-
"realtime_factor": 0.
|
| 153 |
-
"hit_count":
|
| 154 |
-
"cluster_count":
|
| 155 |
"stages": [
|
| 156 |
{
|
| 157 |
"key": "stem",
|
| 158 |
"label": "Stem extraction / source load",
|
| 159 |
-
"duration_sec": 0.
|
| 160 |
"status": "done",
|
| 161 |
-
"detail": "loaded full mix \u00b7 cached"
|
| 162 |
},
|
| 163 |
{
|
| 164 |
"key": "bpm",
|
| 165 |
"label": "Tempo detection",
|
| 166 |
-
"duration_sec": 0.
|
| 167 |
"status": "done",
|
| 168 |
"detail": "120.2 BPM"
|
| 169 |
},
|
| 170 |
{
|
| 171 |
"key": "onsets",
|
| 172 |
"label": "Onset detection + slicing",
|
| 173 |
-
"duration_sec":
|
| 174 |
"status": "done",
|
| 175 |
-
"detail": "
|
| 176 |
},
|
| 177 |
{
|
| 178 |
"key": "classification",
|
| 179 |
"label": "Spectral rule classification",
|
| 180 |
-
"duration_sec": 0.
|
| 181 |
"status": "done",
|
| 182 |
-
"detail": "bright:
|
| 183 |
},
|
| 184 |
{
|
| 185 |
"key": "clustering",
|
| 186 |
"label": "Mel fingerprint + transient NCC clustering",
|
| 187 |
-
"duration_sec": 0.
|
| 188 |
"status": "done",
|
| 189 |
-
"detail": "
|
| 190 |
},
|
| 191 |
{
|
| 192 |
"key": "selection",
|
| 193 |
"label": "Best representative scoring",
|
| 194 |
-
"duration_sec": 0.
|
| 195 |
"status": "done",
|
| 196 |
"detail": "quality-scored representatives"
|
| 197 |
},
|
| 198 |
{
|
| 199 |
"key": "synthesis",
|
| 200 |
"label": "Optional sample synthesis",
|
| 201 |
-
"duration_sec": 0.
|
| 202 |
"status": "done",
|
| 203 |
-
"detail": "
|
| 204 |
},
|
| 205 |
{
|
| 206 |
"key": "export",
|
| 207 |
-
"label": "MIDI,
|
| 208 |
-
"duration_sec": 0.
|
| 209 |
"status": "done",
|
| 210 |
-
"detail": "
|
| 211 |
}
|
| 212 |
]
|
| 213 |
}
|
|
@@ -215,59 +215,59 @@
|
|
| 215 |
"summary": [
|
| 216 |
{
|
| 217 |
"stage": "stem",
|
| 218 |
-
"mean_sec": 0.
|
| 219 |
-
"median_sec": 0.
|
| 220 |
-
"min_sec": 0.
|
| 221 |
-
"max_sec": 0.
|
| 222 |
},
|
| 223 |
{
|
| 224 |
"stage": "bpm",
|
| 225 |
-
"mean_sec": 0.
|
| 226 |
-
"median_sec": 0.
|
| 227 |
-
"min_sec": 0.
|
| 228 |
-
"max_sec": 0.
|
| 229 |
},
|
| 230 |
{
|
| 231 |
"stage": "onsets",
|
| 232 |
-
"mean_sec":
|
| 233 |
-
"median_sec":
|
| 234 |
-
"min_sec": 1.
|
| 235 |
-
"max_sec": 2.
|
| 236 |
},
|
| 237 |
{
|
| 238 |
"stage": "classification",
|
| 239 |
-
"mean_sec": 0.
|
| 240 |
-
"median_sec": 0.
|
| 241 |
-
"min_sec": 0.
|
| 242 |
-
"max_sec": 0.
|
| 243 |
},
|
| 244 |
{
|
| 245 |
"stage": "clustering",
|
| 246 |
-
"mean_sec": 0.
|
| 247 |
-
"median_sec": 0.
|
| 248 |
-
"min_sec": 0.
|
| 249 |
-
"max_sec": 0.
|
| 250 |
},
|
| 251 |
{
|
| 252 |
"stage": "selection",
|
| 253 |
-
"mean_sec": 0.
|
| 254 |
-
"median_sec": 0.
|
| 255 |
-
"min_sec": 0.
|
| 256 |
-
"max_sec": 0.
|
| 257 |
},
|
| 258 |
{
|
| 259 |
"stage": "synthesis",
|
| 260 |
-
"mean_sec": 0.
|
| 261 |
-
"median_sec": 0.
|
| 262 |
-
"min_sec": 0.
|
| 263 |
-
"max_sec": 0.
|
| 264 |
},
|
| 265 |
{
|
| 266 |
"stage": "export",
|
| 267 |
-
"mean_sec": 0.
|
| 268 |
-
"median_sec": 0.
|
| 269 |
-
"min_sec": 0.
|
| 270 |
-
"max_sec": 0.
|
| 271 |
}
|
| 272 |
]
|
| 273 |
}
|
|
|
|
| 8 |
"run_index": 0,
|
| 9 |
"clustering_mode": "batch_quality",
|
| 10 |
"audio_duration_sec": 4.75,
|
| 11 |
+
"total_duration_sec": 2.994395,
|
| 12 |
+
"realtime_factor": 0.630399,
|
| 13 |
+
"hit_count": 12,
|
| 14 |
"cluster_count": 7,
|
| 15 |
"stages": [
|
| 16 |
{
|
| 17 |
"key": "stem",
|
| 18 |
"label": "Stem extraction / source load",
|
| 19 |
+
"duration_sec": 0.02592192699921725,
|
| 20 |
"status": "done",
|
| 21 |
+
"detail": "loaded full mix \u00b7 cached \u00b7 reproduction uses full mix"
|
| 22 |
},
|
| 23 |
{
|
| 24 |
"key": "bpm",
|
| 25 |
"label": "Tempo detection",
|
| 26 |
+
"duration_sec": 0.1751658609991864,
|
| 27 |
"status": "done",
|
| 28 |
"detail": "120.2 BPM"
|
| 29 |
},
|
| 30 |
{
|
| 31 |
"key": "onsets",
|
| 32 |
"label": "Onset detection + slicing",
|
| 33 |
+
"duration_sec": 2.1905335589999595,
|
| 34 |
"status": "done",
|
| 35 |
+
"detail": "12 hits"
|
| 36 |
},
|
| 37 |
{
|
| 38 |
"key": "classification",
|
| 39 |
"label": "Spectral rule classification",
|
| 40 |
+
"duration_sec": 0.09557517999928677,
|
| 41 |
"status": "done",
|
| 42 |
+
"detail": "bright:5, hihat_open:6, kick:1"
|
| 43 |
},
|
| 44 |
{
|
| 45 |
"key": "clustering",
|
| 46 |
"label": "Mel fingerprint + transient NCC clustering",
|
| 47 |
+
"duration_sec": 0.014000580998981604,
|
| 48 |
"status": "done",
|
| 49 |
"detail": "7 clusters \u00b7 batch quality"
|
| 50 |
},
|
| 51 |
{
|
| 52 |
"key": "selection",
|
| 53 |
"label": "Best representative scoring",
|
| 54 |
+
"duration_sec": 0.08321280500058492,
|
| 55 |
"status": "done",
|
| 56 |
"detail": "quality-scored representatives"
|
| 57 |
},
|
| 58 |
{
|
| 59 |
"key": "synthesis",
|
| 60 |
"label": "Optional sample synthesis",
|
| 61 |
+
"duration_sec": 0.0006027010003890609,
|
| 62 |
"status": "done",
|
| 63 |
"detail": "2 synthesized alternates"
|
| 64 |
},
|
| 65 |
{
|
| 66 |
"key": "export",
|
| 67 |
+
"label": "MIDI, reproduced mix, WAV, ZIP export",
|
| 68 |
+
"duration_sec": 0.40873212500082445,
|
| 69 |
"status": "done",
|
| 70 |
+
"detail": "7 samples + 12 review hits + MIDI + full-context reproduction + ZIP"
|
| 71 |
}
|
| 72 |
]
|
| 73 |
},
|
|
|
|
| 78 |
"run_index": 0,
|
| 79 |
"clustering_mode": "batch_quality",
|
| 80 |
"audio_duration_sec": 4.874989,
|
| 81 |
+
"total_duration_sec": 2.802354,
|
| 82 |
+
"realtime_factor": 0.574843,
|
| 83 |
+
"hit_count": 23,
|
| 84 |
+
"cluster_count": 2,
|
| 85 |
"stages": [
|
| 86 |
{
|
| 87 |
"key": "stem",
|
| 88 |
"label": "Stem extraction / source load",
|
| 89 |
+
"duration_sec": 0.02253025699974387,
|
| 90 |
"status": "done",
|
| 91 |
+
"detail": "loaded full mix \u00b7 cached \u00b7 reproduction uses full mix"
|
| 92 |
},
|
| 93 |
{
|
| 94 |
"key": "bpm",
|
| 95 |
"label": "Tempo detection",
|
| 96 |
+
"duration_sec": 0.09380031600085204,
|
| 97 |
"status": "done",
|
| 98 |
"detail": "161.5 BPM"
|
| 99 |
},
|
| 100 |
{
|
| 101 |
"key": "onsets",
|
| 102 |
"label": "Onset detection + slicing",
|
| 103 |
+
"duration_sec": 2.1897132599988254,
|
| 104 |
"status": "done",
|
| 105 |
+
"detail": "23 hits"
|
| 106 |
},
|
| 107 |
{
|
| 108 |
"key": "classification",
|
| 109 |
"label": "Spectral rule classification",
|
| 110 |
+
"duration_sec": 0.017409414000212564,
|
| 111 |
"status": "done",
|
| 112 |
+
"detail": "bright:13, hihat_closed:4, hihat_open:4, kick:1, mid:1"
|
| 113 |
},
|
| 114 |
{
|
| 115 |
"key": "clustering",
|
| 116 |
"label": "Mel fingerprint + transient NCC clustering",
|
| 117 |
+
"duration_sec": 0.03413462400021672,
|
| 118 |
"status": "done",
|
| 119 |
+
"detail": "2 clusters \u00b7 batch quality"
|
| 120 |
},
|
| 121 |
{
|
| 122 |
"key": "selection",
|
| 123 |
"label": "Best representative scoring",
|
| 124 |
+
"duration_sec": 0.26413379800033,
|
| 125 |
"status": "done",
|
| 126 |
"detail": "quality-scored representatives"
|
| 127 |
},
|
| 128 |
{
|
| 129 |
"key": "synthesis",
|
| 130 |
"label": "Optional sample synthesis",
|
| 131 |
+
"duration_sec": 0.0011682919994200347,
|
| 132 |
"status": "done",
|
| 133 |
+
"detail": "2 synthesized alternates"
|
| 134 |
},
|
| 135 |
{
|
| 136 |
"key": "export",
|
| 137 |
+
"label": "MIDI, reproduced mix, WAV, ZIP export",
|
| 138 |
+
"duration_sec": 0.17886992200146778,
|
| 139 |
"status": "done",
|
| 140 |
+
"detail": "2 samples + 23 review hits + MIDI + full-context reproduction + ZIP"
|
| 141 |
}
|
| 142 |
]
|
| 143 |
},
|
|
|
|
| 148 |
"run_index": 0,
|
| 149 |
"clustering_mode": "batch_quality",
|
| 150 |
"audio_duration_sec": 4.874989,
|
| 151 |
+
"total_duration_sec": 2.514399,
|
| 152 |
+
"realtime_factor": 0.515775,
|
| 153 |
+
"hit_count": 31,
|
| 154 |
+
"cluster_count": 3,
|
| 155 |
"stages": [
|
| 156 |
{
|
| 157 |
"key": "stem",
|
| 158 |
"label": "Stem extraction / source load",
|
| 159 |
+
"duration_sec": 0.0255989449997287,
|
| 160 |
"status": "done",
|
| 161 |
+
"detail": "loaded full mix \u00b7 cached \u00b7 reproduction uses full mix"
|
| 162 |
},
|
| 163 |
{
|
| 164 |
"key": "bpm",
|
| 165 |
"label": "Tempo detection",
|
| 166 |
+
"duration_sec": 0.08472461699966516,
|
| 167 |
"status": "done",
|
| 168 |
"detail": "120.2 BPM"
|
| 169 |
},
|
| 170 |
{
|
| 171 |
"key": "onsets",
|
| 172 |
"label": "Onset detection + slicing",
|
| 173 |
+
"duration_sec": 1.905502139001328,
|
| 174 |
"status": "done",
|
| 175 |
+
"detail": "31 hits"
|
| 176 |
},
|
| 177 |
{
|
| 178 |
"key": "classification",
|
| 179 |
"label": "Spectral rule classification",
|
| 180 |
+
"duration_sec": 0.018339307000132976,
|
| 181 |
"status": "done",
|
| 182 |
+
"detail": "bright:4, hihat_closed:25, hihat_open:2"
|
| 183 |
},
|
| 184 |
{
|
| 185 |
"key": "clustering",
|
| 186 |
"label": "Mel fingerprint + transient NCC clustering",
|
| 187 |
+
"duration_sec": 0.05202830600137531,
|
| 188 |
"status": "done",
|
| 189 |
+
"detail": "3 clusters \u00b7 batch quality"
|
| 190 |
},
|
| 191 |
{
|
| 192 |
"key": "selection",
|
| 193 |
"label": "Best representative scoring",
|
| 194 |
+
"duration_sec": 0.24863046999962535,
|
| 195 |
"status": "done",
|
| 196 |
"detail": "quality-scored representatives"
|
| 197 |
},
|
| 198 |
{
|
| 199 |
"key": "synthesis",
|
| 200 |
"label": "Optional sample synthesis",
|
| 201 |
+
"duration_sec": 0.0012351730001682881,
|
| 202 |
"status": "done",
|
| 203 |
+
"detail": "3 synthesized alternates"
|
| 204 |
},
|
| 205 |
{
|
| 206 |
"key": "export",
|
| 207 |
+
"label": "MIDI, reproduced mix, WAV, ZIP export",
|
| 208 |
+
"duration_sec": 0.17784613499861734,
|
| 209 |
"status": "done",
|
| 210 |
+
"detail": "3 samples + 31 review hits + MIDI + full-context reproduction + ZIP"
|
| 211 |
}
|
| 212 |
]
|
| 213 |
}
|
|
|
|
| 215 |
"summary": [
|
| 216 |
{
|
| 217 |
"stage": "stem",
|
| 218 |
+
"mean_sec": 0.024684,
|
| 219 |
+
"median_sec": 0.025599,
|
| 220 |
+
"min_sec": 0.02253,
|
| 221 |
+
"max_sec": 0.025922
|
| 222 |
},
|
| 223 |
{
|
| 224 |
"stage": "bpm",
|
| 225 |
+
"mean_sec": 0.117897,
|
| 226 |
+
"median_sec": 0.0938,
|
| 227 |
+
"min_sec": 0.084725,
|
| 228 |
+
"max_sec": 0.175166
|
| 229 |
},
|
| 230 |
{
|
| 231 |
"stage": "onsets",
|
| 232 |
+
"mean_sec": 2.09525,
|
| 233 |
+
"median_sec": 2.189713,
|
| 234 |
+
"min_sec": 1.905502,
|
| 235 |
+
"max_sec": 2.190534
|
| 236 |
},
|
| 237 |
{
|
| 238 |
"stage": "classification",
|
| 239 |
+
"mean_sec": 0.043775,
|
| 240 |
+
"median_sec": 0.018339,
|
| 241 |
+
"min_sec": 0.017409,
|
| 242 |
+
"max_sec": 0.095575
|
| 243 |
},
|
| 244 |
{
|
| 245 |
"stage": "clustering",
|
| 246 |
+
"mean_sec": 0.033388,
|
| 247 |
+
"median_sec": 0.034135,
|
| 248 |
+
"min_sec": 0.014001,
|
| 249 |
+
"max_sec": 0.052028
|
| 250 |
},
|
| 251 |
{
|
| 252 |
"stage": "selection",
|
| 253 |
+
"mean_sec": 0.198659,
|
| 254 |
+
"median_sec": 0.24863,
|
| 255 |
+
"min_sec": 0.083213,
|
| 256 |
+
"max_sec": 0.264134
|
| 257 |
},
|
| 258 |
{
|
| 259 |
"stage": "synthesis",
|
| 260 |
+
"mean_sec": 0.001002,
|
| 261 |
+
"median_sec": 0.001168,
|
| 262 |
+
"min_sec": 0.000603,
|
| 263 |
+
"max_sec": 0.001235
|
| 264 |
},
|
| 265 |
{
|
| 266 |
"stage": "export",
|
| 267 |
+
"mean_sec": 0.255149,
|
| 268 |
+
"median_sec": 0.17887,
|
| 269 |
+
"min_sec": 0.177846,
|
| 270 |
+
"max_sec": 0.408732
|
| 271 |
}
|
| 272 |
]
|
| 273 |
}
|
docs/interactive-ux/README.md
CHANGED
|
@@ -12,7 +12,7 @@ The project now has a first supervised-editing foundation layered on top of the
|
|
| 12 |
- The state contains hits, clusters, confidence scores, review queue entries, constraints, events, suggestions, and undo snapshots.
|
| 13 |
- The FastAPI backend exposes state, move, pull-out, lock, suppress, review/favorite, suggestion, explanation, and undo endpoints.
|
| 14 |
- The browser UI includes an interactive supervision panel with a review queue, cluster board, suggestion inbox, constraint/event log, and cluster explanation drawer.
|
| 15 |
-
- The
|
| 16 |
|
| 17 |
## Documents
|
| 18 |
|
|
|
|
| 12 |
- The state contains hits, clusters, confidence scores, review queue entries, constraints, events, suggestions, and undo snapshots.
|
| 13 |
- The FastAPI backend exposes state, move, pull-out, lock, suppress, review/favorite, suggestion, explanation, and undo endpoints.
|
| 14 |
- The browser UI includes an interactive supervision panel with a review queue, cluster board, suggestion inbox, constraint/event log, and cluster explanation drawer.
|
| 15 |
+
- The supervised layer now rewrites edited sample WAVs, MIDI, target reconstruction, full-context reproduced audio, and ZIP artifacts under `supervised/` while preserving original batch outputs.
|
| 16 |
|
| 17 |
## Documents
|
| 18 |
|
pipeline_runner.py
CHANGED
|
@@ -149,7 +149,7 @@ STAGE_DEFS = [
|
|
| 149 |
("clustering", "Mel fingerprint + transient NCC clustering"),
|
| 150 |
("selection", "Best representative scoring"),
|
| 151 |
("synthesis", "Optional sample synthesis"),
|
| 152 |
-
("export", "MIDI,
|
| 153 |
]
|
| 154 |
|
| 155 |
|
|
@@ -192,6 +192,62 @@ def _normalise_audio(audio: np.ndarray) -> np.ndarray:
|
|
| 192 |
return audio.astype(np.float32)
|
| 193 |
|
| 194 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 195 |
MODULE_ROOT = Path(__file__).resolve().parent
|
| 196 |
CACHE_DIR = Path(os.environ["DSE_CACHE_DIR"]) if os.environ.get("DSE_CACHE_DIR") else MODULE_ROOT / ".cache"
|
| 197 |
STEM_CACHE_DIR = CACHE_DIR / "stems"
|
|
@@ -313,20 +369,32 @@ def run_extraction_pipeline(
|
|
| 313 |
|
| 314 |
bpm: float | None = None
|
| 315 |
stem_audio: np.ndarray
|
|
|
|
|
|
|
| 316 |
stem_sr: int
|
| 317 |
hits: list[Any] = []
|
| 318 |
clusters: list[Any] = []
|
| 319 |
-
|
|
|
|
| 320 |
|
| 321 |
_notify(progress_cb, {"type": "start", "stages": [asdict(s) for s in stages]})
|
| 322 |
|
| 323 |
with _timed_stage(stages, "stem", progress_cb) as stage:
|
| 324 |
-
|
| 325 |
-
|
| 326 |
-
|
| 327 |
-
|
| 328 |
-
|
| 329 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 330 |
|
| 331 |
with _timed_stage(stages, "bpm", progress_cb) as stage:
|
| 332 |
bpm = detect_bpm(stem_audio, stem_sr)
|
|
@@ -411,7 +479,7 @@ def run_extraction_pipeline(
|
|
| 411 |
|
| 412 |
sample_rows: list[dict[str, Any]] = []
|
| 413 |
hit_rows: list[dict[str, Any]] = []
|
| 414 |
-
files: dict[str, str] = {"stem": "stem.wav"}
|
| 415 |
|
| 416 |
with _timed_stage(stages, "export", progress_cb) as stage:
|
| 417 |
midi_path = out / "reconstruction.mid"
|
|
@@ -423,12 +491,16 @@ def run_extraction_pipeline(
|
|
| 423 |
quantize=bool(params.quantize_midi),
|
| 424 |
subdivision=int(params.subdivision),
|
| 425 |
)
|
| 426 |
-
|
|
|
|
| 427 |
else:
|
| 428 |
-
|
| 429 |
midi_path.write_bytes(b"")
|
| 430 |
|
| 431 |
-
|
|
|
|
|
|
|
|
|
|
| 432 |
files["reconstruction"] = "reconstruction.wav"
|
| 433 |
files["midi"] = "reconstruction.mid"
|
| 434 |
|
|
@@ -481,14 +553,21 @@ def run_extraction_pipeline(
|
|
| 481 |
synth_path = samples_dir / f"{cluster.label}__synth.wav"
|
| 482 |
_write_audio(synth_path, cluster.synthesized, stem_sr)
|
| 483 |
|
| 484 |
-
archive_tmp = build_archive(
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 485 |
files["archive"] = _copy_temp_file(archive_tmp, out / "sample-pack.zip")
|
| 486 |
files["archive"] = "sample-pack.zip"
|
| 487 |
try:
|
| 488 |
os.unlink(archive_tmp)
|
| 489 |
except OSError:
|
| 490 |
pass
|
| 491 |
-
stage.detail = f"{len(sample_rows)} samples + {len(hit_rows)} review hits + MIDI + ZIP"
|
| 492 |
|
| 493 |
duration_sec = time.perf_counter() - started_total
|
| 494 |
result = PipelineResult(
|
|
|
|
| 149 |
("clustering", "Mel fingerprint + transient NCC clustering"),
|
| 150 |
("selection", "Best representative scoring"),
|
| 151 |
("synthesis", "Optional sample synthesis"),
|
| 152 |
+
("export", "MIDI, reproduced mix, WAV, ZIP export"),
|
| 153 |
]
|
| 154 |
|
| 155 |
|
|
|
|
| 192 |
return audio.astype(np.float32)
|
| 193 |
|
| 194 |
|
| 195 |
+
def _mono(audio: np.ndarray) -> np.ndarray:
|
| 196 |
+
audio = np.asarray(audio, dtype=np.float32)
|
| 197 |
+
if audio.ndim > 1:
|
| 198 |
+
audio = audio.mean(axis=1)
|
| 199 |
+
return audio.astype(np.float32)
|
| 200 |
+
|
| 201 |
+
|
| 202 |
+
def _pad_or_trim(audio: np.ndarray, length: int) -> np.ndarray:
|
| 203 |
+
audio = _mono(audio)
|
| 204 |
+
if len(audio) == length:
|
| 205 |
+
return audio
|
| 206 |
+
if len(audio) > length:
|
| 207 |
+
return audio[:length]
|
| 208 |
+
return np.pad(audio, (0, max(0, length - len(audio)))).astype(np.float32)
|
| 209 |
+
|
| 210 |
+
|
| 211 |
+
def _load_source_mix(audio_path: str | os.PathLike[str], sr: int) -> np.ndarray:
|
| 212 |
+
audio, _ = librosa.load(audio_path, sr=sr, mono=True)
|
| 213 |
+
return _mono(audio)
|
| 214 |
+
|
| 215 |
+
|
| 216 |
+
def _common_gain(reference: np.ndarray, fallback: np.ndarray) -> float:
|
| 217 |
+
peak = float(np.max(np.abs(reference))) if reference.size else 0.0
|
| 218 |
+
if peak <= 1e-8:
|
| 219 |
+
peak = float(np.max(np.abs(fallback))) if fallback.size else 0.0
|
| 220 |
+
return peak if peak > 1e-8 else 1.0
|
| 221 |
+
|
| 222 |
+
|
| 223 |
+
def _rms(audio: np.ndarray) -> float:
|
| 224 |
+
audio = _mono(audio)
|
| 225 |
+
return float(np.sqrt(np.mean(np.square(audio, dtype=np.float64)))) if audio.size else 0.0
|
| 226 |
+
|
| 227 |
+
|
| 228 |
+
def _match_rms(rendered: np.ndarray, reference: np.ndarray, *, min_gain: float = 0.05, max_gain: float = 8.0) -> np.ndarray:
|
| 229 |
+
rendered_rms = _rms(rendered)
|
| 230 |
+
reference_rms = _rms(reference)
|
| 231 |
+
if rendered_rms <= 1e-10 or reference_rms <= 1e-10:
|
| 232 |
+
return _mono(rendered)
|
| 233 |
+
gain = float(np.clip(reference_rms / rendered_rms, min_gain, max_gain))
|
| 234 |
+
return (_mono(rendered) * gain).astype(np.float32)
|
| 235 |
+
|
| 236 |
+
|
| 237 |
+
def _soft_limit(audio: np.ndarray, ceiling: float = 0.98) -> np.ndarray:
|
| 238 |
+
audio = _mono(audio).astype(np.float32)
|
| 239 |
+
peak = float(np.max(np.abs(audio))) if audio.size else 0.0
|
| 240 |
+
if peak > ceiling > 0:
|
| 241 |
+
audio = audio * (ceiling / peak)
|
| 242 |
+
return audio.astype(np.float32)
|
| 243 |
+
|
| 244 |
+
|
| 245 |
+
def _make_reproduction_mix(target_reconstruction: np.ndarray, context_bed: np.ndarray, length: int) -> np.ndarray:
|
| 246 |
+
target = _pad_or_trim(target_reconstruction, length)
|
| 247 |
+
context = _pad_or_trim(context_bed, length)
|
| 248 |
+
return _soft_limit(context + target)
|
| 249 |
+
|
| 250 |
+
|
| 251 |
MODULE_ROOT = Path(__file__).resolve().parent
|
| 252 |
CACHE_DIR = Path(os.environ["DSE_CACHE_DIR"]) if os.environ.get("DSE_CACHE_DIR") else MODULE_ROOT / ".cache"
|
| 253 |
STEM_CACHE_DIR = CACHE_DIR / "stems"
|
|
|
|
| 369 |
|
| 370 |
bpm: float | None = None
|
| 371 |
stem_audio: np.ndarray
|
| 372 |
+
source_audio: np.ndarray
|
| 373 |
+
context_bed: np.ndarray
|
| 374 |
stem_sr: int
|
| 375 |
hits: list[Any] = []
|
| 376 |
clusters: list[Any] = []
|
| 377 |
+
target_rendered: np.ndarray | None = None
|
| 378 |
+
reproduced: np.ndarray | None = None
|
| 379 |
|
| 380 |
_notify(progress_cb, {"type": "start", "stages": [asdict(s) for s in stages]})
|
| 381 |
|
| 382 |
with _timed_stage(stages, "stem", progress_cb) as stage:
|
| 383 |
+
raw_stem_audio, stem_sr, stem_detail = _load_or_extract_stem(audio_path, params)
|
| 384 |
+
source_raw = _load_source_mix(audio_path, stem_sr)
|
| 385 |
+
length = max(len(raw_stem_audio), len(source_raw))
|
| 386 |
+
raw_stem_audio = _pad_or_trim(raw_stem_audio, length)
|
| 387 |
+
source_raw = _pad_or_trim(source_raw, length)
|
| 388 |
+
gain = _common_gain(raw_stem_audio if params.stem != "all" else source_raw, source_raw)
|
| 389 |
+
stem_audio = (raw_stem_audio / gain).astype(np.float32)
|
| 390 |
+
source_audio = (source_raw / gain).astype(np.float32)
|
| 391 |
+
context_bed = np.zeros_like(source_audio) if params.stem == "all" else (source_audio - stem_audio).astype(np.float32)
|
| 392 |
+
stage.detail = stem_detail + (" · reproduction uses full mix" if params.stem == "all" else " · reproduction uses residual non-target stems")
|
| 393 |
+
_write_audio(out / "source.wav", _soft_limit(source_audio), stem_sr, subtype="PCM_16")
|
| 394 |
+
_write_audio(out / "stem.wav", _soft_limit(stem_audio), stem_sr, subtype="PCM_16")
|
| 395 |
+
_write_audio(out / "context_bed.wav", _soft_limit(context_bed), stem_sr, subtype="PCM_16")
|
| 396 |
+
|
| 397 |
+
audio_duration_sec = len(source_audio) / stem_sr if stem_sr else 0.0
|
| 398 |
|
| 399 |
with _timed_stage(stages, "bpm", progress_cb) as stage:
|
| 400 |
bpm = detect_bpm(stem_audio, stem_sr)
|
|
|
|
| 479 |
|
| 480 |
sample_rows: list[dict[str, Any]] = []
|
| 481 |
hit_rows: list[dict[str, Any]] = []
|
| 482 |
+
files: dict[str, str] = {"source": "source.wav", "stem": "stem.wav", "context_bed": "context_bed.wav"}
|
| 483 |
|
| 484 |
with _timed_stage(stages, "export", progress_cb) as stage:
|
| 485 |
midi_path = out / "reconstruction.mid"
|
|
|
|
| 491 |
quantize=bool(params.quantize_midi),
|
| 492 |
subdivision=int(params.subdivision),
|
| 493 |
)
|
| 494 |
+
target_rendered = render_midi_with_samples(clusters, sr=stem_sr)
|
| 495 |
+
target_rendered = _match_rms(target_rendered, stem_audio)
|
| 496 |
else:
|
| 497 |
+
target_rendered = np.zeros_like(stem_audio)
|
| 498 |
midi_path.write_bytes(b"")
|
| 499 |
|
| 500 |
+
reproduced = _make_reproduction_mix(target_rendered, context_bed, max(len(source_audio), len(target_rendered)))
|
| 501 |
+
_write_audio(out / "target_reconstruction.wav", _soft_limit(target_rendered), stem_sr, subtype="PCM_16")
|
| 502 |
+
_write_audio(out / "reconstruction.wav", reproduced, stem_sr, subtype="PCM_16")
|
| 503 |
+
files["target_reconstruction"] = "target_reconstruction.wav"
|
| 504 |
files["reconstruction"] = "reconstruction.wav"
|
| 505 |
files["midi"] = "reconstruction.mid"
|
| 506 |
|
|
|
|
| 553 |
synth_path = samples_dir / f"{cluster.label}__synth.wav"
|
| 554 |
_write_audio(synth_path, cluster.synthesized, stem_sr)
|
| 555 |
|
| 556 |
+
archive_tmp = build_archive(
|
| 557 |
+
clusters,
|
| 558 |
+
bpm or 120.0,
|
| 559 |
+
stem_sr,
|
| 560 |
+
midi_path=str(midi_path),
|
| 561 |
+
rendered_audio=reproduced,
|
| 562 |
+
target_rendered_audio=target_rendered,
|
| 563 |
+
)
|
| 564 |
files["archive"] = _copy_temp_file(archive_tmp, out / "sample-pack.zip")
|
| 565 |
files["archive"] = "sample-pack.zip"
|
| 566 |
try:
|
| 567 |
os.unlink(archive_tmp)
|
| 568 |
except OSError:
|
| 569 |
pass
|
| 570 |
+
stage.detail = f"{len(sample_rows)} samples + {len(hit_rows)} review hits + MIDI + full-context reproduction + ZIP"
|
| 571 |
|
| 572 |
duration_sec = time.perf_counter() - started_total
|
| 573 |
result = PipelineResult(
|
sample_extractor.py
CHANGED
|
@@ -525,9 +525,13 @@ def render_midi_with_samples(clusters,sr=44100):
|
|
| 525 |
pk=np.abs(buf).max(); return (buf/pk*0.9).astype(np.float32) if pk>1e-8 else buf.astype(np.float32)
|
| 526 |
def build_sample_map(clusters):
|
| 527 |
return {c.midi_note:{'label':c.label,'count':c.count,'duration_ms':int(c.best_hit.duration*1000)} for c in clusters}
|
| 528 |
-
def build_archive(clusters,bpm,sr,midi_path=None,rendered_audio=None):
|
| 529 |
import zipfile,tempfile,io; zp=tempfile.mktemp(suffix='.zip')
|
| 530 |
idx={'bpm':round(bpm,1),'sample_rate':sr,'total_clusters':len(clusters),'total_hits':sum(c.count for c in clusters),'samples':{}}
|
|
|
|
|
|
|
|
|
|
|
|
|
| 531 |
with zipfile.ZipFile(zp,'w',compression=zipfile.ZIP_STORED) as zf:
|
| 532 |
for c in clusters:
|
| 533 |
b=c.best_hit; fn=f"samples/{c.label}.wav"; buf=io.BytesIO()
|
|
@@ -544,7 +548,11 @@ def build_archive(clusters,bpm,sr,midi_path=None,rendered_audio=None):
|
|
| 544 |
if midi_path and os.path.exists(midi_path): zf.write(midi_path,'reconstruction.mid')
|
| 545 |
if rendered_audio is not None:
|
| 546 |
rb=io.BytesIO(); sf.write(rb,rendered_audio,sr,format='WAV',subtype='PCM_16')
|
|
|
|
| 547 |
zf.writestr('rendered_reconstruction.wav',rb.getvalue())
|
|
|
|
|
|
|
|
|
|
| 548 |
return zp
|
| 549 |
|
| 550 |
# ─── Auto-tuner with locking ─────────────────────────────────────────────────
|
|
|
|
| 525 |
pk=np.abs(buf).max(); return (buf/pk*0.9).astype(np.float32) if pk>1e-8 else buf.astype(np.float32)
|
| 526 |
def build_sample_map(clusters):
|
| 527 |
return {c.midi_note:{'label':c.label,'count':c.count,'duration_ms':int(c.best_hit.duration*1000)} for c in clusters}
|
| 528 |
+
def build_archive(clusters,bpm,sr,midi_path=None,rendered_audio=None,target_rendered_audio=None):
|
| 529 |
import zipfile,tempfile,io; zp=tempfile.mktemp(suffix='.zip')
|
| 530 |
idx={'bpm':round(bpm,1),'sample_rate':sr,'total_clusters':len(clusters),'total_hits':sum(c.count for c in clusters),'samples':{}}
|
| 531 |
+
if rendered_audio is not None:
|
| 532 |
+
idx['reproduction_file']='rendered_reproduction_full_mix.wav'
|
| 533 |
+
if target_rendered_audio is not None:
|
| 534 |
+
idx['target_reconstruction_file']='rendered_reconstruction_target_stem.wav'
|
| 535 |
with zipfile.ZipFile(zp,'w',compression=zipfile.ZIP_STORED) as zf:
|
| 536 |
for c in clusters:
|
| 537 |
b=c.best_hit; fn=f"samples/{c.label}.wav"; buf=io.BytesIO()
|
|
|
|
| 548 |
if midi_path and os.path.exists(midi_path): zf.write(midi_path,'reconstruction.mid')
|
| 549 |
if rendered_audio is not None:
|
| 550 |
rb=io.BytesIO(); sf.write(rb,rendered_audio,sr,format='WAV',subtype='PCM_16')
|
| 551 |
+
zf.writestr('rendered_reproduction_full_mix.wav',rb.getvalue())
|
| 552 |
zf.writestr('rendered_reconstruction.wav',rb.getvalue())
|
| 553 |
+
if target_rendered_audio is not None:
|
| 554 |
+
tb=io.BytesIO(); sf.write(tb,target_rendered_audio,sr,format='WAV',subtype='PCM_16')
|
| 555 |
+
zf.writestr('rendered_reconstruction_target_stem.wav',tb.getvalue())
|
| 556 |
return zp
|
| 557 |
|
| 558 |
# ─── Auto-tuner with locking ─────────────────────────────────────────────────
|
scripts/test_api_job.py
CHANGED
|
@@ -23,3 +23,5 @@ for _ in range(60):
|
|
| 23 |
print(json.dumps({'status':job['status'], 'error':job.get('error'), 'hit_count': job.get('result',{}).get('hit_count'), 'files': job.get('result',{}).get('file_urls')}, indent=2))
|
| 24 |
assert job['status']=='complete', job.get('error')
|
| 25 |
assert job['result']['hit_count'] > 0
|
|
|
|
|
|
|
|
|
| 23 |
print(json.dumps({'status':job['status'], 'error':job.get('error'), 'hit_count': job.get('result',{}).get('hit_count'), 'files': job.get('result',{}).get('file_urls')}, indent=2))
|
| 24 |
assert job['status']=='complete', job.get('error')
|
| 25 |
assert job['result']['hit_count'] > 0
|
| 26 |
+
for key in ['source', 'stem', 'context_bed', 'target_reconstruction', 'reconstruction', 'midi', 'archive']:
|
| 27 |
+
assert key in job['result']['file_urls'], key
|
scripts/test_supervised_export_and_force_onset.py
CHANGED
|
@@ -89,7 +89,7 @@ def main() -> int:
|
|
| 89 |
assert export["kind"] == "supervised-export"
|
| 90 |
assert export["hit_count"] == state["summary"]["hit_count"] - state["summary"].get("suppressed_hit_count", 0)
|
| 91 |
assert export["cluster_count"] >= 1
|
| 92 |
-
for key in ["archive", "midi", "reconstruction"]:
|
| 93 |
url = export["file_urls"][key]
|
| 94 |
file_response = client.get(url)
|
| 95 |
file_response.raise_for_status()
|
|
|
|
| 89 |
assert export["kind"] == "supervised-export"
|
| 90 |
assert export["hit_count"] == state["summary"]["hit_count"] - state["summary"].get("suppressed_hit_count", 0)
|
| 91 |
assert export["cluster_count"] >= 1
|
| 92 |
+
for key in ["archive", "midi", "target_reconstruction", "reconstruction"]:
|
| 93 |
url = export["file_urls"][key]
|
| 94 |
file_response = client.get(url)
|
| 95 |
file_response.raise_for_status()
|
supervised_export.py
CHANGED
|
@@ -123,6 +123,54 @@ def _write_audio(path: Path, audio: np.ndarray, sr: int, subtype: str = "PCM_16"
|
|
| 123 |
sf.write(path, audio, sr, subtype=subtype)
|
| 124 |
|
| 125 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 126 |
def export_supervised_state(
|
| 127 |
output_dir: str | os.PathLike[str],
|
| 128 |
job_id: str,
|
|
@@ -159,20 +207,39 @@ def export_supervised_state(
|
|
| 159 |
files: dict[str, str] = {}
|
| 160 |
samples: list[dict[str, Any]] = []
|
| 161 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 162 |
midi_path = export_dir / "reconstruction.mid"
|
| 163 |
if clusters:
|
| 164 |
export_midi(clusters, str(midi_path), bpm=bpm, quantize=quantize, subdivision=int(subdivision))
|
| 165 |
-
|
|
|
|
| 166 |
if synthesize:
|
| 167 |
for cluster in clusters:
|
| 168 |
if cluster.count >= 2:
|
| 169 |
cluster.synthesized = synthesize_from_cluster(cluster)
|
| 170 |
else:
|
| 171 |
midi_path.write_bytes(b"")
|
| 172 |
-
|
| 173 |
|
|
|
|
|
|
|
| 174 |
_write_audio(export_dir / "reconstruction.wav", rendered, sr, subtype="PCM_16")
|
| 175 |
files["midi"] = "supervised/reconstruction.mid"
|
|
|
|
| 176 |
files["reconstruction"] = "supervised/reconstruction.wav"
|
| 177 |
|
| 178 |
for cluster in sorted(clusters, key=lambda item: item.count, reverse=True):
|
|
@@ -197,7 +264,14 @@ def export_supervised_state(
|
|
| 197 |
if cluster.synthesized is not None:
|
| 198 |
_write_audio(out / f"supervised/samples/{cluster.label}__synth.wav", cluster.synthesized, sr, subtype="PCM_24")
|
| 199 |
|
| 200 |
-
archive_tmp = build_archive(
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 201 |
archive_rel = "supervised/sample-pack.zip"
|
| 202 |
shutil.copyfile(archive_tmp, out / archive_rel)
|
| 203 |
try:
|
|
|
|
| 123 |
sf.write(path, audio, sr, subtype=subtype)
|
| 124 |
|
| 125 |
|
| 126 |
+
def _mono(audio: np.ndarray) -> np.ndarray:
|
| 127 |
+
audio = np.asarray(audio, dtype=np.float32)
|
| 128 |
+
if audio.ndim > 1:
|
| 129 |
+
audio = audio.mean(axis=1)
|
| 130 |
+
return audio.astype(np.float32)
|
| 131 |
+
|
| 132 |
+
|
| 133 |
+
def _pad_or_trim(audio: np.ndarray, length: int) -> np.ndarray:
|
| 134 |
+
audio = _mono(audio)
|
| 135 |
+
if len(audio) == length:
|
| 136 |
+
return audio
|
| 137 |
+
if len(audio) > length:
|
| 138 |
+
return audio[:length]
|
| 139 |
+
return np.pad(audio, (0, max(0, length - len(audio)))).astype(np.float32)
|
| 140 |
+
|
| 141 |
+
|
| 142 |
+
def _rms(audio: np.ndarray) -> float:
|
| 143 |
+
audio = _mono(audio)
|
| 144 |
+
return float(np.sqrt(np.mean(np.square(audio, dtype=np.float64)))) if audio.size else 0.0
|
| 145 |
+
|
| 146 |
+
|
| 147 |
+
def _match_rms(rendered: np.ndarray, reference: np.ndarray, *, min_gain: float = 0.05, max_gain: float = 8.0) -> np.ndarray:
|
| 148 |
+
rendered_rms = _rms(rendered)
|
| 149 |
+
reference_rms = _rms(reference)
|
| 150 |
+
if rendered_rms <= 1e-10 or reference_rms <= 1e-10:
|
| 151 |
+
return _mono(rendered)
|
| 152 |
+
return (_mono(rendered) * float(np.clip(reference_rms / rendered_rms, min_gain, max_gain))).astype(np.float32)
|
| 153 |
+
|
| 154 |
+
|
| 155 |
+
def _soft_limit(audio: np.ndarray, ceiling: float = 0.98) -> np.ndarray:
|
| 156 |
+
audio = _mono(audio).astype(np.float32)
|
| 157 |
+
peak = float(np.max(np.abs(audio))) if audio.size else 0.0
|
| 158 |
+
if peak > ceiling > 0:
|
| 159 |
+
audio = audio * (ceiling / peak)
|
| 160 |
+
return audio.astype(np.float32)
|
| 161 |
+
|
| 162 |
+
|
| 163 |
+
def _read_optional_audio(path: Path) -> tuple[np.ndarray | None, int | None]:
|
| 164 |
+
if not path.exists():
|
| 165 |
+
return None, None
|
| 166 |
+
audio, sr = sf.read(path, dtype="float32", always_2d=False)
|
| 167 |
+
return _mono(audio), int(sr)
|
| 168 |
+
|
| 169 |
+
|
| 170 |
+
def _make_reproduction_mix(target_reconstruction: np.ndarray, context_bed: np.ndarray, length: int) -> np.ndarray:
|
| 171 |
+
return _soft_limit(_pad_or_trim(context_bed, length) + _pad_or_trim(target_reconstruction, length))
|
| 172 |
+
|
| 173 |
+
|
| 174 |
def export_supervised_state(
|
| 175 |
output_dir: str | os.PathLike[str],
|
| 176 |
job_id: str,
|
|
|
|
| 207 |
files: dict[str, str] = {}
|
| 208 |
samples: list[dict[str, Any]] = []
|
| 209 |
|
| 210 |
+
context_bed, context_sr = _read_optional_audio(out / "context_bed.wav")
|
| 211 |
+
source_audio, source_sr = _read_optional_audio(out / "source.wav")
|
| 212 |
+
stem_audio, stem_file_sr = _read_optional_audio(out / "stem.wav")
|
| 213 |
+
if context_sr:
|
| 214 |
+
sr = int(context_sr)
|
| 215 |
+
elif source_sr:
|
| 216 |
+
sr = int(source_sr)
|
| 217 |
+
elif stem_file_sr:
|
| 218 |
+
sr = int(stem_file_sr)
|
| 219 |
+
source_length = len(source_audio) if source_audio is not None else max((len(context_bed) if context_bed is not None else 0), sr)
|
| 220 |
+
if context_bed is None:
|
| 221 |
+
context_bed = np.zeros(source_length, dtype=np.float32)
|
| 222 |
+
if stem_audio is None:
|
| 223 |
+
stem_audio = np.zeros(source_length, dtype=np.float32)
|
| 224 |
+
|
| 225 |
midi_path = export_dir / "reconstruction.mid"
|
| 226 |
if clusters:
|
| 227 |
export_midi(clusters, str(midi_path), bpm=bpm, quantize=quantize, subdivision=int(subdivision))
|
| 228 |
+
target_rendered = render_midi_with_samples(clusters, sr=sr)
|
| 229 |
+
target_rendered = _match_rms(target_rendered, stem_audio)
|
| 230 |
if synthesize:
|
| 231 |
for cluster in clusters:
|
| 232 |
if cluster.count >= 2:
|
| 233 |
cluster.synthesized = synthesize_from_cluster(cluster)
|
| 234 |
else:
|
| 235 |
midi_path.write_bytes(b"")
|
| 236 |
+
target_rendered = np.zeros(source_length, dtype=np.float32)
|
| 237 |
|
| 238 |
+
rendered = _make_reproduction_mix(target_rendered, context_bed, max(source_length, len(target_rendered)))
|
| 239 |
+
_write_audio(export_dir / "target_reconstruction.wav", _soft_limit(target_rendered), sr, subtype="PCM_16")
|
| 240 |
_write_audio(export_dir / "reconstruction.wav", rendered, sr, subtype="PCM_16")
|
| 241 |
files["midi"] = "supervised/reconstruction.mid"
|
| 242 |
+
files["target_reconstruction"] = "supervised/target_reconstruction.wav"
|
| 243 |
files["reconstruction"] = "supervised/reconstruction.wav"
|
| 244 |
|
| 245 |
for cluster in sorted(clusters, key=lambda item: item.count, reverse=True):
|
|
|
|
| 264 |
if cluster.synthesized is not None:
|
| 265 |
_write_audio(out / f"supervised/samples/{cluster.label}__synth.wav", cluster.synthesized, sr, subtype="PCM_24")
|
| 266 |
|
| 267 |
+
archive_tmp = build_archive(
|
| 268 |
+
clusters,
|
| 269 |
+
bpm,
|
| 270 |
+
sr,
|
| 271 |
+
midi_path=str(midi_path),
|
| 272 |
+
rendered_audio=rendered,
|
| 273 |
+
target_rendered_audio=target_rendered,
|
| 274 |
+
)
|
| 275 |
archive_rel = "supervised/sample-pack.zip"
|
| 276 |
shutil.copyfile(archive_tmp, out / archive_rel)
|
| 277 |
try:
|
web/app.js
CHANGED
|
@@ -17,6 +17,7 @@ let activeJobId = null;
|
|
| 17 |
let selectedHitIndex = null;
|
| 18 |
let selectedSampleIndex = null;
|
| 19 |
let forceOnsetMode = false;
|
|
|
|
| 20 |
let audioContext = null;
|
| 21 |
|
| 22 |
const palette = ["#9b72ef", "#4f7df2", "#42b8b4", "#ef9343", "#ea5ca9", "#6d9be8", "#8abc59", "#805fe6"];
|
|
@@ -58,9 +59,24 @@ function fmtClock(seconds) {
|
|
| 58 |
}
|
| 59 |
|
| 60 |
function transportAudio() {
|
| 61 |
-
const
|
| 62 |
-
|
| 63 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 64 |
}
|
| 65 |
|
| 66 |
function updateTransport() {
|
|
@@ -78,9 +94,10 @@ function updateTransport() {
|
|
| 78 |
}
|
| 79 |
|
| 80 |
function pauseNonTransportAudio() {
|
| 81 |
-
|
|
|
|
| 82 |
const el = $(id);
|
| 83 |
-
if (el && !el.paused) el.pause();
|
| 84 |
}
|
| 85 |
}
|
| 86 |
|
|
@@ -683,7 +700,7 @@ async function undoLastEdit() {
|
|
| 683 |
|
| 684 |
function renderEditedExport(exportPayload) {
|
| 685 |
const fileUrls = exportPayload?.file_urls ?? {};
|
| 686 |
-
const labels = { archive: "Edited sample pack ZIP", midi: "Edited MIDI", reconstruction: "Edited reconstruction WAV" };
|
| 687 |
$("editedDownloads").innerHTML = Object.entries(fileUrls)
|
| 688 |
.map(([key, url]) => `<a href="${esc(url)}" download>${esc(labels[key] ?? key)}</a>`)
|
| 689 |
.join("");
|
|
@@ -729,11 +746,25 @@ function renderResult(job) {
|
|
| 729 |
$("resultSummary").textContent = `${result.hit_count} hits → ${result.cluster_count} samples · BPM ${result.bpm ?? "—"} · ${fmtSec(result.duration_sec)} total · ${rtf}× realtime · ${mode}`;
|
| 730 |
|
| 731 |
const fileUrls = result.file_urls ?? {};
|
| 732 |
-
const labels = {
|
| 733 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 734 |
$("stemAudio").src = fileUrls.stem ?? "";
|
| 735 |
$("reconAudio").src = fileUrls.reconstruction ?? "";
|
| 736 |
-
|
|
|
|
| 737 |
|
| 738 |
renderSamples(result);
|
| 739 |
renderHits(result);
|
|
@@ -870,8 +901,9 @@ function setFile(file) {
|
|
| 870 |
}
|
| 871 |
if (file) {
|
| 872 |
$("stemAudio").removeAttribute("src");
|
|
|
|
| 873 |
$("sourcePreview").src = URL.createObjectURL(file);
|
| 874 |
-
|
| 875 |
}
|
| 876 |
}
|
| 877 |
|
|
@@ -910,12 +942,6 @@ async function boot() {
|
|
| 910 |
$("demucs_model").addEventListener("change", updateStemOptions);
|
| 911 |
$("fileInput").addEventListener("change", (event) => setFile(event.target.files?.[0] ?? null));
|
| 912 |
$("runButton").addEventListener("click", runExtraction);
|
| 913 |
-
$("useFastButton").addEventListener("click", () => {
|
| 914 |
-
$("stem").value = "all";
|
| 915 |
-
$("demucs_shifts").value = 0;
|
| 916 |
-
$("target_min").value = 4;
|
| 917 |
-
$("target_max").value = 16;
|
| 918 |
-
});
|
| 919 |
$("usePreviewButton").addEventListener("click", () => {
|
| 920 |
$("stem").value = "all";
|
| 921 |
$("clustering_mode").value = "online_preview";
|
|
@@ -924,6 +950,18 @@ $("usePreviewButton").addEventListener("click", () => {
|
|
| 924 |
$("target_max").value = 16;
|
| 925 |
$("mel_threshold").value = 0.62;
|
| 926 |
$("ncc_threshold").value = 0.72;
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 927 |
});
|
| 928 |
|
| 929 |
for (const [id, delta] of [["clusterMinusButton", -1], ["clusterPlusButton", 1]]) {
|
|
@@ -963,7 +1001,7 @@ $("targetClusterSelect").addEventListener("change", setActionButtons);
|
|
| 963 |
$("waveform").addEventListener("click", selectNearestWaveformHit);
|
| 964 |
$("transportPlayButton").addEventListener("click", () => { toggleTransportPlayback().catch(() => {}); });
|
| 965 |
$("transportSeek").addEventListener("input", (event) => seekTransport(event.target.value));
|
| 966 |
-
for (const id of ["sourcePreview", "stemAudio"]) {
|
| 967 |
const audio = $(id);
|
| 968 |
audio.addEventListener("timeupdate", updateTransport);
|
| 969 |
audio.addEventListener("durationchange", updateTransport);
|
|
@@ -971,6 +1009,9 @@ for (const id of ["sourcePreview", "stemAudio"]) {
|
|
| 971 |
audio.addEventListener("pause", updateTransport);
|
| 972 |
audio.addEventListener("ended", updateTransport);
|
| 973 |
}
|
|
|
|
|
|
|
|
|
|
| 974 |
|
| 975 |
const dropzone = $("dropzone");
|
| 976 |
const globalDropOverlay = $("globalDropOverlay");
|
|
|
|
| 17 |
let selectedHitIndex = null;
|
| 18 |
let selectedSampleIndex = null;
|
| 19 |
let forceOnsetMode = false;
|
| 20 |
+
let previewMode = "source";
|
| 21 |
let audioContext = null;
|
| 22 |
|
| 23 |
const palette = ["#9b72ef", "#4f7df2", "#42b8b4", "#ef9343", "#ea5ca9", "#6d9be8", "#8abc59", "#805fe6"];
|
|
|
|
| 59 |
}
|
| 60 |
|
| 61 |
function transportAudio() {
|
| 62 |
+
const candidates = {
|
| 63 |
+
source: $("sourcePreview"),
|
| 64 |
+
stem: $("stemAudio"),
|
| 65 |
+
reproduction: $("reconAudio"),
|
| 66 |
+
};
|
| 67 |
+
const selected = candidates[previewMode];
|
| 68 |
+
if (selected?.src) return selected;
|
| 69 |
+
return candidates.reproduction?.src || candidates.source?.src ? (candidates.reproduction?.src ? candidates.reproduction : candidates.source) : candidates.stem;
|
| 70 |
+
}
|
| 71 |
+
|
| 72 |
+
function setPreviewMode(mode) {
|
| 73 |
+
const previous = transportAudio();
|
| 74 |
+
if (previous && !previous.paused) previous.pause();
|
| 75 |
+
previewMode = mode;
|
| 76 |
+
for (const button of document.querySelectorAll("[data-preview-mode]")) {
|
| 77 |
+
button.classList.toggle("active", button.dataset.previewMode === previewMode);
|
| 78 |
+
}
|
| 79 |
+
updateTransport();
|
| 80 |
}
|
| 81 |
|
| 82 |
function updateTransport() {
|
|
|
|
| 94 |
}
|
| 95 |
|
| 96 |
function pauseNonTransportAudio() {
|
| 97 |
+
const keep = transportAudio();
|
| 98 |
+
for (const id of ["sourcePreview", "stemAudio", "reconAudio", "hitAudio", "sampleAudio"]) {
|
| 99 |
const el = $(id);
|
| 100 |
+
if (el && el !== keep && !el.paused) el.pause();
|
| 101 |
}
|
| 102 |
}
|
| 103 |
|
|
|
|
| 700 |
|
| 701 |
function renderEditedExport(exportPayload) {
|
| 702 |
const fileUrls = exportPayload?.file_urls ?? {};
|
| 703 |
+
const labels = { archive: "Edited sample pack ZIP", midi: "Edited MIDI", reconstruction: "Edited reproduced mix WAV", target_reconstruction: "Edited target reconstruction WAV" };
|
| 704 |
$("editedDownloads").innerHTML = Object.entries(fileUrls)
|
| 705 |
.map(([key, url]) => `<a href="${esc(url)}" download>${esc(labels[key] ?? key)}</a>`)
|
| 706 |
.join("");
|
|
|
|
| 746 |
$("resultSummary").textContent = `${result.hit_count} hits → ${result.cluster_count} samples · BPM ${result.bpm ?? "—"} · ${fmtSec(result.duration_sec)} total · ${rtf}× realtime · ${mode}`;
|
| 747 |
|
| 748 |
const fileUrls = result.file_urls ?? {};
|
| 749 |
+
const labels = {
|
| 750 |
+
archive: "Sample pack ZIP",
|
| 751 |
+
midi: "MIDI",
|
| 752 |
+
source: "Source mix WAV",
|
| 753 |
+
stem: "Target stem WAV",
|
| 754 |
+
context_bed: "Non-target stems WAV",
|
| 755 |
+
target_reconstruction: "Target reconstruction WAV",
|
| 756 |
+
reconstruction: "Reproduced mix WAV",
|
| 757 |
+
};
|
| 758 |
+
const downloadOrder = ["archive", "reconstruction", "target_reconstruction", "midi", "source", "stem", "context_bed"];
|
| 759 |
+
$("downloads").innerHTML = downloadOrder
|
| 760 |
+
.filter((key) => fileUrls[key])
|
| 761 |
+
.map((key) => `<a href="${esc(fileUrls[key])}" download>${esc(labels[key] ?? key)}</a>`)
|
| 762 |
+
.join("");
|
| 763 |
+
$("sourcePreview").src = fileUrls.source ?? $("sourcePreview").src ?? "";
|
| 764 |
$("stemAudio").src = fileUrls.stem ?? "";
|
| 765 |
$("reconAudio").src = fileUrls.reconstruction ?? "";
|
| 766 |
+
if (fileUrls.reconstruction) setPreviewMode("reproduction");
|
| 767 |
+
else updateTransport();
|
| 768 |
|
| 769 |
renderSamples(result);
|
| 770 |
renderHits(result);
|
|
|
|
| 901 |
}
|
| 902 |
if (file) {
|
| 903 |
$("stemAudio").removeAttribute("src");
|
| 904 |
+
$("reconAudio").removeAttribute("src");
|
| 905 |
$("sourcePreview").src = URL.createObjectURL(file);
|
| 906 |
+
setPreviewMode("source");
|
| 907 |
}
|
| 908 |
}
|
| 909 |
|
|
|
|
| 942 |
$("demucs_model").addEventListener("change", updateStemOptions);
|
| 943 |
$("fileInput").addEventListener("change", (event) => setFile(event.target.files?.[0] ?? null));
|
| 944 |
$("runButton").addEventListener("click", runExtraction);
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 945 |
$("usePreviewButton").addEventListener("click", () => {
|
| 946 |
$("stem").value = "all";
|
| 947 |
$("clustering_mode").value = "online_preview";
|
|
|
|
| 950 |
$("target_max").value = 16;
|
| 951 |
$("mel_threshold").value = 0.62;
|
| 952 |
$("ncc_threshold").value = 0.72;
|
| 953 |
+
$("resultSummary").textContent = "Fast preview preset applied: full mix, online grouping, no Demucs shifts.";
|
| 954 |
+
});
|
| 955 |
+
$("useQualityButton").addEventListener("click", () => {
|
| 956 |
+
if (($("stem").value || "") === "all") $("stem").value = "drums";
|
| 957 |
+
$("clustering_mode").value = "batch_quality";
|
| 958 |
+
$("demucs_shifts").value = 1;
|
| 959 |
+
$("demucs_overlap").value = 0.25;
|
| 960 |
+
$("target_min").value = 5;
|
| 961 |
+
$("target_max").value = 20;
|
| 962 |
+
$("mel_threshold").value = 0.75;
|
| 963 |
+
$("ncc_threshold").value = 0.80;
|
| 964 |
+
$("resultSummary").textContent = "Best quality preset applied: separated stem, batch clustering, conservative grouping.";
|
| 965 |
});
|
| 966 |
|
| 967 |
for (const [id, delta] of [["clusterMinusButton", -1], ["clusterPlusButton", 1]]) {
|
|
|
|
| 1001 |
$("waveform").addEventListener("click", selectNearestWaveformHit);
|
| 1002 |
$("transportPlayButton").addEventListener("click", () => { toggleTransportPlayback().catch(() => {}); });
|
| 1003 |
$("transportSeek").addEventListener("input", (event) => seekTransport(event.target.value));
|
| 1004 |
+
for (const id of ["sourcePreview", "stemAudio", "reconAudio"]) {
|
| 1005 |
const audio = $(id);
|
| 1006 |
audio.addEventListener("timeupdate", updateTransport);
|
| 1007 |
audio.addEventListener("durationchange", updateTransport);
|
|
|
|
| 1009 |
audio.addEventListener("pause", updateTransport);
|
| 1010 |
audio.addEventListener("ended", updateTransport);
|
| 1011 |
}
|
| 1012 |
+
for (const button of document.querySelectorAll("[data-preview-mode]")) {
|
| 1013 |
+
button.addEventListener("click", () => setPreviewMode(button.dataset.previewMode));
|
| 1014 |
+
}
|
| 1015 |
|
| 1016 |
const dropzone = $("dropzone");
|
| 1017 |
const globalDropOverlay = $("globalDropOverlay");
|
web/index.html
CHANGED
|
@@ -88,6 +88,11 @@
|
|
| 88 |
<button id="transportPlayButton" class="round-play" type="button" aria-label="Play preview">▶</button>
|
| 89 |
<span id="transportTime" class="transport-time">0:00 / 0:00</span>
|
| 90 |
<input id="transportSeek" class="transport-seek" type="range" min="0" max="1000" value="0" step="1" aria-label="Seek preview" />
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 91 |
</div>
|
| 92 |
<div class="hidden-audio-bank" aria-hidden="true">
|
| 93 |
<audio id="sourcePreview"></audio>
|
|
@@ -109,26 +114,35 @@
|
|
| 109 |
|
| 110 |
<aside class="sidebar right-sidebar" aria-label="Right tool sidebar">
|
| 111 |
<details class="tool-panel control-card" open>
|
| 112 |
-
<summary><span>
|
|
|
|
| 113 |
<div class="control-group">
|
| 114 |
-
<label>Stem
|
| 115 |
<select id="stem"></select>
|
|
|
|
| 116 |
</label>
|
| 117 |
</div>
|
| 118 |
|
| 119 |
<div class="control-group sensitivity-group">
|
| 120 |
-
<label for="onset_delta">
|
| 121 |
<input id="onset_delta" type="range" min="0.01" max="0.35" step="0.005" />
|
| 122 |
-
<div class="range-caption"><span>
|
|
|
|
| 123 |
</div>
|
| 124 |
|
| 125 |
<div class="control-group">
|
| 126 |
-
<label>
|
| 127 |
-
|
| 128 |
-
|
| 129 |
-
|
| 130 |
-
|
| 131 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 132 |
</div>
|
| 133 |
</details>
|
| 134 |
|
|
@@ -140,84 +154,99 @@
|
|
| 140 |
</details>
|
| 141 |
|
| 142 |
<details class="tool-panel advanced-controls">
|
| 143 |
-
<summary><span>Advanced</span><small>
|
| 144 |
-
<
|
| 145 |
-
|
| 146 |
-
<
|
| 147 |
-
|
| 148 |
-
|
| 149 |
-
|
| 150 |
-
<
|
| 151 |
-
|
| 152 |
-
|
| 153 |
-
<
|
| 154 |
-
|
| 155 |
-
<
|
| 156 |
-
</
|
| 157 |
-
</
|
| 158 |
-
|
| 159 |
-
|
| 160 |
-
</
|
| 161 |
-
<
|
| 162 |
-
<
|
| 163 |
-
|
| 164 |
-
|
| 165 |
-
|
| 166 |
-
|
| 167 |
-
|
| 168 |
-
<
|
| 169 |
-
|
| 170 |
-
<
|
| 171 |
-
|
| 172 |
-
|
| 173 |
-
<
|
| 174 |
-
|
| 175 |
-
|
| 176 |
-
<
|
| 177 |
-
|
| 178 |
-
|
| 179 |
-
<
|
| 180 |
-
|
| 181 |
-
|
| 182 |
-
<
|
| 183 |
-
|
| 184 |
-
|
| 185 |
-
|
| 186 |
-
|
| 187 |
-
|
| 188 |
-
|
| 189 |
-
<
|
| 190 |
-
|
| 191 |
-
|
| 192 |
-
|
| 193 |
-
|
| 194 |
-
|
| 195 |
-
|
| 196 |
-
|
| 197 |
-
|
| 198 |
-
|
| 199 |
-
|
| 200 |
-
<
|
| 201 |
-
</
|
| 202 |
-
|
| 203 |
-
|
| 204 |
-
<
|
| 205 |
-
|
| 206 |
-
|
| 207 |
-
<
|
| 208 |
-
|
| 209 |
-
<
|
| 210 |
-
|
| 211 |
-
|
| 212 |
-
|
| 213 |
-
|
| 214 |
-
|
| 215 |
-
|
| 216 |
-
|
| 217 |
-
|
| 218 |
-
<
|
| 219 |
-
|
| 220 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 221 |
</details>
|
| 222 |
</aside>
|
| 223 |
</main>
|
|
|
|
| 88 |
<button id="transportPlayButton" class="round-play" type="button" aria-label="Play preview">▶</button>
|
| 89 |
<span id="transportTime" class="transport-time">0:00 / 0:00</span>
|
| 90 |
<input id="transportSeek" class="transport-seek" type="range" min="0" max="1000" value="0" step="1" aria-label="Seek preview" />
|
| 91 |
+
<div class="preview-tabs" aria-label="Preview source">
|
| 92 |
+
<button id="previewSourceButton" class="preview-tab active" type="button" data-preview-mode="source">Source</button>
|
| 93 |
+
<button id="previewStemButton" class="preview-tab" type="button" data-preview-mode="stem">Stem</button>
|
| 94 |
+
<button id="previewReproductionButton" class="preview-tab" type="button" data-preview-mode="reproduction">Reproduced</button>
|
| 95 |
+
</div>
|
| 96 |
</div>
|
| 97 |
<div class="hidden-audio-bank" aria-hidden="true">
|
| 98 |
<audio id="sourcePreview"></audio>
|
|
|
|
| 114 |
|
| 115 |
<aside class="sidebar right-sidebar" aria-label="Right tool sidebar">
|
| 116 |
<details class="tool-panel control-card" open>
|
| 117 |
+
<summary><span>Common controls</span><small>The 3 knobs to try first</small></summary>
|
| 118 |
+
<p class="panel-help">Start here. These controls decide what gets detected, how many groups are produced, and which stem is sampled.</p>
|
| 119 |
<div class="control-group">
|
| 120 |
+
<label>Stem to sample
|
| 121 |
<select id="stem"></select>
|
| 122 |
+
<small class="field-hint">Use <strong>drums</strong> for separated drum hits, or <strong>all</strong> for fast full-mix prototyping.</small>
|
| 123 |
</label>
|
| 124 |
</div>
|
| 125 |
|
| 126 |
<div class="control-group sensitivity-group">
|
| 127 |
+
<label for="onset_delta">Hit sensitivity</label>
|
| 128 |
<input id="onset_delta" type="range" min="0.01" max="0.35" step="0.005" />
|
| 129 |
+
<div class="range-caption"><span>Fewer hits</span><span>More hits</span></div>
|
| 130 |
+
<small class="field-hint">Increase when quiet hits are missed; decrease when bleed or ghost transients are over-detected.</small>
|
| 131 |
</div>
|
| 132 |
|
| 133 |
<div class="control-group">
|
| 134 |
+
<label>Sample groups
|
| 135 |
+
<div class="stepper">
|
| 136 |
+
<button id="clusterMinusButton" type="button" class="step-button" aria-label="Decrease cluster count">−</button>
|
| 137 |
+
<input id="target_max" type="number" min="0" max="256" step="1" aria-label="Sample group count" />
|
| 138 |
+
<button id="clusterPlusButton" type="button" class="step-button" aria-label="Increase cluster count">+</button>
|
| 139 |
+
</div>
|
| 140 |
+
<small class="field-hint">Approximate maximum number of sample cards to produce.</small>
|
| 141 |
+
</label>
|
| 142 |
+
</div>
|
| 143 |
+
<div class="preset-row common-presets">
|
| 144 |
+
<button id="usePreviewButton" class="ghost-button" type="button">Fast preview</button>
|
| 145 |
+
<button id="useQualityButton" class="ghost-button" type="button">Best quality</button>
|
| 146 |
</div>
|
| 147 |
</details>
|
| 148 |
|
|
|
|
| 154 |
</details>
|
| 155 |
|
| 156 |
<details class="tool-panel advanced-controls">
|
| 157 |
+
<summary><span>Advanced parameters</span><small>Only adjust when the common controls are not enough</small></summary>
|
| 158 |
+
<p class="panel-help">Advanced controls are grouped by pipeline stage. They are intentionally hidden from the normal extraction loop.</p>
|
| 159 |
+
<section class="advanced-section">
|
| 160 |
+
<h4>Stem separation</h4>
|
| 161 |
+
<div class="control-grid compact-controls">
|
| 162 |
+
<label>Demucs model
|
| 163 |
+
<select id="demucs_model"></select>
|
| 164 |
+
</label>
|
| 165 |
+
<label>Shifts
|
| 166 |
+
<input id="demucs_shifts" type="number" min="0" max="8" step="1" />
|
| 167 |
+
</label>
|
| 168 |
+
<label>Overlap
|
| 169 |
+
<input id="demucs_overlap" type="number" min="0" max="0.9" step="0.05" />
|
| 170 |
+
</label>
|
| 171 |
+
</div>
|
| 172 |
+
</section>
|
| 173 |
+
<section class="advanced-section">
|
| 174 |
+
<h4>Hit detection</h4>
|
| 175 |
+
<div class="control-grid compact-controls">
|
| 176 |
+
<label>Onset mode
|
| 177 |
+
<select id="onset_mode">
|
| 178 |
+
<option value="auto">auto / multiband</option>
|
| 179 |
+
<option value="percussive">percussive</option>
|
| 180 |
+
<option value="harmonic">harmonic</option>
|
| 181 |
+
<option value="broadband">broadband</option>
|
| 182 |
+
</select>
|
| 183 |
+
</label>
|
| 184 |
+
<label>Energy threshold dB
|
| 185 |
+
<input id="energy_threshold_db" type="number" min="-100" max="0" step="1" />
|
| 186 |
+
</label>
|
| 187 |
+
<label>Minimum gap seconds
|
| 188 |
+
<input id="min_gap" type="number" min="0.001" max="1" step="0.005" />
|
| 189 |
+
</label>
|
| 190 |
+
<label>Pre-pad seconds
|
| 191 |
+
<input id="pre_pad" type="number" min="0" max="0.25" step="0.001" />
|
| 192 |
+
</label>
|
| 193 |
+
<label>Min duration seconds
|
| 194 |
+
<input id="min_dur" type="number" min="0.001" max="10" step="0.005" />
|
| 195 |
+
</label>
|
| 196 |
+
<label>Max duration seconds
|
| 197 |
+
<input id="max_dur" type="number" min="0.01" max="10" step="0.1" />
|
| 198 |
+
</label>
|
| 199 |
+
</div>
|
| 200 |
+
</section>
|
| 201 |
+
<section class="advanced-section">
|
| 202 |
+
<h4>Grouping</h4>
|
| 203 |
+
<div class="control-grid compact-controls">
|
| 204 |
+
<label>Clustering mode
|
| 205 |
+
<select id="clustering_mode">
|
| 206 |
+
<option value="batch_quality">batch quality</option>
|
| 207 |
+
<option value="online_preview">online preview</option>
|
| 208 |
+
</select>
|
| 209 |
+
</label>
|
| 210 |
+
<label>Target min clusters
|
| 211 |
+
<input id="target_min" type="number" min="0" max="256" step="1" />
|
| 212 |
+
</label>
|
| 213 |
+
<label>NCC threshold
|
| 214 |
+
<input id="ncc_threshold" type="number" min="0" max="1" step="0.01" />
|
| 215 |
+
</label>
|
| 216 |
+
<label>Attack window ms
|
| 217 |
+
<input id="attack_ms" type="number" min="1" max="250" step="1" />
|
| 218 |
+
</label>
|
| 219 |
+
<label>Mel prefilter
|
| 220 |
+
<input id="mel_threshold" type="number" min="0" max="1" step="0.01" />
|
| 221 |
+
</label>
|
| 222 |
+
<label>Linkage
|
| 223 |
+
<select id="linkage">
|
| 224 |
+
<option value="average">average</option>
|
| 225 |
+
<option value="complete">complete</option>
|
| 226 |
+
<option value="single">single</option>
|
| 227 |
+
</select>
|
| 228 |
+
</label>
|
| 229 |
+
</div>
|
| 230 |
+
</section>
|
| 231 |
+
<section class="advanced-section">
|
| 232 |
+
<h4>Export and cache</h4>
|
| 233 |
+
<div class="control-grid compact-controls">
|
| 234 |
+
<label>MIDI grid
|
| 235 |
+
<select id="subdivision">
|
| 236 |
+
<option value="8">8th</option>
|
| 237 |
+
<option value="16">16th</option>
|
| 238 |
+
<option value="32">32nd</option>
|
| 239 |
+
<option value="64">64th</option>
|
| 240 |
+
</select>
|
| 241 |
+
</label>
|
| 242 |
+
</div>
|
| 243 |
+
<div class="toggles">
|
| 244 |
+
<label><input id="synthesize" type="checkbox" /> synthesize alternates</label>
|
| 245 |
+
<label><input id="quantize_midi" type="checkbox" /> quantize MIDI</label>
|
| 246 |
+
<label><input id="use_disk_cache" type="checkbox" /> disk cache stems/source loads</label>
|
| 247 |
+
</div>
|
| 248 |
+
<button id="clearCacheButton" class="ghost-button full-width" type="button">Clear cache</button>
|
| 249 |
+
</section>
|
| 250 |
</details>
|
| 251 |
</aside>
|
| 252 |
</main>
|
web/styles.css
CHANGED
|
@@ -857,3 +857,75 @@ tr:last-child td { border-bottom: 0; }
|
|
| 857 |
box-shadow: none;
|
| 858 |
}
|
| 859 |
.transport-row .transport-seek:hover::-moz-range-thumb { opacity: 1; }
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 857 |
box-shadow: none;
|
| 858 |
}
|
| 859 |
.transport-row .transport-seek:hover::-moz-range-thumb { opacity: 1; }
|
| 860 |
+
|
| 861 |
+
/* Pass 9: clearer preview and parameter hierarchy. */
|
| 862 |
+
.transport-row {
|
| 863 |
+
grid-template-columns: 48px 100px minmax(120px, 1fr) auto;
|
| 864 |
+
gap: 14px;
|
| 865 |
+
}
|
| 866 |
+
.preview-tabs {
|
| 867 |
+
display: inline-flex;
|
| 868 |
+
align-items: center;
|
| 869 |
+
gap: 4px;
|
| 870 |
+
padding: 4px;
|
| 871 |
+
border: 1px solid var(--line);
|
| 872 |
+
border-radius: 999px;
|
| 873 |
+
background: #f7f7fa;
|
| 874 |
+
white-space: nowrap;
|
| 875 |
+
}
|
| 876 |
+
.preview-tab {
|
| 877 |
+
border: 0;
|
| 878 |
+
border-radius: 999px;
|
| 879 |
+
padding: 7px 10px;
|
| 880 |
+
background: transparent;
|
| 881 |
+
color: var(--muted);
|
| 882 |
+
font-size: 11px;
|
| 883 |
+
font-weight: 800;
|
| 884 |
+
cursor: pointer;
|
| 885 |
+
}
|
| 886 |
+
.preview-tab.active {
|
| 887 |
+
background: #fff;
|
| 888 |
+
color: var(--accent-strong);
|
| 889 |
+
box-shadow: 0 2px 8px rgba(18, 21, 30, .06);
|
| 890 |
+
}
|
| 891 |
+
.panel-help,
|
| 892 |
+
.field-hint {
|
| 893 |
+
display: block;
|
| 894 |
+
color: var(--muted);
|
| 895 |
+
font-size: 11px;
|
| 896 |
+
line-height: 1.4;
|
| 897 |
+
font-weight: 560;
|
| 898 |
+
}
|
| 899 |
+
.panel-help {
|
| 900 |
+
margin: 10px 0 4px;
|
| 901 |
+
}
|
| 902 |
+
.field-hint {
|
| 903 |
+
margin-top: 7px;
|
| 904 |
+
}
|
| 905 |
+
.common-presets {
|
| 906 |
+
grid-template-columns: 1fr 1fr;
|
| 907 |
+
}
|
| 908 |
+
.advanced-section {
|
| 909 |
+
margin-top: 14px;
|
| 910 |
+
padding-top: 12px;
|
| 911 |
+
border-top: 1px solid rgba(228, 229, 233, .75);
|
| 912 |
+
}
|
| 913 |
+
.advanced-section:first-of-type {
|
| 914 |
+
border-top: 0;
|
| 915 |
+
padding-top: 0;
|
| 916 |
+
}
|
| 917 |
+
.advanced-section h4 {
|
| 918 |
+
margin: 0 0 8px;
|
| 919 |
+
color: var(--text);
|
| 920 |
+
font-size: 12px;
|
| 921 |
+
letter-spacing: -.01em;
|
| 922 |
+
}
|
| 923 |
+
@media (max-width: 920px) {
|
| 924 |
+
.transport-row {
|
| 925 |
+
grid-template-columns: 44px 86px minmax(0, 1fr);
|
| 926 |
+
}
|
| 927 |
+
.preview-tabs {
|
| 928 |
+
grid-column: 1 / -1;
|
| 929 |
+
justify-content: center;
|
| 930 |
+
}
|
| 931 |
+
}
|