Spaces:
Sleeping
Sleeping
File size: 11,976 Bytes
9e45cc2 eb1a122 9e45cc2 eb1a122 9e45cc2 eb1a122 03d531b eb1a122 5a90820 eb1a122 b8fa9bf 03d531b b8fa9bf 03d531b b8fa9bf 3703c4e 03d531b 7f1f066 ce84147 03d531b e07820e 03d531b e07820e 03d531b fa35534 5a90820 fa35534 03d531b b8fa9bf 03d531b e07820e 03d531b fa35534 5a90820 b8fa9bf 03d531b b8fa9bf e07820e e33cc90 0b5f0f0 8c10624 7f1f066 ce84147 5a90820 eb1a122 fa35534 b8fa9bf fa35534 b8fa9bf 5a90820 eb1a122 03d531b e07820e 03d531b e07820e 7f1f066 ce84147 fa35534 03d531b eb1a122 b8fa9bf eb1a122 fa35534 eb1a122 03d531b fa35534 b8fa9bf eb1a122 03d531b fa35534 7f1f066 e07820e fa35534 7f1f066 eb1a122 03d531b e07820e ce84147 fa35534 03d531b b8fa9bf eb1a122 fa35534 eb1a122 3703c4e eb1a122 03d531b e07820e fa35534 e07820e 0b5f0f0 eb1a122 b8fa9bf ab6f318 fa35534 ab6f318 f026127 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 | ---
title: Drum Sample Extractor
emoji: 🥁
colorFrom: gray
colorTo: pink
sdk: docker
app_port: 7860
pinned: false
---
# Drum Sample Extractor
A custom FastAPI + browser workstation for extracting, reviewing, and now semantically supervising reusable drum samples from an audio file.
The pipeline is configured for Spleeter as the lightweight source-separation default when available, falls back to full-mix processing when optional separation dependencies are missing, keeps Demucs as an explicit quality backend, detects onsets, classifies hits, clusters similar transients, chooses representative samples, optionally synthesizes alternate samples, and exports WAVs, MIDI, target-stem reconstruction, full-context reproduced audio, manifests, selected-only packs, and complete ZIP sample packs. The interactive layer stores user corrections as replayable semantic state beside each run manifest.
## Current status
The project is usable as a local/Hugging Face Space application. Gradio is no longer the active UI; the active app is a custom FastAPI backend plus a no-build browser frontend.
Implemented:
- Custom web frontend in `web/`, served by `app.py`.
- FastAPI job API with upload, polling, safe artifact downloads, config, health, cache clearing, run history, and SSE progress.
- Timed pipeline runner in `pipeline_runner.py`.
- Per-stage timing in every `manifest.json`.
- Two clustering modes:
- `batch_quality`: all-pairs mel/NCC similarity plus agglomerative clustering.
- `online_preview`: prototype-based incremental assignment intended for near-realtime preview.
- Disk cache for decoded full-mix/stem outputs keyed by source digest and extraction settings.
- Run history panel indexing `.runs/*/output/manifest.json`.
- Individual review WAVs for every detected hit under `review/hits/`.
- Click-to-audition workflow for waveform onsets, detected hit rows, and representative sample rows.
- Interactive supervised state in `supervised_state.py`:
- persisted `supervision_state.json`,
- hit/cluster confidence,
- outlier-first review queue,
- constraints,
- event log,
- suggestions,
- undo stack.
- Clean, fixed, non-scrolling workstation UI: explicit top-bar upload button, whole-app drag/drop overlay, collapsed left/right/bottom tool panels by default, large center waveform/sample workspace, bottom dock for review/edit tools, and an explicit Start here flow.
- Immediate browser-side waveform rendering on file selection, before backend extraction starts.
- Waveform-based real progress visualization during extraction using backend `progress` events; no ETA or time-progress guessing.
- Visible API/runtime error banner in the UI, plus backend coercion for browser-form parameter values such as `subdivision="16"`.
- Supervision UI:
- selected-hit actions,
- move hit to cluster,
- pull hit into a new cluster,
- accept/favorite hit,
- suppress hit as bleed,
- lock/unlock cluster,
- suggestion inbox with exact diff previews,
- cluster explanation drawer,
- force-onset waveform mode,
- restore suppressed hits,
- edited sample-pack export,
- constraint/event log.
- Spleeter source-separation backend selected by default, with `spleeter:4stems`, `spleeter:2stems`, and `spleeter:5stems` support.
- Optional Demucs backend for explicit higher-quality separation; Spleeter failures now fall back to full-mix processing when fallback is enabled.
- True per-card checkbox selection and selected-only export under `selected/`.
- Persisted `draw another` card action that pins the next representative hit for the cluster.
- Immediate trim/extend card edits that rewrite preview WAVs under `overrides/hits/` and persist to supervised state.
- Documentation for features, progress, tasks, API, timing, hit review, realtime suitability, UI, remaining work, and interactive UX.
- Legacy Gradio apps preserved in `legacy/` for reference only.
Not fully complete yet:
- No true cached feature-vector local reclustering yet.
- No cluster merge/split/relabel workflow beyond move/pull-to-new-cluster.
- No frontend TypeScript build/test harness yet.
- Spleeter progress is coarse-grained; Demucs progress exposes chunk-level work where available.
- Demucs remains offline/batch by design and is treated as the higher-cost explicit quality backend.
See:
- `docs/FEATURES.md`
- `docs/TASKS.md`
- `docs/PROGRESS.md`
- `docs/API.md`
- `docs/interactive-ux/README.md`
- `docs/REMAINING_WORK.md`
- `docs/SUPERVISED_EXPORT_AND_FORCE_ONSET.md`
- `docs/FIXED_WORKSTATION_UI.md`
- `docs/REPRODUCED_AUDIO_AND_PARAMETERS.md`
- `docs/CLEAN_DEFAULT_UI.md`
- `docs/IMMEDIATE_WAVEFORM_AND_REAL_PROGRESS.md`
- `docs/API_ERRORS_AND_PARAMETER_VALIDATION.md`
- `docs/UPLOAD_ERROR_AND_RUNTIME_FALLBACK.md`
## Run locally
```bash
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
uvicorn app:app --host 0.0.0.0 --port 7860
```
Open `http://127.0.0.1:7860`.
For fast iteration, use the default automatic flow. To bypass source separation entirely, open `Advanced`, use `Fast preview`, or set:
- `Separation engine = none`
- `Stem = all`
- `Clustering mode = online_preview`
That uses the full mix and the near-realtime clustering path. The default engine is Spleeter. Install it separately with `pip install -r requirements-spleeter.txt` in an environment compatible with Spleeter/TensorFlow. If Spleeter is unavailable and fallback is enabled, the app falls back to full-mix processing so the UI still works. Choose Demucs explicitly under Expert controls for slower quality separation.
## Run checks
```bash
python3 -m py_compile app.py pipeline_runner.py sample_extractor.py supervised_state.py supervised_export.py scripts/*.py
node --check web/app.js
python3 scripts/test_sse_and_review_hits.py
python3 scripts/test_interactive_supervision.py
python3 scripts/test_supervised_export_and_force_onset.py
python3 scripts/test_progress_contract.py
python3 scripts/test_param_validation_and_api_errors.py
python3 scripts/test_selected_export_card_actions.py
```
## Run benchmarks
```bash
python3 scripts/benchmark_subprocesses.py --runs 2 --bars 4 --output docs/benchmark-subprocesses.json
```
The benchmark uses synthetic drum fixtures and `stem=all` so the DSP stages are measured without Demucs model download/runtime noise.
## API example
```bash
curl http://127.0.0.1:7860/api/config
curl -F 'file=@song.wav' \
-F 'params={"separation_backend":"spleeter","spleeter_model":"spleeter:4stems","stem":"drums","clustering_mode":"online_preview","target_min":4,"target_max":12}' \
http://127.0.0.1:7860/api/jobs
```
Then poll the returned job id:
```bash
curl http://127.0.0.1:7860/api/jobs/<job-id>
```
Read supervised state:
```bash
curl http://127.0.0.1:7860/api/jobs/<job-id>/state
```
Move a hit into a target cluster:
```bash
curl -X POST http://127.0.0.1:7860/api/jobs/<job-id>/hits/hit%3A00003/move \
-H 'Content-Type: application/json' \
-d '{"target_cluster_id":"cluster:0"}'
```
Export selected cards only:
```bash
curl -X POST http://127.0.0.1:7860/api/jobs/<job-id>/export-selected \
-H 'Content-Type: application/json' \
-d '{"labels":["kick_0","snare_0"],"synthesize":true}'
```
Draw another representative for a card:
```bash
curl -X POST http://127.0.0.1:7860/api/jobs/<job-id>/samples/kick_0/draw
```
Trim/extend the current representative preview:
```bash
curl -X POST http://127.0.0.1:7860/api/jobs/<job-id>/samples/kick_0/edit \
-H 'Content-Type: application/json' \
-d '{"start_offset_ms":-8,"tail_offset_ms":24}'
```
List active/completed runs:
```bash
curl http://127.0.0.1:7860/api/jobs
```
## Important files
| Path | Purpose |
|---|---|
| `app.py` | FastAPI app, static UI serving, job API, run history, artifact downloads, supervised editing endpoints |
| `pipeline_runner.py` | Timed extraction pipeline, Spleeter/Demucs/none separation backends, real progress contract, disk source/stem/context cache, batch/online clustering routing |
| `sample_extractor.py` | Core DSP/sample extraction implementation, including chunk-progress callback support for Demucs stem extraction |
| `supervised_state.py` | Persistent semantic state, confidence, constraints, events, suggestions, force-onset, restore, undo |
| `supervised_export.py` | Renders edited semantic state into supervised and selected-only WAV/MIDI/reconstruction/ZIP artifacts |
| `web/` | Custom no-build browser frontend with clean fixed non-scrolling workstation layout, explicit upload/whole-page drag-drop, immediate uploaded waveform rendering, real-progress waveform tinting, source/stem/reproduced preview transport, common/advanced parameter separation, collapsed sidebars/bottom dock, sample-card grid, hidden-audio audition, add-onset mode, and edited export |
| `scripts/benchmark_subprocesses.py` | Synthetic benchmark runner for stage timings |
| `scripts/test_interactive_supervision.py` | Smoke test for supervised state endpoints |
| `scripts/test_supervised_export_and_force_onset.py` | Smoke test for force-onset, restore, suggestion diffs, and edited exports |
| `scripts/test_param_validation_and_api_errors.py` | Regression test for browser-style parameter coercion and visible API error details |
| `scripts/test_selected_export_card_actions.py` | Smoke test for selected-only export, draw-next persistence, and immediate preview timing edits |
| `docs/interactive-ux/` | Supplied interactive UX docs aligned to current implementation |
| `docs/` | Review, timing, API, UI, feature, task, progress, and remaining-work documentation |
| `legacy/` | Previous Gradio apps retained for reference |
## Optional Spleeter backend
Spleeter is the default selected backend because it is much lighter than Demucs for the common path. It is not pinned into `requirements.txt` because TensorFlow/Spleeter compatibility depends on the Python environment. Use:
```bash
pip install -r requirements-spleeter.txt
```
Leave `allow_backend_fallback=true` for normal use so missing or failing Spleeter installs automatically fall back to Demucs. Disable fallback only when debugging Spleeter itself.
## Output per run
Each run is stored under `.runs/<job-id>/output/`:
- `stem.wav`
- `reconstruction.wav`
- `reconstruction.mid`
- `sample-pack.zip`
- `samples/*.wav`
- `review/hits/*.wav`
- `manifest.json`
- `supervision_state.json`
- `supervised/manifest.json` after edited export
- `supervised/sample-pack.zip` after edited export
- `selected/sample-pack.zip` after selected-card export
- `overrides/hits/*.wav` after immediate card trim/extend edits
- `supervised/samples/*.wav` after edited export
- `supervised/reconstruction.mid` after edited export
- `supervised/reconstruction.wav` after edited export
- `source.wav`, `context_bed.wav`, and `target_reconstruction.wav` for source/stem/reproduced A/B previews
Generated runtime directories are ignored by git:
- `.runs/`
- `.cache/`
## Automatic default workflow
The default UI is now intentionally simple:
1. Drop or upload an audio file.
2. The waveform renders immediately in the browser.
3. Upload and extraction start automatically.
4. Automatic tuning chooses practical onset sensitivity and sample-group bounds after the source/stem is available.
5. Sample cards appear in grouped columns as soon as their WAVs are written.
6. The user can audition, dismiss, draw another candidate, or trim/extend a card. Draw and timing choices are persisted as semantic overrides and affect selected/edited exports.
Advanced parameters, run history, raw tables, and supervised semantic editing remain available in collapsed panels, but they are no longer required for the common path.
See `docs/AUTOMATIC_CARD_FLOW_UI.md`.
### Reference-style UI update
The web UI now follows the supplied Sample Extractor reference: waveform-first canvas, grouped sample columns, persistent right settings panel, compact export bar, and a bottom selection/tools bar. Drop/upload still starts processing automatically.
|