Spaces:

rikhoffbauer2
/

drum-sample-extractor

Sleeping

File size: 11,976 Bytes

9e45cc2
 
eb1a122
9e45cc2
 
eb1a122
 
9e45cc2
 
 
eb1a122
 
03d531b
eb1a122
5a90820
eb1a122
 
 
b8fa9bf
 
03d531b
b8fa9bf
 
03d531b
b8fa9bf
 
 
 
 
 
 
3703c4e
 
03d531b
 
 
 
 
 
 
 
7f1f066
 
 
ce84147
03d531b
 
 
 
 
 
 
e07820e
03d531b
e07820e
 
 
03d531b
fa35534
 
5a90820
fa35534
 
 
03d531b
b8fa9bf
 
 
 
03d531b
e07820e
03d531b
fa35534
5a90820
b8fa9bf
 
 
 
 
 
03d531b
 
b8fa9bf
e07820e
e33cc90
0b5f0f0
8c10624
7f1f066
ce84147
5a90820
eb1a122
 
 
 
 
 
 
 
 
 
 
 
fa35534
b8fa9bf
fa35534
b8fa9bf
 
 
5a90820
eb1a122
03d531b
 
 
e07820e
03d531b
 
 
e07820e
7f1f066
ce84147
fa35534
03d531b
 
eb1a122
 
 
 
 
 
 
 
b8fa9bf
eb1a122
 
 
 
 
fa35534
eb1a122
 
 
 
 
 
 
 
 
03d531b
 
 
 
 
 
 
 
 
 
 
 
 
 
fa35534
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b8fa9bf
 
 
 
 
 
eb1a122
 
 
 
03d531b
fa35534
7f1f066
e07820e
fa35534
7f1f066
eb1a122
03d531b
e07820e
ce84147
fa35534
03d531b
b8fa9bf
eb1a122
 
fa35534
 
 
 
 
 
 
 
 
 
eb1a122
 
 
 
 
 
 
 
 
3703c4e
eb1a122
03d531b
e07820e
 
fa35534
 
e07820e
 
 
0b5f0f0
eb1a122
b8fa9bf
 
 
 
ab6f318
 
 
 
 
 
 
 
 
 
fa35534
ab6f318
 
 
 
f026127

---
title: Drum Sample Extractor
emoji: 🥁
colorFrom: gray
colorTo: pink
sdk: docker
app_port: 7860
pinned: false
---

# Drum Sample Extractor

A custom FastAPI + browser workstation for extracting, reviewing, and now semantically supervising reusable drum samples from an audio file.

The pipeline is configured for Spleeter as the lightweight source-separation default when available, falls back to full-mix processing when optional separation dependencies are missing, keeps Demucs as an explicit quality backend, detects onsets, classifies hits, clusters similar transients, chooses representative samples, optionally synthesizes alternate samples, and exports WAVs, MIDI, target-stem reconstruction, full-context reproduced audio, manifests, selected-only packs, and complete ZIP sample packs. The interactive layer stores user corrections as replayable semantic state beside each run manifest.

## Current status

The project is usable as a local/Hugging Face Space application. Gradio is no longer the active UI; the active app is a custom FastAPI backend plus a no-build browser frontend.

Implemented:

- Custom web frontend in `web/`, served by `app.py`.
- FastAPI job API with upload, polling, safe artifact downloads, config, health, cache clearing, run history, and SSE progress.
- Timed pipeline runner in `pipeline_runner.py`.
- Per-stage timing in every `manifest.json`.
- Two clustering modes:
  - `batch_quality`: all-pairs mel/NCC similarity plus agglomerative clustering.
  - `online_preview`: prototype-based incremental assignment intended for near-realtime preview.
- Disk cache for decoded full-mix/stem outputs keyed by source digest and extraction settings.
- Run history panel indexing `.runs/*/output/manifest.json`.
- Individual review WAVs for every detected hit under `review/hits/`.
- Click-to-audition workflow for waveform onsets, detected hit rows, and representative sample rows.
- Interactive supervised state in `supervised_state.py`:
  - persisted `supervision_state.json`,
  - hit/cluster confidence,
  - outlier-first review queue,
  - constraints,
  - event log,
  - suggestions,
  - undo stack.
- Clean, fixed, non-scrolling workstation UI: explicit top-bar upload button, whole-app drag/drop overlay, collapsed left/right/bottom tool panels by default, large center waveform/sample workspace, bottom dock for review/edit tools, and an explicit Start here flow.
- Immediate browser-side waveform rendering on file selection, before backend extraction starts.
- Waveform-based real progress visualization during extraction using backend `progress` events; no ETA or time-progress guessing.
- Visible API/runtime error banner in the UI, plus backend coercion for browser-form parameter values such as `subdivision="16"`.
- Supervision UI:
  - selected-hit actions,
  - move hit to cluster,
  - pull hit into a new cluster,
  - accept/favorite hit,
  - suppress hit as bleed,
  - lock/unlock cluster,
  - suggestion inbox with exact diff previews,
  - cluster explanation drawer,
  - force-onset waveform mode,
  - restore suppressed hits,
  - edited sample-pack export,
  - constraint/event log.

- Spleeter source-separation backend selected by default, with `spleeter:4stems`, `spleeter:2stems`, and `spleeter:5stems` support.
- Optional Demucs backend for explicit higher-quality separation; Spleeter failures now fall back to full-mix processing when fallback is enabled.
- True per-card checkbox selection and selected-only export under `selected/`.
- Persisted `draw another` card action that pins the next representative hit for the cluster.
- Immediate trim/extend card edits that rewrite preview WAVs under `overrides/hits/` and persist to supervised state.
- Documentation for features, progress, tasks, API, timing, hit review, realtime suitability, UI, remaining work, and interactive UX.
- Legacy Gradio apps preserved in `legacy/` for reference only.

Not fully complete yet:

- No true cached feature-vector local reclustering yet.
- No cluster merge/split/relabel workflow beyond move/pull-to-new-cluster.
- No frontend TypeScript build/test harness yet.
- Spleeter progress is coarse-grained; Demucs progress exposes chunk-level work where available.
- Demucs remains offline/batch by design and is treated as the higher-cost explicit quality backend.

See:

- `docs/FEATURES.md`
- `docs/TASKS.md`
- `docs/PROGRESS.md`
- `docs/API.md`
- `docs/interactive-ux/README.md`
- `docs/REMAINING_WORK.md`
- `docs/SUPERVISED_EXPORT_AND_FORCE_ONSET.md`
- `docs/FIXED_WORKSTATION_UI.md`
- `docs/REPRODUCED_AUDIO_AND_PARAMETERS.md`
- `docs/CLEAN_DEFAULT_UI.md`
- `docs/IMMEDIATE_WAVEFORM_AND_REAL_PROGRESS.md`
- `docs/API_ERRORS_AND_PARAMETER_VALIDATION.md`
- `docs/UPLOAD_ERROR_AND_RUNTIME_FALLBACK.md`

## Run locally

```bash
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
uvicorn app:app --host 0.0.0.0 --port 7860
```

Open `http://127.0.0.1:7860`.

For fast iteration, use the default automatic flow. To bypass source separation entirely, open `Advanced`, use `Fast preview`, or set:

- `Separation engine = none`
- `Stem = all`
- `Clustering mode = online_preview`

That uses the full mix and the near-realtime clustering path. The default engine is Spleeter. Install it separately with `pip install -r requirements-spleeter.txt` in an environment compatible with Spleeter/TensorFlow. If Spleeter is unavailable and fallback is enabled, the app falls back to full-mix processing so the UI still works. Choose Demucs explicitly under Expert controls for slower quality separation.

## Run checks

```bash
python3 -m py_compile app.py pipeline_runner.py sample_extractor.py supervised_state.py supervised_export.py scripts/*.py
node --check web/app.js
python3 scripts/test_sse_and_review_hits.py
python3 scripts/test_interactive_supervision.py
python3 scripts/test_supervised_export_and_force_onset.py
python3 scripts/test_progress_contract.py
python3 scripts/test_param_validation_and_api_errors.py
python3 scripts/test_selected_export_card_actions.py
```

## Run benchmarks

```bash
python3 scripts/benchmark_subprocesses.py --runs 2 --bars 4 --output docs/benchmark-subprocesses.json
```

The benchmark uses synthetic drum fixtures and `stem=all` so the DSP stages are measured without Demucs model download/runtime noise.

## API example

```bash
curl http://127.0.0.1:7860/api/config

curl -F 'file=@song.wav' \
  -F 'params={"separation_backend":"spleeter","spleeter_model":"spleeter:4stems","stem":"drums","clustering_mode":"online_preview","target_min":4,"target_max":12}' \
  http://127.0.0.1:7860/api/jobs
```

Then poll the returned job id:

```bash
curl http://127.0.0.1:7860/api/jobs/<job-id>
```

Read supervised state:

```bash
curl http://127.0.0.1:7860/api/jobs/<job-id>/state
```

Move a hit into a target cluster:

```bash
curl -X POST http://127.0.0.1:7860/api/jobs/<job-id>/hits/hit%3A00003/move \
  -H 'Content-Type: application/json' \
  -d '{"target_cluster_id":"cluster:0"}'
```


Export selected cards only:

```bash
curl -X POST http://127.0.0.1:7860/api/jobs/<job-id>/export-selected \
  -H 'Content-Type: application/json' \
  -d '{"labels":["kick_0","snare_0"],"synthesize":true}'
```

Draw another representative for a card:

```bash
curl -X POST http://127.0.0.1:7860/api/jobs/<job-id>/samples/kick_0/draw
```

Trim/extend the current representative preview:

```bash
curl -X POST http://127.0.0.1:7860/api/jobs/<job-id>/samples/kick_0/edit \
  -H 'Content-Type: application/json' \
  -d '{"start_offset_ms":-8,"tail_offset_ms":24}'
```

List active/completed runs:

```bash
curl http://127.0.0.1:7860/api/jobs
```

## Important files

| Path | Purpose |
|---|---|
| `app.py` | FastAPI app, static UI serving, job API, run history, artifact downloads, supervised editing endpoints |
| `pipeline_runner.py` | Timed extraction pipeline, Spleeter/Demucs/none separation backends, real progress contract, disk source/stem/context cache, batch/online clustering routing |
| `sample_extractor.py` | Core DSP/sample extraction implementation, including chunk-progress callback support for Demucs stem extraction |
| `supervised_state.py` | Persistent semantic state, confidence, constraints, events, suggestions, force-onset, restore, undo |
| `supervised_export.py` | Renders edited semantic state into supervised and selected-only WAV/MIDI/reconstruction/ZIP artifacts |
| `web/` | Custom no-build browser frontend with clean fixed non-scrolling workstation layout, explicit upload/whole-page drag-drop, immediate uploaded waveform rendering, real-progress waveform tinting, source/stem/reproduced preview transport, common/advanced parameter separation, collapsed sidebars/bottom dock, sample-card grid, hidden-audio audition, add-onset mode, and edited export |
| `scripts/benchmark_subprocesses.py` | Synthetic benchmark runner for stage timings |
| `scripts/test_interactive_supervision.py` | Smoke test for supervised state endpoints |
| `scripts/test_supervised_export_and_force_onset.py` | Smoke test for force-onset, restore, suggestion diffs, and edited exports |
| `scripts/test_param_validation_and_api_errors.py` | Regression test for browser-style parameter coercion and visible API error details |
| `scripts/test_selected_export_card_actions.py` | Smoke test for selected-only export, draw-next persistence, and immediate preview timing edits |
| `docs/interactive-ux/` | Supplied interactive UX docs aligned to current implementation |
| `docs/` | Review, timing, API, UI, feature, task, progress, and remaining-work documentation |
| `legacy/` | Previous Gradio apps retained for reference |

## Optional Spleeter backend

Spleeter is the default selected backend because it is much lighter than Demucs for the common path. It is not pinned into `requirements.txt` because TensorFlow/Spleeter compatibility depends on the Python environment. Use:

```bash
pip install -r requirements-spleeter.txt
```

Leave `allow_backend_fallback=true` for normal use so missing or failing Spleeter installs automatically fall back to Demucs. Disable fallback only when debugging Spleeter itself.

## Output per run

Each run is stored under `.runs/<job-id>/output/`:

- `stem.wav`
- `reconstruction.wav`
- `reconstruction.mid`
- `sample-pack.zip`
- `samples/*.wav`
- `review/hits/*.wav`
- `manifest.json`
- `supervision_state.json`
- `supervised/manifest.json` after edited export
- `supervised/sample-pack.zip` after edited export
- `selected/sample-pack.zip` after selected-card export
- `overrides/hits/*.wav` after immediate card trim/extend edits
- `supervised/samples/*.wav` after edited export
- `supervised/reconstruction.mid` after edited export
- `supervised/reconstruction.wav` after edited export
- `source.wav`, `context_bed.wav`, and `target_reconstruction.wav` for source/stem/reproduced A/B previews

Generated runtime directories are ignored by git:

- `.runs/`
- `.cache/`

## Automatic default workflow

The default UI is now intentionally simple:

1. Drop or upload an audio file.
2. The waveform renders immediately in the browser.
3. Upload and extraction start automatically.
4. Automatic tuning chooses practical onset sensitivity and sample-group bounds after the source/stem is available.
5. Sample cards appear in grouped columns as soon as their WAVs are written.
6. The user can audition, dismiss, draw another candidate, or trim/extend a card. Draw and timing choices are persisted as semantic overrides and affect selected/edited exports.

Advanced parameters, run history, raw tables, and supervised semantic editing remain available in collapsed panels, but they are no longer required for the common path.

See `docs/AUTOMATIC_CARD_FLOW_UI.md`.


### Reference-style UI update

The web UI now follows the supplied Sample Extractor reference: waveform-first canvas, grouped sample columns, persistent right settings panel, compact export bar, and a bottom selection/tools bar. Drop/upload still starts processing automatically.