Spaces:

rikhoffbauer2
/

drum-sample-extractor

Sleeping

App Files Files Community

ChatGPT commited on 16 days ago

Commit

eb1a122

1 Parent(s): 0821fc5

feat: replace gradio with custom extraction web app

Browse files

Files changed (22) hide show

.gitignore +22 -0
Dockerfile +18 -0
README.md +78 -5
app.py +199 -254
docs/API.md +140 -0
docs/PIPELINE_TIMING_AND_REALTIME.md +214 -0
docs/PROJECT_REVIEW.md +89 -0
docs/REMAINING_WORK.md +27 -0
docs/UI_REPLACEMENT.md +66 -0
docs/benchmark-subprocesses.json +476 -0
legacy/gradio_app.py +259 -0
app_v2.py → legacy/gradio_app_v2.py +0 -0
pipeline_runner.py +407 -0
requirements-legacy-gradio.txt +2 -0
requirements.txt +8 -6
sample_extractor.py +1 -1
scripts/benchmark_subprocesses.py +86 -0
scripts/smoke_benchmark.py +24 -0
scripts/test_api_job.py +25 -0
web/app.js +269 -0
web/index.html +174 -0
web/styles.css +80 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,22 @@

+.runs/
+.command-logs/
+__pycache__/
+*.pyc
+*.pyo
+*.pyd
+*.egg-info/
+.venv/
+venv/
+dist/
+build/
+.DS_Store
+.env
+*.wav
+*.mp3
+*.flac
+*.aiff
+*.ogg
+*.m4a
+*.mid
+*.zip
+!drum-sample-extractor-updated.zip

Dockerfile ADDED Viewed

	@@ -0,0 +1,18 @@

+FROM python:3.11-slim
+ENV PYTHONDONTWRITEBYTECODE=1 \
+    PYTHONUNBUFFERED=1 \
+    PIP_NO_CACHE_DIR=1
+WORKDIR /app
+RUN apt-get update \
+    && apt-get install -y --no-install-recommends ffmpeg libsndfile1 git \
+    && rm -rf /var/lib/apt/lists/*
+COPY requirements.txt ./
+RUN pip install -r requirements.txt
+COPY . ./
+EXPOSE 7860
+CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860"]

README.md CHANGED Viewed

@@ -1,12 +1,85 @@
 ---
 title: Drum Sample Extractor
-emoji: 📊
 colorFrom: gray
 colorTo: pink
-sdk: gradio
-sdk_version: 6.13.0
-app_file: app.py
 pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
 title: Drum Sample Extractor
+emoji: 🥁
 colorFrom: gray
 colorTo: pink
+sdk: docker
+app_port: 7860
 pinned: false
 ---
+# Drum Sample Extractor
+A custom FastAPI + browser UI for extracting reusable drum samples from an audio file.
+The pipeline can isolate a stem with Demucs, detect onsets, classify hits, cluster similar transients, choose representative samples, optionally synthesize alternate samples, and export WAVs, MIDI, reconstruction audio, and a complete ZIP sample pack.
+## Current status
+- Gradio has been replaced by a custom web frontend in `web/` served by `app.py`.
+- The extraction pipeline is exposed through a JSON/multipart API and factored into `pipeline_runner.py`.
+- Per-stage timing is captured for every extraction run and written into `manifest.json`.
+- Benchmarking support is available in `scripts/benchmark_subprocesses.py`.
+- Legacy Gradio apps are preserved in `legacy/` for reference only.
+## Run locally
+```bash
+python3 -m venv .venv
+source .venv/bin/activate
+pip install -r requirements.txt
+uvicorn app:app --host 0.0.0.0 --port 7860
+```
+Open `http://127.0.0.1:7860`.
+For fast iteration, set `Stem` to `all`. That bypasses Demucs and runs onset detection, classification, clustering, representative selection, synthesis, MIDI rendering, and packaging directly on the uploaded audio.
+## Run benchmarks
+```bash
+python3 scripts/benchmark_subprocesses.py --runs 2 --bars 4 --output docs/benchmark-subprocesses.json
+```
+The benchmark uses synthetic drum fixtures and `stem=all` so the DSP stages are measured without Demucs model download/runtime noise.
+## API
+```bash
+curl http://127.0.0.1:7860/api/config
+curl -F 'file=@song.wav' \
+  -F 'params={"stem":"all","target_min":4,"target_max":12}' \
+  http://127.0.0.1:7860/api/jobs
+```
+Then poll the returned job id:
+```bash
+curl http://127.0.0.1:7860/api/jobs/<job-id>
+```
+## Important files
+| Path | Purpose |
+|---|---|
+| `app.py` | FastAPI app, static UI serving, job API, artifact downloads |
+| `pipeline_runner.py` | Timed extraction pipeline used by API and benchmarks |
+| `sample_extractor.py` | Core DSP/sample extraction implementation |
+| `web/` | Custom no-build browser frontend |
+| `scripts/benchmark_subprocesses.py` | Synthetic benchmark runner for stage timings |
+| `docs/` | Review, timing, API, and UI documentation |
+| `legacy/` | Previous Gradio apps retained for reference |
+## Output per run
+Each run is stored under `.runs/<job-id>/output/`:
+- `stem.wav`
+- `reconstruction.wav`
+- `reconstruction.mid`
+- `sample-pack.zip`
+- `samples/*.wav`
+- `manifest.json`
+`.runs/` is ignored by git.

app.py CHANGED Viewed

@@ -1,259 +1,204 @@
 """
-Gradio UI — Sample Extractor v9.
-SuperFlux onsets, transient NCC, mel pre-filter, MIDI quantization, param locking.
-"""
-import os, sys
-sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
-# ─── HOTFIX: patch _sf() keyword argument bug ────────────────────────────────
-_src = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'sample_extractor.py')
-with open(_src, 'r') as _f: _content = _f.read()
-if '_sf(yh,lag=2,ms=5)' in _content:
-    _content = _content.replace('_sf(yh,lag=2,ms=5)', '_sf(yh,l=2,ms=5)')
-    with open(_src, 'w') as _f: _f.write(_content)
-    print("[HOTFIX] Fixed _sf() kwarg: lag=2 → l=2")
-del _src, _content
-# ──────────────────────────────────────────────────────────────────────────────
-import gradio as gr
-import numpy as np, pandas as pd, json, tempfile
-import soundfile as sf, librosa
-import matplotlib; matplotlib.use('Agg')
-import matplotlib.pyplot as plt
-from sample_extractor import (
-    extract_stem, detect_onsets, classify_hits,
-    cluster_hits, select_best, synthesize_from_cluster,
-    sample_quality_score, export_midi, detect_bpm,
-    render_midi_with_samples, build_archive, cache_clear, auto_tune,
-    DEMUCS_MODELS, DEMUCS_STEMS,
 )
-from synth_generator import generate_test_song
-from evaluation import evaluate_extraction
-from config_store import PipelineConfig, get_leaderboard
-from optimizer_v2 import run_optimization
-def audio_tuple(a, sr):
-    a = a.astype(np.float32); pk = np.abs(a).max()
-    if pk > 0: a = a / pk * 0.95
-    return (sr, a)
-def run_auto_tune(audio_in, stem_choice, demucs_model, demucs_shifts, demucs_overlap,
-                  onset_mode, cur_delta, cur_energy, cur_gap, cur_tmin, cur_tmax,
-                  lock_delta, lock_energy, lock_gap, lock_targets, progress=gr.Progress()):
-    if audio_in is None: return [gr.update()]*5 + ["Upload audio first", ""]
-    locks = {}
-    if lock_delta: locks['onset_delta'] = float(cur_delta)
-    if lock_energy: locks['energy_threshold_db'] = float(cur_energy)
-    if lock_gap: locks['min_gap'] = float(cur_gap)
-    if lock_targets: locks['target_min']=int(cur_tmin); locks['target_max']=int(cur_tmax)
-    progress(0.0); sr_in,data=audio_in; data=data.astype(np.float32)
-    if data.ndim>1: data=data.mean(axis=1)
-    pk=np.abs(data).max()
-    if pk>0: data/=pk
-    with tempfile.NamedTemporaryFile(suffix='.wav',delete=False) as f:
-        sf.write(f.name,data,sr_in); tmp=f.name
     try:
-        progress(0.05,desc=f"Stem..."); sa,ssr=extract_stem(tmp,stem=stem_choice,device="cpu",
-            model_name=demucs_model,shifts=int(demucs_shifts),overlap=float(demucs_overlap))
-        ld=', '.join(f'{k}={v}' for k,v in locks.items()) if locks else 'none'
-        progress(0.15,desc=f"Tuning (🔒 {ld})...")
-        bp,bs,log=auto_tune(sa,ssr,mode=onset_mode,locks=locks)
-        progress(1.0)
-        lt='\n'.join(log[-30:])
-        li=f"🔒 Locked: {ld}" if locks else "All params free"
-        sm=f"**Score: {bs:.1f}/100** · {li}\n\nClick **Extract** to use these settings."
-        return [
-            gr.update(value=bp['onset_delta']) if not lock_delta else gr.update(),
-            gr.update(value=bp['energy_threshold_db']) if not lock_energy else gr.update(),
-            gr.update(value=bp['min_gap']) if not lock_gap else gr.update(),
-            gr.update(value=bp.get('target_min',5)) if not lock_targets else gr.update(),
-            gr.update(value=bp.get('target_max',20)) if not lock_targets else gr.update(),
-            sm, lt]
-    finally: os.unlink(tmp)
-def run_extraction(audio_in, stem_choice, demucs_model, demucs_shifts, demucs_overlap,
-                   onset_mode, onset_delta, energy_db, pre_pad, min_dur, max_dur, min_gap,
-                   ncc_threshold, attack_ms, linkage, target_min, target_max,
-                   do_synthesize, quantize_midi, subdivision, progress=gr.Progress()):
-    if audio_in is None: return [None]*8
-    progress(0.0); sr_in,data=audio_in; data=data.astype(np.float32)
-    if data.ndim>1: data=data.mean(axis=1)
-    pk=np.abs(data).max()
-    if pk>0: data/=pk
-    with tempfile.NamedTemporaryFile(suffix='.wav',delete=False) as f:
-        sf.write(f.name,data,sr_in); tmp=f.name
     try:
-        progress(0.05,desc=f"Stem ({demucs_model})...")
-        sa,ssr=extract_stem(tmp,stem=stem_choice,device="cpu",
-            model_name=demucs_model,shifts=int(demucs_shifts),overlap=float(demucs_overlap))
-        progress(0.15,desc="BPM..."); bpm=detect_bpm(sa,ssr)
-        progress(0.25,desc="Onsets...")
-        hits=detect_onsets(sa,ssr,mode=onset_mode,onset_delta=float(onset_delta),
-            energy_threshold_db=float(energy_db),pre_pad=float(pre_pad),
-            min_dur=float(min_dur),max_dur=float(max_dur),min_gap=float(min_gap))
-        if not hits:
-            return (audio_tuple(sa,ssr),f"**BPM: {bpm}** — No hits.",None,None,None,None,"",pd.DataFrame())
-        progress(0.35,desc="Classify..."); hits=classify_hits(hits)
-        progress(0.45,desc="Cluster...")
-        cl=cluster_hits(hits,audio=sa,sr=ssr,ncc_threshold=float(ncc_threshold),
-            attack_ms=float(attack_ms),target_min=int(target_min),target_max=int(target_max),linkage=str(linkage))
-        progress(0.65,desc="Select..."); select_best(cl)
-        if do_synthesize:
-            progress(0.7,desc="Synth...")
-            for c in cl:
-                if c.count>=2: c.synthesized=synthesize_from_cluster(c)
-        progress(0.75,desc="MIDI..."); mp=tempfile.mktemp(suffix='.mid')
-        export_midi(cl,mp,bpm=bpm,quantize=bool(quantize_midi),subdivision=int(subdivision))
-        progress(0.8,desc="Render..."); rend=render_midi_with_samples(cl,sr=ssr)
-        progress(0.85,desc="Package...")
-        sd=tempfile.mkdtemp(); sp=[]
-        for c in sorted(cl,key=lambda x:x.count,reverse=True):
-            p=os.path.join(sd,f"{c.label}.wav"); c.best_hit.save(p); sp.append(p)
-        zp=build_archive(cl,bpm,ssr,midi_path=mp,rendered_audio=rend)
-        rows=[]
-        for c in sorted(cl,key=lambda x:x.count,reverse=True):
-            b=c.best_hit; sc=sample_quality_score(b.audio,b.sr,c.label.rsplit('_',1)[0])
-            rows.append({'Sample':c.label,'Hits':c.count,'MIDI':c.midi_note,
-                'Score':f"{sc['total']:.1f}",'Clean':f"{sc['cleanness']:.2f}",
-                'Complete':f"{sc['completeness']:.2f}",
-                'Dur':f"{b.duration*1000:.0f}ms",
-                'First':f"{sorted(h.onset_time for h in c.hits)[0]:.2f}s"})
-        sm=f"**BPM: {bpm}** · **{len(cl)} samples** from {len(hits)} hits\n\n"
-        sm+=f"`{demucs_model}` · δ=`{onset_delta}` · E=`{energy_db}dB` · attack=`{attack_ms}ms`"
-        if int(target_min)>0 and int(target_max)>0: sm+=f" · clusters `{int(target_min)}–{int(target_max)}`"
-        if quantize_midi: sm+=f" · MIDI 1/{int(subdivision)}"
-        sm+="\n\n| Sample | Hits | MIDI |\n|---|---|---|\n"
-        for c in sorted(cl,key=lambda x:x.count,reverse=True): sm+=f"| {c.label} | {c.count} | {c.midi_note} |\n"
-        progress(1.0)
-        return (audio_tuple(sa,ssr),sm,audio_tuple(rend,ssr),sp,mp,zp,"",pd.DataFrame(rows))
-    finally: os.unlink(tmp)
-def run_eval(pattern,bpm,bars,ncc_threshold,target_min,target_max,progress=gr.Progress()):
-    progress(0.0); song=generate_test_song(pattern_name=pattern,bars=int(bars),bpm=float(bpm),variation='medium',seed=42)
-    dbpm=detect_bpm(song.drums_only,song.sr); progress(0.2)
-    hits=detect_onsets(song.drums_only,song.sr)
-    if not hits: return None,None,None,None,"",""
-    hits=classify_hits(hits)
-    cl=cluster_hits(hits,audio=song.drums_only,sr=song.sr,ncc_threshold=float(ncc_threshold),
-                     target_min=int(target_min),target_max=int(target_max))
-    select_best(cl)
-    for c in cl:
-        if c.count>=2: c.synthesized=synthesize_from_cluster(c)
-    progress(0.5); rend=render_midi_with_samples(cl,sr=song.sr); progress(0.6)
-    gt={n:s.audio for n,s in song.samples.items()}
-    gh=[{'sample':h.sample_name,'onset':h.onset_time,'velocity':h.velocity} for h in song.hits]
-    r=evaluate_extraction(cl,gt,gh,song.sr,hits)
-    s=[{'Metric':'BPM','Value':f"{dbpm}",'Target':f"{song.bpm}"},
-       {'Metric':'Clusters','Value':str(len(cl)),'Target':str(len(gt))},
-       {'Metric':'Score','Value':f"{r.overall_score:.1f}/100",'Target':'> 70'}]
-    if r.unmatched_gt: s.append({'Metric':'⚠','Value':', '.join(r.unmatched_gt),'Target':'None'})
-    m=[{'Cluster':x.cluster_label,'GT':x.gt_name,'Score':f"{x.sample_score:.1f}"} for x in r.matches]
-    progress(1.0)
-    return (audio_tuple(song.mix,song.sr),audio_tuple(rend,song.sr),pd.DataFrame(s),pd.DataFrame(m) if m else None,"","")
-def run_optimize(n,name,author,save,progress=gr.Progress()):
-    logs=[]; progress(0.0)
-    state=run_optimization(n_iterations=int(n),config_name=name or "opt",
-        author=author or "anon",save_to_hub=bool(save),log_fn=lambda m:logs.append(m))
-    progress(1.0)
-    h=[{'Iter':r.iteration,'Score':f"{r.avg_score:.1f}"} for r in state.history]
-    if state.history:
-        fig,ax=plt.subplots(figsize=(10,4)); ax.plot([r.iteration for r in state.history],[r.avg_score for r in state.history],'b-o')
-        ax.grid(True,alpha=0.3); plt.tight_layout()
-    else: fig,ax=plt.subplots(); ax.text(0.5,0.5,"No data")
-    return '\n'.join(logs),pd.DataFrame(h),fig,json.dumps(state.best_config,indent=2)
-def refresh_lb():
-    try: lb=get_leaderboard(); return pd.DataFrame(lb) if lb else pd.DataFrame(),""
-    except Exception as e: return pd.DataFrame(),str(e)
-def build_app():
-    with gr.Blocks(title="🎵 Sample Extractor",theme=gr.themes.Soft(),
-                   css=".gradio-container{max-width:1300px!important}") as app:
-        gr.Markdown("# 🎵 Sample Extractor v9\n"
-                    "**SuperFlux** onsets · **Transient NCC** (25ms attack) · "
-                    "**Mel pre-filter** · **MIDI quantization** · **Auto-Tune** with 🔒 locks")
-        with gr.Tabs():
-            with gr.Tab("🎵 Extract"):
-                audio_in=gr.Audio(sources=['upload'],type='numpy',label='Upload Audio')
-                with gr.Accordion("🔧 Stem Separation",open=False):
-                    with gr.Row():
-                        dm=gr.Dropdown(DEMUCS_MODELS,value="htdemucs_ft",label="Model")
-                        st=gr.Dropdown(['drums','bass','other','vocals','all'],value='drums',label='Stem')
-                        dsh=gr.Slider(0,5,value=1,step=1,label='Shifts')
-                        dov=gr.Slider(0.0,0.5,value=0.25,step=0.05,label='Overlap')
-                with gr.Accordion("🎯 Onset Detection",open=False):
-                    with gr.Row(): om=gr.Dropdown(['auto','percussive','harmonic','broadband'],value='auto',label='Mode')
-                    with gr.Row():
-                        od=gr.Slider(0.01,0.5,value=0.12,step=0.01,label='Delta'); lock_od=gr.Checkbox(value=False,label='🔒',scale=0)
-                    with gr.Row():
-                        ed=gr.Slider(-70,-10,value=-35,step=1,label='Energy (dB)'); lock_ed=gr.Checkbox(value=False,label='🔒',scale=0)
-                    with gr.Row():
-                        mg=gr.Slider(0.005,0.2,value=0.03,step=0.005,label='Min gap'); lock_mg=gr.Checkbox(value=False,label='🔒',scale=0)
-                    with gr.Row():
-                        pp=gr.Slider(0.0,0.05,value=0.003,step=0.001,label='Pre-pad')
-                        mnd=gr.Slider(0.005,0.2,value=0.02,step=0.005,label='Min dur')
-                        mxd=gr.Slider(0.1,5.0,value=1.5,step=0.1,label='Max dur')
-                with gr.Accordion("🔗 Clustering",open=True):
-                    with gr.Row():
-                        tmin=gr.Number(value=5,label='Target min',precision=0)
-                        tmax=gr.Number(value=20,label='Target max',precision=0)
-                        lock_tgt=gr.Checkbox(value=True,label='🔒 Lock range',scale=0)
-                    gr.Markdown("*🔒 = auto-tune keeps this value fixed*")
-                    with gr.Row():
-                        nt=gr.Slider(0.3,0.99,value=0.80,step=0.01,label='NCC threshold')
-                        atk=gr.Slider(10,100,value=25,step=5,label='Attack (ms)')
-                        lnk=gr.Dropdown(['average','complete','single'],value='average',label='Linkage')
-                with gr.Accordion("🎹 MIDI & Post",open=False):
-                    with gr.Row():
-                        syn=gr.Checkbox(value=True,label='Synthesize')
-                        qmidi=gr.Checkbox(value=True,label='Quantize MIDI')
-                        subdiv=gr.Dropdown([('8th',8),('16th',16),('32nd',32)],value=16,label='Grid')
-                with gr.Row():
-                    tune_btn=gr.Button("🎛️ Auto-Tune",variant="secondary",size="lg")
-                    extract_btn=gr.Button("🔬 Extract",variant="primary",size="lg")
-                tune_summary=gr.Markdown(""); tune_log=gr.Textbox(label="Log",lines=8,max_lines=15,visible=False)
-                summary_md=gr.Markdown("*Upload → Auto-Tune or Extract*")
-                with gr.Row():
-                    stem_out=gr.Audio(type='numpy',label='Stem',interactive=False)
-                    rend_out=gr.Audio(type='numpy',label='🔊 Reconstruction',interactive=False)
-                gr.Markdown("### Downloads")
-                with gr.Row():
-                    arc=gr.File(label="📦 ZIP",interactive=False); mid=gr.File(label="🎹 MIDI",interactive=False)
-                smp=gr.File(label="WAVs",file_count="multiple",interactive=False)
-                met=gr.Dataframe(label="Samples"); stx=gr.Textbox(visible=False)
-                dm.change(fn=lambda m:gr.update(choices=DEMUCS_STEMS.get(m,["drums","bass","other","vocals"])+["all"]),inputs=[dm],outputs=[st])
-                tune_btn.click(run_auto_tune,[audio_in,st,dm,dsh,dov,om,od,ed,mg,tmin,tmax,lock_od,lock_ed,lock_mg,lock_tgt],
-                    [od,ed,mg,tmin,tmax,tune_summary,tune_log])
-                extract_btn.click(run_extraction,[audio_in,st,dm,dsh,dov,om,od,ed,pp,mnd,mxd,mg,nt,atk,lnk,tmin,tmax,syn,qmidi,subdiv],
-                    [stem_out,summary_md,rend_out,smp,mid,arc,stx,met])
-            with gr.Tab("📊 Evaluate"):
-                with gr.Row():
-                    ep=gr.Dropdown(['rock','funk','halftime'],value='rock',label='Pattern')
-                    eb=gr.Slider(80,200,value=120,step=2,label='BPM'); ebs=gr.Slider(2,8,value=4,step=1,label='Bars')
-                with gr.Row():
-                    en=gr.Slider(0.3,0.99,value=0.80,step=0.01,label='NCC')
-                    etm=gr.Number(value=0,label='Min',precision=0); etx=gr.Number(value=0,label='Max',precision=0)
-                evb=gr.Button("🧪 Evaluate",variant="primary",size="lg")
-                with gr.Row():
-                    evm=gr.Audio(type='numpy',label='Original',interactive=False)
-                    evr=gr.Audio(type='numpy',label='Reconstruction',interactive=False)
-                evs=gr.Dataframe(); evm2=gr.Dataframe()
-                es1=gr.Textbox(visible=False); es2=gr.Textbox(visible=False)
-                evb.click(run_eval,[ep,eb,ebs,en,etm,etx],[evm,evr,evs,evm2,es1,es2])
-            with gr.Tab("🔄 Optimize"):
-                with gr.Row():
-                    on=gr.Slider(2,30,value=5,step=1,label='Iters'); ocn=gr.Textbox(value="opt",label='Name')
-                    oa=gr.Textbox(value="",label='Author'); osv=gr.Checkbox(value=True,label='Save')
-                ob=gr.Button("🚀 Run",variant="primary",size="lg")
-                ol=gr.Textbox(label="Log",lines=20,max_lines=40); oh=gr.Dataframe(); op=gr.Plot()
-                oc=gr.Code(label="Config",language="json")
-                ob.click(run_optimize,[on,ocn,oa,osv],[ol,oh,op,oc])
-            with gr.Tab("🏆 Leaderboard"):
-                lbb=gr.Button("🔄 Refresh"); lt=gr.Dataframe(); ls=gr.Textbox(visible=False)
-                lbb.click(refresh_lb,[],[lt,ls])
-    return app
-if __name__=="__main__": build_app().launch(server_name="0.0.0.0",server_port=7860)

+#!/usr/bin/env python3
+"""Custom web application for the drum sample extractor.
+Run with:
+    uvicorn app:app --host 0.0.0.0 --port 7860
 """
+from __future__ import annotations
+import json
+import shutil
+import traceback
+import uuid
+from concurrent.futures import ThreadPoolExecutor
+from dataclasses import asdict
+from pathlib import Path
+from threading import Lock
+from typing import Any
+from fastapi import FastAPI, File, Form, HTTPException, UploadFile
+from fastapi.middleware.cors import CORSMiddleware
+from fastapi.responses import FileResponse, JSONResponse
+from fastapi.staticfiles import StaticFiles
+from pipeline_runner import PipelineParams, initial_stages, run_extraction_pipeline
+from sample_extractor import DEMUCS_MODELS, DEMUCS_STEMS, cache_clear
+ROOT = Path(__file__).resolve().parent
+WEB_DIR = ROOT / "web"
+RUNS_DIR = ROOT / ".runs"
+RUNS_DIR.mkdir(exist_ok=True)
+app = FastAPI(title="Drum Sample Extractor", version="10.0.0")
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=["*"],
+    allow_credentials=False,
+    allow_methods=["*"],
+    allow_headers=["*"],
 )
+executor = ThreadPoolExecutor(max_workers=1)
+jobs_lock = Lock()
+jobs: dict[str, dict[str, Any]] = {}
+def _job_url(job_id: str, relative_path: str) -> str:
+    return f"/api/jobs/{job_id}/files/{relative_path}"
+def _serialise_job(job: dict[str, Any]) -> dict[str, Any]:
+    payload = {key: value for key, value in job.items() if key not in {"input_path", "output_dir"}}
+    if payload.get("result"):
+        result = dict(payload["result"])
+        result["file_urls"] = {key: _job_url(job["id"], path) for key, path in result.get("files", {}).items()}
+        result["samples"] = [
+            {**sample, "url": _job_url(job["id"], sample["file"])}
+            for sample in result.get("samples", [])
+        ]
+        payload["result"] = result
+    return payload
+def _update_job(job_id: str, **patch: Any) -> None:
+    with jobs_lock:
+        jobs[job_id].update(patch)
+def _append_log(job_id: str, message: str) -> None:
+    with jobs_lock:
+        jobs[job_id].setdefault("logs", []).append(message)
+def _run_job(job_id: str) -> None:
+    with jobs_lock:
+        job = jobs[job_id]
+        input_path = Path(job["input_path"])
+        output_dir = Path(job["output_dir"])
+        params = job["params"]
+        job["status"] = "running"
+    def progress(event: dict[str, Any]) -> None:
+        if "stages" in event:
+            _update_job(job_id, stages=event["stages"])
+        if event.get("stage"):
+            stage = event["stage"]
+            if stage.get("status") == "running":
+                _append_log(job_id, f"Started: {stage['label']}")
+            elif stage.get("status") == "done":
+                _append_log(job_id, f"Finished: {stage['label']} in {stage['duration_sec']:.3f}s")
     try:
+        result = run_extraction_pipeline(input_path, output_dir, PipelineParams.from_mapping(params), progress_cb=progress)
+        _update_job(job_id, status="complete", result=asdict(result), error=None)
+    except Exception as exc:  # deliberately explicit for UI diagnostics
+        _update_job(job_id, status="error", error=str(exc), traceback=traceback.format_exc())
+        _append_log(job_id, f"Error: {exc}")
+@app.get("/api/health")
+def health() -> dict[str, str]:
+    return {"status": "ok"}
+@app.get("/api/config")
+def config() -> dict[str, Any]:
+    return {
+        "demucs_models": DEMUCS_MODELS,
+        "demucs_stems": {key: value + ["all"] for key, value in DEMUCS_STEMS.items()},
+        "defaults": asdict(PipelineParams()),
+        "stages": initial_stages(),
+    }
+@app.post("/api/cache/clear")
+def clear_cache() -> dict[str, str]:
+    cache_clear()
+    return {"status": "cleared"}
+@app.post("/api/jobs")
+async def create_job(file: UploadFile = File(...), params: str = Form("{}")) -> JSONResponse:
     try:
+        parsed_params = json.loads(params)
+        validated = PipelineParams.from_mapping(parsed_params)
+    except Exception as exc:
+        raise HTTPException(status_code=400, detail=str(exc)) from exc
+    job_id = uuid.uuid4().hex[:12]
+    job_dir = RUNS_DIR / job_id
+    input_dir = job_dir / "input"
+    output_dir = job_dir / "output"
+    input_dir.mkdir(parents=True, exist_ok=True)
+    output_dir.mkdir(parents=True, exist_ok=True)
+    suffix = Path(file.filename or "input.wav").suffix or ".wav"
+    input_path = input_dir / f"source{suffix}"
+    with input_path.open("wb") as handle:
+        shutil.copyfileobj(file.file, handle)
+    job = {
+        "id": job_id,
+        "status": "pending",
+        "filename": file.filename,
+        "params": asdict(validated),
+        "stages": initial_stages(),
+        "logs": [],
+        "result": None,
+        "error": None,
+        "traceback": None,
+        "input_path": str(input_path),
+        "output_dir": str(output_dir),
+    }
+    with jobs_lock:
+        jobs[job_id] = job
+    executor.submit(_run_job, job_id)
+    return JSONResponse(_serialise_job(job), status_code=202)
+@app.get("/api/jobs/{job_id}")
+def get_job(job_id: str) -> dict[str, Any]:
+    with jobs_lock:
+        job = jobs.get(job_id)
+        if not job:
+            manifest = RUNS_DIR / job_id / "output" / "manifest.json"
+            if manifest.exists():
+                result = json.loads(manifest.read_text(encoding="utf-8"))
+                return _serialise_job(
+                    {
+                        "id": job_id,
+                        "status": "complete",
+                        "filename": None,
+                        "params": result.get("params", {}),
+                        "stages": result.get("stages", []),
+                        "logs": [],
+                        "result": result,
+                        "error": None,
+                        "traceback": None,
+                        "output_dir": str(manifest.parent),
+                    }
+                )
+            raise HTTPException(status_code=404, detail="Job not found")
+        return _serialise_job(dict(job))
+@app.get("/api/jobs/{job_id}/files/{relative_path:path}")
+def get_job_file(job_id: str, relative_path: str) -> FileResponse:
+    root = (RUNS_DIR / job_id / "output").resolve()
+    path = (root / relative_path).resolve()
+    if not str(path).startswith(str(root)) or not path.exists() or not path.is_file():
+        raise HTTPException(status_code=404, detail="File not found")
+    return FileResponse(path)
+if WEB_DIR.exists():
+    app.mount("/web", StaticFiles(directory=WEB_DIR), name="web")
+@app.get("/")
+def index() -> FileResponse:
+    index_path = WEB_DIR / "index.html"
+    if not index_path.exists():
+        raise HTTPException(status_code=500, detail="web/index.html is missing")
+    return FileResponse(index_path)

docs/API.md ADDED Viewed

	@@ -0,0 +1,140 @@

+# API documentation
+The active app is `app.py`, a FastAPI application.
+## Start server
+```bash
+uvicorn app:app --host 0.0.0.0 --port 7860
+```
+## `GET /api/health`
+Returns backend health.
+```json
+{"status":"ok"}
+```
+## `GET /api/config`
+Returns supported models, stems, default pipeline params, and stage definitions.
+```bash
+curl http://127.0.0.1:7860/api/config
+```
+## `POST /api/jobs`
+Creates an extraction job.
+Content type: `multipart/form-data`
+Fields:
+| Field | Type | Required | Description |
+|---|---|---:|---|
+| `file` | file | yes | Audio source |
+| `params` | JSON string | no | Partial or full pipeline params |
+Example:
+```bash
+curl -F 'file=@song.wav' \
+  -F 'params={"stem":"all","target_min":4,"target_max":12,"synthesize":true}' \
+  http://127.0.0.1:7860/api/jobs
+```
+Response status: `202 Accepted`
+```json
+{
+  "id": "58ca0db4ac74",
+  "status": "pending",
+  "filename": "song.wav",
+  "params": {"stem": "all"},
+  "stages": [],
+  "logs": [],
+  "result": null,
+  "error": null
+}
+```
+## `GET /api/jobs/{job_id}`
+Poll job status and retrieve results.
+Statuses:
+| Status | Meaning |
+|---|---|
+| `pending` | Job is queued |
+| `running` | Job is executing |
+| `complete` | Result and artifacts are ready |
+| `error` | Pipeline failed; `error` and `traceback` are populated |
+Completed jobs contain:
+| Key | Meaning |
+|---|---|
+| `duration_sec` | Total wall time |
+| `audio_duration_sec` | Duration of processed stem/source |
+| `realtime_factor` | `duration_sec / audio_duration_sec` |
+| `bpm` | Detected tempo |
+| `hit_count` | Number of accepted onsets/hits |
+| `cluster_count` | Number of sample clusters |
+| `stages` | Per-stage timing/status/detail list |
+| `samples` | Sample rows with score, duration, first onset, and download URL |
+| `overview` | Decimated envelope and onset markers for waveform display |
+| `files` | Relative artifact paths |
+| `file_urls` | Direct API URLs for artifacts |
+## `GET /api/jobs/{job_id}/files/{relative_path}`
+Downloads an artifact from a completed job.
+Examples:
+```bash
+curl -O http://127.0.0.1:7860/api/jobs/58ca0db4ac74/files/sample-pack.zip
+curl -O http://127.0.0.1:7860/api/jobs/58ca0db4ac74/files/reconstruction.mid
+curl -O http://127.0.0.1:7860/api/jobs/58ca0db4ac74/files/samples/hihat_open_0.wav
+```
+The endpoint prevents path traversal by resolving downloads under `.runs/<job-id>/output/`.
+## `POST /api/cache/clear`
+Clears the in-memory extraction cache.
+```bash
+curl -X POST http://127.0.0.1:7860/api/cache/clear
+```
+## Pipeline parameters
+Defined in `pipeline_runner.PipelineParams`.
+| Parameter | Default | Meaning |
+|---|---:|---|
+| `stem` | `drums` | Demucs source to extract, or `all` to bypass Demucs |
+| `demucs_model` | `htdemucs_ft` | Demucs model |
+| `demucs_shifts` | `1` | Test-time shifts for Demucs quality/speed tradeoff |
+| `demucs_overlap` | `0.25` | Demucs chunk overlap |
+| `onset_mode` | `auto` | `auto`, `percussive`, `harmonic`, or `broadband` |
+| `onset_delta` | `0.12` | Peak-pick threshold |
+| `energy_threshold_db` | `-35` | RMS gate for accepting hits |
+| `pre_pad` | `0.003` | Seconds of audio before onset |
+| `min_dur` | `0.02` | Minimum hit duration |
+| `max_dur` | `1.5` | Maximum hit duration |
+| `min_gap` | `0.03` | Minimum time between onsets |
+| `ncc_threshold` | `0.80` | Similarity threshold when not targeting cluster count |
+| `attack_ms` | `25` | Transient window used for NCC |
+| `mel_threshold` | `0.75` | Candidate prefilter threshold |
+| `linkage` | `average` | Agglomerative linkage |
+| `target_min` | `5` | Lower cluster target; `0` disables target mode |
+| `target_max` | `20` | Upper cluster target; `0` disables target mode |
+| `synthesize` | `true` | Write synthesized alternates for clusters with multiple hits |
+| `quantize_midi` | `true` | Snap MIDI notes to grid |
+| `subdivision` | `16` | MIDI grid subdivision |
+| `device` | `cpu` | Torch device for Demucs |

docs/PIPELINE_TIMING_AND_REALTIME.md ADDED Viewed

	@@ -0,0 +1,214 @@

+# Pipeline timing and near-real-time analysis
+## Measurement setup
+Benchmarks were run with `scripts/benchmark_subprocesses.py` using synthetic drum fixtures from `synth_generator.py`.
+Important constraints:
+- `stem=all` was used to bypass Demucs and measure the DSP/sample-extraction subprocesses directly.
+- The script performs one warm-up run first, so import/JIT overhead is not included in the summary.
+- Runs used 4 bars at 120 BPM across `rock`, `funk`, and `halftime` synthetic patterns.
+- The benchmark output is stored in `docs/benchmark-subprocesses.json`.
+## Measured subprocess lengths
+| Stage | Mean seconds | Median seconds | Min seconds | Max seconds |
+|---|---:|---:|---:|---:|
+| `stem` | 0.017 | 0.013 | 0.009 | 0.039 |
+| `bpm` | 0.224 | 0.223 | 0.206 | 0.241 |
+| `onsets` | 2.140 | 2.034 | 1.762 | 2.871 |
+| `classification` | 0.034 | 0.035 | 0.024 | 0.045 |
+| `clustering` | 0.496 | 0.597 | 0.059 | 0.913 |
+| `selection` | 0.499 | 0.551 | 0.311 | 0.651 |
+| `synthesis` | 0.002 | 0.002 | 0.002 | 0.003 |
+| `export` | 0.105 | 0.103 | 0.046 | 0.178 |
+Observed total runtime for warm synthetic 4-bar fixtures was roughly `0.30×–0.43×` realtime when Demucs was bypassed. In plain terms: the pure extraction stages ran faster than the audio duration on these fixtures. The first cold run can be much slower because librosa/scipy/numba-style initialization costs are paid up front.
+## Significant subprocesses
+### 1. Stem extraction / source load
+Current implementation:
+- `stem=all`: load and normalize the source audio with librosa.
+- any other stem: run Demucs via `demucs.pretrained.get_model` and `demucs.apply.apply_model`.
+Timing profile:
+- `stem=all` is near-instant after warm-up on short fixtures.
+- Demucs is the offline bottleneck and should be treated as non-realtime in this project.
+Real-time suitability: **No for Demucs, yes for direct source load.**
+Recommended strategy:
+- Keep Demucs as an explicit offline preprocessing stage.
+- Cache stem output by content hash and model parameters.
+- Let users bypass Demucs for drum loops, already-separated stems, and iterative parameter tuning.
+### 2. BPM / tempo detection
+Current implementation:
+- `librosa.onset.onset_strength`
+- `librosa.feature.tempo`
+- beat-track sanity adjustment
+Timing profile:
+- Measured around 0.22 s for ~9 s synthetic clips after warm-up.
+Real-time suitability: **Near-realtime with buffering.**
+A live version should estimate tempo over rolling windows and refine continuously. It does not need the entire file, but short windows can be unstable.
+### 3. Onset detection + slicing
+Current implementation:
+- Multiband SuperFlux-style onset envelope in `auto` mode.
+- Optional percussive/harmonic/broadband modes.
+- Peak picking and hit slicing by onset-to-next-onset boundaries.
+- Energy threshold and duration filtering.
+Timing profile:
+- This is the largest non-Demucs DSP stage in the measured benchmark: about 2.14 s mean for ~9 s fixtures.
+- It is still faster than realtime in warm synthetic tests.
+Real-time suitability: **Yes, with a rolling window and bounded lookahead.**
+Why:
+- Onset strength and peak picking are local-window operations.
+- Backtracking and next-onset slicing require a small amount of future context.
+- A live system can emit provisional hits and finalize durations once the next onset or max-duration cutoff arrives.
+### 4. Spectral rule classification
+Current implementation:
+- STFT per hit.
+- Low/mid/high energy ratios.
+- Spectral centroid, zero-crossing rate, duration rules.
+Timing profile:
+- Measured around 34 ms mean for the benchmark fixtures.
+Real-time suitability: **Yes.**
+This is cheap per hit and can run immediately after a hit segment is finalized.
+### 5. Mel fingerprinting + transient NCC clustering
+Current implementation:
+- Build mel fingerprints for hits.
+- Use cosine similarity as a prefilter.
+- Compute transient normalized cross-correlation only for candidate pairs.
+- Run agglomerative clustering on the resulting precomputed distance matrix.
+- Optionally merge singleton clusters into nearby multi-hit clusters.
+Timing profile:
+- Measured around 0.50 s mean, but depends strongly on number of hits and pair count.
+- Complexity is roughly quadratic in hit count for pairwise similarity, with mel prefiltering reducing NCC work.
+Real-time suitability: **Partially.**
+What can be realtime:
+- Mel fingerprint extraction per hit.
+- Transient NCC against a bounded set of existing cluster representatives.
+- Online assignment to existing clusters.
+What is not truly realtime in the current implementation:
+- Full agglomerative clustering over the complete distance matrix.
+- Target cluster count search through repeated clustering.
+Recommended live design:
+1. Maintain cluster prototypes: representative transient, mel centroid, count, label histogram.
+2. For each finalized hit, compute fingerprint and compare to prototypes first.
+3. Only run transient NCC against likely candidates.
+4. Assign immediately when above threshold; create a new cluster otherwise.
+5. Periodically run batch reclustering in the background to clean up early mistakes.
+### 6. Best representative selection
+Current implementation:
+- Compute sample quality score per candidate hit.
+- Choose highest-scoring hit per cluster.
+Timing profile:
+- Measured around 0.50 s mean in the benchmark.
+- Cost scales with number of hits and quality scoring work.
+Real-time suitability: **Yes as an incremental update.**
+A live version can maintain the current best hit per cluster and only rescore new arrivals or candidates whose cluster changed.
+### 7. Optional synthesis
+Current implementation:
+- Align cluster members by peak position.
+- Normalize and weighted-average hits to create an alternate synthesized sample.
+Timing profile:
+- Measured around 2 ms mean on benchmark fixtures.
+Real-time suitability: **Yes for small clusters, but better as deferred polish.**
+It is fast, but users usually do not need synthesized alternates before cluster membership stabilizes.
+### 8. Export: MIDI, reconstruction, WAVs, ZIP
+Current implementation:
+- Build MIDI notes from hits and cluster sample notes.
+- Render reconstruction with representative samples.
+- Write samples, reconstruction audio, MIDI, archive, and manifest.
+Timing profile:
+- Measured around 0.10 s mean on benchmark fixtures.
+Real-time suitability: **No for ZIP packaging; yes for preview rendering chunks.**
+The final ZIP is a completion artifact. Reconstruction can be rendered progressively for UI preview.
+## Real-time feasibility summary
+| Subprocess | Current batch status | Near-real-time feasibility | Notes |
+|---|---|---|---|
+| Source load | Fast | Yes | Direct file/stream decode is not the bottleneck |
+| Demucs stem separation | Slow/offline | No | Keep offline and cached |
+| BPM detection | Buffered batch | Partial | Rolling estimate works, exact tempo should refine over time |
+| Onset detection | Batch but local-window | Yes | Needs bounded lookahead/backtracking |
+| Hit slicing | Depends on next onset | Yes | Emit provisional segment, finalize on next onset/max duration |
+| Rule classification | Per-hit | Yes | Cheap and stateless |
+| Mel fingerprinting | Per-hit | Yes | Compute once per finalized hit |
+| Transient NCC | Pairwise batch | Partial | Realtime against prototypes; batch all-pairs is not realtime |
+| Agglomerative clustering | Batch | No | Replace or complement with online prototype assignment |
+| Representative selection | Batch per cluster | Yes | Keep best-so-far per cluster |
+| Synthesis | Batch per cluster | Partial | Can update lazily after cluster changes |
+| MIDI/reconstruction preview | Batch export | Partial | Preview can stream; final MIDI is a completion artifact |
+| ZIP packaging | Final artifact | No | Keep as final step |
+## Recommended next technical move
+Implement a second clustering mode named `online`:
+```text
+onset event → segment finalized → classify → mel fingerprint → candidate prototypes → transient NCC → assign/create cluster → update best representative → UI update
+```
+Keep the existing agglomerative mode as `batch-quality`. Use online mode for immediate feedback and batch mode for final high-quality export.

docs/PROJECT_REVIEW.md ADDED Viewed

	@@ -0,0 +1,89 @@

+# Project review
+## Goal
+Review the uploaded drum sample extractor, identify architectural and UX gaps, replace the Gradio UI with a custom frontend, and document the extraction pipeline with timing and real-time feasibility notes.
+## Success checklist
+- The active app is no longer Gradio-based.
+- The core extraction process is callable independently of the UI.
+- Every significant extraction subprocess is timed.
+- Runtime artifacts are stable and downloadable.
+- Documentation explains current behavior, tradeoffs, and remaining work.
+- Legacy files are preserved but not part of the active path.
+## Existing project structure before changes
+The archive contained a compact Python project:
+| File | Role |
+|---|---|
+| `app.py` | Active Gradio UI, parameter controls, extraction, eval, optimization tabs |
+| `app_v2.py` | Older Gradio UI variant |
+| `sample_extractor.py` | Current extraction pipeline: Demucs/load, SuperFlux onsets, rule labels, mel+NCC clustering, MIDI/export |
+| `drum_extractor.py` | Older CLI-oriented pipeline with CLAP-era comments and broader experimental code |
+| `synth_generator.py` | Synthetic drum fixture generator |
+| `evaluation.py` | Ground-truth matching and scoring |
+| `optimizer.py`, `optimizer_v2.py` | Parameter search experiments |
+| `quality_metrics.py` | Completeness, cleanness, onset, reference metrics |
+| `config_store.py` | Config persistence and leaderboard helpers |
+## Key findings
+1. `sample_extractor.py` is the right core to keep. It is compact, stage-oriented, and already exposes most of the operations needed by a proper app/API.
+2. `app.py` mixed UI code, runtime hotfixing, file conversion, extraction orchestration, and artifact packaging. That made it hard to test or replace the UI.
+3. The previous Gradio UI was fast to build but not ideal for this use-case: extraction is a staged process with logs, timing, waveform review, downloadable artifacts, and a dense parameter surface that benefits from a purpose-built layout.
+4. The previous `app.py` patched `sample_extractor.py` at runtime to fix `_sf(..., lag=2)` vs `_sf(..., l=2)`. The underlying bug is now fixed directly in `sample_extractor.py`.
+5. There was no meaningful project documentation, no API documentation, and no benchmark/timing documentation.
+6. `requirements.txt` still treated Gradio as first-class. The active app now uses FastAPI; Gradio dependencies have been moved to `requirements-legacy-gradio.txt`.
+7. `.runs/`, generated audio, MIDI, ZIP files, and local caches needed explicit ignore rules.
+## Changes made
+| Area | Change |
+|---|---|
+| Active UI | Replaced Gradio with `app.py` FastAPI + custom static frontend in `web/` |
+| Pipeline | Added `pipeline_runner.py` with validated params, stage timing, progress callbacks, manifests, and artifact writing |
+| Legacy | Moved old Gradio apps into `legacy/` |
+| Bugfix | Fixed the `_sf(yh, lag=2, ms=5)` keyword mismatch in `sample_extractor.py` |
+| API | Added job creation, polling, config, health, cache clear, and safe artifact download endpoints |
+| UX | Added drag/drop upload, dense controls, stage timeline, logs, waveform/onset overview, audio previews, sample table, downloads |
+| Benchmarking | Added `scripts/benchmark_subprocesses.py` and committed benchmark output JSON |
+| Packaging | Added Dockerfile, updated requirements, added `.gitignore` |
+| Docs | Added project review, timing/real-time analysis, API docs, UI notes, and remaining work |
+## Current architecture
+```text
+browser UI in web/
+        │
+        ▼
+FastAPI app.py
+        │
+        ▼
+pipeline_runner.py
+        │
+        ▼
+sample_extractor.py + quality_metrics.py
+        │
+        ▼
+.runs/<job-id>/output/{samples, MIDI, WAV, ZIP, manifest.json}
+```
+The UI only talks to the API. The API only calls the timed runner. The runner is now independently testable and usable from scripts.
+## Risks and limitations
+- Demucs can dominate runtime and may require a model download on first use.
+- The current job store is in-memory. Completed jobs can be reloaded from `manifest.json`, but queued/running job state is lost on process restart.
+- The clustering implementation is still batch-oriented. It can be optimized or adapted incrementally, but current agglomerative clustering is not a streaming algorithm.
+- There is no authentication or quota control; this is intended as a local/Hugging Face style app, not a public multi-tenant service.
+- The browser UI is currently no-build static JavaScript/CSS. That is intentional for deployability, but a larger UI should eventually move to TypeScript with a real component/test setup.
+## Verification performed
+- Python syntax compilation for `app.py`, `pipeline_runner.py`, `sample_extractor.py`, and benchmark scripts.
+- FastAPI `TestClient` checks for `/`, `/api/health`, and `/api/config`.
+- End-to-end API job test using a synthetic drum fixture with `stem=all`.
+- Synthetic subprocess benchmark across rock, funk, and halftime patterns.

docs/REMAINING_WORK.md ADDED Viewed

	@@ -0,0 +1,27 @@

+# Remaining work
+## Highest value next steps
+1. **Online clustering mode**: add prototype-based incremental clustering for immediate feedback, while keeping agglomerative clustering as the final-quality batch mode.
+2. **Run history**: index `.runs/*/output/manifest.json` so prior runs are browsable and comparable in the UI.
+3. **Waveform editing**: add hit audition, onset adjustment, cluster merge/split, and label reassignment.
+4. **Demucs caching**: persist stem cache on disk by input digest + model + stem + shifts + overlap.
+5. **True progress reporting**: expose lower-level progress inside Demucs and pairwise clustering, not only stage transitions.
+6. **Benchmark panel**: add an in-app benchmark view that can run synthetic fixtures and compare parameter profiles.
+7. **Frontend test harness**: move the no-build UI to TypeScript once the interaction model stabilizes.
+## Known constraints
+- Demucs is not a realtime stage and should stay explicitly offline.
+- Agglomerative clustering is a batch algorithm; it should not be sold as realtime.
+- First run on a fresh environment can be slower due to imports, model download, and library initialization.
+- The current job queue is process-local and single-worker. That is fine for local use, but not enough for a shared public deployment.
+## Suggested implementation order
+1. Add disk cache for source decode/stem separation.
+2. Add run history index and UI browser.
+3. Add hit audition from `overview.onsets` and sample rows.
+4. Implement online prototype clustering.
+5. Add comparison mode between two job manifests.
+6. Add SSE log/progress streaming.

docs/UI_REPLACEMENT.md ADDED Viewed

	@@ -0,0 +1,66 @@

+# Custom UI replacement
+## What changed
+The active interface is now a custom browser UI served from `web/` by the FastAPI app in `app.py`. The old Gradio files were moved to `legacy/`.
+## UX goals
+1. Make the process feel like a sample-extraction workstation, not a generic notebook form.
+2. Keep upload, controls, pipeline status, logs, waveform review, audio previews, downloads, and sample rows visible without tab hunting.
+3. Show stage timing as a first-class result, because extraction quality and speed tradeoffs matter.
+4. Make `stem=all` obvious for fast iteration when Demucs is unnecessary.
+5. Keep the frontend deployable without a JavaScript build step.
+## UI structure
+| Area | Purpose |
+|---|---|
+| Hero/status | Backend readiness and product framing |
+| Source panel | Drag/drop upload and source audio preview |
+| Controls panel | Stem, onset, clustering, MIDI, and synthesis parameters |
+| Pipeline panel | Stage statuses, durations, and live logs |
+| Result panel | Summary, waveform/onsets, downloads, stem/reconstruction audio, sample table |
+## Frontend implementation
+Files:
+- `web/index.html`
+- `web/styles.css`
+- `web/app.js`
+The frontend uses modern browser APIs directly:
+- `fetch` for API calls
+- `FormData` for upload
+- `<audio>` for previews
+- `<canvas>` for waveform/onset visualization
+- CSS grid, responsive layout, custom properties, and backdrop filters for layout/polish
+No Gradio runtime, iframe, or generated UI framework is involved.
+## Backend integration
+The frontend creates a job with `POST /api/jobs`, then polls `GET /api/jobs/{id}` until completion. Completed jobs expose direct download URLs for:
+- sample pack ZIP
+- MIDI reconstruction
+- stem WAV
+- reconstruction WAV
+- individual sample WAVs
+## Why polling instead of websockets/SSE
+Polling is the simplest robust option here because the current pipeline is CPU-heavy and mostly stage-based. The UI polls every 800 ms, which is enough to show stage transitions and logs without introducing websocket lifecycle complexity.
+Future improvement: use Server-Sent Events for lower-latency log streaming once the backend has a persistent job store.
+## Remaining UI improvements
+- Add waveform zoom and click-to-audition individual detected hits.
+- Add inline controls for reassigning sample labels and merging/splitting clusters.
+- Add A/B comparison between parameter runs.
+- Add downloadable timing report per job.
+- Add persistent run history browser for `.runs/`.
+- Add online clustering mode for near-realtime progressive preview.

docs/benchmark-subprocesses.json ADDED Viewed

	@@ -0,0 +1,476 @@

+{
+  "runs": [
+    {
+      "pattern": "rock",
+      "bars": 4,
+      "bpm": 120.0,
+      "run_index": 0,
+      "audio_duration_sec": 8.75,
+      "total_duration_sec": 2.594698,
+      "realtime_factor": 0.296537,
+      "hit_count": 28,
+      "cluster_count": 1,
+      "stages": [
+        {
+          "key": "stem",
+          "label": "Stem extraction / source load",
+          "duration_sec": 0.014633260999971753,
+          "status": "done",
+          "detail": "loaded full mix"
+        },
+        {
+          "key": "bpm",
+          "label": "Tempo detection",
+          "duration_sec": 0.23692302500001006,
+          "status": "done",
+          "detail": "120.2 BPM"
+        },
+        {
+          "key": "onsets",
+          "label": "Onset detection + slicing",
+          "duration_sec": 1.762329765000004,
+          "status": "done",
+          "detail": "28 hits"
+        },
+        {
+          "key": "classification",
+          "label": "Spectral rule classification",
+          "duration_sec": 0.02908633100003044,
+          "status": "done",
+          "detail": "bright:9, cymbal:1, hihat_closed:1, hihat_open:15, mid:2"
+        },
+        {
+          "key": "clustering",
+          "label": "Mel fingerprint + transient NCC clustering",
+          "duration_sec": 0.05944011799999771,
+          "status": "done",
+          "detail": "1 clusters"
+        },
+        {
+          "key": "selection",
+          "label": "Best representative scoring",
+          "duration_sec": 0.31093429700001707,
+          "status": "done",
+          "detail": "quality-scored representatives"
+        },
+        {
+          "key": "synthesis",
+          "label": "Optional sample synthesis",
+          "duration_sec": 0.0028187070000171843,
+          "status": "done",
+          "detail": "1 synthesized alternates"
+        },
+        {
+          "key": "export",
+          "label": "MIDI, reconstruction, WAV, ZIP export",
+          "duration_sec": 0.1779485609999938,
+          "status": "done",
+          "detail": "1 WAVs + MIDI + ZIP"
+        }
+      ]
+    },
+    {
+      "pattern": "funk",
+      "bars": 4,
+      "bpm": 120.0,
+      "run_index": 0,
+      "audio_duration_sec": 8.874989,
+      "total_duration_sec": 3.790648,
+      "realtime_factor": 0.427116,
+      "hit_count": 53,
+      "cluster_count": 2,
+      "stages": [
+        {
+          "key": "stem",
+          "label": "Stem extraction / source load",
+          "duration_sec": 0.009321340000042255,
+          "status": "done",
+          "detail": "loaded full mix"
+        },
+        {
+          "key": "bpm",
+          "label": "Tempo detection",
+          "duration_sec": 0.23110938799999303,
+          "status": "done",
+          "detail": "161.5 BPM"
+        },
+        {
+          "key": "onsets",
+          "label": "Onset detection + slicing",
+          "duration_sec": 2.1605432889999747,
+          "status": "done",
+          "detail": "53 hits"
+        },
+        {
+          "key": "classification",
+          "label": "Spectral rule classification",
+          "duration_sec": 0.04475730899997643,
+          "status": "done",
+          "detail": "bright:25, hihat_closed:18, hihat_open:7, mid:3"
+        },
+        {
+          "key": "clustering",
+          "label": "Mel fingerprint + transient NCC clustering",
+          "duration_sec": 0.6768225310000275,
+          "status": "done",
+          "detail": "2 clusters"
+        },
+        {
+          "key": "selection",
+          "label": "Best representative scoring",
+          "duration_sec": 0.559724416999984,
+          "status": "done",
+          "detail": "quality-scored representatives"
+        },
+        {
+          "key": "synthesis",
+          "label": "Optional sample synthesis",
+          "duration_sec": 0.0024601989999837315,
+          "status": "done",
+          "detail": "2 synthesized alternates"
+        },
+        {
+          "key": "export",
+          "label": "MIDI, reconstruction, WAV, ZIP export",
+          "duration_sec": 0.10532420399999864,
+          "status": "done",
+          "detail": "2 WAVs + MIDI + ZIP"
+        }
+      ]
+    },
+    {
+      "pattern": "halftime",
+      "bars": 4,
+      "bpm": 120.0,
+      "run_index": 0,
+      "audio_duration_sec": 8.874989,
+      "total_duration_sec": 3.701891,
+      "realtime_factor": 0.417115,
+      "hit_count": 66,
+      "cluster_count": 2,
+      "stages": [
+        {
+          "key": "stem",
+          "label": "Stem extraction / source load",
+          "duration_sec": 0.009298575000002529,
+          "status": "done",
+          "detail": "loaded full mix"
+        },
+        {
+          "key": "bpm",
+          "label": "Tempo detection",
+          "duration_sec": 0.21581650399997443,
+          "status": "done",
+          "detail": "120.2 BPM"
+        },
+        {
+          "key": "onsets",
+          "label": "Onset detection + slicing",
+          "duration_sec": 1.9768937550000487,
+          "status": "done",
+          "detail": "66 hits"
+        },
+        {
+          "key": "classification",
+          "label": "Spectral rule classification",
+          "duration_sec": 0.03783250899999757,
+          "status": "done",
+          "detail": "bright:11, cymbal:2, hihat_closed:48, hihat_open:5"
+        },
+        {
+          "key": "clustering",
+          "label": "Mel fingerprint + transient NCC clustering",
+          "duration_sec": 0.7498706449999872,
+          "status": "done",
+          "detail": "2 clusters"
+        },
+        {
+          "key": "selection",
+          "label": "Best representative scoring",
+          "duration_sec": 0.6169061510000233,
+          "status": "done",
+          "detail": "quality-scored representatives"
+        },
+        {
+          "key": "synthesis",
+          "label": "Optional sample synthesis",
+          "duration_sec": 0.0028750459999855593,
+          "status": "done",
+          "detail": "2 synthesized alternates"
+        },
+        {
+          "key": "export",
+          "label": "MIDI, reconstruction, WAV, ZIP export",
+          "duration_sec": 0.09185817900004167,
+          "status": "done",
+          "detail": "2 WAVs + MIDI + ZIP"
+        }
+      ]
+    },
+    {
+      "pattern": "rock",
+      "bars": 4,
+      "bpm": 120.0,
+      "run_index": 1,
+      "audio_duration_sec": 8.75,
+      "total_duration_sec": 2.848686,
+      "realtime_factor": 0.325564,
+      "hit_count": 24,
+      "cluster_count": 1,
+      "stages": [
+        {
+          "key": "stem",
+          "label": "Stem extraction / source load",
+          "duration_sec": 0.03869248300003392,
+          "status": "done",
+          "detail": "loaded full mix"
+        },
+        {
+          "key": "bpm",
+          "label": "Tempo detection",
+          "duration_sec": 0.24107510999999704,
+          "status": "done",
+          "detail": "120.2 BPM"
+        },
+        {
+          "key": "onsets",
+          "label": "Onset detection + slicing",
+          "duration_sec": 2.0721967459999746,
+          "status": "done",
+          "detail": "24 hits"
+        },
+        {
+          "key": "classification",
+          "label": "Spectral rule classification",
+          "duration_sec": 0.024016725000024053,
+          "status": "done",
+          "detail": "bright:7, hihat_closed:2, hihat_open:15"
+        },
+        {
+          "key": "clustering",
+          "label": "Mel fingerprint + transient NCC clustering",
+          "duration_sec": 0.05910233800000242,
+          "status": "done",
+          "detail": "1 clusters"
+        },
+        {
+          "key": "selection",
+          "label": "Best representative scoring",
+          "duration_sec": 0.3106304350000073,
+          "status": "done",
+          "detail": "quality-scored representatives"
+        },
+        {
+          "key": "synthesis",
+          "label": "Optional sample synthesis",
+          "duration_sec": 0.0015013799999792354,
+          "status": "done",
+          "detail": "1 synthesized alternates"
+        },
+        {
+          "key": "export",
+          "label": "MIDI, reconstruction, WAV, ZIP export",
+          "duration_sec": 0.10095534999999245,
+          "status": "done",
+          "detail": "1 WAVs + MIDI + ZIP"
+        }
+      ]
+    },
+    {
+      "pattern": "funk",
+      "bars": 4,
+      "bpm": 120.0,
+      "run_index": 1,
+      "audio_duration_sec": 8.874989,
+      "total_duration_sec": 3.416797,
+      "realtime_factor": 0.384992,
+      "hit_count": 52,
+      "cluster_count": 3,
+      "stages": [
+        {
+          "key": "stem",
+          "label": "Stem extraction / source load",
+          "duration_sec": 0.011181277999980921,
+          "status": "done",
+          "detail": "loaded full mix"
+        },
+        {
+          "key": "bpm",
+          "label": "Tempo detection",
+          "duration_sec": 0.20633040499996014,
+          "status": "done",
+          "detail": "120.2 BPM"
+        },
+        {
+          "key": "onsets",
+          "label": "Onset detection + slicing",
+          "duration_sec": 1.9962494719999881,
+          "status": "done",
+          "detail": "52 hits"
+        },
+        {
+          "key": "classification",
+          "label": "Spectral rule classification",
+          "duration_sec": 0.03461634600000707,
+          "status": "done",
+          "detail": "bright:23, cymbal:3, hihat_closed:15, hihat_open:8, mid:3"
+        },
+        {
+          "key": "clustering",
+          "label": "Mel fingerprint + transient NCC clustering",
+          "duration_sec": 0.51767344000001,
+          "status": "done",
+          "detail": "3 clusters"
+        },
+        {
+          "key": "selection",
+          "label": "Best representative scoring",
+          "duration_sec": 0.5431782379999959,
+          "status": "done",
+          "detail": "quality-scored representatives"
+        },
+        {
+          "key": "synthesis",
+          "label": "Optional sample synthesis",
+          "duration_sec": 0.001988787999948727,
+          "status": "done",
+          "detail": "3 synthesized alternates"
+        },
+        {
+          "key": "export",
+          "label": "MIDI, reconstruction, WAV, ZIP export",
+          "duration_sec": 0.10504587100001572,
+          "status": "done",
+          "detail": "3 WAVs + MIDI + ZIP"
+        }
+      ]
+    },
+    {
+      "pattern": "halftime",
+      "bars": 4,
+      "bpm": 120.0,
+      "run_index": 1,
+      "audio_duration_sec": 8.874989,
+      "total_duration_sec": 4.750472,
+      "realtime_factor": 0.535265,
+      "hit_count": 64,
+      "cluster_count": 1,
+      "stages": [
+        {
+          "key": "stem",
+          "label": "Stem extraction / source load",
+          "duration_sec": 0.016472632999978032,
+          "status": "done",
+          "detail": "loaded full mix"
+        },
+        {
+          "key": "bpm",
+          "label": "Tempo detection",
+          "duration_sec": 0.2141354419999857,
+          "status": "done",
+          "detail": "120.2 BPM"
+        },
+        {
+          "key": "onsets",
+          "label": "Onset detection + slicing",
+          "duration_sec": 2.8706004370000073,
+          "status": "done",
+          "detail": "64 hits"
+        },
+        {
+          "key": "classification",
+          "label": "Spectral rule classification",
+          "duration_sec": 0.036172296999950504,
+          "status": "done",
+          "detail": "bright:11, cymbal:2, hihat_closed:45, hihat_open:4, mid:2"
+        },
+        {
+          "key": "clustering",
+          "label": "Mel fingerprint + transient NCC clustering",
+          "duration_sec": 0.9130003360000387,
+          "status": "done",
+          "detail": "1 clusters"
+        },
+        {
+          "key": "selection",
+          "label": "Best representative scoring",
+          "duration_sec": 0.6508792970000172,
+          "status": "done",
+          "detail": "quality-scored representatives"
+        },
+        {
+          "key": "synthesis",
+          "label": "Optional sample synthesis",
+          "duration_sec": 0.0025003810000043813,
+          "status": "done",
+          "detail": "1 synthesized alternates"
+        },
+        {
+          "key": "export",
+          "label": "MIDI, reconstruction, WAV, ZIP export",
+          "duration_sec": 0.04621197200003735,
+          "status": "done",
+          "detail": "1 WAVs + MIDI + ZIP"
+        }
+      ]
+    }
+  ],
+  "summary": [
+    {
+      "stage": "stem",
+      "mean_sec": 0.0166,
+      "median_sec": 0.012907,
+      "min_sec": 0.009299,
+      "max_sec": 0.038692
+    },
+    {
+      "stage": "bpm",
+      "mean_sec": 0.224232,
+      "median_sec": 0.223463,
+      "min_sec": 0.20633,
+      "max_sec": 0.241075
+    },
+    {
+      "stage": "onsets",
+      "mean_sec": 2.139802,
+      "median_sec": 2.034223,
+      "min_sec": 1.76233,
+      "max_sec": 2.8706
+    },
+    {
+      "stage": "classification",
+      "mean_sec": 0.034414,
+      "median_sec": 0.035394,
+      "min_sec": 0.024017,
+      "max_sec": 0.044757
+    },
+    {
+      "stage": "clustering",
+      "mean_sec": 0.495985,
+      "median_sec": 0.597248,
+      "min_sec": 0.059102,
+      "max_sec": 0.913
+    },
+    {
+      "stage": "selection",
+      "mean_sec": 0.498709,
+      "median_sec": 0.551451,
+      "min_sec": 0.31063,
+      "max_sec": 0.650879
+    },
+    {
+      "stage": "synthesis",
+      "mean_sec": 0.002357,
+      "median_sec": 0.00248,
+      "min_sec": 0.001501,
+      "max_sec": 0.002875
+    },
+    {
+      "stage": "export",
+      "mean_sec": 0.104557,
+      "median_sec": 0.103001,
+      "min_sec": 0.046212,
+      "max_sec": 0.177949
+    }
+  ]
+}

legacy/gradio_app.py ADDED Viewed

	@@ -0,0 +1,259 @@

+"""
+Gradio UI — Sample Extractor v9.
+SuperFlux onsets, transient NCC, mel pre-filter, MIDI quantization, param locking.
+"""
+import os, sys
+sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
+# ─── HOTFIX: patch _sf() keyword argument bug ────────────────────────────────
+_src = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'sample_extractor.py')
+with open(_src, 'r') as _f: _content = _f.read()
+if '_sf(yh,lag=2,ms=5)' in _content:
+    _content = _content.replace('_sf(yh,lag=2,ms=5)', '_sf(yh,l=2,ms=5)')
+    with open(_src, 'w') as _f: _f.write(_content)
+    print("[HOTFIX] Fixed _sf() kwarg: lag=2 → l=2")
+del _src, _content
+# ──────────────────────────────────────────────────────────────────────────────
+import gradio as gr
+import numpy as np, pandas as pd, json, tempfile
+import soundfile as sf, librosa
+import matplotlib; matplotlib.use('Agg')
+import matplotlib.pyplot as plt
+from sample_extractor import (
+    extract_stem, detect_onsets, classify_hits,
+    cluster_hits, select_best, synthesize_from_cluster,
+    sample_quality_score, export_midi, detect_bpm,
+    render_midi_with_samples, build_archive, cache_clear, auto_tune,
+    DEMUCS_MODELS, DEMUCS_STEMS,
+)
+from synth_generator import generate_test_song
+from evaluation import evaluate_extraction
+from config_store import PipelineConfig, get_leaderboard
+from optimizer_v2 import run_optimization
+def audio_tuple(a, sr):
+    a = a.astype(np.float32); pk = np.abs(a).max()
+    if pk > 0: a = a / pk * 0.95
+    return (sr, a)
+def run_auto_tune(audio_in, stem_choice, demucs_model, demucs_shifts, demucs_overlap,
+                  onset_mode, cur_delta, cur_energy, cur_gap, cur_tmin, cur_tmax,
+                  lock_delta, lock_energy, lock_gap, lock_targets, progress=gr.Progress()):
+    if audio_in is None: return [gr.update()]*5 + ["Upload audio first", ""]
+    locks = {}
+    if lock_delta: locks['onset_delta'] = float(cur_delta)
+    if lock_energy: locks['energy_threshold_db'] = float(cur_energy)
+    if lock_gap: locks['min_gap'] = float(cur_gap)
+    if lock_targets: locks['target_min']=int(cur_tmin); locks['target_max']=int(cur_tmax)
+    progress(0.0); sr_in,data=audio_in; data=data.astype(np.float32)
+    if data.ndim>1: data=data.mean(axis=1)
+    pk=np.abs(data).max()
+    if pk>0: data/=pk
+    with tempfile.NamedTemporaryFile(suffix='.wav',delete=False) as f:
+        sf.write(f.name,data,sr_in); tmp=f.name
+    try:
+        progress(0.05,desc=f"Stem..."); sa,ssr=extract_stem(tmp,stem=stem_choice,device="cpu",
+            model_name=demucs_model,shifts=int(demucs_shifts),overlap=float(demucs_overlap))
+        ld=', '.join(f'{k}={v}' for k,v in locks.items()) if locks else 'none'
+        progress(0.15,desc=f"Tuning (🔒 {ld})...")
+        bp,bs,log=auto_tune(sa,ssr,mode=onset_mode,locks=locks)
+        progress(1.0)
+        lt='\n'.join(log[-30:])
+        li=f"🔒 Locked: {ld}" if locks else "All params free"
+        sm=f"**Score: {bs:.1f}/100** · {li}\n\nClick **Extract** to use these settings."
+        return [
+            gr.update(value=bp['onset_delta']) if not lock_delta else gr.update(),
+            gr.update(value=bp['energy_threshold_db']) if not lock_energy else gr.update(),
+            gr.update(value=bp['min_gap']) if not lock_gap else gr.update(),
+            gr.update(value=bp.get('target_min',5)) if not lock_targets else gr.update(),
+            gr.update(value=bp.get('target_max',20)) if not lock_targets else gr.update(),
+            sm, lt]
+    finally: os.unlink(tmp)
+def run_extraction(audio_in, stem_choice, demucs_model, demucs_shifts, demucs_overlap,
+                   onset_mode, onset_delta, energy_db, pre_pad, min_dur, max_dur, min_gap,
+                   ncc_threshold, attack_ms, linkage, target_min, target_max,
+                   do_synthesize, quantize_midi, subdivision, progress=gr.Progress()):
+    if audio_in is None: return [None]*8
+    progress(0.0); sr_in,data=audio_in; data=data.astype(np.float32)
+    if data.ndim>1: data=data.mean(axis=1)
+    pk=np.abs(data).max()
+    if pk>0: data/=pk
+    with tempfile.NamedTemporaryFile(suffix='.wav',delete=False) as f:
+        sf.write(f.name,data,sr_in); tmp=f.name
+    try:
+        progress(0.05,desc=f"Stem ({demucs_model})...")
+        sa,ssr=extract_stem(tmp,stem=stem_choice,device="cpu",
+            model_name=demucs_model,shifts=int(demucs_shifts),overlap=float(demucs_overlap))
+        progress(0.15,desc="BPM..."); bpm=detect_bpm(sa,ssr)
+        progress(0.25,desc="Onsets...")
+        hits=detect_onsets(sa,ssr,mode=onset_mode,onset_delta=float(onset_delta),
+            energy_threshold_db=float(energy_db),pre_pad=float(pre_pad),
+            min_dur=float(min_dur),max_dur=float(max_dur),min_gap=float(min_gap))
+        if not hits:
+            return (audio_tuple(sa,ssr),f"**BPM: {bpm}** — No hits.",None,None,None,None,"",pd.DataFrame())
+        progress(0.35,desc="Classify..."); hits=classify_hits(hits)
+        progress(0.45,desc="Cluster...")
+        cl=cluster_hits(hits,audio=sa,sr=ssr,ncc_threshold=float(ncc_threshold),
+            attack_ms=float(attack_ms),target_min=int(target_min),target_max=int(target_max),linkage=str(linkage))
+        progress(0.65,desc="Select..."); select_best(cl)
+        if do_synthesize:
+            progress(0.7,desc="Synth...")
+            for c in cl:
+                if c.count>=2: c.synthesized=synthesize_from_cluster(c)
+        progress(0.75,desc="MIDI..."); mp=tempfile.mktemp(suffix='.mid')
+        export_midi(cl,mp,bpm=bpm,quantize=bool(quantize_midi),subdivision=int(subdivision))
+        progress(0.8,desc="Render..."); rend=render_midi_with_samples(cl,sr=ssr)
+        progress(0.85,desc="Package...")
+        sd=tempfile.mkdtemp(); sp=[]
+        for c in sorted(cl,key=lambda x:x.count,reverse=True):
+            p=os.path.join(sd,f"{c.label}.wav"); c.best_hit.save(p); sp.append(p)
+        zp=build_archive(cl,bpm,ssr,midi_path=mp,rendered_audio=rend)
+        rows=[]
+        for c in sorted(cl,key=lambda x:x.count,reverse=True):
+            b=c.best_hit; sc=sample_quality_score(b.audio,b.sr,c.label.rsplit('_',1)[0])
+            rows.append({'Sample':c.label,'Hits':c.count,'MIDI':c.midi_note,
+                'Score':f"{sc['total']:.1f}",'Clean':f"{sc['cleanness']:.2f}",
+                'Complete':f"{sc['completeness']:.2f}",
+                'Dur':f"{b.duration*1000:.0f}ms",
+                'First':f"{sorted(h.onset_time for h in c.hits)[0]:.2f}s"})
+        sm=f"**BPM: {bpm}** · **{len(cl)} samples** from {len(hits)} hits\n\n"
+        sm+=f"`{demucs_model}` · δ=`{onset_delta}` · E=`{energy_db}dB` · attack=`{attack_ms}ms`"
+        if int(target_min)>0 and int(target_max)>0: sm+=f" · clusters `{int(target_min)}–{int(target_max)}`"
+        if quantize_midi: sm+=f" · MIDI 1/{int(subdivision)}"
+        sm+="\n\n| Sample | Hits | MIDI |\n|---|---|---|\n"
+        for c in sorted(cl,key=lambda x:x.count,reverse=True): sm+=f"| {c.label} | {c.count} | {c.midi_note} |\n"
+        progress(1.0)
+        return (audio_tuple(sa,ssr),sm,audio_tuple(rend,ssr),sp,mp,zp,"",pd.DataFrame(rows))
+    finally: os.unlink(tmp)
+def run_eval(pattern,bpm,bars,ncc_threshold,target_min,target_max,progress=gr.Progress()):
+    progress(0.0); song=generate_test_song(pattern_name=pattern,bars=int(bars),bpm=float(bpm),variation='medium',seed=42)
+    dbpm=detect_bpm(song.drums_only,song.sr); progress(0.2)
+    hits=detect_onsets(song.drums_only,song.sr)
+    if not hits: return None,None,None,None,"",""
+    hits=classify_hits(hits)
+    cl=cluster_hits(hits,audio=song.drums_only,sr=song.sr,ncc_threshold=float(ncc_threshold),
+                     target_min=int(target_min),target_max=int(target_max))
+    select_best(cl)
+    for c in cl:
+        if c.count>=2: c.synthesized=synthesize_from_cluster(c)
+    progress(0.5); rend=render_midi_with_samples(cl,sr=song.sr); progress(0.6)
+    gt={n:s.audio for n,s in song.samples.items()}
+    gh=[{'sample':h.sample_name,'onset':h.onset_time,'velocity':h.velocity} for h in song.hits]
+    r=evaluate_extraction(cl,gt,gh,song.sr,hits)
+    s=[{'Metric':'BPM','Value':f"{dbpm}",'Target':f"{song.bpm}"},
+       {'Metric':'Clusters','Value':str(len(cl)),'Target':str(len(gt))},
+       {'Metric':'Score','Value':f"{r.overall_score:.1f}/100",'Target':'> 70'}]
+    if r.unmatched_gt: s.append({'Metric':'⚠','Value':', '.join(r.unmatched_gt),'Target':'None'})
+    m=[{'Cluster':x.cluster_label,'GT':x.gt_name,'Score':f"{x.sample_score:.1f}"} for x in r.matches]
+    progress(1.0)
+    return (audio_tuple(song.mix,song.sr),audio_tuple(rend,song.sr),pd.DataFrame(s),pd.DataFrame(m) if m else None,"","")
+def run_optimize(n,name,author,save,progress=gr.Progress()):
+    logs=[]; progress(0.0)
+    state=run_optimization(n_iterations=int(n),config_name=name or "opt",
+        author=author or "anon",save_to_hub=bool(save),log_fn=lambda m:logs.append(m))
+    progress(1.0)
+    h=[{'Iter':r.iteration,'Score':f"{r.avg_score:.1f}"} for r in state.history]
+    if state.history:
+        fig,ax=plt.subplots(figsize=(10,4)); ax.plot([r.iteration for r in state.history],[r.avg_score for r in state.history],'b-o')
+        ax.grid(True,alpha=0.3); plt.tight_layout()
+    else: fig,ax=plt.subplots(); ax.text(0.5,0.5,"No data")
+    return '\n'.join(logs),pd.DataFrame(h),fig,json.dumps(state.best_config,indent=2)
+def refresh_lb():
+    try: lb=get_leaderboard(); return pd.DataFrame(lb) if lb else pd.DataFrame(),""
+    except Exception as e: return pd.DataFrame(),str(e)
+def build_app():
+    with gr.Blocks(title="🎵 Sample Extractor",theme=gr.themes.Soft(),
+                   css=".gradio-container{max-width:1300px!important}") as app:
+        gr.Markdown("# 🎵 Sample Extractor v9\n"
+                    "**SuperFlux** onsets · **Transient NCC** (25ms attack) · "
+                    "**Mel pre-filter** · **MIDI quantization** · **Auto-Tune** with 🔒 locks")
+        with gr.Tabs():
+            with gr.Tab("🎵 Extract"):
+                audio_in=gr.Audio(sources=['upload'],type='numpy',label='Upload Audio')
+                with gr.Accordion("🔧 Stem Separation",open=False):
+                    with gr.Row():
+                        dm=gr.Dropdown(DEMUCS_MODELS,value="htdemucs_ft",label="Model")
+                        st=gr.Dropdown(['drums','bass','other','vocals','all'],value='drums',label='Stem')
+                        dsh=gr.Slider(0,5,value=1,step=1,label='Shifts')
+                        dov=gr.Slider(0.0,0.5,value=0.25,step=0.05,label='Overlap')
+                with gr.Accordion("🎯 Onset Detection",open=False):
+                    with gr.Row(): om=gr.Dropdown(['auto','percussive','harmonic','broadband'],value='auto',label='Mode')
+                    with gr.Row():
+                        od=gr.Slider(0.01,0.5,value=0.12,step=0.01,label='Delta'); lock_od=gr.Checkbox(value=False,label='🔒',scale=0)
+                    with gr.Row():
+                        ed=gr.Slider(-70,-10,value=-35,step=1,label='Energy (dB)'); lock_ed=gr.Checkbox(value=False,label='🔒',scale=0)
+                    with gr.Row():
+                        mg=gr.Slider(0.005,0.2,value=0.03,step=0.005,label='Min gap'); lock_mg=gr.Checkbox(value=False,label='🔒',scale=0)
+                    with gr.Row():
+                        pp=gr.Slider(0.0,0.05,value=0.003,step=0.001,label='Pre-pad')
+                        mnd=gr.Slider(0.005,0.2,value=0.02,step=0.005,label='Min dur')
+                        mxd=gr.Slider(0.1,5.0,value=1.5,step=0.1,label='Max dur')
+                with gr.Accordion("🔗 Clustering",open=True):
+                    with gr.Row():
+                        tmin=gr.Number(value=5,label='Target min',precision=0)
+                        tmax=gr.Number(value=20,label='Target max',precision=0)
+                        lock_tgt=gr.Checkbox(value=True,label='🔒 Lock range',scale=0)
+                    gr.Markdown("*🔒 = auto-tune keeps this value fixed*")
+                    with gr.Row():
+                        nt=gr.Slider(0.3,0.99,value=0.80,step=0.01,label='NCC threshold')
+                        atk=gr.Slider(10,100,value=25,step=5,label='Attack (ms)')
+                        lnk=gr.Dropdown(['average','complete','single'],value='average',label='Linkage')
+                with gr.Accordion("🎹 MIDI & Post",open=False):
+                    with gr.Row():
+                        syn=gr.Checkbox(value=True,label='Synthesize')
+                        qmidi=gr.Checkbox(value=True,label='Quantize MIDI')
+                        subdiv=gr.Dropdown([('8th',8),('16th',16),('32nd',32)],value=16,label='Grid')
+                with gr.Row():
+                    tune_btn=gr.Button("🎛️ Auto-Tune",variant="secondary",size="lg")
+                    extract_btn=gr.Button("🔬 Extract",variant="primary",size="lg")
+                tune_summary=gr.Markdown(""); tune_log=gr.Textbox(label="Log",lines=8,max_lines=15,visible=False)
+                summary_md=gr.Markdown("*Upload → Auto-Tune or Extract*")
+                with gr.Row():
+                    stem_out=gr.Audio(type='numpy',label='Stem',interactive=False)
+                    rend_out=gr.Audio(type='numpy',label='🔊 Reconstruction',interactive=False)
+                gr.Markdown("### Downloads")
+                with gr.Row():
+                    arc=gr.File(label="📦 ZIP",interactive=False); mid=gr.File(label="🎹 MIDI",interactive=False)
+                smp=gr.File(label="WAVs",file_count="multiple",interactive=False)
+                met=gr.Dataframe(label="Samples"); stx=gr.Textbox(visible=False)
+                dm.change(fn=lambda m:gr.update(choices=DEMUCS_STEMS.get(m,["drums","bass","other","vocals"])+["all"]),inputs=[dm],outputs=[st])
+                tune_btn.click(run_auto_tune,[audio_in,st,dm,dsh,dov,om,od,ed,mg,tmin,tmax,lock_od,lock_ed,lock_mg,lock_tgt],
+                    [od,ed,mg,tmin,tmax,tune_summary,tune_log])
+                extract_btn.click(run_extraction,[audio_in,st,dm,dsh,dov,om,od,ed,pp,mnd,mxd,mg,nt,atk,lnk,tmin,tmax,syn,qmidi,subdiv],
+                    [stem_out,summary_md,rend_out,smp,mid,arc,stx,met])
+            with gr.Tab("📊 Evaluate"):
+                with gr.Row():
+                    ep=gr.Dropdown(['rock','funk','halftime'],value='rock',label='Pattern')
+                    eb=gr.Slider(80,200,value=120,step=2,label='BPM'); ebs=gr.Slider(2,8,value=4,step=1,label='Bars')
+                with gr.Row():
+                    en=gr.Slider(0.3,0.99,value=0.80,step=0.01,label='NCC')
+                    etm=gr.Number(value=0,label='Min',precision=0); etx=gr.Number(value=0,label='Max',precision=0)
+                evb=gr.Button("🧪 Evaluate",variant="primary",size="lg")
+                with gr.Row():
+                    evm=gr.Audio(type='numpy',label='Original',interactive=False)
+                    evr=gr.Audio(type='numpy',label='Reconstruction',interactive=False)
+                evs=gr.Dataframe(); evm2=gr.Dataframe()
+                es1=gr.Textbox(visible=False); es2=gr.Textbox(visible=False)
+                evb.click(run_eval,[ep,eb,ebs,en,etm,etx],[evm,evr,evs,evm2,es1,es2])
+            with gr.Tab("🔄 Optimize"):
+                with gr.Row():
+                    on=gr.Slider(2,30,value=5,step=1,label='Iters'); ocn=gr.Textbox(value="opt",label='Name')
+                    oa=gr.Textbox(value="",label='Author'); osv=gr.Checkbox(value=True,label='Save')
+                ob=gr.Button("🚀 Run",variant="primary",size="lg")
+                ol=gr.Textbox(label="Log",lines=20,max_lines=40); oh=gr.Dataframe(); op=gr.Plot()
+                oc=gr.Code(label="Config",language="json")
+                ob.click(run_optimize,[on,ocn,oa,osv],[ol,oh,op,oc])
+            with gr.Tab("🏆 Leaderboard"):
+                lbb=gr.Button("🔄 Refresh"); lt=gr.Dataframe(); ls=gr.Textbox(visible=False)
+                lbb.click(refresh_lb,[],[lt,ls])
+    return app
+if __name__=="__main__": build_app().launch(server_name="0.0.0.0",server_port=7860)

app_v2.py → legacy/gradio_app_v2.py RENAMED Viewed

File without changes

pipeline_runner.py ADDED Viewed

	@@ -0,0 +1,407 @@

+#!/usr/bin/env python3
+"""Timed extraction pipeline used by the FastAPI app and benchmarks."""
+from __future__ import annotations
+import json
+import os
+import shutil
+import tempfile
+import time
+from contextlib import contextmanager
+from dataclasses import asdict, dataclass, field
+from pathlib import Path
+from typing import Any, Callable
+import librosa
+import numpy as np
+import soundfile as sf
+from sample_extractor import (
+    DEMUCS_MODELS,
+    DEMUCS_STEMS,
+    build_archive,
+    classify_hits,
+    cluster_hits,
+    detect_bpm,
+    detect_onsets,
+    export_midi,
+    extract_stem,
+    render_midi_with_samples,
+    sample_quality_score,
+    select_best,
+    synthesize_from_cluster,
+)
+ProgressCallback = Callable[[dict[str, Any]], None]
+@dataclass
+class PipelineParams:
+    stem: str = "drums"
+    demucs_model: str = "htdemucs_ft"
+    demucs_shifts: int = 1
+    demucs_overlap: float = 0.25
+    onset_mode: str = "auto"
+    onset_delta: float = 0.12
+    energy_threshold_db: float = -35.0
+    pre_pad: float = 0.003
+    min_dur: float = 0.02
+    max_dur: float = 1.5
+    min_gap: float = 0.03
+    ncc_threshold: float = 0.80
+    attack_ms: float = 25.0
+    mel_threshold: float = 0.75
+    linkage: str = "average"
+    target_min: int = 5
+    target_max: int = 20
+    synthesize: bool = True
+    quantize_midi: bool = True
+    subdivision: int = 16
+    device: str = "cpu"
+    @classmethod
+    def from_mapping(cls, data: dict[str, Any] | None) -> "PipelineParams":
+        data = dict(data or {})
+        allowed = {field.name for field in cls.__dataclass_fields__.values()}
+        unknown = sorted(set(data) - allowed)
+        if unknown:
+            raise ValueError(f"Unknown pipeline parameter(s): {', '.join(unknown)}")
+        params = cls(**data)
+        params.validate()
+        return params
+    def validate(self) -> None:
+        if self.demucs_model not in DEMUCS_MODELS:
+            raise ValueError(f"Unsupported Demucs model: {self.demucs_model}")
+        allowed_stems = set(DEMUCS_STEMS.get(self.demucs_model, [])) | {"all"}
+        if self.stem not in allowed_stems:
+            raise ValueError(f"Stem '{self.stem}' is not available for {self.demucs_model}")
+        if self.onset_mode not in {"auto", "percussive", "harmonic", "broadband"}:
+            raise ValueError(f"Unsupported onset mode: {self.onset_mode}")
+        if self.linkage not in {"average", "complete", "single"}:
+            raise ValueError(f"Unsupported clustering linkage: {self.linkage}")
+        if not 0 <= self.demucs_shifts <= 8:
+            raise ValueError("demucs_shifts must be between 0 and 8")
+        if not 0.0 <= self.demucs_overlap <= 0.9:
+            raise ValueError("demucs_overlap must be between 0.0 and 0.9")
+        if not 0.001 <= self.onset_delta <= 1.0:
+            raise ValueError("onset_delta must be between 0.001 and 1.0")
+        if not -100.0 <= self.energy_threshold_db <= 0.0:
+            raise ValueError("energy_threshold_db must be between -100 and 0 dB")
+        if not 0.0 <= self.pre_pad <= 0.25:
+            raise ValueError("pre_pad must be between 0 and 0.25 seconds")
+        if not 0.001 <= self.min_dur <= self.max_dur <= 10.0:
+            raise ValueError("duration bounds must satisfy 0.001 <= min_dur <= max_dur <= 10")
+        if not 0.001 <= self.min_gap <= 1.0:
+            raise ValueError("min_gap must be between 0.001 and 1.0 seconds")
+        if not 0.0 <= self.ncc_threshold <= 1.0:
+            raise ValueError("ncc_threshold must be between 0 and 1")
+        if not 1.0 <= self.attack_ms <= 250.0:
+            raise ValueError("attack_ms must be between 1 and 250 ms")
+        if not 0.0 <= self.mel_threshold <= 1.0:
+            raise ValueError("mel_threshold must be between 0 and 1")
+        if self.target_min < 0 or self.target_max < 0:
+            raise ValueError("target_min and target_max must be non-negative")
+        if self.target_max and self.target_min and self.target_min > self.target_max:
+            raise ValueError("target_min cannot be greater than target_max")
+        if self.subdivision not in {4, 8, 16, 32, 64}:
+            raise ValueError("subdivision must be one of 4, 8, 16, 32, 64")
+@dataclass
+class StageTiming:
+    key: str
+    label: str
+    duration_sec: float = 0.0
+    status: str = "pending"
+    detail: str = ""
+@dataclass
+class PipelineResult:
+    params: dict[str, Any]
+    duration_sec: float
+    audio_duration_sec: float
+    realtime_factor: float
+    bpm: float | None
+    sample_rate: int
+    hit_count: int
+    cluster_count: int
+    stages: list[dict[str, Any]]
+    samples: list[dict[str, Any]]
+    overview: dict[str, Any]
+    files: dict[str, str]
+STAGE_DEFS = [
+    ("stem", "Stem extraction / source load"),
+    ("bpm", "Tempo detection"),
+    ("onsets", "Onset detection + slicing"),
+    ("classification", "Spectral rule classification"),
+    ("clustering", "Mel fingerprint + transient NCC clustering"),
+    ("selection", "Best representative scoring"),
+    ("synthesis", "Optional sample synthesis"),
+    ("export", "MIDI, reconstruction, WAV, ZIP export"),
+]
+def initial_stages() -> list[dict[str, Any]]:
+    return [asdict(StageTiming(key=key, label=label)) for key, label in STAGE_DEFS]
+def _notify(cb: ProgressCallback | None, payload: dict[str, Any]) -> None:
+    if cb:
+        cb(payload)
+@contextmanager
+def _timed_stage(stages: list[StageTiming], key: str, cb: ProgressCallback | None = None):
+    stage = next(stage for stage in stages if stage.key == key)
+    stage.status = "running"
+    _notify(cb, {"type": "stage", "stage": asdict(stage), "stages": [asdict(s) for s in stages]})
+    started = time.perf_counter()
+    try:
+        yield stage
+    except Exception as exc:
+        stage.duration_sec = time.perf_counter() - started
+        stage.status = "error"
+        stage.detail = str(exc)
+        _notify(cb, {"type": "stage", "stage": asdict(stage), "stages": [asdict(s) for s in stages]})
+        raise
+    else:
+        stage.duration_sec = time.perf_counter() - started
+        stage.status = "done"
+        _notify(cb, {"type": "stage", "stage": asdict(stage), "stages": [asdict(s) for s in stages]})
+def _normalise_audio(audio: np.ndarray) -> np.ndarray:
+    audio = audio.astype(np.float32)
+    if audio.ndim > 1:
+        audio = audio.mean(axis=1)
+    peak = float(np.max(np.abs(audio))) if audio.size else 0.0
+    if peak > 0:
+        audio = audio / peak
+    return audio.astype(np.float32)
+def _write_audio(path: Path, audio: np.ndarray, sr: int, subtype: str = "PCM_24") -> None:
+    path.parent.mkdir(parents=True, exist_ok=True)
+    sf.write(path, audio.astype(np.float32), sr, subtype=subtype)
+def _make_overview(audio: np.ndarray, sr: int, hits: list[Any], max_points: int = 1600) -> dict[str, Any]:
+    if len(audio) == 0:
+        return {"sample_rate": sr, "duration_sec": 0, "envelope": [], "onsets": []}
+    frame = max(1, int(np.ceil(len(audio) / max_points)))
+    usable = (len(audio) // frame) * frame
+    if usable == 0:
+        envelope = [float(np.max(np.abs(audio)))]
+    else:
+        envelope = np.max(np.abs(audio[:usable].reshape(-1, frame)), axis=1).astype(float).tolist()
+    return {
+        "sample_rate": sr,
+        "duration_sec": round(len(audio) / sr, 6),
+        "frame_duration_sec": round(frame / sr, 6),
+        "envelope": [round(float(x), 6) for x in envelope],
+        "onsets": [
+            {
+                "time_sec": round(float(h.onset_time), 6),
+                "label": h.label,
+                "energy": round(float(h.rms_energy), 6),
+                "cluster_id": int(getattr(h, "cluster_id", -1)),
+            }
+            for h in hits
+        ],
+    }
+def _copy_temp_file(src: str | os.PathLike[str], dst: Path) -> str:
+    dst.parent.mkdir(parents=True, exist_ok=True)
+    shutil.copyfile(src, dst)
+    return str(dst)
+def run_extraction_pipeline(
+    audio_path: str | os.PathLike[str],
+    output_dir: str | os.PathLike[str],
+    params: PipelineParams | dict[str, Any] | None = None,
+    progress_cb: ProgressCallback | None = None,
+) -> PipelineResult:
+    """Run extraction and write all runtime artifacts into output_dir."""
+    if not isinstance(params, PipelineParams):
+        params = PipelineParams.from_mapping(params)
+    out = Path(output_dir)
+    out.mkdir(parents=True, exist_ok=True)
+    samples_dir = out / "samples"
+    samples_dir.mkdir(parents=True, exist_ok=True)
+    stages = [StageTiming(key=key, label=label) for key, label in STAGE_DEFS]
+    started_total = time.perf_counter()
+    bpm: float | None = None
+    stem_audio: np.ndarray
+    stem_sr: int
+    hits: list[Any] = []
+    clusters: list[Any] = []
+    rendered: np.ndarray | None = None
+    _notify(progress_cb, {"type": "start", "stages": [asdict(s) for s in stages]})
+    with _timed_stage(stages, "stem", progress_cb) as stage:
+        stem_audio, stem_sr = extract_stem(
+            str(audio_path),
+            stem=params.stem,
+            device=params.device,
+            model_name=params.demucs_model,
+            shifts=int(params.demucs_shifts),
+            overlap=float(params.demucs_overlap),
+        )
+        stem_audio = _normalise_audio(stem_audio)
+        stage.detail = f"{params.stem} via {params.demucs_model}" if params.stem != "all" else "loaded full mix"
+        _write_audio(out / "stem.wav", stem_audio, stem_sr, subtype="PCM_16")
+    audio_duration_sec = len(stem_audio) / stem_sr if stem_sr else 0.0
+    with _timed_stage(stages, "bpm", progress_cb) as stage:
+        bpm = detect_bpm(stem_audio, stem_sr)
+        stage.detail = f"{bpm} BPM"
+    with _timed_stage(stages, "onsets", progress_cb) as stage:
+        hits = detect_onsets(
+            stem_audio,
+            stem_sr,
+            mode=params.onset_mode,
+            onset_delta=float(params.onset_delta),
+            energy_threshold_db=float(params.energy_threshold_db),
+            pre_pad=float(params.pre_pad),
+            min_dur=float(params.min_dur),
+            max_dur=float(params.max_dur),
+            min_gap=float(params.min_gap),
+        )
+        stage.detail = f"{len(hits)} hits"
+    if hits:
+        with _timed_stage(stages, "classification", progress_cb) as stage:
+            hits = classify_hits(hits)
+            counts: dict[str, int] = {}
+            for hit in hits:
+                counts[hit.label] = counts.get(hit.label, 0) + 1
+            stage.detail = ", ".join(f"{key}:{value}" for key, value in sorted(counts.items()))
+        with _timed_stage(stages, "clustering", progress_cb) as stage:
+            clusters = cluster_hits(
+                hits,
+                audio=stem_audio,
+                sr=stem_sr,
+                ncc_threshold=float(params.ncc_threshold),
+                attack_ms=float(params.attack_ms),
+                mel_threshold=float(params.mel_threshold),
+                target_min=int(params.target_min),
+                target_max=int(params.target_max),
+                linkage=params.linkage,
+            )
+            for cluster in clusters:
+                for hit in cluster.hits:
+                    hit.cluster_id = cluster.cluster_id
+            stage.detail = f"{len(clusters)} clusters"
+        with _timed_stage(stages, "selection", progress_cb) as stage:
+            select_best(clusters)
+            stage.detail = "quality-scored representatives"
+        with _timed_stage(stages, "synthesis", progress_cb) as stage:
+            if params.synthesize:
+                synth_count = 0
+                for cluster in clusters:
+                    if cluster.count >= 2:
+                        cluster.synthesized = synthesize_from_cluster(cluster)
+                        synth_count += int(cluster.synthesized is not None)
+                stage.detail = f"{synth_count} synthesized alternates"
+            else:
+                stage.detail = "disabled"
+    else:
+        for key, detail in [
+            ("classification", "skipped: no hits"),
+            ("clustering", "skipped: no hits"),
+            ("selection", "skipped: no hits"),
+            ("synthesis", "skipped: no hits"),
+        ]:
+            stage = next(s for s in stages if s.key == key)
+            stage.status = "done"
+            stage.detail = detail
+    sample_rows: list[dict[str, Any]] = []
+    files: dict[str, str] = {"stem": "stem.wav"}
+    with _timed_stage(stages, "export", progress_cb) as stage:
+        midi_path = out / "reconstruction.mid"
+        if clusters:
+            export_midi(
+                clusters,
+                str(midi_path),
+                bpm=bpm or 120.0,
+                quantize=bool(params.quantize_midi),
+                subdivision=int(params.subdivision),
+            )
+            rendered = render_midi_with_samples(clusters, sr=stem_sr)
+        else:
+            rendered = np.zeros_like(stem_audio)
+            midi_path.write_bytes(b"")
+        _write_audio(out / "reconstruction.wav", rendered, stem_sr, subtype="PCM_16")
+        files["reconstruction"] = "reconstruction.wav"
+        files["midi"] = "reconstruction.mid"
+        for cluster in sorted(clusters, key=lambda item: item.count, reverse=True):
+            best = cluster.best_hit
+            sample_path = samples_dir / f"{cluster.label}.wav"
+            best.save(str(sample_path))
+            quality = sample_quality_score(best.audio, best.sr, cluster.label.rsplit("_", 1)[0])
+            sample_rows.append(
+                {
+                    "label": cluster.label,
+                    "classification": cluster.label.rsplit("_", 1)[0],
+                    "hits": int(cluster.count),
+                    "midi_note": int(cluster.midi_note),
+                    "score": round(float(quality["total"]), 2),
+                    "cleanness": round(float(quality["cleanness"]), 4),
+                    "completeness": round(float(quality["completeness"]), 4),
+                    "duration_ms": round(float(best.duration * 1000), 1),
+                    "first_onset_sec": round(float(min(hit.onset_time for hit in cluster.hits)), 4),
+                    "file": f"samples/{cluster.label}.wav",
+                }
+            )
+            if cluster.synthesized is not None:
+                synth_path = samples_dir / f"{cluster.label}__synth.wav"
+                _write_audio(synth_path, cluster.synthesized, stem_sr)
+        archive_tmp = build_archive(clusters, bpm or 120.0, stem_sr, midi_path=str(midi_path), rendered_audio=rendered)
+        files["archive"] = _copy_temp_file(archive_tmp, out / "sample-pack.zip")
+        files["archive"] = "sample-pack.zip"
+        try:
+            os.unlink(archive_tmp)
+        except OSError:
+            pass
+        stage.detail = f"{len(sample_rows)} WAVs + MIDI + ZIP"
+    duration_sec = time.perf_counter() - started_total
+    result = PipelineResult(
+        params=asdict(params),
+        duration_sec=round(duration_sec, 6),
+        audio_duration_sec=round(audio_duration_sec, 6),
+        realtime_factor=round(duration_sec / max(audio_duration_sec, 1e-9), 6),
+        bpm=bpm,
+        sample_rate=stem_sr,
+        hit_count=len(hits),
+        cluster_count=len(clusters),
+        stages=[asdict(stage) for stage in stages],
+        samples=sample_rows,
+        overview=_make_overview(stem_audio, stem_sr, hits),
+        files=files,
+    )
+    (out / "manifest.json").write_text(json.dumps(asdict(result), indent=2), encoding="utf-8")
+    _notify(progress_cb, {"type": "complete", "result": asdict(result), "stages": result.stages})
+    return result

requirements-legacy-gradio.txt ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ -r requirements.txt
2	+ gradio

requirements.txt CHANGED Viewed

@@ -1,12 +1,14 @@
-demucs==4.0.1
 librosa>=0.10.0
-soundfile
-scikit-learn
-numpy
 torch
 torchaudio
-scipy
-gradio
 matplotlib
 pandas
 pretty_midi

+fastapi>=0.110
+uvicorn[standard]>=0.27
+python-multipart>=0.0.9
+numpy>=1.26
+scipy>=1.11
 librosa>=0.10.0
+soundfile>=0.12
+scikit-learn>=1.4
 torch
 torchaudio
+demucs==4.0.1
 matplotlib
 pandas
 pretty_midi

sample_extractor.py CHANGED Viewed

@@ -104,7 +104,7 @@ def detect_onsets(y,sr,pre_pad=0.003,min_dur=0.02,max_dur=1.5,min_gap=0.03,
                 sr=sr,hop_length=hop_length,lag=l,max_size=ms)
         def _n(x): m=x.max(); return x/m if m>0 else x
         oe = np.maximum.reduce([_n(_sf(y,20,300)), _n(_sf(y,300,4000)),
-                                 _n(_sf(y,4000,16000)), _n(_sf(yh,lag=2,ms=5))])
     wait = max(1, int(min_gap * sr / hop_length))
     fr = librosa.onset.onset_detect(onset_envelope=oe,sr=sr,hop_length=hop_length,

                 sr=sr,hop_length=hop_length,lag=l,max_size=ms)
         def _n(x): m=x.max(); return x/m if m>0 else x
         oe = np.maximum.reduce([_n(_sf(y,20,300)), _n(_sf(y,300,4000)),
+                                 _n(_sf(y,4000,16000)), _n(_sf(yh,l=2,ms=5))])
     wait = max(1, int(min_gap * sr / hop_length))
     fr = librosa.onset.onset_detect(onset_envelope=oe,sr=sr,hop_length=hop_length,

scripts/benchmark_subprocesses.py ADDED Viewed

	@@ -0,0 +1,86 @@

+#!/usr/bin/env python3
+"""Benchmark significant sample-extraction subprocesses using synthetic fixtures.
+This intentionally defaults to `stem=all` so the DSP stages can be measured without
+Demucs download/runtime noise. Use `--include-demucs` with a real input file if you
+want to benchmark stem separation on the current machine.
+"""
+from __future__ import annotations
+import argparse
+import json
+import statistics
+import sys
+import tempfile
+from pathlib import Path
+sys.path.insert(0, str(Path(__file__).resolve().parents[1]))
+import soundfile as sf
+from pipeline_runner import PipelineParams, run_extraction_pipeline
+from sample_extractor import cache_clear
+from synth_generator import generate_test_song
+def run_case(pattern: str, bars: int, bpm: float, run_index: int) -> dict:
+    tmp = Path(tempfile.mkdtemp(prefix="dse-bench-"))
+    song = generate_test_song(pattern_name=pattern, bars=bars, bpm=bpm, add_bass=False, seed=42 + run_index)
+    src = tmp / f"{pattern}-{bars}bars.wav"
+    sf.write(src, song.drums_only, song.sr)
+    cache_clear()
+    params = PipelineParams(stem="all", target_min=4, target_max=12, synthesize=True)
+    result = run_extraction_pipeline(src, tmp / "out", params)
+    return {
+        "pattern": pattern,
+        "bars": bars,
+        "bpm": bpm,
+        "run_index": run_index,
+        "audio_duration_sec": result.audio_duration_sec,
+        "total_duration_sec": result.duration_sec,
+        "realtime_factor": result.realtime_factor,
+        "hit_count": result.hit_count,
+        "cluster_count": result.cluster_count,
+        "stages": result.stages,
+    }
+def main() -> int:
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--runs", type=int, default=2)
+    parser.add_argument("--bars", type=int, default=4)
+    parser.add_argument("--bpm", type=float, default=120.0)
+    parser.add_argument("--output", default="docs/benchmark-subprocesses.json")
+    args = parser.parse_args()
+    # Warm imports/JIT and discard the result.
+    run_case("rock", 1, args.bpm, -1)
+    rows = []
+    for run_index in range(args.runs):
+        for pattern in ["rock", "funk", "halftime"]:
+            rows.append(run_case(pattern, args.bars, args.bpm, run_index))
+    stage_keys = [stage["key"] for stage in rows[0]["stages"]]
+    summary = []
+    for key in stage_keys:
+        values = [next(stage for stage in row["stages"] if stage["key"] == key)["duration_sec"] for row in rows]
+        summary.append({
+            "stage": key,
+            "mean_sec": round(statistics.mean(values), 6),
+            "median_sec": round(statistics.median(values), 6),
+            "min_sec": round(min(values), 6),
+            "max_sec": round(max(values), 6),
+        })
+    payload = {"runs": rows, "summary": summary}
+    out = Path(args.output)
+    out.parent.mkdir(parents=True, exist_ok=True)
+    out.write_text(json.dumps(payload, indent=2), encoding="utf-8")
+    print(json.dumps(payload, indent=2))
+    return 0
+if __name__ == "__main__":
+    raise SystemExit(main())

scripts/smoke_benchmark.py ADDED Viewed

	@@ -0,0 +1,24 @@

+import sys
+from pathlib import Path
+sys.path.insert(0, str(Path(__file__).resolve().parents[1]))
+import json
+import tempfile
+import soundfile as sf
+from synth_generator import generate_test_song
+from pipeline_runner import PipelineParams, run_extraction_pipeline
+song = generate_test_song(pattern_name='rock', bars=2, bpm=120, add_bass=False)
+out = Path(tempfile.mkdtemp(prefix='dse-test-'))
+inp = out / 'input.wav'
+sf.write(inp, song.drums_only, song.sr)
+params = PipelineParams(stem='all', target_min=4, target_max=8, synthesize=True)
+res = run_extraction_pipeline(inp, out / 'out', params)
+print(json.dumps({
+    'duration_sec': res.duration_sec,
+    'audio_duration_sec': res.audio_duration_sec,
+    'realtime_factor': res.realtime_factor,
+    'hit_count': res.hit_count,
+    'cluster_count': res.cluster_count,
+    'stages': res.stages,
+    'files': res.files,
+}, indent=2))

scripts/test_api_job.py ADDED Viewed

	@@ -0,0 +1,25 @@

+import io, json, sys, time
+from pathlib import Path
+sys.path.insert(0, str(Path(__file__).resolve().parents[1]))
+import soundfile as sf
+from fastapi.testclient import TestClient
+from app import app
+from synth_generator import generate_test_song
+song=generate_test_song(pattern_name='rock', bars=1, bpm=120, add_bass=False)
+buf=io.BytesIO()
+sf.write(buf, song.drums_only, song.sr, format='WAV')
+buf.seek(0)
+client=TestClient(app)
+params={'stem':'all','target_min':2,'target_max':6,'synthesize':True}
+r=client.post('/api/jobs', files={'file':('test.wav', buf, 'audio/wav')}, data={'params':json.dumps(params)})
+r.raise_for_status()
+job=r.json()
+for _ in range(60):
+    job=client.get(f"/api/jobs/{job['id']}").json()
+    if job['status'] in {'complete','error'}:
+        break
+    time.sleep(0.25)
+print(json.dumps({'status':job['status'], 'error':job.get('error'), 'hit_count': job.get('result',{}).get('hit_count'), 'files': job.get('result',{}).get('file_urls')}, indent=2))
+assert job['status']=='complete', job.get('error')
+assert job['result']['hit_count'] > 0

web/app.js ADDED Viewed

	@@ -0,0 +1,269 @@

+const $ = (id) => document.getElementById(id);
+const fields = [
+  "stem", "demucs_model", "demucs_shifts", "demucs_overlap", "onset_mode", "onset_delta",
+  "energy_threshold_db", "pre_pad", "min_dur", "max_dur", "min_gap", "ncc_threshold",
+  "attack_ms", "mel_threshold", "linkage", "target_min", "target_max", "subdivision",
+  "synthesize", "quantize_midi"
+];
+let config = null;
+let selectedFile = null;
+let activePoll = null;
+function fmtSec(value) {
+  if (value === null || value === undefined || Number.isNaN(Number(value))) return "—";
+  const n = Number(value);
+  if (n < 0.001) return `${(n * 1000).toFixed(2)} ms`;
+  if (n < 1) return `${(n * 1000).toFixed(1)} ms`;
+  return `${n.toFixed(2)} s`;
+}
+function setHealth(ok, text, subtext) {
+  $("healthDot").className = `status-dot ${ok ? "ok" : "bad"}`;
+  $("healthText").textContent = text;
+  $("healthSubtext").textContent = subtext;
+}
+async function api(path, options = {}) {
+  const response = await fetch(path, options);
+  if (!response.ok) {
+    let detail = response.statusText;
+    try { detail = (await response.json()).detail ?? detail; } catch {}
+    throw new Error(detail);
+  }
+  return response.json();
+}
+function setSelectOptions(select, values, labels = null) {
+  select.innerHTML = "";
+  for (const value of values) {
+    const option = document.createElement("option");
+    option.value = String(value);
+    option.textContent = labels?.[value] ?? String(value);
+    select.appendChild(option);
+  }
+}
+function populateConfig() {
+  setSelectOptions($("demucs_model"), config.demucs_models);
+  const defaults = config.defaults;
+  for (const field of fields) {
+    const el = $(field);
+    if (!el || defaults[field] === undefined) continue;
+    if (el.type === "checkbox") el.checked = Boolean(defaults[field]);
+    else el.value = defaults[field];
+  }
+  updateStemOptions();
+  renderStages(config.stages);
+}
+function updateStemOptions() {
+  const model = $("demucs_model").value || config.defaults.demucs_model;
+  const stems = config.demucs_stems[model] ?? ["drums", "bass", "other", "vocals", "all"];
+  const current = $("stem").value || config.defaults.stem;
+  setSelectOptions($("stem"), stems);
+  $("stem").value = stems.includes(current) ? current : stems[0];
+}
+function collectParams() {
+  const params = {};
+  for (const field of fields) {
+    const el = $(field);
+    if (!el) continue;
+    if (el.type === "checkbox") params[field] = el.checked;
+    else if (el.type === "number") params[field] = Number(el.value);
+    else params[field] = el.value;
+  }
+  return params;
+}
+function renderStages(stages = []) {
+  $("stageList").innerHTML = stages.map((stage) => `
+    <div class="stage ${stage.status}" title="${stage.detail || ""}">
+      <span class="badge"></span>
+      <div><strong>${stage.label}</strong><small>${stage.detail || stage.status}</small></div>
+      <time>${fmtSec(stage.duration_sec)}</time>
+    </div>
+  `).join("");
+}
+function drawWaveform(overview) {
+  window.__lastOverview = overview;
+  const canvas = $("waveform");
+  const ctx = canvas.getContext("2d");
+  const ratio = window.devicePixelRatio || 1;
+  const rect = canvas.getBoundingClientRect();
+  canvas.width = Math.max(1, Math.floor(rect.width * ratio));
+  canvas.height = Math.max(160, Math.floor(160 * ratio));
+  ctx.scale(ratio, ratio);
+  const w = rect.width;
+  const h = 160;
+  ctx.clearRect(0, 0, w, h);
+  ctx.fillStyle = "rgba(139,211,255,.045)";
+  ctx.fillRect(0, 0, w, h);
+  const env = overview?.envelope ?? [];
+  if (!env.length) return;
+  ctx.strokeStyle = "rgba(139,211,255,.92)";
+  ctx.lineWidth = 1.4;
+  ctx.beginPath();
+  const mid = h / 2;
+  env.forEach((v, i) => {
+    const x = (i / Math.max(1, env.length - 1)) * w;
+    const y = mid - Math.min(1, v) * (h * 0.42);
+    if (i === 0) ctx.moveTo(x, y); else ctx.lineTo(x, y);
+  });
+  for (let i = env.length - 1; i >= 0; i--) {
+    const v = env[i];
+    const x = (i / Math.max(1, env.length - 1)) * w;
+    const y = mid + Math.min(1, v) * (h * 0.42);
+    ctx.lineTo(x, y);
+  }
+  ctx.closePath();
+  ctx.fillStyle = "rgba(139,211,255,.28)";
+  ctx.fill();
+  ctx.stroke();
+  ctx.strokeStyle = "rgba(200,165,255,.55)";
+  ctx.lineWidth = 1;
+  for (const onset of overview.onsets ?? []) {
+    const x = (onset.time_sec / Math.max(overview.duration_sec, 0.001)) * w;
+    ctx.beginPath();
+    ctx.moveTo(x, 10);
+    ctx.lineTo(x, h - 10);
+    ctx.stroke();
+  }
+}
+function renderResult(job) {
+  const result = job.result;
+  if (!result) return;
+  const rtf = result.realtime_factor.toFixed(2);
+  $("resultSummary").textContent = `${result.hit_count} hits → ${result.cluster_count} samples · BPM ${result.bpm ?? "—"} · ${fmtSec(result.duration_sec)} total · ${rtf}× realtime`;
+  drawWaveform(result.overview);
+  const fileUrls = result.file_urls ?? {};
+  const labels = { archive: "Sample pack ZIP", midi: "MIDI", stem: "Stem WAV", reconstruction: "Reconstruction WAV" };
+  $("downloads").innerHTML = Object.entries(fileUrls).map(([key, url]) => `<a href="${url}" download>${labels[key] ?? key}</a>`).join("");
+  $("stemAudio").src = fileUrls.stem ?? "";
+  $("reconAudio").src = fileUrls.reconstruction ?? "";
+  const tbody = $("samplesTable").querySelector("tbody");
+  tbody.innerHTML = (result.samples ?? []).map((sample) => `
+    <tr>
+      <td>${sample.label}</td>
+      <td>${sample.classification}</td>
+      <td>${sample.hits}</td>
+      <td>${sample.score}</td>
+      <td>${sample.duration_ms} ms</td>
+      <td>${sample.first_onset_sec} s</td>
+      <td><a href="${sample.url}" download>WAV</a></td>
+    </tr>
+  `).join("");
+}
+function renderJob(job) {
+  $("jobPill").textContent = `${job.status}${job.id ? ` · ${job.id}` : ""}`;
+  renderStages(job.stages ?? []);
+  $("logs").textContent = (job.logs ?? []).join("\n");
+  if (job.status === "complete") renderResult(job);
+  if (job.status === "error") {
+    $("resultSummary").textContent = `Extraction failed: ${job.error}`;
+    $("logs").textContent = `${(job.logs ?? []).join("\n")}\n\n${job.traceback ?? ""}`;
+  }
+}
+async function pollJob(id) {
+  if (activePoll) clearInterval(activePoll);
+  const tick = async () => {
+    try {
+      const job = await api(`/api/jobs/${id}`);
+      renderJob(job);
+      if (["complete", "error"].includes(job.status)) {
+        clearInterval(activePoll);
+        activePoll = null;
+        $("runButton").disabled = !selectedFile;
+      }
+    } catch (error) {
+      clearInterval(activePoll);
+      activePoll = null;
+      $("runButton").disabled = !selectedFile;
+      $("resultSummary").textContent = error.message;
+    }
+  };
+  await tick();
+  activePoll = setInterval(tick, 800);
+}
+async function runExtraction() {
+  if (!selectedFile) return;
+  $("runButton").disabled = true;
+  $("jobPill").textContent = "uploading";
+  $("logs").textContent = "Uploading source and starting extraction…";
+  const form = new FormData();
+  form.append("file", selectedFile, selectedFile.name);
+  form.append("params", JSON.stringify(collectParams()));
+  try {
+    const job = await api("/api/jobs", { method: "POST", body: form });
+    renderJob(job);
+    await pollJob(job.id);
+  } catch (error) {
+    $("runButton").disabled = false;
+    $("resultSummary").textContent = error.message;
+  }
+}
+function setFile(file) {
+  selectedFile = file;
+  $("dropTitle").textContent = file ? file.name : "Drop audio here or click to browse";
+  $("dropMeta").textContent = file ? `${(file.size / 1024 / 1024).toFixed(2)} MB · ${file.type || "audio"}` : "No file selected";
+  $("runButton").disabled = !file;
+  if (file) {
+    $("sourcePreview").hidden = false;
+    $("sourcePreview").src = URL.createObjectURL(file);
+  }
+}
+async function boot() {
+  try {
+    await api("/api/health");
+    config = await api("/api/config");
+    populateConfig();
+    setHealth(true, "Ready", "Backend online");
+  } catch (error) {
+    setHealth(false, "Offline", error.message);
+  }
+}
+$("demucs_model").addEventListener("change", updateStemOptions);
+$("fileInput").addEventListener("change", (event) => setFile(event.target.files?.[0] ?? null));
+$("runButton").addEventListener("click", runExtraction);
+$("useFastButton").addEventListener("click", () => {
+  $("stem").value = "all";
+  $("demucs_shifts").value = 0;
+  $("target_min").value = 4;
+  $("target_max").value = 16;
+});
+$("clearCacheButton").addEventListener("click", async () => {
+  try {
+    await api("/api/cache/clear", { method: "POST" });
+    $("logs").textContent = "Pipeline cache cleared.";
+  } catch (error) {
+    $("logs").textContent = error.message;
+  }
+});
+const dropzone = $("dropzone");
+for (const eventName of ["dragenter", "dragover"]) {
+  dropzone.addEventListener(eventName, (event) => { event.preventDefault(); dropzone.classList.add("dragging"); });
+}
+for (const eventName of ["dragleave", "drop"]) {
+  dropzone.addEventListener(eventName, (event) => { event.preventDefault(); dropzone.classList.remove("dragging"); });
+}
+dropzone.addEventListener("drop", (event) => setFile(event.dataTransfer.files?.[0] ?? null));
+window.addEventListener("resize", () => {
+  const current = window.__lastOverview;
+  if (current) drawWaveform(current);
+});
+boot();

web/index.html ADDED Viewed

	@@ -0,0 +1,174 @@

+<!doctype html>
+<html lang="en">
+  <head>
+    <meta charset="utf-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1" />
+    <title>Drum Sample Extractor</title>
+    <link rel="stylesheet" href="/web/styles.css" />
+  </head>
+  <body>
+    <div class="shell">
+      <header class="hero">
+        <div>
+          <p class="eyebrow">Sample extraction workstation</p>
+          <h1>Extract reusable drum samples from one audio file.</h1>
+          <p class="lede">Upload a track, isolate or bypass the stem, detect hits, cluster similar transients, export WAVs, MIDI, reconstruction audio, and a complete sample pack.</p>
+        </div>
+        <div class="hero-card" aria-live="polite">
+          <span class="status-dot" id="healthDot"></span>
+          <div>
+            <strong id="healthText">Connecting</strong>
+            <span id="healthSubtext">FastAPI backend</span>
+          </div>
+        </div>
+      </header>
+      <main class="workspace">
+        <section class="panel ingest-panel">
+          <div class="panel-heading">
+            <div>
+              <h2>1. Source</h2>
+              <p>Drop a WAV, MP3, FLAC, AIFF, or OGG file. Use <code>all</code> stem for fast iteration without Demucs.</p>
+            </div>
+          </div>
+          <label class="dropzone" id="dropzone">
+            <input id="fileInput" type="file" accept="audio/*,.wav,.mp3,.flac,.aiff,.ogg,.m4a" />
+            <span class="drop-icon">↥</span>
+            <strong id="dropTitle">Drop audio here or click to browse</strong>
+            <small id="dropMeta">No file selected</small>
+          </label>
+          <audio id="sourcePreview" controls hidden></audio>
+        </section>
+        <section class="panel controls-panel">
+          <div class="panel-heading">
+            <div>
+              <h2>2. Extraction controls</h2>
+              <p>Defaults favor quick full-song extraction. Tighten thresholds after reviewing the timeline.</p>
+            </div>
+            <button id="clearCacheButton" class="ghost-button" type="button">Clear cache</button>
+          </div>
+          <div class="control-grid">
+            <label>Stem
+              <select id="stem"></select>
+            </label>
+            <label>Demucs model
+              <select id="demucs_model"></select>
+            </label>
+            <label>Shifts
+              <input id="demucs_shifts" type="number" min="0" max="8" step="1" />
+            </label>
+            <label>Overlap
+              <input id="demucs_overlap" type="number" min="0" max="0.9" step="0.05" />
+            </label>
+            <label>Onset mode
+              <select id="onset_mode">
+                <option value="auto">auto / multiband</option>
+                <option value="percussive">percussive</option>
+                <option value="harmonic">harmonic</option>
+                <option value="broadband">broadband</option>
+              </select>
+            </label>
+            <label>Onset delta
+              <input id="onset_delta" type="number" min="0.001" max="1" step="0.01" />
+            </label>
+            <label>Energy threshold dB
+              <input id="energy_threshold_db" type="number" min="-100" max="0" step="1" />
+            </label>
+            <label>Minimum gap seconds
+              <input id="min_gap" type="number" min="0.001" max="1" step="0.005" />
+            </label>
+            <label>Pre-pad seconds
+              <input id="pre_pad" type="number" min="0" max="0.25" step="0.001" />
+            </label>
+            <label>Min duration seconds
+              <input id="min_dur" type="number" min="0.001" max="10" step="0.005" />
+            </label>
+            <label>Max duration seconds
+              <input id="max_dur" type="number" min="0.01" max="10" step="0.1" />
+            </label>
+            <label>NCC threshold
+              <input id="ncc_threshold" type="number" min="0" max="1" step="0.01" />
+            </label>
+            <label>Attack window ms
+              <input id="attack_ms" type="number" min="1" max="250" step="1" />
+            </label>
+            <label>Mel prefilter
+              <input id="mel_threshold" type="number" min="0" max="1" step="0.01" />
+            </label>
+            <label>Linkage
+              <select id="linkage">
+                <option value="average">average</option>
+                <option value="complete">complete</option>
+                <option value="single">single</option>
+              </select>
+            </label>
+            <label>Target min clusters
+              <input id="target_min" type="number" min="0" max="256" step="1" />
+            </label>
+            <label>Target max clusters
+              <input id="target_max" type="number" min="0" max="256" step="1" />
+            </label>
+            <label>MIDI grid
+              <select id="subdivision">
+                <option value="8">8th</option>
+                <option value="16">16th</option>
+                <option value="32">32nd</option>
+                <option value="64">64th</option>
+              </select>
+            </label>
+          </div>
+          <div class="toggles">
+            <label><input id="synthesize" type="checkbox" /> synthesize alternates</label>
+            <label><input id="quantize_midi" type="checkbox" /> quantize MIDI</label>
+          </div>
+          <div class="actions">
+            <button id="runButton" class="primary-button" type="button" disabled>Extract samples</button>
+            <button id="useFastButton" class="secondary-button" type="button">Use fast full-mix mode</button>
+          </div>
+        </section>
+        <section class="panel progress-panel">
+          <div class="panel-heading">
+            <div>
+              <h2>3. Pipeline</h2>
+              <p>Stage timings are captured per run. Stem separation is deliberately isolated because it dominates offline extraction.</p>
+            </div>
+            <span class="job-pill" id="jobPill">idle</span>
+          </div>
+          <div id="stageList" class="stage-list"></div>
+          <pre id="logs" class="logs" aria-live="polite"></pre>
+        </section>
+        <section class="panel result-panel">
+          <div class="panel-heading">
+            <div>
+              <h2>4. Results</h2>
+              <p id="resultSummary">Run extraction to populate samples, timing, MIDI, reconstruction, and downloads.</p>
+            </div>
+          </div>
+          <canvas id="waveform" class="waveform" height="160"></canvas>
+          <div class="downloads" id="downloads"></div>
+          <div class="audio-grid">
+            <label>Stem audio<audio id="stemAudio" controls></audio></label>
+            <label>Reconstruction<audio id="reconAudio" controls></audio></label>
+          </div>
+          <div class="table-wrap">
+            <table id="samplesTable">
+              <thead>
+                <tr>
+                  <th>Sample</th><th>Class</th><th>Hits</th><th>Score</th><th>Duration</th><th>First hit</th><th>File</th>
+                </tr>
+              </thead>
+              <tbody></tbody>
+            </table>
+          </div>
+        </section>
+      </main>
+    </div>
+    <script type="module" src="/web/app.js"></script>
+  </body>
+</html>

web/styles.css ADDED Viewed

	@@ -0,0 +1,80 @@

+:root {
+  color-scheme: dark;
+  --bg: #08090d;
+  --panel: rgba(18, 22, 32, 0.84);
+  --panel-strong: rgba(28, 34, 48, 0.92);
+  --line: rgba(255, 255, 255, 0.1);
+  --muted: #8b93a7;
+  --text: #eef2ff;
+  --accent: #8bd3ff;
+  --accent-2: #c8a5ff;
+  --good: #55e6a5;
+  --bad: #ff6d7a;
+  --warn: #ffca6b;
+  --shadow: 0 24px 90px rgba(0,0,0,.38);
+}
+* { box-sizing: border-box; }
+html, body { margin: 0; min-height: 100%; background: radial-gradient(circle at 20% 0%, rgba(139,211,255,.20), transparent 30rem), radial-gradient(circle at 88% 8%, rgba(200,165,255,.18), transparent 28rem), var(--bg); color: var(--text); font-family: Inter, ui-sans-serif, system-ui, -apple-system, BlinkMacSystemFont, "SF Pro Display", "Segoe UI", sans-serif; }
+button, input, select { font: inherit; }
+code { color: var(--accent); }
+.shell { width: min(1520px, calc(100% - 32px)); margin: 0 auto; padding: 32px 0 56px; }
+.hero { display: grid; grid-template-columns: 1fr auto; gap: 24px; align-items: end; margin-bottom: 24px; }
+.eyebrow { margin: 0 0 10px; text-transform: uppercase; letter-spacing: .16em; color: var(--accent); font-size: 12px; font-weight: 800; }
+h1 { margin: 0; font-size: clamp(36px, 6vw, 76px); line-height: .92; letter-spacing: -.07em; max-width: 980px; }
+.lede { margin: 18px 0 0; color: #cbd3e5; font-size: 17px; max-width: 860px; line-height: 1.55; }
+.hero-card { min-width: 250px; display: flex; align-items: center; gap: 14px; padding: 18px; border: 1px solid var(--line); background: rgba(255,255,255,.06); border-radius: 24px; box-shadow: var(--shadow); backdrop-filter: blur(18px); }
+.hero-card strong, .hero-card span { display: block; }
+.hero-card span:last-child { color: var(--muted); font-size: 13px; margin-top: 3px; }
+.status-dot { width: 12px; height: 12px; border-radius: 999px; background: var(--warn); box-shadow: 0 0 26px currentColor; }
+.status-dot.ok { background: var(--good); }
+.status-dot.bad { background: var(--bad); }
+.workspace { display: grid; grid-template-columns: minmax(320px, .9fr) minmax(520px, 1.35fr); gap: 18px; align-items: start; }
+.panel { border: 1px solid var(--line); border-radius: 28px; background: linear-gradient(180deg, var(--panel-strong), var(--panel)); box-shadow: var(--shadow); backdrop-filter: blur(22px); padding: 22px; }
+.result-panel { grid-column: 1 / -1; }
+.panel-heading { display: flex; align-items: flex-start; justify-content: space-between; gap: 18px; margin-bottom: 18px; }
+h2 { margin: 0; font-size: 20px; letter-spacing: -.025em; }
+.panel p { margin: 7px 0 0; color: var(--muted); line-height: 1.45; }
+.dropzone { position: relative; display: grid; place-items: center; gap: 8px; min-height: 260px; padding: 22px; border: 1.5px dashed rgba(139,211,255,.42); border-radius: 24px; background: linear-gradient(145deg, rgba(139,211,255,.08), rgba(200,165,255,.05)); text-align: center; cursor: pointer; transition: transform .2s ease, border-color .2s ease, background .2s ease; }
+.dropzone:hover, .dropzone.dragging { transform: translateY(-1px); border-color: var(--accent); background: rgba(139,211,255,.12); }
+.dropzone input { position: absolute; inset: 0; opacity: 0; cursor: pointer; }
+.drop-icon { width: 74px; height: 74px; display: grid; place-items: center; border-radius: 22px; background: rgba(255,255,255,.08); color: var(--accent); font-size: 42px; line-height: 1; }
+.dropzone strong { font-size: 18px; }
+.dropzone small { color: var(--muted); }
+audio { width: 100%; margin-top: 12px; }
+.control-grid { display: grid; grid-template-columns: repeat(4, minmax(0, 1fr)); gap: 12px; }
+label { display: block; color: #c7d0e4; font-size: 12px; font-weight: 750; letter-spacing: .02em; }
+input, select { width: 100%; margin-top: 7px; border: 1px solid var(--line); border-radius: 14px; padding: 11px 12px; color: var(--text); background: rgba(5, 7, 12, .62); outline: none; }
+input:focus, select:focus { border-color: rgba(139,211,255,.8); box-shadow: 0 0 0 4px rgba(139,211,255,.12); }
+.toggles { display: flex; flex-wrap: wrap; gap: 14px; margin: 16px 0 0; }
+.toggles label { display: flex; align-items: center; gap: 8px; font-size: 13px; font-weight: 700; }
+.toggles input { width: auto; margin: 0; }
+.actions { display: flex; flex-wrap: wrap; gap: 12px; margin-top: 18px; }
+button { border: 0; border-radius: 16px; padding: 12px 16px; color: var(--text); cursor: pointer; transition: transform .16s ease, opacity .16s ease, border-color .16s ease; }
+button:hover:not(:disabled) { transform: translateY(-1px); }
+button:disabled { opacity: .45; cursor: not-allowed; }
+.primary-button { background: linear-gradient(135deg, var(--accent), var(--accent-2)); color: #07101d; font-weight: 900; }
+.secondary-button, .ghost-button { border: 1px solid var(--line); background: rgba(255,255,255,.07); }
+.ghost-button { padding: 9px 12px; color: #cbd3e5; }
+.job-pill { display: inline-flex; align-items: center; border: 1px solid var(--line); border-radius: 999px; padding: 7px 10px; color: var(--muted); background: rgba(255,255,255,.06); font-size: 12px; }
+.stage-list { display: grid; gap: 9px; }
+.stage { display: grid; grid-template-columns: 24px 1fr auto; gap: 10px; align-items: center; padding: 12px; border: 1px solid var(--line); border-radius: 18px; background: rgba(0,0,0,.16); }
+.stage .badge { width: 18px; height: 18px; border-radius: 999px; background: rgba(255,255,255,.16); }
+.stage.running .badge { background: var(--accent); box-shadow: 0 0 22px rgba(139,211,255,.8); }
+.stage.done .badge { background: var(--good); }
+.stage.error .badge { background: var(--bad); }
+.stage strong { display: block; font-size: 14px; }
+.stage small { display: block; color: var(--muted); margin-top: 2px; }
+.stage time { color: #d7def0; font-variant-numeric: tabular-nums; }
+.logs { min-height: 140px; max-height: 240px; overflow: auto; border: 1px solid var(--line); border-radius: 18px; padding: 14px; margin: 14px 0 0; background: #05070b; color: #9db8c8; font-size: 12px; line-height: 1.45; white-space: pre-wrap; }
+.waveform { width: 100%; min-height: 160px; border: 1px solid var(--line); border-radius: 20px; background: rgba(0,0,0,.18); margin: 4px 0 16px; }
+.downloads { display: flex; flex-wrap: wrap; gap: 10px; margin-bottom: 16px; }
+.downloads a, .table-wrap a { color: #07101d; text-decoration: none; font-weight: 850; background: var(--accent); border-radius: 999px; padding: 8px 11px; }
+.audio-grid { display: grid; grid-template-columns: repeat(2, minmax(0, 1fr)); gap: 16px; margin-bottom: 16px; }
+.table-wrap { overflow: auto; border: 1px solid var(--line); border-radius: 20px; }
+table { width: 100%; border-collapse: collapse; min-width: 860px; }
+th, td { text-align: left; padding: 12px 14px; border-bottom: 1px solid var(--line); font-size: 13px; }
+th { position: sticky; top: 0; background: #101521; color: #aeb9ce; z-index: 1; }
+td { color: #e5eaf7; }
+tr:last-child td { border-bottom: 0; }
+@media (max-width: 1100px) { .workspace, .hero { grid-template-columns: 1fr; } .control-grid { grid-template-columns: repeat(2, minmax(0, 1fr)); } }
+@media (max-width: 680px) { .shell { width: min(100% - 20px, 1520px); padding-top: 16px; } .panel { padding: 16px; border-radius: 22px; } .control-grid, .audio-grid { grid-template-columns: 1fr; } h1 { letter-spacing: -.045em; } }