Spaces:

rikhoffbauer2
/

drum-sample-extractor

Sleeping

App Files Files Community

ChatGPT commited on 4 days ago

Commit

e07820e

1 Parent(s): 03d531b

feat: render supervised edits into artifacts

Browse files

Files changed (21) hide show

README.md +17 -7
app.py +64 -0
docs/API.md +90 -3
docs/FEATURES.md +8 -4
docs/PROGRESS.md +50 -8
docs/REMAINING_WORK.md +18 -0
docs/SUPERVISED_EXPORT_AND_FORCE_ONSET.md +110 -0
docs/TASKS.md +5 -3
docs/interactive-ux/ARCHITECTURE_NOTES.md +26 -11
docs/interactive-ux/FEASIBILITY_MATRIX.md +18 -0
docs/interactive-ux/FEATURE_REQUIREMENTS.md +20 -2
docs/interactive-ux/PROGRESS.md +7 -0
docs/interactive-ux/README.md +18 -0
docs/interactive-ux/SCOPE.md +18 -0
docs/interactive-ux/TASKS.md +17 -1
scripts/test_supervised_export_and_force_onset.py +113 -0
supervised_export.py +261 -0
supervised_state.py +235 -4
web/app.js +121 -14
web/index.html +4 -0
web/styles.css +3 -0

README.md CHANGED Viewed

@@ -46,18 +46,19 @@ Implemented:
   - accept/favorite hit,
   - suppress hit as bleed,
   - lock/unlock cluster,
-  - suggestion inbox,
   - cluster explanation drawer,
   - constraint/event log.
 - Documentation for features, progress, tasks, API, timing, hit review, realtime suitability, UI, remaining work, and interactive UX.
 - Legacy Gradio apps preserved in `legacy/` for reference only.
 Not fully complete yet:
-- Semantic edits do not yet regenerate WAV/MIDI/ZIP exports.
-- No force-onset/click-to-add missed onset yet.
-- No restore for suppressed hits yet.
 - No true cached feature-vector local reclustering yet.
 - No frontend TypeScript build/test harness yet.
 - Demucs remains offline/batch by design.
@@ -69,6 +70,7 @@ See:
 - `docs/API.md`
 - `docs/interactive-ux/README.md`
 - `docs/REMAINING_WORK.md`
 ## Run locally
@@ -91,10 +93,11 @@ That bypasses Demucs and uses the near-realtime clustering path.
 ## Run checks
 ```bash
-python3 -m py_compile app.py pipeline_runner.py sample_extractor.py supervised_state.py scripts/*.py
 node --check web/app.js
 python3 scripts/test_sse_and_review_hits.py
 python3 scripts/test_interactive_supervision.py
 ```
 ## Run benchmarks
@@ -148,10 +151,12 @@ curl http://127.0.0.1:7860/api/jobs
 | `app.py` | FastAPI app, static UI serving, job API, run history, artifact downloads, supervised editing endpoints |
 | `pipeline_runner.py` | Timed extraction pipeline, disk stem/source cache, batch/online clustering routing |
 | `sample_extractor.py` | Core DSP/sample extraction implementation |
-| `supervised_state.py` | Persistent semantic state, confidence, constraints, events, suggestions, undo |
-| `web/` | Custom no-build browser frontend with waveform, hit review, sample audition, and supervision panel |
 | `scripts/benchmark_subprocesses.py` | Synthetic benchmark runner for stage timings |
 | `scripts/test_interactive_supervision.py` | Smoke test for supervised state endpoints |
 | `docs/interactive-ux/` | Supplied interactive UX docs aligned to current implementation |
 | `docs/` | Review, timing, API, UI, feature, task, progress, and remaining-work documentation |
 | `legacy/` | Previous Gradio apps retained for reference |
@@ -168,6 +173,11 @@ Each run is stored under `.runs/<job-id>/output/`:
 - `review/hits/*.wav`
 - `manifest.json`
 - `supervision_state.json`
 Generated runtime directories are ignored by git:

   - accept/favorite hit,
   - suppress hit as bleed,
   - lock/unlock cluster,
+  - suggestion inbox with exact diff previews,
   - cluster explanation drawer,
+  - force-onset waveform mode,
+  - restore suppressed hits,
+  - edited sample-pack export,
   - constraint/event log.
 - Documentation for features, progress, tasks, API, timing, hit review, realtime suitability, UI, remaining work, and interactive UX.
 - Legacy Gradio apps preserved in `legacy/` for reference only.
 Not fully complete yet:
 - No true cached feature-vector local reclustering yet.
+- No cluster merge/split/relabel workflow beyond move/pull-to-new-cluster.
 - No frontend TypeScript build/test harness yet.
 - Demucs remains offline/batch by design.
 - `docs/API.md`
 - `docs/interactive-ux/README.md`
 - `docs/REMAINING_WORK.md`
+- `docs/SUPERVISED_EXPORT_AND_FORCE_ONSET.md`
 ## Run locally
 ## Run checks
 ```bash
+python3 -m py_compile app.py pipeline_runner.py sample_extractor.py supervised_state.py supervised_export.py scripts/*.py
 node --check web/app.js
 python3 scripts/test_sse_and_review_hits.py
 python3 scripts/test_interactive_supervision.py
+python3 scripts/test_supervised_export_and_force_onset.py
 ```
 ## Run benchmarks
 | `app.py` | FastAPI app, static UI serving, job API, run history, artifact downloads, supervised editing endpoints |
 | `pipeline_runner.py` | Timed extraction pipeline, disk stem/source cache, batch/online clustering routing |
 | `sample_extractor.py` | Core DSP/sample extraction implementation |
+| `supervised_state.py` | Persistent semantic state, confidence, constraints, events, suggestions, force-onset, restore, undo |
+| `supervised_export.py` | Renders edited semantic state into supervised WAV/MIDI/reconstruction/ZIP artifacts |
+| `web/` | Custom no-build browser frontend with waveform, hit review, sample audition, add-onset mode, edited export, and supervision panel |
 | `scripts/benchmark_subprocesses.py` | Synthetic benchmark runner for stage timings |
 | `scripts/test_interactive_supervision.py` | Smoke test for supervised state endpoints |
+| `scripts/test_supervised_export_and_force_onset.py` | Smoke test for force-onset, restore, suggestion diffs, and edited exports |
 | `docs/interactive-ux/` | Supplied interactive UX docs aligned to current implementation |
 | `docs/` | Review, timing, API, UI, feature, task, progress, and remaining-work documentation |
 | `legacy/` | Previous Gradio apps retained for reference |
 - `review/hits/*.wav`
 - `manifest.json`
 - `supervision_state.json`
+- `supervised/manifest.json` after edited export
+- `supervised/sample-pack.zip` after edited export
+- `supervised/samples/*.wav` after edited export
+- `supervised/reconstruction.mid` after edited export
+- `supervised/reconstruction.wav` after edited export
 Generated runtime directories are ignored by git:

app.py CHANGED Viewed

@@ -29,16 +29,19 @@ from sample_extractor import DEMUCS_MODELS, DEMUCS_STEMS, cache_clear
 from supervised_state import (
     accept_suggestion,
     explain_cluster as build_cluster_explanation,
     load_or_create_state,
     lock_cluster as apply_cluster_lock,
     move_hit as apply_hit_move,
     public_state,
     pull_hit_to_new_cluster,
     reject_suggestion,
     set_hit_review_status,
     suppress_hit as apply_hit_suppression,
     undo_last as apply_undo,
 )
 ROOT = Path(__file__).resolve().parent
 WEB_DIR = ROOT / "web"
@@ -63,6 +66,16 @@ def _job_url(job_id: str, relative_path: str) -> str:
     return f"/api/jobs/{job_id}/files/{relative_path}"
 def _serialise_job(job: dict[str, Any]) -> dict[str, Any]:
     payload = {key: value for key, value in job.items() if key not in {"input_path", "output_dir"}}
     if payload.get("result"):
@@ -334,6 +347,46 @@ def get_job_state(job_id: str) -> dict[str, Any]:
     return _state_payload(job_id)
 @app.post("/api/jobs/{job_id}/hits/{hit_id}/move")
 def post_move_hit(job_id: str, hit_id: str, payload: dict[str, Any] = Body(default_factory=dict)) -> dict[str, Any]:
     target_cluster_id = _json_patch(payload).get("target_cluster_id")
@@ -372,6 +425,17 @@ def post_suppress_hit(job_id: str, hit_id: str, payload: dict[str, Any] = Body(d
     return _state_payload(job_id)
 @app.post("/api/jobs/{job_id}/hits/{hit_id}/review")
 def post_review_hit(job_id: str, hit_id: str, payload: dict[str, Any] = Body(default_factory=dict)) -> dict[str, Any]:
     status = str(_json_patch(payload).get("status") or "accepted")

 from supervised_state import (
     accept_suggestion,
     explain_cluster as build_cluster_explanation,
+    force_onset as apply_force_onset,
     load_or_create_state,
     lock_cluster as apply_cluster_lock,
     move_hit as apply_hit_move,
     public_state,
     pull_hit_to_new_cluster,
     reject_suggestion,
+    restore_hit as apply_hit_restore,
     set_hit_review_status,
     suppress_hit as apply_hit_suppression,
     undo_last as apply_undo,
 )
+from supervised_export import export_supervised_state
 ROOT = Path(__file__).resolve().parent
 WEB_DIR = ROOT / "web"
     return f"/api/jobs/{job_id}/files/{relative_path}"
+def _serialise_export(job_id: str, export_manifest: dict[str, Any]) -> dict[str, Any]:
+    payload = dict(export_manifest)
+    payload["file_urls"] = {key: _job_url(job_id, path) for key, path in payload.get("files", {}).items()}
+    payload["samples"] = [
+        {**sample, "url": _job_url(job_id, sample["file"])}
+        for sample in payload.get("samples", [])
+    ]
+    return payload
 def _serialise_job(job: dict[str, Any]) -> dict[str, Any]:
     payload = {key: value for key, value in job.items() if key not in {"input_path", "output_dir"}}
     if payload.get("result"):
     return _state_payload(job_id)
+@app.post("/api/jobs/{job_id}/export")
+def post_supervised_export(job_id: str, payload: dict[str, Any] = Body(default_factory=dict)) -> dict[str, Any]:
+    patch = _json_patch(payload)
+    try:
+        export_manifest = export_supervised_state(
+            _job_output_dir(job_id),
+            job_id,
+            synthesize=bool(patch.get("synthesize", True)),
+            quantize=patch.get("quantize"),
+            subdivision=patch.get("subdivision"),
+        )
+    except Exception as exc:
+        raise HTTPException(status_code=500, detail=str(exc)) from exc
+    return {"export": _serialise_export(job_id, export_manifest), "state": _state_payload(job_id)}
+@app.post("/api/jobs/{job_id}/hits/force-onset")
+def post_force_onset(job_id: str, payload: dict[str, Any] = Body(default_factory=dict)) -> dict[str, Any]:
+    patch = _json_patch(payload)
+    if "onset_sec" not in patch:
+        raise HTTPException(status_code=400, detail="onset_sec is required")
+    try:
+        apply_force_onset(
+            _job_output_dir(job_id),
+            job_id,
+            float(patch["onset_sec"]),
+            duration_ms=patch.get("duration_ms"),
+            label=patch.get("label"),
+            target_cluster_id=patch.get("target_cluster_id"),
+            pre_pad_sec=float(patch.get("pre_pad_sec", 0.003)),
+        )
+    except KeyError as exc:
+        raise HTTPException(status_code=404, detail=str(exc)) from exc
+    except ValueError as exc:
+        raise HTTPException(status_code=400, detail=str(exc)) from exc
+    except Exception as exc:
+        raise HTTPException(status_code=500, detail=str(exc)) from exc
+    return _state_payload(job_id)
 @app.post("/api/jobs/{job_id}/hits/{hit_id}/move")
 def post_move_hit(job_id: str, hit_id: str, payload: dict[str, Any] = Body(default_factory=dict)) -> dict[str, Any]:
     target_cluster_id = _json_patch(payload).get("target_cluster_id")
     return _state_payload(job_id)
+@app.post("/api/jobs/{job_id}/hits/{hit_id}/restore")
+def post_restore_hit(job_id: str, hit_id: str) -> dict[str, Any]:
+    try:
+        apply_hit_restore(_job_output_dir(job_id), job_id, hit_id)
+    except KeyError as exc:
+        raise HTTPException(status_code=404, detail=str(exc)) from exc
+    except Exception as exc:
+        raise HTTPException(status_code=500, detail=str(exc)) from exc
+    return _state_payload(job_id)
 @app.post("/api/jobs/{job_id}/hits/{hit_id}/review")
 def post_review_hit(job_id: str, hit_id: str, payload: dict[str, Any] = Body(default_factory=dict)) -> dict[str, Any]:
     status = str(_json_patch(payload).get("status") or "accepted")

docs/API.md CHANGED Viewed

@@ -221,7 +221,7 @@ The interactive supervision API is backed by `supervised_state.py` and persists
 .runs/<job_id>/output/supervision_state.json
 ```
-The batch `manifest.json` remains immutable. Supervised edits currently update semantic state only; they do not yet regenerate WAV/MIDI/ZIP artifacts.
 ### `GET /api/jobs/{job_id}/state`
@@ -231,18 +231,105 @@ Response keys:
 | Key | Meaning |
 |---|---|
-| `summary` | Counts for hits, clusters, constraints, events, suggestions, suppressed hits, locked clusters, undo availability. |
 | `hits` | Semantic hit rows with confidence, suppression/favorite/review flags, file URLs, and current cluster assignment. |
 | `clusters` | Semantic clusters with hit IDs, representative hit, confidence, locked state, and suppressed count. |
 | `review_queue` | Low-confidence/high-priority hits sorted for review. |
 | `constraints` | Recent replayable constraints. |
 | `events` | Recent state mutation events. |
-| `suggestions` | Open move/split/suppress suggestions. |
 ```bash
 curl http://127.0.0.1:7860/api/jobs/<job-id>/state
 ```
 ### `POST /api/jobs/{job_id}/hits/{hit_id}/move`
 Moves a hit into an existing target cluster.

 .runs/<job_id>/output/supervision_state.json
 ```
+The batch `manifest.json` remains immutable. Supervised edits update semantic state and can be rendered into a separate edited export under `supervised/` without mutating original artifacts.
 ### `GET /api/jobs/{job_id}/state`
 | Key | Meaning |
 |---|---|
+| `summary` | Counts for hits, clusters, constraints, events, suggestions, suppressed/forced hits, locked clusters, latest export, undo availability. |
 | `hits` | Semantic hit rows with confidence, suppression/favorite/review flags, file URLs, and current cluster assignment. |
 | `clusters` | Semantic clusters with hit IDs, representative hit, confidence, locked state, and suppressed count. |
 | `review_queue` | Low-confidence/high-priority hits sorted for review. |
 | `constraints` | Recent replayable constraints. |
 | `events` | Recent state mutation events. |
+| `suggestions` | Open move/split/suppress suggestions, including exact `diff` previews. |
 ```bash
 curl http://127.0.0.1:7860/api/jobs/<job-id>/state
 ```
+### `POST /api/jobs/{job_id}/hits/force-onset`
+Creates a user-forced hit slice from `stem.wav` and adds it to semantic state.
+Body:
+```json
+{
+  "onset_sec": 0.123,
+  "duration_ms": 160,
+  "target_cluster_id": "cluster:0",
+  "label": "snare"
+}
+```
+Required fields:
+| Field | Required | Meaning |
+|---|---:|---|
+| `onset_sec` | yes | Onset location in seconds. |
+| `duration_ms` | no | Slice length. If omitted, the system slices until the next active onset or a bounded default. |
+| `target_cluster_id` | no | Existing cluster to place the hit into. If omitted, a new user cluster is created. |
+| `label` | no | Override label. If omitted, the rule-based classifier labels the forced slice. |
+Effects:
+- writes `review/hits/hit_NNNNN_<label>_forced.wav`,
+- creates a semantic hit with `source=forced`,
+- creates `force-onset` and `force-cluster` constraints,
+- appends `hit.force_onset`,
+- recomputes confidence and review queue.
+### `POST /api/jobs/{job_id}/hits/{hit_id}/restore`
+Restores a suppressed hit.
+Effects:
+- sets `suppressed=false`,
+- clears `review_status=suppressed` back to `unreviewed`,
+- creates a `restore-hit` constraint,
+- appends `hit.restored`,
+- recomputes confidence and review queue.
+### `POST /api/jobs/{job_id}/export`
+Renders the current semantic state into edited artifacts under `supervised/`. This does not modify the original `manifest.json`, original samples, or original ZIP.
+Body:
+```json
+{
+  "synthesize": true,
+  "quantize": true,
+  "subdivision": 16
+}
+```
+Response shape:
+```json
+{
+  "export": {
+    "kind": "supervised-export",
+    "hit_count": 17,
+    "cluster_count": 10,
+    "files": {
+      "archive": "supervised/sample-pack.zip",
+      "midi": "supervised/reconstruction.mid",
+      "reconstruction": "supervised/reconstruction.wav"
+    },
+    "file_urls": {}
+  },
+  "state": {}
+}
+```
+Export rules:
+- suppressed hits are excluded,
+- forced hits are included,
+- moved/pulled hits use current semantic cluster membership,
+- favorite/pinned representatives are honored before quality scoring,
+- cluster labels are sanitized for filenames,
+- `supervision_state.json` receives `latest_export` and a `supervised.exported` event.
 ### `POST /api/jobs/{job_id}/hits/{hit_id}/move`
 Moves a hit into an existing target cluster.

docs/FEATURES.md CHANGED Viewed

@@ -38,6 +38,10 @@ Turn an input audio file into a practical drum sample pack: detected hits, group
 | Pipeline | Reconstruction render | Implemented | Renders MIDI-like reconstruction using selected samples. |
 | Pipeline | Per-hit review export | Implemented | Writes every accepted detected hit to `review/hits/*.wav` and records rows in the manifest. |
 | Pipeline | Sample pack ZIP | Implemented | Includes WAVs, index JSON, MIDI, rendered reconstruction. |
 | Docs | Project review | Implemented | `docs/PROJECT_REVIEW.md`. |
 | Docs | Timing/realtime analysis | Implemented | `docs/PIPELINE_TIMING_AND_REALTIME.md`. |
 | Docs | API docs | Implemented | `docs/API.md`. |
@@ -77,8 +81,8 @@ Turn an input audio file into a practical drum sample pack: detected hits, group
 | Supervision | Pull hit into new cluster | Implemented | Creates a user cluster and cannot-link/force-cluster constraints. |
 | Supervision | Lock cluster | Implemented | Lock state persists and updates confidence/UI. |
 | Supervision | Suppress hit as bleed | Implemented | Marks hit suppressed, stores suppress-pattern, may suggest similar suppressions. |
-| Supervision | Favorite representative | Partial | Pins semantic representative; supervised export does not yet honor it. |
-| Supervision | Suggestion inbox | Partial | Move/split/suppress suggestions can be accepted/rejected; exact diff preview is not implemented. |
 | Supervision | Cluster explanation | Implemented | Backend and UI show confidence reasons, label distribution, outliers, and constraints. |
-| Supervision | Edited artifact re-export | Not implemented | Semantic edits do not yet regenerate sample WAVs, MIDI, reconstruction, or ZIP. |
-| Supervision | Force-onset from waveform | Not implemented | Waveform click currently auditions nearest existing hit only. |

 | Pipeline | Reconstruction render | Implemented | Renders MIDI-like reconstruction using selected samples. |
 | Pipeline | Per-hit review export | Implemented | Writes every accepted detected hit to `review/hits/*.wav` and records rows in the manifest. |
 | Pipeline | Sample pack ZIP | Implemented | Includes WAVs, index JSON, MIDI, rendered reconstruction. |
+| Supervision | Edited artifact re-export | Implemented | `supervised_export.py` writes edited samples, MIDI, reconstruction, ZIP, and `supervised/manifest.json`. |
+| Supervision | Force-onset from waveform | Implemented | Adds user-forced hit slices from cached `stem.wav`; UI add-onset mode posts to `/hits/force-onset`. |
+| Supervision | Suppressed-hit restore | Implemented | Restore endpoint and UI button reverse suppression without undoing unrelated edits. |
+| Supervision | Suggestion diff previews | Implemented | Open suggestions include exact hit/cluster before-after previews and a UI `Diff` button. |
 | Docs | Project review | Implemented | `docs/PROJECT_REVIEW.md`. |
 | Docs | Timing/realtime analysis | Implemented | `docs/PIPELINE_TIMING_AND_REALTIME.md`. |
 | Docs | API docs | Implemented | `docs/API.md`. |
 | Supervision | Pull hit into new cluster | Implemented | Creates a user cluster and cannot-link/force-cluster constraints. |
 | Supervision | Lock cluster | Implemented | Lock state persists and updates confidence/UI. |
 | Supervision | Suppress hit as bleed | Implemented | Marks hit suppressed, stores suppress-pattern, may suggest similar suppressions. |
+| Supervision | Favorite representative | Implemented | Pins semantic representative and supervised export honors it before quality scoring. |
+| Supervision | Suggestion inbox | Implemented | Move/split/suppress suggestions can be accepted/rejected and inspected with exact diff previews. |
 | Supervision | Cluster explanation | Implemented | Backend and UI show confidence reasons, label distribution, outliers, and constraints. |
+| Supervision | Edited artifact re-export | Implemented | Exports edited state into `supervised/` without mutating original batch artifacts. |
+| Supervision | Force-onset from waveform | Implemented | Add-onset mode turns waveform clicks into forced hit slices from `stem.wav`. |

docs/PROGRESS.md CHANGED Viewed

@@ -134,14 +134,56 @@ analyze audio
 → reload completed run with decisions intact
 ```
-It does not yet satisfy the full workstation loop because edited semantic state is not yet rendered into updated sample WAVs, MIDI, reconstruction, or ZIP output.
 ## Next recommended pass after Pass 4
-1. Add supervised re-export endpoint.
-2. Exclude suppressed hits from supervised exports.
-3. Honor favorite/pinned representatives in supervised sample WAVs.
-4. Add force-onset endpoint using cached `stem.wav`.
-5. Add add-onset mode to the waveform UI.
-6. Add restore suppressed hit and batch restore.
-7. Add feature-vector cache for true local reclustering.

 → reload completed run with decisions intact
 ```
+It now satisfies the first full semantic-edit loop because edited semantic state can be rendered into separate supervised sample WAVs, MIDI, reconstruction, and ZIP output.
 ## Next recommended pass after Pass 4
+1. Add cluster merge/relabel/split workflows.
+2. Add feature-vector cache for true local reclustering.
+3. Add edited-vs-original run comparison.
+4. Add browser-level UI tests and migrate frontend to TypeScript/Vite after UX stabilizes.
+## Pass 5: supervised export, force-onset, restore, and suggestion diffs
+Completed in this pass:
+1. Added `supervised_export.py` to render `supervision_state.json` into edited artifacts under `supervised/`.
+2. Added `POST /api/jobs/{job_id}/export` for edited sample-pack export.
+3. Added `POST /api/jobs/{job_id}/hits/force-onset` to create user-forced hit slices from `stem.wav`.
+4. Added add-onset waveform mode in the frontend.
+5. Added `POST /api/jobs/{job_id}/hits/{hit_id}/restore` and a restore button for suppressed hits.
+6. Added exact suggestion diff previews through `suggestion.diff` and a UI `Diff` action.
+7. Updated supervised export to exclude suppressed hits and honor favorite/pinned representatives.
+8. Added `scripts/test_supervised_export_and_force_onset.py`.
+9. Added `docs/SUPERVISED_EXPORT_AND_FORCE_ONSET.md` and updated feature/API/task/progress docs.
+Outcome:
+The project now closes the main semantic-edit loop:
+```text
+analyze audio
+→ inspect hits/clusters
+→ move/pull/suppress/restore/favorite/lock/force-onset
+→ inspect suggestions and diffs
+→ export edited WAV/MIDI/reconstruction/ZIP artifacts
+```
+The original batch artifacts remain immutable. Edited outputs are written separately under `supervised/`.
+Validation performed in this pass:
+- `python3 -m py_compile app.py pipeline_runner.py sample_extractor.py supervised_state.py supervised_export.py scripts/*.py`
+- `node --check web/app.js`
+- `python3 scripts/test_supervised_export_and_force_onset.py`
+- `python3 scripts/test_sse_and_review_hits.py`
+- `python3 scripts/test_interactive_supervision.py`
+- `python3 scripts/test_api_job.py`
+Next recommended pass after Pass 5:
+1. Add cluster merge/relabel/split workflows.
+2. Add cached feature-vector local reclustering around edited hits.
+3. Add edited-vs-original run comparison.
+4. Add browser-level UI tests and migrate the frontend to TypeScript/Vite once the UX stops shifting.

docs/REMAINING_WORK.md CHANGED Viewed

@@ -54,3 +54,21 @@ Highest-priority remaining work now:
 5. **Suggestion diff preview**: show exact before/after membership changes before accepting a suggestion.
 6. **Constraint violation detection**: explicitly report conflicting user constraints.
 7. **Frontend tests and TypeScript migration**: harden the increasingly stateful UI.

 5. **Suggestion diff preview**: show exact before/after membership changes before accepting a suggestion.
 6. **Constraint violation detection**: explicitly report conflicting user constraints.
 7. **Frontend tests and TypeScript migration**: harden the increasingly stateful UI.
+## Closed in Pass 5
+- Supervised edited-state export now writes `supervised/manifest.json`, edited samples, edited MIDI, edited reconstruction WAV, and edited ZIP.
+- Suppressed hits are excluded from edited exports.
+- Favorite/pinned representatives are honored by edited exports.
+- Add-onset mode writes forced hit slices from `stem.wav`.
+- Suppressed hits can be restored without undoing unrelated edits.
+- Suggestions expose exact before/after diffs and the UI can preview them.
+## Current top remaining gaps
+1. Cluster merge/relabel/split workflows.
+2. Cached feature-vector local reclustering around edited hits.
+3. Edited-vs-original comparison view.
+4. Batch restore / bulk operations for suppressed hits.
+5. Browser-level UI tests and TypeScript/Vite hardening.

docs/SUPERVISED_EXPORT_AND_FORCE_ONSET.md ADDED Viewed

	@@ -0,0 +1,110 @@

+# Supervised export and force-onset workflow
+Last updated: 2026-05-12
+## Purpose
+The batch extraction manifest is immutable. Interactive edits are stored in `supervision_state.json`. This pass adds the missing rendering step that turns those semantic edits into new downloadable artifacts without rerunning Demucs or onset detection.
+```text
+manifest.json + review/hits/*.wav + supervision_state.json
+→ supervised/manifest.json
+→ supervised/samples/*.wav
+→ supervised/reconstruction.mid
+→ supervised/reconstruction.wav
+→ supervised/sample-pack.zip
+```
+## Implemented features
+| Feature | Status | Files |
+|---|---:|---|
+| Edited-state export | Implemented | `supervised_export.py`, `POST /api/jobs/{job_id}/export` |
+| Suppressed-hit exclusion | Implemented | Export ignores hits with `suppressed=true`. |
+| Favorite/pinned representative export | Implemented | Export honors `representative_hit_id` / favorite hits before quality scoring. |
+| Force-onset from existing stem audio | Implemented | `POST /api/jobs/{job_id}/hits/force-onset` |
+| Forced-hit review WAV writing | Implemented | `review/hits/hit_NNNNN_<label>_forced.wav` |
+| Suppressed-hit restore | Implemented | `POST /api/jobs/{job_id}/hits/{hit_id}/restore` |
+| Exact suggestion diff preview | Implemented | `suggestion.diff` in state responses and UI diff button. |
+| UI add-onset mode | Implemented | Toggle in supervision header; waveform clicks add forced hits. |
+| UI edited export downloads | Implemented | Edited ZIP/MIDI/reconstruction links render after export. |
+## Export behavior
+The supervised export builds clusters from current semantic state:
+1. Skip suppressed hits.
+2. Load hit audio from each hit's `file` path under the job output directory.
+3. Sanitize cluster labels for output filenames.
+4. Preserve forced hits and moved/pulled hits through current cluster membership.
+5. Pick representatives from semantic `representative_hit_id` or favorite hits first.
+6. Quality-score representatives only for unpinned clusters.
+7. Write edited samples, MIDI, reconstruction WAV, ZIP, and `supervised/manifest.json`.
+8. Append a `supervised.exported` event and `latest_export` entry to `supervision_state.json`.
+The original `manifest.json`, original `sample-pack.zip`, and original `samples/*.wav` are not modified.
+## Force-onset behavior
+`POST /api/jobs/{job_id}/hits/force-onset` requires a completed run with `stem.wav`.
+Body:
+```json
+{
+  "onset_sec": 0.123,
+  "duration_ms": 160,
+  "target_cluster_id": "cluster:0",
+  "label": "snare"
+}
+```
+Rules:
+- `onset_sec` is required.
+- `duration_ms` is optional. If omitted, the system slices until the next active onset or a bounded default duration.
+- `target_cluster_id` is optional. If omitted, a new user cluster is created.
+- `label` is optional. If omitted, the current rule-based classifier labels the slice.
+- A short fade-out is applied to avoid clicks.
+- The forced hit is marked `source="forced"`, `explicit=true`, and `review_status="accepted"`.
+## Suggestion diffs
+Open suggestions now include exact previews:
+```json
+{
+  "type": "move-hits",
+  "affected_hit_count": 3,
+  "hits": [
+    {
+      "hit_id": "hit:00007",
+      "from_cluster_label": "bright_1",
+      "to_cluster_label": "snare_user_1",
+      "before_suppressed": false,
+      "after_suppressed": false
+    }
+  ],
+  "clusters_before": {},
+  "clusters_after": {}
+}
+```
+The frontend exposes this through the `Diff` button in the suggestion inbox.
+## Validation
+Covered by:
+```bash
+python3 scripts/test_supervised_export_and_force_onset.py
+```
+This test verifies:
+- suppression and restore,
+- forced-hit creation and download,
+- suggestion diff presence when suggestions exist,
+- supervised export creation,
+- artifact download URLs for edited ZIP/MIDI/reconstruction,
+- latest export state metadata.

docs/TASKS.md CHANGED Viewed

@@ -76,9 +76,10 @@ Last updated: 2026-05-12
 | Add suggestion inbox | Done/Partial | UI/API supports accept/reject; exact diff preview still open. |
 | Add cluster explanation drawer | Done | `GET /api/jobs/{job_id}/explain/cluster/{cluster_id}` plus UI drawer. |
 | Add semantic undo | Done | `POST /api/jobs/{job_id}/undo`. |
-| Add supervised export from edited state | Todo | Needed so corrections affect ZIP/MIDI/WAV outputs. |
-| Add click-to-add missed onset | Todo | Needed for `force-onset` constraints and direct onset correction. |
-| Add suppressed-hit restore | Todo | Needed as the safety counterpart to suppression. |
 | Add true local feature-neighborhood reclustering | Todo | Requires cached feature vectors and constraint-aware assignment. |
 ## Latest validation tasks
@@ -87,3 +88,4 @@ Last updated: 2026-05-12
 - [x] `node --check web/app.js`
 - [x] `python3 scripts/test_sse_and_review_hits.py`
 - [x] `python3 scripts/test_interactive_supervision.py`

 | Add suggestion inbox | Done/Partial | UI/API supports accept/reject; exact diff preview still open. |
 | Add cluster explanation drawer | Done | `GET /api/jobs/{job_id}/explain/cluster/{cluster_id}` plus UI drawer. |
 | Add semantic undo | Done | `POST /api/jobs/{job_id}/undo`. |
+| Add supervised export from edited state | Done | `supervised_export.py`; `POST /api/jobs/{job_id}/export`; UI edited download links. |
+| Add click-to-add missed onset | Done | Add-onset waveform mode creates forced hits from `stem.wav`. |
+| Add suppressed-hit restore | Done | `POST /api/jobs/{job_id}/hits/{hit_id}/restore`; UI restore button. |
+| Add exact suggestion diff previews | Done | Suggestions expose `diff`; UI has `Diff` preview. |
 | Add true local feature-neighborhood reclustering | Todo | Requires cached feature vectors and constraint-aware assignment. |
 ## Latest validation tasks
 - [x] `node --check web/app.js`
 - [x] `python3 scripts/test_sse_and_review_hits.py`
 - [x] `python3 scripts/test_interactive_supervision.py`
+- [x] `python3 scripts/test_supervised_export_and_force_onset.py`

docs/interactive-ux/ARCHITECTURE_NOTES.md CHANGED Viewed

@@ -14,7 +14,7 @@ Current interactive foundation:
 audio/cache → immutable manifest/artifacts → supervision_state.json → reactive UI → user constraints/events/suggestions
 ```
-The current implementation deliberately keeps the batch extraction artifacts immutable. Interactive edits mutate `supervision_state.json`, not the original `manifest.json`, hit WAVs, representative WAVs, MIDI, reconstruction, or ZIP. This keeps edits cheap and reversible, but supervised re-export is the next architectural step.
 ## Implemented modules
@@ -22,8 +22,9 @@ The current implementation deliberately keeps the batch extraction artifacts imm
 |---|---|
 | `pipeline_runner.py` | Batch extraction, timing, manifests, review-hit WAV exports |
 | `sample_extractor.py` | Audio analysis, classification, batch/online clustering, export helpers |
-| `supervised_state.py` | Persistent semantic job state, constraints, events, confidence, suggestions, undo |
-| `app.py` | FastAPI endpoints for batch jobs and supervised state mutations |
 | `web/app.js` | Browser state rendering, review queue, cluster board, suggestions, actions |
 | `web/index.html` | Workstation layout and interactive supervision panel |
 | `web/styles.css` | Visual treatment for low confidence, suppression, locks, panels |
@@ -85,6 +86,8 @@ Each completed run now gets:
 ```text
 .runs/<job_id>/output/manifest.json
 .runs/<job_id>/output/supervision_state.json
 ```
 `manifest.json` is the immutable batch result. `supervision_state.json` is the mutable, replayable semantic state.
@@ -97,6 +100,9 @@ POST   /api/jobs/{job_id}/hits/{hit_id}/move
 POST   /api/jobs/{job_id}/hits/{hit_id}/pull-out
 POST   /api/jobs/{job_id}/hits/{hit_id}/suppress
 POST   /api/jobs/{job_id}/hits/{hit_id}/review
 POST   /api/jobs/{job_id}/clusters/{cluster_id}/lock
 GET    /api/jobs/{job_id}/suggestions
 POST   /api/jobs/{job_id}/suggestions/{suggestion_id}/accept
@@ -132,7 +138,7 @@ type Suggestion =
   | { type: "suppress-hits"; hit_ids: string[]; confidence: number; reason: string };
 ```
-Suggestion generation currently uses label, spectral centroid, and RMS-energy similarity. Accepted suggestions become explicit constraints/examples.
 ## Event log
@@ -151,6 +157,9 @@ suggestion.created
 suggestion.accepted
 suggestion.rejected
 state.undo
 ```
 The UI renders recent events and constraints in the supervision panel.
@@ -167,15 +176,22 @@ semantic edit
 → recompute hit/cluster confidence and review queue
 ```
-Not implemented yet:
 ```text
 semantic edit
 → load cached feature vectors
 → choose affected neighborhood
 → run constrained local reclustering
-→ create preview diff
-→ optionally apply/re-export artifacts
 ```
 ## UI state implications
@@ -194,10 +210,9 @@ Implemented panels:
 Still missing:
-- edited export panel,
-- force-onset mode,
-- suppression restore UI,
-- side-by-side before/after diff preview.
 ## Implementation warning

 audio/cache → immutable manifest/artifacts → supervision_state.json → reactive UI → user constraints/events/suggestions
 ```
+The current implementation keeps the batch extraction artifacts immutable. Interactive edits mutate `supervision_state.json`, then `supervised_export.py` can render those edits into a separate `supervised/` artifact tree. The original `manifest.json`, original sample WAVs, original MIDI/reconstruction, and original ZIP remain untouched.
 ## Implemented modules
 |---|---|
 | `pipeline_runner.py` | Batch extraction, timing, manifests, review-hit WAV exports |
 | `sample_extractor.py` | Audio analysis, classification, batch/online clustering, export helpers |
+| `supervised_state.py` | Persistent semantic job state, constraints, events, confidence, suggestions, force-onset, restore, undo |
+| `app.py` | FastAPI endpoints for batch jobs, supervised state mutations, force-onset, restore, and edited export |
+| `supervised_export.py` | Converts semantic state into edited WAV/MIDI/reconstruction/ZIP artifacts under `supervised/` |
 | `web/app.js` | Browser state rendering, review queue, cluster board, suggestions, actions |
 | `web/index.html` | Workstation layout and interactive supervision panel |
 | `web/styles.css` | Visual treatment for low confidence, suppression, locks, panels |
 ```text
 .runs/<job_id>/output/manifest.json
 .runs/<job_id>/output/supervision_state.json
+.runs/<job_id>/output/supervised/manifest.json        # after edited export
+.runs/<job_id>/output/supervised/sample-pack.zip      # after edited export
 ```
 `manifest.json` is the immutable batch result. `supervision_state.json` is the mutable, replayable semantic state.
 POST   /api/jobs/{job_id}/hits/{hit_id}/pull-out
 POST   /api/jobs/{job_id}/hits/{hit_id}/suppress
 POST   /api/jobs/{job_id}/hits/{hit_id}/review
+POST   /api/jobs/{job_id}/hits/{hit_id}/restore
+POST   /api/jobs/{job_id}/hits/force-onset
+POST   /api/jobs/{job_id}/export
 POST   /api/jobs/{job_id}/clusters/{cluster_id}/lock
 GET    /api/jobs/{job_id}/suggestions
 POST   /api/jobs/{job_id}/suggestions/{suggestion_id}/accept
   | { type: "suppress-hits"; hit_ids: string[]; confidence: number; reason: string };
 ```
+Suggestion generation currently uses label, spectral centroid, and RMS-energy similarity. Open suggestions include exact `diff` previews showing affected hits and cluster counts before/after. Accepted suggestions become explicit constraints/examples.
 ## Event log
 suggestion.accepted
 suggestion.rejected
 state.undo
+hit.force_onset
+hit.restored
+supervised.exported
 ```
 The UI renders recent events and constraints in the supervision panel.
 → recompute hit/cluster confidence and review queue
 ```
+Still not implemented:
 ```text
 semantic edit
 → load cached feature vectors
 → choose affected neighborhood
 → run constrained local reclustering
+→ update suggestion/recluster preview from real feature margins
+```
+Implemented now:
+```text
+semantic edit
+→ export supervised state
+→ write edited WAV/MIDI/reconstruction/ZIP under supervised/
 ```
 ## UI state implications
 Still missing:
+- cluster merge/relabel/split controls,
+- edited-vs-original comparison view,
+- cached feature-neighborhood local reclustering.
 ## Implementation warning

docs/interactive-ux/FEASIBILITY_MATRIX.md CHANGED Viewed

@@ -98,3 +98,21 @@ That foundation should be implemented before adding higher-risk semantic-space o
 | 5 | Diff preview for suggestions | Makes batch suggestions safer and more trustworthy. |
 | 6 | Constraint violation detection | Prevents silent conflicts once constraints become richer. |
 | 7 | Browser tests | Protects the increasingly stateful UI from regressions. |

 | 5 | Diff preview for suggestions | Makes batch suggestions safer and more trustworthy. |
 | 6 | Constraint violation detection | Prevents silent conflicts once constraints become richer. |
 | 7 | Browser tests | Protects the increasingly stateful UI from regressions. |
+## Pass 5 implementation status
+Implemented after initial alignment:
+- supervised edited-state export under `supervised/`,
+- add-onset waveform mode backed by `POST /api/jobs/{job_id}/hits/force-onset`,
+- suppressed-hit restore backed by `POST /api/jobs/{job_id}/hits/{hit_id}/restore`,
+- exact suggestion diff previews in API state and UI,
+- validation via `scripts/test_supervised_export_and_force_onset.py`.
+Still open:
+- cluster merge/relabel/split workflows,
+- cached feature-vector local reclustering,
+- edited-vs-original comparison,
+- browser-level UI tests.

docs/interactive-ux/FEATURE_REQUIREMENTS.md CHANGED Viewed

@@ -36,10 +36,10 @@ Partially implemented:
 - Local recomputation is currently semantic-state recomputation, not full feature-neighborhood reclustering.
 - Suggestions are heuristic and preview-count based, not full diff previews.
-- Favorite/pin changes semantic representative but does not yet regenerate the sample pack.
 - Confidence scoring is heuristic, not feature-margin/stability based.
-Not implemented yet:
 - Click-to-add missed onset.
 - Restore suppressed hit.
@@ -160,3 +160,21 @@ Status: **partial**. Constraints/events persist and can be reloaded. A dedicated
 ### NFR-006: No silent override of explicit user intent
 Status: **implemented in current semantic layer**. Explicit moves, locks, suppressions, and favorites persist unless undone or explicitly changed.

 - Local recomputation is currently semantic-state recomputation, not full feature-neighborhood reclustering.
 - Suggestions are heuristic and preview-count based, not full diff previews.
+- Favorite/pin changes semantic representative and supervised export honors it when generating the edited sample pack.
 - Confidence scoring is heuristic, not feature-margin/stability based.
+Implemented in Pass 5 where noted; remaining items listed below:
 - Click-to-add missed onset.
 - Restore suppressed hit.
 ### NFR-006: No silent override of explicit user intent
 Status: **implemented in current semantic layer**. Explicit moves, locks, suppressions, and favorites persist unless undone or explicitly changed.
+## Pass 5 implementation status
+Implemented after initial alignment:
+- supervised edited-state export under `supervised/`,
+- add-onset waveform mode backed by `POST /api/jobs/{job_id}/hits/force-onset`,
+- suppressed-hit restore backed by `POST /api/jobs/{job_id}/hits/{hit_id}/restore`,
+- exact suggestion diff previews in API state and UI,
+- validation via `scripts/test_supervised_export_and_force_onset.py`.
+Still open:
+- cluster merge/relabel/split workflows,
+- cached feature-vector local reclustering,
+- edited-vs-original comparison,
+- browser-level UI tests.

docs/interactive-ux/PROGRESS.md CHANGED Viewed

@@ -85,3 +85,10 @@ analyze audio
 ```
 The remaining missing piece is that edited semantic state is not yet reflected in a regenerated sample pack.

 ```
 The remaining missing piece is that edited semantic state is not yet reflected in a regenerated sample pack.
+## Pass 5 alignment
+The interactive UX docs are now aligned with the implemented semantic edit/export loop. The project supports move, pull-out, suppress, restore, favorite, lock, force-onset, suggestion diff preview, undo, and edited artifact export. The current boundary is no longer “semantic only”; edits can now produce separate supervised WAV/MIDI/reconstruction/ZIP artifacts while original batch outputs remain immutable.
+Remaining UX work is concentrated around cluster-level editing and comparison: merge/relabel/split, feature-vector local reclustering, edited-vs-original diff views, and browser tests.

docs/interactive-ux/README.md CHANGED Viewed

@@ -57,3 +57,21 @@ Next implementation should close the gap between semantic edits and artifact out
 6. Add browser-level tests for the interactive supervision panel.
 The strongest technical foundation remains constraint-aware clustering plus uncertainty-driven review. The current pass implements the persistent state and UI/API shell needed for that foundation.

 6. Add browser-level tests for the interactive supervision panel.
 The strongest technical foundation remains constraint-aware clustering plus uncertainty-driven review. The current pass implements the persistent state and UI/API shell needed for that foundation.
+## Pass 5 implementation status
+Implemented after initial alignment:
+- supervised edited-state export under `supervised/`,
+- add-onset waveform mode backed by `POST /api/jobs/{job_id}/hits/force-onset`,
+- suppressed-hit restore backed by `POST /api/jobs/{job_id}/hits/{hit_id}/restore`,
+- exact suggestion diff previews in API state and UI,
+- validation via `scripts/test_supervised_export_and_force_onset.py`.
+Still open:
+- cluster merge/relabel/split workflows,
+- cached feature-vector local reclustering,
+- edited-vs-original comparison,
+- browser-level UI tests.

docs/interactive-ux/SCOPE.md CHANGED Viewed

@@ -171,3 +171,21 @@ Not achieved yet:
 → edited state can be exported reproducibly as updated WAV/MIDI/ZIP artifacts
 → local feature reclustering updates related hits without manually accepting suggestions
 ```

 → edited state can be exported reproducibly as updated WAV/MIDI/ZIP artifacts
 → local feature reclustering updates related hits without manually accepting suggestions
 ```
+## Pass 5 implementation status
+Implemented after initial alignment:
+- supervised edited-state export under `supervised/`,
+- add-onset waveform mode backed by `POST /api/jobs/{job_id}/hits/force-onset`,
+- suppressed-hit restore backed by `POST /api/jobs/{job_id}/hits/{hit_id}/restore`,
+- exact suggestion diff previews in API state and UI,
+- validation via `scripts/test_supervised_export_and_force_onset.py`.
+Still open:
+- cluster merge/relabel/split workflows,
+- cached feature-vector local reclustering,
+- edited-vs-original comparison,
+- browser-level UI tests.

docs/interactive-ux/TASKS.md CHANGED Viewed

@@ -102,4 +102,20 @@ Start with `UX-401` plus supervised export.
 Reason:
-The project now has a replayable state/events/constraints foundation. The largest UX gap is that semantic edits do not yet regenerate edited artifacts. Force-onset is the next direct correction primitive after move/pull/lock/suppress.

 Reason:
+The project now has a replayable state/events/constraints foundation plus supervised edited artifact export. The largest UX gaps are cluster merge/relabel/split, cached feature-neighborhood reclustering, and edited-vs-original comparison.
+## Pass 5 completed
+- [x] Render semantic edits into `supervised/` artifacts.
+- [x] Add force-onset endpoint and waveform add-onset mode.
+- [x] Add suppressed-hit restore endpoint and button.
+- [x] Add exact suggestion diff previews.
+- [x] Add validation script for supervised export and force-onset.
+## Next interactive tasks
+- [ ] Cluster merge/relabel/split controls.
+- [ ] Cached feature-vector local reclustering.
+- [ ] Edited-vs-original comparison view.
+- [ ] Browser tests for waveform add-onset and edited export.

scripts/test_supervised_export_and_force_onset.py ADDED Viewed

	@@ -0,0 +1,113 @@

+#!/usr/bin/env python3
+"""Smoke-test force-onset, restore, suggestion diffs, and supervised export."""
+from __future__ import annotations
+import io
+import json
+import sys
+import time
+from pathlib import Path
+from urllib.parse import quote
+import soundfile as sf
+from fastapi.testclient import TestClient
+sys.path.insert(0, str(Path(__file__).resolve().parents[1]))
+from app import app  # noqa: E402
+from synth_generator import generate_test_song  # noqa: E402
+def wait_for_job(client: TestClient, job_id: str) -> dict:
+    for _ in range(100):
+        payload = client.get(f"/api/jobs/{job_id}").json()
+        if payload["status"] in {"complete", "error"}:
+            return payload
+        time.sleep(0.12)
+    raise TimeoutError(job_id)
+def post_json(client: TestClient, path: str, body: dict | None = None) -> dict:
+    response = client.post(path, json=body or {})
+    response.raise_for_status()
+    return response.json()
+def main() -> int:
+    song = generate_test_song(pattern_name="funk", bars=1, bpm=124, add_bass=False)
+    buf = io.BytesIO()
+    sf.write(buf, song.drums_only, song.sr, format="WAV")
+    buf.seek(0)
+    client = TestClient(app)
+    response = client.post(
+        "/api/jobs",
+        files={"file": ("supervised.wav", buf, "audio/wav")},
+        data={"params": json.dumps({"stem": "all", "clustering_mode": "online_preview", "target_min": 3, "target_max": 10})},
+    )
+    response.raise_for_status()
+    job_id = response.json()["id"]
+    job = wait_for_job(client, job_id)
+    assert job["status"] == "complete", job.get("error")
+    state = client.get(f"/api/jobs/{job_id}/state").json()
+    assert state["summary"]["hit_count"] > 0
+    assert state["summary"]["cluster_count"] > 0
+    first_hit = state["hits"][0]
+    first_cluster = state["clusters"][0]
+    # Suppression is reversible.
+    q_hit = quote(first_hit["id"], safe="")
+    state = post_json(client, f"/api/jobs/{job_id}/hits/{q_hit}/suppress", {"reason": "bleed"})
+    assert state["summary"]["suppressed_hit_count"] >= 1
+    state = post_json(client, f"/api/jobs/{job_id}/hits/{q_hit}/restore", {})
+    assert state["summary"]["suppressed_hit_count"] == 0
+    # Forced onset writes a review WAV and appears in semantic state.
+    state = post_json(
+        client,
+        f"/api/jobs/{job_id}/hits/force-onset",
+        {"onset_sec": 0.123, "target_cluster_id": first_cluster["id"], "duration_ms": 160},
+    )
+    forced = [hit for hit in state["hits"] if hit.get("source") == "forced"]
+    assert forced, "expected forced hit"
+    forced_url = forced[-1]["url"]
+    forced_response = client.get(forced_url)
+    forced_response.raise_for_status()
+    assert forced_response.content[:4] == b"RIFF"
+    # Suggestion diffs are exact previews when suggestions exist.
+    suggestions = state.get("suggestions", [])
+    if suggestions:
+        assert "diff" in suggestions[0]
+        assert "affected_hit_count" in suggestions[0]["diff"]
+    # Supervised export excludes suppressed hits, includes forced hit, and writes downloadable artifacts.
+    export_response = post_json(client, f"/api/jobs/{job_id}/export", {"synthesize": True})
+    export = export_response["export"]
+    assert export["kind"] == "supervised-export"
+    assert export["hit_count"] == state["summary"]["hit_count"] - state["summary"].get("suppressed_hit_count", 0)
+    assert export["cluster_count"] >= 1
+    for key in ["archive", "midi", "reconstruction"]:
+        url = export["file_urls"][key]
+        file_response = client.get(url)
+        file_response.raise_for_status()
+        assert file_response.content, key
+    refreshed = export_response["state"]
+    assert refreshed["summary"]["latest_export"]["path"] == "supervised/manifest.json"
+    print(json.dumps({
+        "status": "ok",
+        "job_id": job_id,
+        "forced_hit_count": refreshed["summary"].get("forced_hit_count"),
+        "export_hit_count": export["hit_count"],
+        "export_cluster_count": export["cluster_count"],
+        "archive": export["files"]["archive"],
+    }, indent=2))
+    return 0
+if __name__ == "__main__":
+    raise SystemExit(main())

supervised_export.py ADDED Viewed

	@@ -0,0 +1,261 @@

+#!/usr/bin/env python3
+"""Render supervised semantic state into edited sample-pack artifacts.
+The batch manifest remains immutable. This module takes the mutable
+``supervision_state.json`` layer, excludes suppressed hits, honors explicit
+representatives/favorites, and writes a separate ``supervised/`` export tree.
+"""
+from __future__ import annotations
+import json
+import os
+import re
+import shutil
+import time
+from dataclasses import asdict
+from pathlib import Path
+from typing import Any
+import numpy as np
+import soundfile as sf
+from sample_extractor import (
+    Cluster,
+    Hit,
+    build_archive,
+    export_midi,
+    render_midi_with_samples,
+    sample_quality_score,
+    select_best,
+    synthesize_from_cluster,
+)
+from supervised_state import load_manifest, load_or_create_state, now, recompute_scores
+def _safe_label(value: Any, fallback: str) -> str:
+    text = str(value or fallback).strip() or fallback
+    text = re.sub(r"[^A-Za-z0-9._-]+", "_", text)
+    text = re.sub(r"_+", "_", text).strip("._-")
+    return text or fallback
+def _read_hit_audio(output_dir: Path, hit: dict[str, Any]) -> tuple[np.ndarray, int]:
+    rel = hit.get("file")
+    if not rel:
+        raise FileNotFoundError(f"Hit {hit.get('id')} does not have a file path")
+    path = (output_dir / rel).resolve()
+    path.relative_to(output_dir.resolve())
+    if not path.exists():
+        raise FileNotFoundError(f"Hit audio missing for {hit.get('id')}: {rel}")
+    audio, sr = sf.read(path, dtype="float32", always_2d=False)
+    if audio.ndim > 1:
+        audio = audio.mean(axis=1)
+    return np.asarray(audio, dtype=np.float32), int(sr)
+def _state_to_clusters(output_dir: Path, state: dict[str, Any]) -> list[Cluster]:
+    hits_by_id = state.get("hits", {})
+    clusters: list[Cluster] = []
+    used_labels: set[str] = set()
+    for ordinal, raw_cluster in enumerate(state.get("clusters", {}).values()):
+        active_hit_ids = [
+            hid
+            for hid in raw_cluster.get("hit_ids", [])
+            if hid in hits_by_id and not hits_by_id[hid].get("suppressed")
+        ]
+        if not active_hit_ids:
+            continue
+        label = _safe_label(raw_cluster.get("label"), f"cluster_{ordinal}")
+        base = label
+        suffix = 1
+        while label in used_labels:
+            suffix += 1
+            label = f"{base}_{suffix}"
+        used_labels.add(label)
+        converted_hits: list[Hit] = []
+        for hid in active_hit_ids:
+            raw_hit = hits_by_id[hid]
+            audio, sr = _read_hit_audio(output_dir, raw_hit)
+            converted_hits.append(
+                Hit(
+                    audio=audio,
+                    sr=sr,
+                    onset_time=float(raw_hit.get("onset_sec") or 0.0),
+                    duration=float(raw_hit.get("duration_ms") or 0.0) / 1000.0 or (len(audio) / max(sr, 1)),
+                    index=int(raw_hit.get("index") or 0),
+                    rms_energy=float(raw_hit.get("rms_energy") or 0.0),
+                    spectral_centroid=float(raw_hit.get("spectral_centroid_hz") or 0.0),
+                    label=str(raw_hit.get("label") or raw_cluster.get("classification") or "other"),
+                    cluster_id=ordinal,
+                )
+            )
+        cluster = Cluster(cluster_id=ordinal, label=label, hits=converted_hits)
+        representative = raw_cluster.get("representative_hit_id")
+        pinned = False
+        if representative in active_hit_ids:
+            cluster.best_hit_idx = active_hit_ids.index(representative)
+            pinned = True
+        else:
+            favorite_idx = next((i for i, hid in enumerate(active_hit_ids) if hits_by_id[hid].get("favorite")), None)
+            if favorite_idx is not None:
+                cluster.best_hit_idx = favorite_idx
+                pinned = True
+        setattr(cluster, "_pinned_by_state", pinned)
+        clusters.append(cluster)
+    # Score representatives only where the supervision state did not pin one.
+    for cluster in clusters:
+        if cluster.count <= 1:
+            cluster.best_hit_idx = 0
+    unpinned = [cluster for cluster in clusters if cluster.count > 1 and not getattr(cluster, "_pinned_by_state", False)]
+    if unpinned:
+        select_best(unpinned)
+    return clusters
+def _write_audio(path: Path, audio: np.ndarray, sr: int, subtype: str = "PCM_16") -> None:
+    path.parent.mkdir(parents=True, exist_ok=True)
+    sf.write(path, audio, sr, subtype=subtype)
+def export_supervised_state(
+    output_dir: str | os.PathLike[str],
+    job_id: str,
+    *,
+    synthesize: bool = True,
+    quantize: bool | None = None,
+    subdivision: int | None = None,
+) -> dict[str, Any]:
+    """Create edited artifacts from ``supervision_state.json``.
+    Returns the JSON manifest written to ``supervised/manifest.json``.
+    """
+    out = Path(output_dir)
+    manifest = load_manifest(out)
+    state = load_or_create_state(job_id, out)
+    recompute_scores(state)
+    export_dir = out / "supervised"
+    if export_dir.exists():
+        shutil.rmtree(export_dir)
+    samples_dir = export_dir / "samples"
+    samples_dir.mkdir(parents=True, exist_ok=True)
+    clusters = _state_to_clusters(out, state)
+    bpm = float(manifest.get("bpm") or 120.0)
+    sr = int(manifest.get("sample_rate") or 44100)
+    params = manifest.get("params") or {}
+    if quantize is None:
+        quantize = bool(params.get("quantize_midi", True))
+    if subdivision is None:
+        subdivision = int(params.get("subdivision", 16))
+    started = time.perf_counter()
+    files: dict[str, str] = {}
+    samples: list[dict[str, Any]] = []
+    midi_path = export_dir / "reconstruction.mid"
+    if clusters:
+        export_midi(clusters, str(midi_path), bpm=bpm, quantize=quantize, subdivision=int(subdivision))
+        rendered = render_midi_with_samples(clusters, sr=sr)
+        if synthesize:
+            for cluster in clusters:
+                if cluster.count >= 2:
+                    cluster.synthesized = synthesize_from_cluster(cluster)
+    else:
+        midi_path.write_bytes(b"")
+        rendered = np.zeros(sr, dtype=np.float32)
+    _write_audio(export_dir / "reconstruction.wav", rendered, sr, subtype="PCM_16")
+    files["midi"] = "supervised/reconstruction.mid"
+    files["reconstruction"] = "supervised/reconstruction.wav"
+    for cluster in sorted(clusters, key=lambda item: item.count, reverse=True):
+        best = cluster.best_hit
+        sample_file = f"supervised/samples/{cluster.label}.wav"
+        best.save(str(out / sample_file))
+        quality = sample_quality_score(best.audio, best.sr, cluster.label.rsplit("_", 1)[0])
+        samples.append(
+            {
+                "label": cluster.label,
+                "classification": cluster.label.rsplit("_", 1)[0],
+                "hits": int(cluster.count),
+                "midi_note": int(cluster.midi_note),
+                "score": round(float(quality["total"]), 2),
+                "cleanness": round(float(quality["cleanness"]), 4),
+                "completeness": round(float(quality["completeness"]), 4),
+                "duration_ms": round(float(best.duration * 1000), 1),
+                "first_onset_sec": round(float(min(hit.onset_time for hit in cluster.hits)), 4),
+                "file": sample_file,
+            }
+        )
+        if cluster.synthesized is not None:
+            _write_audio(out / f"supervised/samples/{cluster.label}__synth.wav", cluster.synthesized, sr, subtype="PCM_24")
+    archive_tmp = build_archive(clusters, bpm, sr, midi_path=str(midi_path), rendered_audio=rendered)
+    archive_rel = "supervised/sample-pack.zip"
+    shutil.copyfile(archive_tmp, out / archive_rel)
+    try:
+        os.unlink(archive_tmp)
+    except OSError:
+        pass
+    files["archive"] = archive_rel
+    active_hits = [hit for hit in state.get("hits", {}).values() if not hit.get("suppressed")]
+    export_manifest = {
+        "kind": "supervised-export",
+        "job_id": job_id,
+        "created_at": now(),
+        "duration_sec": round(time.perf_counter() - started, 6),
+        "source_manifest_fingerprint": state.get("manifest_fingerprint"),
+        "state_updated_at": state.get("updated_at"),
+        "bpm": bpm,
+        "sample_rate": sr,
+        "hit_count": len(active_hits),
+        "suppressed_hit_count": sum(1 for hit in state.get("hits", {}).values() if hit.get("suppressed")),
+        "cluster_count": len(clusters),
+        "quantize_midi": bool(quantize),
+        "subdivision": int(subdivision),
+        "samples": samples,
+        "files": files,
+        "state_summary": {
+            "constraint_count": len(state.get("constraints", [])),
+            "event_count": len(state.get("events", [])),
+            "open_suggestion_count": len([s for s in state.get("suggestions", []) if s.get("status") == "open"]),
+        },
+    }
+    (export_dir / "manifest.json").write_text(json.dumps(export_manifest, indent=2, sort_keys=True), encoding="utf-8")
+    state.setdefault("exports", []).append(
+        {
+            "created_at": export_manifest["created_at"],
+            "path": "supervised/manifest.json",
+            "hit_count": export_manifest["hit_count"],
+            "cluster_count": export_manifest["cluster_count"],
+            "suppressed_hit_count": export_manifest["suppressed_hit_count"],
+        }
+    )
+    state["latest_export"] = state["exports"][-1]
+    state.setdefault("events", []).append(
+        {
+            "id": f"event:export:{int(export_manifest['created_at'] * 1000)}",
+            "type": "supervised.exported",
+            "source": "system",
+            "created_at": export_manifest["created_at"],
+            "payload": {
+                "hit_count": export_manifest["hit_count"],
+                "cluster_count": export_manifest["cluster_count"],
+                "archive": archive_rel,
+            },
+        }
+    )
+    state_path = out / "supervision_state.json"
+    state["updated_at"] = now()
+    state_path.write_text(json.dumps(state, indent=2, sort_keys=True), encoding="utf-8")
+    return export_manifest

supervised_state.py CHANGED Viewed

@@ -5,9 +5,9 @@ The extraction pipeline produces immutable audio artifacts and a batch manifest.
 This module layers replayable semantic state on top of that manifest: hits,
 clusters, constraints, events, suggestions, confidence, and undo snapshots.
-The first implementation intentionally avoids rewriting audio artifacts. It makes
-supervised edits cheap, explicit, inspectable, and reproducible, then leaves
-artifact re-export as a later step.
 """
 from __future__ import annotations
@@ -70,6 +70,21 @@ def _safe_float(value: Any, default: float = 0.0) -> float:
         pass
     return default
 def _snapshot(state: dict[str, Any]) -> dict[str, Any]:
     snap = copy.deepcopy(state)
@@ -367,6 +382,77 @@ def _find_similar_hits(state: dict[str, Any], hit_id: str, *, exclude_cluster: s
     return scored[:limit]
 def _add_suggestion(state: dict[str, Any], suggestion_type: str, payload: dict[str, Any], confidence: float, reason: str) -> dict[str, Any]:
     suggestion = {
         "id": _new_id("suggestion"),
@@ -377,6 +463,7 @@ def _add_suggestion(state: dict[str, Any], suggestion_type: str, payload: dict[s
         "reason": reason,
         **payload,
     }
     state.setdefault("suggestions", []).append(suggestion)
     _event(state, "suggestion.created", {"suggestion_id": suggestion["id"], "type": suggestion_type, "reason": reason})
     return suggestion
@@ -528,6 +615,143 @@ def suppress_hit(output_dir: str | Path, job_id: str, hit_id: str, reason: str =
     return _write_state(output_dir, state)
 def set_hit_review_status(output_dir: str | Path, job_id: str, hit_id: str, status: str = "accepted", source: str = "user") -> dict[str, Any]:
     if status not in {"unreviewed", "accepted", "favorite"}:
         raise ValueError("status must be unreviewed, accepted, or favorite")
@@ -649,8 +873,13 @@ def public_state(state: dict[str, Any], url_for: Callable[[str], str] | None = N
             hit["url"] = url_for(hit["file"])
     clusters.sort(key=lambda c: (-len(c.get("hit_ids", [])), c.get("label", "")))
     hits.sort(key=lambda h: h.get("index", 0))
-    open_suggestions = [s for s in state.get("suggestions", []) if s.get("status") == "open"]
     open_suggestions.sort(key=lambda s: (-_safe_float(s.get("confidence")), s.get("created_at", 0)))
     return {
         "version": state.get("version"),
         "job_id": state.get("job_id"),
@@ -665,6 +894,8 @@ def public_state(state: dict[str, Any], url_for: Callable[[str], str] | None = N
             "suppressed_hit_count": sum(1 for h in hits if h.get("suppressed")),
             "locked_cluster_count": sum(1 for c in clusters if c.get("locked")),
             "undo_available": bool(state.get("undo_stack")),
         },
         "hits": hits,
         "clusters": clusters,

 This module layers replayable semantic state on top of that manifest: hits,
 clusters, constraints, events, suggestions, confidence, and undo snapshots.
+Supervised edits are cheap, explicit, inspectable, and reproducible. A
+separate supervised export step renders the mutable state into edited WAV/MIDI/ZIP
+artifacts without mutating the original batch manifest.
 """
 from __future__ import annotations
         pass
     return default
+def _safe_int(value: Any, default: int = 0) -> int:
+    try:
+        return int(value)
+    except Exception:
+        return default
+def _safe_file_component(value: str) -> str:
+    import re
+    text = str(value or "hit").strip().lower()
+    text = re.sub(r"[^a-z0-9._-]+", "_", text)
+    text = re.sub(r"_+", "_", text).strip("._-")
+    return text or "hit"
 def _snapshot(state: dict[str, Any]) -> dict[str, Any]:
     snap = copy.deepcopy(state)
     return scored[:limit]
+def suggestion_diff(state: dict[str, Any], suggestion: dict[str, Any]) -> dict[str, Any]:
+    """Build an exact before/after preview for a suggestion against current state."""
+    hits = state.get("hits", {})
+    clusters = state.get("clusters", {})
+    stype = suggestion.get("type")
+    hit_ids = [hid for hid in suggestion.get("hit_ids", []) if hid in hits]
+    def cluster_snapshot(cluster_id: str | None) -> dict[str, Any]:
+        cluster = clusters.get(cluster_id or "", {})
+        members = [hid for hid in cluster.get("hit_ids", []) if hid in hits]
+        active = [hid for hid in members if not hits[hid].get("suppressed")]
+        return {
+            "cluster_id": cluster_id,
+            "label": cluster.get("label", cluster_id),
+            "active_count": len(active),
+            "total_count": len(members),
+            "suppressed_count": sum(1 for hid in members if hits[hid].get("suppressed")),
+        }
+    rows = []
+    cluster_ids: set[str] = set()
+    for hid in hit_ids:
+        hit = hits[hid]
+        source_cluster_id = hit.get("cluster_id")
+        target_cluster_id = suggestion.get("target_cluster_id") if stype in {"move-hits", "split-hits"} else source_cluster_id
+        cluster_ids.add(str(source_cluster_id))
+        if target_cluster_id:
+            cluster_ids.add(str(target_cluster_id))
+        rows.append(
+            {
+                "hit_id": hid,
+                "hit_index": hit.get("index"),
+                "label": hit.get("label"),
+                "from_cluster_id": source_cluster_id,
+                "from_cluster_label": clusters.get(source_cluster_id, {}).get("label"),
+                "to_cluster_id": target_cluster_id,
+                "to_cluster_label": clusters.get(target_cluster_id, {}).get("label") if target_cluster_id else None,
+                "before_suppressed": bool(hit.get("suppressed")),
+                "after_suppressed": bool(hit.get("suppressed")) or stype == "suppress-hits",
+                "confidence": hit.get("confidence"),
+            }
+        )
+    before = {cid: cluster_snapshot(cid) for cid in sorted(cluster_ids)}
+    after = copy.deepcopy(before)
+    if stype in {"move-hits", "split-hits"}:
+        target = suggestion.get("target_cluster_id")
+        for row in rows:
+            source = row.get("from_cluster_id")
+            if source in after and source != target:
+                after[source]["active_count"] = max(0, after[source]["active_count"] - 1)
+                after[source]["total_count"] = max(0, after[source]["total_count"] - 1)
+            if target in after and source != target:
+                after[target]["active_count"] += 1
+                after[target]["total_count"] += 1
+    elif stype == "suppress-hits":
+        for row in rows:
+            source = row.get("from_cluster_id")
+            if source in after and not row.get("before_suppressed"):
+                after[source]["active_count"] = max(0, after[source]["active_count"] - 1)
+                after[source]["suppressed_count"] += 1
+    return {
+        "type": stype,
+        "affected_hit_count": len(rows),
+        "hits": rows,
+        "clusters_before": before,
+        "clusters_after": after,
+    }
 def _add_suggestion(state: dict[str, Any], suggestion_type: str, payload: dict[str, Any], confidence: float, reason: str) -> dict[str, Any]:
     suggestion = {
         "id": _new_id("suggestion"),
         "reason": reason,
         **payload,
     }
+    suggestion["diff"] = suggestion_diff(state, suggestion)
     state.setdefault("suggestions", []).append(suggestion)
     _event(state, "suggestion.created", {"suggestion_id": suggestion["id"], "type": suggestion_type, "reason": reason})
     return suggestion
     return _write_state(output_dir, state)
+def restore_hit(output_dir: str | Path, job_id: str, hit_id: str, source: str = "user") -> dict[str, Any]:
+    state = load_or_create_state(job_id, output_dir)
+    hits = state.get("hits", {})
+    if hit_id not in hits:
+        raise KeyError(f"Unknown hit: {hit_id}")
+    _push_undo(state)
+    hit = hits[hit_id]
+    hit["suppressed"] = False
+    hit["review_status"] = "unreviewed" if hit.get("review_status") == "suppressed" else hit.get("review_status", "unreviewed")
+    hit["explicit"] = True
+    _constraint(state, "restore-hit", {"hit_id": hit_id}, source=source)
+    _event(state, "hit.restored", {"hit_id": hit_id}, source=source)
+    recompute_scores(state)
+    return _write_state(output_dir, state)
+def force_onset(
+    output_dir: str | Path,
+    job_id: str,
+    onset_sec: float,
+    *,
+    duration_ms: float | None = None,
+    label: str | None = None,
+    target_cluster_id: str | None = None,
+    pre_pad_sec: float = 0.003,
+    source: str = "user",
+) -> dict[str, Any]:
+    """Create a user-forced hit from ``stem.wav`` and add it to semantic state."""
+    import librosa
+    import numpy as np
+    import soundfile as sf
+    from sample_extractor import Hit as AudioHit, classify_hit
+    out = Path(output_dir)
+    stem_path = out / "stem.wav"
+    if not stem_path.exists():
+        raise FileNotFoundError("stem.wav is required before forcing onsets")
+    state = load_or_create_state(job_id, out)
+    hits = state.setdefault("hits", {})
+    clusters = state.setdefault("clusters", {})
+    onset = max(0.0, _safe_float(onset_sec))
+    audio, sr = sf.read(stem_path, dtype="float32", always_2d=False)
+    if audio.ndim > 1:
+        audio = audio.mean(axis=1)
+    audio = np.asarray(audio, dtype=np.float32)
+    duration = (_safe_float(duration_ms, 0.0) / 1000.0) if duration_ms else 0.0
+    if duration <= 0:
+        future_onsets = sorted(
+            _safe_float(hit.get("onset_sec"))
+            for hit in hits.values()
+            if not hit.get("suppressed") and _safe_float(hit.get("onset_sec")) > onset + 0.01
+        )
+        next_onset = future_onsets[0] if future_onsets else None
+        duration = min(1.5, max(0.08, (next_onset - onset) if next_onset is not None else 0.45))
+    duration = max(0.02, min(10.0, duration))
+    start = max(0, int((onset - max(0.0, pre_pad_sec)) * sr))
+    end = min(len(audio), start + int(duration * sr))
+    if end <= start:
+        raise ValueError("Forced onset is outside the available stem audio")
+    segment = audio[start:end].copy()
+    fade_len = min(int(0.003 * sr), len(segment) // 4)
+    if fade_len > 0:
+        segment[-fade_len:] *= np.linspace(1, 0, fade_len)
+    rms = float(np.sqrt(np.mean(segment**2))) if len(segment) else 0.0
+    spectral_centroid = float(librosa.feature.spectral_centroid(y=segment, sr=sr).mean()) if len(segment) >= 32 else 0.0
+    index = max((_safe_int(hit.get("index"), -1) for hit in hits.values()), default=-1) + 1
+    tmp_hit = AudioHit(audio=segment, sr=sr, onset_time=onset, duration=len(segment) / sr, index=index, rms_energy=rms, spectral_centroid=spectral_centroid)
+    inferred_label = label or classify_hit(tmp_hit)
+    tmp_hit.label = inferred_label
+    hit_id = _hit_id({"index": index})
+    safe_label = _safe_file_component(inferred_label or "forced")
+    rel_file = f"review/hits/hit_{index:05d}_{safe_label}_forced.wav"
+    full_path = out / rel_file
+    full_path.parent.mkdir(parents=True, exist_ok=True)
+    sf.write(full_path, segment, sr, subtype="PCM_24")
+    _push_undo(state)
+    if target_cluster_id and target_cluster_id not in clusters:
+        raise KeyError(f"Unknown cluster: {target_cluster_id}")
+    if not target_cluster_id:
+        state.setdefault("counters", {})["user_clusters"] = int(state.get("counters", {}).get("user_clusters", 0)) + 1
+        target_cluster_id = _new_id("cluster:user")
+        cluster_label = f"{_safe_file_component(inferred_label)}_forced_{state['counters']['user_clusters']}"
+        clusters[target_cluster_id] = {
+            "id": target_cluster_id,
+            "label": cluster_label,
+            "classification": _base_label(cluster_label),
+            "hit_ids": [],
+            "representative_hit_id": hit_id,
+            "locked": False,
+            "user_named": bool(label),
+            "confidence": 0.0,
+            "confidence_reasons": [],
+            "suppressed_count": 0,
+            "original_id": None,
+        }
+    cluster_label = clusters[target_cluster_id].get("label", target_cluster_id)
+    hits[hit_id] = {
+        "id": hit_id,
+        "index": index,
+        "label": str(inferred_label or "other"),
+        "cluster_id": target_cluster_id,
+        "original_cluster_id": None,
+        "cluster_label": cluster_label,
+        "onset_sec": round(onset, 6),
+        "duration_ms": round((len(segment) / sr) * 1000.0, 1),
+        "rms_energy": round(rms, 6),
+        "spectral_centroid_hz": round(spectral_centroid, 1),
+        "file": rel_file,
+        "is_representative": False,
+        "source": "forced",
+        "suppressed": False,
+        "favorite": False,
+        "review_status": "accepted",
+        "confidence": 0.0,
+        "confidence_reasons": [],
+        "explicit": True,
+    }
+    clusters[target_cluster_id].setdefault("hit_ids", [])
+    if hit_id not in clusters[target_cluster_id]["hit_ids"]:
+        clusters[target_cluster_id]["hit_ids"].append(hit_id)
+    if not clusters[target_cluster_id].get("representative_hit_id"):
+        clusters[target_cluster_id]["representative_hit_id"] = hit_id
+    _constraint(state, "force-onset", {"hit_id": hit_id, "onset_sec": round(onset, 6)}, source=source)
+    _constraint(state, "force-cluster", {"hit_id": hit_id, "cluster_id": target_cluster_id}, source=source)
+    _event(state, "hit.force_onset", {"hit_id": hit_id, "onset_sec": round(onset, 6), "cluster_id": target_cluster_id}, source=source)
+    _rebuild_cluster_labels(state)
+    recompute_scores(state)
+    return _write_state(out, state)
 def set_hit_review_status(output_dir: str | Path, job_id: str, hit_id: str, status: str = "accepted", source: str = "user") -> dict[str, Any]:
     if status not in {"unreviewed", "accepted", "favorite"}:
         raise ValueError("status must be unreviewed, accepted, or favorite")
             hit["url"] = url_for(hit["file"])
     clusters.sort(key=lambda c: (-len(c.get("hit_ids", [])), c.get("label", "")))
     hits.sort(key=lambda h: h.get("index", 0))
+    open_suggestions = [copy.deepcopy(s) for s in state.get("suggestions", []) if s.get("status") == "open"]
+    for suggestion in open_suggestions:
+        suggestion["diff"] = suggestion.get("diff") or suggestion_diff(state, suggestion)
     open_suggestions.sort(key=lambda s: (-_safe_float(s.get("confidence")), s.get("created_at", 0)))
+    latest_export = copy.deepcopy(state.get("latest_export"))
+    if latest_export and url_for and latest_export.get("path"):
+        latest_export["url"] = url_for(latest_export["path"])
     return {
         "version": state.get("version"),
         "job_id": state.get("job_id"),
             "suppressed_hit_count": sum(1 for h in hits if h.get("suppressed")),
             "locked_cluster_count": sum(1 for c in clusters if c.get("locked")),
             "undo_available": bool(state.get("undo_stack")),
+            "forced_hit_count": sum(1 for h in hits if h.get("source") == "forced"),
+            "latest_export": latest_export,
         },
         "hits": hits,
         "clusters": clusters,

web/app.js CHANGED Viewed

@@ -15,6 +15,7 @@ let lastResult = null;
 let lastSupervisionState = null;
 let activeJobId = null;
 let selectedHitIndex = null;
 function esc(value) {
   return String(value ?? "").replace(/[&<>'"]/g, (c) => ({ "&": "&amp;", "<": "&lt;", ">": "&gt;", "'": "&#39;", '"': "&quot;" }[c]));
@@ -90,16 +91,20 @@ function currentTargetCluster() {
 function setActionButtons() {
   const hasState = Boolean(activeJobId && lastSupervisionState);
   const hasHit = hasState && selectedHitIndex !== null;
-  for (const id of ["moveHitButton", "pullHitButton", "acceptHitButton", "favoriteHitButton", "suppressHitButton"]) {
     const button = $(id);
     if (button) button.disabled = !hasHit;
   }
-  for (const id of ["refreshStateButton", "undoButton", "lockClusterButton", "explainClusterButton"]) {
     const button = $(id);
     if (button) button.disabled = !hasState;
   }
   const target = currentTargetCluster();
   if ($("lockClusterButton")) $("lockClusterButton").textContent = target?.locked ? "Unlock target cluster" : "Lock target cluster";
   if ($("undoButton") && lastSupervisionState) $("undoButton").disabled = !lastSupervisionState.summary?.undo_available;
 }
@@ -203,6 +208,17 @@ function drawWaveform(overview) {
     ctx.lineTo(x, selected ? h - 3 : h - 10);
     ctx.stroke();
   }
 }
 function playAudio(el, url) {
@@ -213,9 +229,27 @@ function playAudio(el, url) {
   if (promise && typeof promise.catch === "function") promise.catch(() => {});
 }
 function selectHit(index, shouldPlay = true) {
-  if (!lastResult) return;
-  const rawHit = (lastResult.hits ?? []).find((item) => Number(item.index) === Number(index));
   if (!rawHit) return;
   const hit = decorateHit(rawHit);
   selectedHitIndex = hit.index;
@@ -260,7 +294,12 @@ function renderSamples(result) {
 function renderHits(result) {
   const tbody = $("hitsTable").querySelector("tbody");
-  const hits = (result.hits ?? []).map(decorateHit);
   tbody.innerHTML = hits.map((hit) => {
     const confidence = hit.confidence === undefined ? "—" : `${Math.round(Number(hit.confidence) * 100)}%`;
     const flags = [hit.is_representative ? "rep" : null, hit.favorite ? "fav" : null, hit.suppressed ? "suppressed" : null, hit.review_status !== "unreviewed" ? hit.review_status : null].filter(Boolean);
@@ -301,8 +340,13 @@ function renderSupervisionState(state) {
     <span>${esc(summary.constraint_count ?? 0)} constraints</span>
     <span>${esc(summary.open_suggestion_count ?? 0)} suggestions</span>
     <span>${esc(summary.suppressed_hit_count ?? 0)} suppressed</span>
     <span>${esc(summary.locked_cluster_count ?? 0)} locked</span>
   `;
   const currentTarget = $("targetClusterSelect").value;
   $("targetClusterSelect").innerHTML = (state.clusters ?? []).map((cluster) => `
@@ -334,15 +378,30 @@ function renderSupervisionState(state) {
     });
   }
-  $("suggestionInbox").innerHTML = (state.suggestions ?? []).map((suggestion) => `
-    <div class="suggestion-row">
-      <div><strong>${esc(suggestion.type)}</strong><small>${esc(suggestion.reason)} · ${Math.round(Number(suggestion.confidence ?? 0) * 100)}% · ${esc(suggestion.preview_count ?? suggestion.hit_ids?.length ?? 0)} hits</small></div>
-      <div class="row-actions">
-        <button class="mini-button" type="button" data-accept-suggestion="${esc(suggestion.id)}">Accept</button>
-        <button class="mini-button" type="button" data-reject-suggestion="${esc(suggestion.id)}">Reject</button>
       </div>
-    </div>
-  `).join("") || `<p class="empty">No open suggestions.</p>`;
   for (const button of $("suggestionInbox").querySelectorAll("[data-accept-suggestion]")) {
     button.addEventListener("click", () => acceptSuggestion(button.dataset.acceptSuggestion));
   }
@@ -390,6 +449,12 @@ async function suppressSelectedHit() {
   await applyStateAction(`/api/jobs/${encodeURIComponent(activeJobId)}/hits/${encodeURIComponent(hitId)}/suppress`, { reason: "bleed" });
 }
 async function reviewSelectedHit(status) {
   const hitId = hitIdFromIndex(selectedHitIndex);
   if (!activeJobId || !hitId) return;
@@ -425,6 +490,40 @@ async function undoLastEdit() {
   await applyStateAction(`/api/jobs/${encodeURIComponent(activeJobId)}/undo`, {});
 }
 function renderResult(job) {
   const result = job.result;
   if (!result) return;
@@ -580,10 +679,15 @@ function setFile(file) {
 }
 function selectNearestWaveformHit(event) {
-  if (!lastResult?.overview?.onsets?.length) return;
   const rect = $("waveform").getBoundingClientRect();
   const ratio = Math.min(1, Math.max(0, (event.clientX - rect.left) / Math.max(1, rect.width)));
   const time = ratio * Math.max(lastResult.overview.duration_sec, 0.001);
   let best = null;
   let bestDelta = Infinity;
   for (const onset of lastResult.overview.onsets) {
@@ -639,6 +743,9 @@ $("pullHitButton").addEventListener("click", () => pullSelectedHit().catch((erro
 $("acceptHitButton").addEventListener("click", () => reviewSelectedHit("accepted").catch((error) => { $("clusterExplanation").textContent = error.message; }));
 $("favoriteHitButton").addEventListener("click", () => reviewSelectedHit("favorite").catch((error) => { $("clusterExplanation").textContent = error.message; }));
 $("suppressHitButton").addEventListener("click", () => suppressSelectedHit().catch((error) => { $("clusterExplanation").textContent = error.message; }));
 $("lockClusterButton").addEventListener("click", () => toggleTargetClusterLock().catch((error) => { $("clusterExplanation").textContent = error.message; }));
 $("explainClusterButton").addEventListener("click", () => explainTargetCluster().catch((error) => { $("clusterExplanation").textContent = error.message; }));
 $("targetClusterSelect").addEventListener("change", setActionButtons);

 let lastSupervisionState = null;
 let activeJobId = null;
 let selectedHitIndex = null;
+let forceOnsetMode = false;
 function esc(value) {
   return String(value ?? "").replace(/[&<>'"]/g, (c) => ({ "&": "&amp;", "<": "&lt;", ">": "&gt;", "'": "&#39;", '"': "&quot;" }[c]));
 function setActionButtons() {
   const hasState = Boolean(activeJobId && lastSupervisionState);
   const hasHit = hasState && selectedHitIndex !== null;
+  for (const id of ["moveHitButton", "pullHitButton", "acceptHitButton", "favoriteHitButton", "suppressHitButton", "restoreHitButton"]) {
     const button = $(id);
     if (button) button.disabled = !hasHit;
   }
+  for (const id of ["refreshStateButton", "undoButton", "lockClusterButton", "explainClusterButton", "exportStateButton", "forceOnsetButton"]) {
     const button = $(id);
     if (button) button.disabled = !hasState;
   }
   const target = currentTargetCluster();
   if ($("lockClusterButton")) $("lockClusterButton").textContent = target?.locked ? "Unlock target cluster" : "Lock target cluster";
+  if ($("forceOnsetButton")) {
+    $("forceOnsetButton").textContent = forceOnsetMode ? "Add-onset mode on" : "Add-onset mode off";
+    $("forceOnsetButton").classList.toggle("active", forceOnsetMode);
+  }
   if ($("undoButton") && lastSupervisionState) $("undoButton").disabled = !lastSupervisionState.summary?.undo_available;
 }
     ctx.lineTo(x, selected ? h - 3 : h - 10);
     ctx.stroke();
   }
+  for (const hit of lastSupervisionState?.hits ?? []) {
+    if (hit.source !== "forced") continue;
+    const x = (Number(hit.onset_sec) / Math.max(overview.duration_sec, 0.001)) * w;
+    const selected = Number(hit.index) === Number(selectedHitIndex);
+    ctx.strokeStyle = selected ? "rgba(255,255,255,.98)" : "rgba(85,230,165,.9)";
+    ctx.lineWidth = selected ? 2.8 : 1.6;
+    ctx.beginPath();
+    ctx.moveTo(x, 2);
+    ctx.lineTo(x, h - 2);
+    ctx.stroke();
+  }
 }
 function playAudio(el, url) {
   if (promise && typeof promise.catch === "function") promise.catch(() => {});
 }
+function stateOnlyHitByIndex(index) {
+  const stateHit = stateHitByIndex(index);
+  if (!stateHit) return null;
+  return {
+    index: stateHit.index,
+    label: stateHit.label,
+    cluster_id: stateHit.cluster_id,
+    cluster_label: stateHit.cluster_label,
+    is_representative: false,
+    onset_sec: stateHit.onset_sec,
+    duration_ms: stateHit.duration_ms,
+    rms_energy: stateHit.rms_energy,
+    spectral_centroid_hz: stateHit.spectral_centroid_hz,
+    file: stateHit.file,
+    url: stateHit.url,
+  };
+}
 function selectHit(index, shouldPlay = true) {
+  if (!lastResult && !lastSupervisionState) return;
+  const rawHit = (lastResult?.hits ?? []).find((item) => Number(item.index) === Number(index)) ?? stateOnlyHitByIndex(index);
   if (!rawHit) return;
   const hit = decorateHit(rawHit);
   selectedHitIndex = hit.index;
 function renderHits(result) {
   const tbody = $("hitsTable").querySelector("tbody");
+  const baseHits = (result.hits ?? []).map(decorateHit);
+  const seen = new Set(baseHits.map((hit) => Number(hit.index)));
+  const stateOnlyHits = (lastSupervisionState?.hits ?? [])
+    .filter((hit) => !seen.has(Number(hit.index)))
+    .map((hit) => decorateHit(stateOnlyHitByIndex(hit.index)));
+  const hits = [...baseHits, ...stateOnlyHits].sort((a, b) => Number(a.index) - Number(b.index));
   tbody.innerHTML = hits.map((hit) => {
     const confidence = hit.confidence === undefined ? "—" : `${Math.round(Number(hit.confidence) * 100)}%`;
     const flags = [hit.is_representative ? "rep" : null, hit.favorite ? "fav" : null, hit.suppressed ? "suppressed" : null, hit.review_status !== "unreviewed" ? hit.review_status : null].filter(Boolean);
     <span>${esc(summary.constraint_count ?? 0)} constraints</span>
     <span>${esc(summary.open_suggestion_count ?? 0)} suggestions</span>
     <span>${esc(summary.suppressed_hit_count ?? 0)} suppressed</span>
+    <span>${esc(summary.forced_hit_count ?? 0)} forced</span>
     <span>${esc(summary.locked_cluster_count ?? 0)} locked</span>
+    ${summary.latest_export ? `<span>latest edited export · ${esc(summary.latest_export.hit_count)} hits / ${esc(summary.latest_export.cluster_count)} clusters</span>` : ""}
   `;
+  if (summary.latest_export?.url) {
+    $("editedDownloads").innerHTML = `<a href="${esc(summary.latest_export.url)}" download>Edited export manifest</a>`;
+  }
   const currentTarget = $("targetClusterSelect").value;
   $("targetClusterSelect").innerHTML = (state.clusters ?? []).map((cluster) => `
     });
   }
+  $("suggestionInbox").innerHTML = (state.suggestions ?? []).map((suggestion) => {
+    const diff = suggestion.diff ?? {};
+    const diffText = diff.affected_hit_count == null
+      ? "no diff"
+      : `${diff.affected_hit_count} affected · ${Object.keys(diff.clusters_before ?? {}).length} clusters`;
+    const hitPreview = (diff.hits ?? []).slice(0, 4).map((hit) => `#${hit.hit_index}: ${hit.from_cluster_label ?? hit.from_cluster_id} → ${hit.after_suppressed ? "suppressed" : (hit.to_cluster_label ?? hit.to_cluster_id)}`).join("; ");
+    return `
+      <div class="suggestion-row">
+        <div><strong>${esc(suggestion.type)}</strong><small>${esc(suggestion.reason)} · ${Math.round(Number(suggestion.confidence ?? 0) * 100)}% · ${esc(diffText)}${hitPreview ? ` · ${esc(hitPreview)}` : ""}</small></div>
+        <div class="row-actions">
+          <button class="mini-button" type="button" data-preview-suggestion="${esc(suggestion.id)}">Diff</button>
+          <button class="mini-button" type="button" data-accept-suggestion="${esc(suggestion.id)}">Accept</button>
+          <button class="mini-button" type="button" data-reject-suggestion="${esc(suggestion.id)}">Reject</button>
+        </div>
       </div>
+    `;
+  }).join("") || `<p class="empty">No open suggestions.</p>`;
+  for (const button of $("suggestionInbox").querySelectorAll("[data-preview-suggestion]")) {
+    button.addEventListener("click", () => {
+      const suggestion = (lastSupervisionState?.suggestions ?? []).find((item) => item.id === button.dataset.previewSuggestion);
+      $("clusterExplanation").classList.remove("empty");
+      $("clusterExplanation").textContent = JSON.stringify(suggestion?.diff ?? {}, null, 2);
+    });
+  }
   for (const button of $("suggestionInbox").querySelectorAll("[data-accept-suggestion]")) {
     button.addEventListener("click", () => acceptSuggestion(button.dataset.acceptSuggestion));
   }
   await applyStateAction(`/api/jobs/${encodeURIComponent(activeJobId)}/hits/${encodeURIComponent(hitId)}/suppress`, { reason: "bleed" });
 }
+async function restoreSelectedHit() {
+  const hitId = hitIdFromIndex(selectedHitIndex);
+  if (!activeJobId || !hitId) return;
+  await applyStateAction(`/api/jobs/${encodeURIComponent(activeJobId)}/hits/${encodeURIComponent(hitId)}/restore`, {});
+}
 async function reviewSelectedHit(status) {
   const hitId = hitIdFromIndex(selectedHitIndex);
   if (!activeJobId || !hitId) return;
   await applyStateAction(`/api/jobs/${encodeURIComponent(activeJobId)}/undo`, {});
 }
+function renderEditedExport(exportPayload) {
+  const fileUrls = exportPayload?.file_urls ?? {};
+  const labels = { archive: "Edited sample pack ZIP", midi: "Edited MIDI", reconstruction: "Edited reconstruction WAV" };
+  $("editedDownloads").innerHTML = Object.entries(fileUrls)
+    .map(([key, url]) => `<a href="${esc(url)}" download>${esc(labels[key] ?? key)}</a>`)
+    .join("");
+}
+async function exportEditedPack() {
+  if (!activeJobId) return;
+  $("exportStateButton").disabled = true;
+  try {
+    const payload = await jsonApi(`/api/jobs/${encodeURIComponent(activeJobId)}/export`, { synthesize: true });
+    renderEditedExport(payload.export);
+    renderSupervisionState(payload.state);
+    $("clusterExplanation").classList.remove("empty");
+    $("clusterExplanation").textContent = JSON.stringify(payload.export, null, 2);
+  } finally {
+    setActionButtons();
+  }
+}
+async function forceOnsetAtTime(timeSec) {
+  if (!activeJobId) return;
+  const body = { onset_sec: Number(timeSec) };
+  const target = currentTargetCluster();
+  if (target) body.target_cluster_id = target.id;
+  const before = new Set((lastSupervisionState?.hits ?? []).map((hit) => hit.id));
+  const state = await jsonApi(`/api/jobs/${encodeURIComponent(activeJobId)}/hits/force-onset`, body);
+  renderSupervisionState(state);
+  const added = (state.hits ?? []).find((hit) => !before.has(hit.id) && hit.source === "forced");
+  if (added) selectHit(added.index);
+}
 function renderResult(job) {
   const result = job.result;
   if (!result) return;
 }
 function selectNearestWaveformHit(event) {
+  if (!lastResult?.overview) return;
   const rect = $("waveform").getBoundingClientRect();
   const ratio = Math.min(1, Math.max(0, (event.clientX - rect.left) / Math.max(1, rect.width)));
   const time = ratio * Math.max(lastResult.overview.duration_sec, 0.001);
+  if (forceOnsetMode) {
+    forceOnsetAtTime(time).catch((error) => { $("clusterExplanation").textContent = error.message; });
+    return;
+  }
+  if (!lastResult.overview.onsets?.length) return;
   let best = null;
   let bestDelta = Infinity;
   for (const onset of lastResult.overview.onsets) {
 $("acceptHitButton").addEventListener("click", () => reviewSelectedHit("accepted").catch((error) => { $("clusterExplanation").textContent = error.message; }));
 $("favoriteHitButton").addEventListener("click", () => reviewSelectedHit("favorite").catch((error) => { $("clusterExplanation").textContent = error.message; }));
 $("suppressHitButton").addEventListener("click", () => suppressSelectedHit().catch((error) => { $("clusterExplanation").textContent = error.message; }));
+$("restoreHitButton").addEventListener("click", () => restoreSelectedHit().catch((error) => { $("clusterExplanation").textContent = error.message; }));
+$("exportStateButton").addEventListener("click", () => exportEditedPack().catch((error) => { $("clusterExplanation").textContent = error.message; setActionButtons(); }));
+$("forceOnsetButton").addEventListener("click", () => { forceOnsetMode = !forceOnsetMode; setActionButtons(); });
 $("lockClusterButton").addEventListener("click", () => toggleTargetClusterLock().catch((error) => { $("clusterExplanation").textContent = error.message; }));
 $("explainClusterButton").addEventListener("click", () => explainTargetCluster().catch((error) => { $("clusterExplanation").textContent = error.message; }));
 $("targetClusterSelect").addEventListener("change", setActionButtons);

web/index.html CHANGED Viewed

@@ -196,10 +196,13 @@
               </div>
               <div class="supervision-actions">
                 <button id="refreshStateButton" class="ghost-button" type="button">Refresh state</button>
                 <button id="undoButton" class="ghost-button" type="button" disabled>Undo edit</button>
               </div>
             </div>
             <div id="supervisionSummary" class="state-summary">No interactive state loaded.</div>
             <div class="supervision-tools">
               <label>Target cluster
                 <select id="targetClusterSelect"></select>
@@ -209,6 +212,7 @@
               <button id="acceptHitButton" class="secondary-button" type="button" disabled>Accept hit</button>
               <button id="favoriteHitButton" class="secondary-button" type="button" disabled>Favorite as representative</button>
               <button id="suppressHitButton" class="secondary-button danger-button" type="button" disabled>Suppress as bleed</button>
               <button id="lockClusterButton" class="secondary-button" type="button" disabled>Lock target cluster</button>
               <button id="explainClusterButton" class="secondary-button" type="button" disabled>Explain target cluster</button>
             </div>

               </div>
               <div class="supervision-actions">
                 <button id="refreshStateButton" class="ghost-button" type="button">Refresh state</button>
+                <button id="exportStateButton" class="ghost-button" type="button" disabled>Export edited pack</button>
+                <button id="forceOnsetButton" class="ghost-button" type="button" disabled>Add-onset mode off</button>
                 <button id="undoButton" class="ghost-button" type="button" disabled>Undo edit</button>
               </div>
             </div>
             <div id="supervisionSummary" class="state-summary">No interactive state loaded.</div>
+            <div id="editedDownloads" class="downloads edited-downloads"></div>
             <div class="supervision-tools">
               <label>Target cluster
                 <select id="targetClusterSelect"></select>
               <button id="acceptHitButton" class="secondary-button" type="button" disabled>Accept hit</button>
               <button id="favoriteHitButton" class="secondary-button" type="button" disabled>Favorite as representative</button>
               <button id="suppressHitButton" class="secondary-button danger-button" type="button" disabled>Suppress as bleed</button>
+              <button id="restoreHitButton" class="secondary-button" type="button" disabled>Restore selected hit</button>
               <button id="lockClusterButton" class="secondary-button" type="button" disabled>Lock target cluster</button>
               <button id="explainClusterButton" class="secondary-button" type="button" disabled>Explain target cluster</button>
             </div>

web/styles.css CHANGED Viewed

@@ -120,3 +120,6 @@ tr.low-confidence td { background: rgba(255,202,107,.06); }
 tr.low-confidence.selected td { background: rgba(139,211,255,.15); }
 @media (max-width: 1320px) { .supervision-tools { grid-template-columns: repeat(3, minmax(0, 1fr)); } .supervision-grid { grid-template-columns: repeat(2, minmax(0, 1fr)); } }
 @media (max-width: 760px) { .supervision-header { display: block; } .supervision-actions { justify-content: flex-start; margin-top: 10px; } .supervision-tools, .supervision-grid { grid-template-columns: 1fr; } }

 tr.low-confidence.selected td { background: rgba(139,211,255,.15); }
 @media (max-width: 1320px) { .supervision-tools { grid-template-columns: repeat(3, minmax(0, 1fr)); } .supervision-grid { grid-template-columns: repeat(2, minmax(0, 1fr)); } }
 @media (max-width: 760px) { .supervision-header { display: block; } .supervision-actions { justify-content: flex-start; margin-top: 10px; } .supervision-tools, .supervision-grid { grid-template-columns: 1fr; } }
+.ghost-button.active, .secondary-button.active { border-color: rgba(85,230,165,.7); background: rgba(85,230,165,.13); color: #d9ffe9; }
+.edited-downloads { margin: -4px 0 14px; }
+.edited-downloads:empty { display: none; }