Spaces:

rikhoffbauer2
/

ai-techno-dj

Running

App Files Files Community

Rik Hoffbauer commited on 24 days ago

Commit

3a6aebd

1 Parent(s): 6362e08

Update documentation and add smoke tests for remaining gaps

Browse files

Files changed (3) hide show

README.md +8 -3
docs/remaining-gaps-addressed.md +84 -0
tests/smoke_remaining_gaps.py +134 -0

README.md CHANGED Viewed

@@ -17,7 +17,7 @@ tags:
   - mixing
   - stem-separation
   - demucs
-short_description: AI DJ analyzes songs and renders DJ sets
 ---
 # AI DJ Set Builder
@@ -46,6 +46,10 @@ The original approach was too confident: it used one asserted mix-in, one assert
 This version adds:
 - ranked cue candidates with evidence in `cue_graph.py`
 - downbeat phase confidence instead of blind `beats[::4]`
 - transition edge scoring in `transition_optimizer.py`
@@ -58,10 +62,10 @@ This version adds:
 The system still does not prove DJ quality automatically. Metrics are diagnostics, not human preference. The remaining limits are empirical rather than missing product mechanisms:
-- cue learning requires labeled examples in `data/cue_model.json` or JSONL training data
 - listening benchmarks require real audition/rating data before they can validate quality
 - drum-lane decomposition is heuristic band splitting, not true instrument separation
-- full-set rendering and transition previews both use the AutomationIR renderer
 See:
@@ -69,6 +73,7 @@ See:
 - [`docs/architecture-after-review.md`](docs/architecture-after-review.md)
 - [`docs/shortcomings-addressed.md`](docs/shortcomings-addressed.md)
 - [`docs/implementation-completion.md`](docs/implementation-completion.md)
 ## Local run

   - mixing
   - stem-separation
   - demucs
+short_description: AI analyzes songs, plans cue-aware transitions, renders DJ sets
 ---
 # AI DJ Set Builder
 This version adds:
+- waveform-backed cue editor with ranked cue overlays and manual feedback export
+- stem-file-aware AutomationIR rendering with Demucs-cache detection and explicit diagnostics
+- feedback-to-learning path from cue edits and listening ratings
+- transition diagnostics for silence, low-end jumps, HF spikes, clipping risk, and crest factor
 - ranked cue candidates with evidence in `cue_graph.py`
 - downbeat phase confidence instead of blind `beats[::4]`
 - transition edge scoring in `transition_optimizer.py`
 The system still does not prove DJ quality automatically. Metrics are diagnostics, not human preference. The remaining limits are empirical rather than missing product mechanisms:
+- cue learning requires labeled examples in `data/cue_model.json`, `data/manual-cue-edits.jsonl`, or decisive listening ratings
 - listening benchmarks require real audition/rating data before they can validate quality
 - drum-lane decomposition is heuristic band splitting, not true instrument separation
+- full-set rendering and transition previews both use the AutomationIR renderer; stem-style transitions now use component lanes when possible
 See:
 - [`docs/architecture-after-review.md`](docs/architecture-after-review.md)
 - [`docs/shortcomings-addressed.md`](docs/shortcomings-addressed.md)
 - [`docs/implementation-completion.md`](docs/implementation-completion.md)
+- [`docs/remaining-gaps-addressed.md`](docs/remaining-gaps-addressed.md)
 ## Local run

docs/remaining-gaps-addressed.md ADDED Viewed

	@@ -0,0 +1,84 @@

+# Remaining gaps addressed
+Date: 2026-05-02
+This pass addresses the remaining gaps that were previously called out as not honestly complete.
+## 1. Waveform-backed manual cue editor
+Added `cue_editor.py` and integrated it into the Gradio transition tab.
+The editor now:
+- renders real waveform overviews for track A and track B;
+- overlays selected transition anchors;
+- overlays ranked cue candidates from analysis;
+- exposes cue candidate dropdowns for A mix-out, B mix-in, and B drop;
+- applies those choices back into `TransitionPlan.selected_cues`;
+- persists manual cue edits as positive cue-training examples in `data/manual-cue-edits.jsonl`.
+This is still not a DAW-grade draggable waveform editor, but it is no longer a numeric-only form.
+## 2. Stem-file-aware AutomationIR rendering
+Added `stem_provider.py` and connected it to both transition preview rendering and full-set AutomationIR rendering.
+The renderer now:
+- looks for existing Demucs-style stem files before using heuristic component lanes;
+- supports common layouts such as `separated/htdemucs/<track>/{drums,bass,vocals,other}.wav` and `data/stems/htdemucs/<track>/*.wav`;
+- can optionally invoke Demucs when `AI_DJ_ENABLE_DEMUCS=1` is set;
+- uses real broad stems for bass/vocals/other and splits the drum stem into kick/snare-hat/top lanes;
+- records stem-provider diagnostics so fallback behavior is visible.
+Full-set rendering now keeps component lanes for tracks participating in stem-style transitions instead of silently collapsing those sections back to full-track fades.
+## 3. Feedback-to-learning path
+Extended `cue_learning.py` so feedback is not dead data.
+New paths:
+- manual waveform/numeric cue edits append supervised examples;
+- accepted/high-rated transitions become positive cue examples;
+- rejected/low-rated transitions become negative cue examples;
+- the UI can train `data/cue_model.json` from accumulated manual edits and listening ratings.
+This is not a large neural cue detector. It is a practical local learning loop that lets the prototype adapt to user corrections and audition outcomes.
+## 4. Listening diagnostics beyond spectral smoothness
+Added `transition_diagnostics.py` and integrated it into candidate previews.
+The diagnostics now check for:
+- accidental silence;
+- low-end discontinuity;
+- high-frequency spike risk;
+- clipping/limiter risk;
+- unstable crest factor.
+These are still diagnostics, not proof of musical quality.
+## 5. Verification coverage
+Added `tests/smoke_remaining_gaps.py` covering:
+- waveform cue editor image generation;
+- cue choice parsing/application;
+- existing Demucs-style stem cache detection;
+- component-lane generation in transition and full-set IR;
+- transition diagnostic warnings;
+- cue-model training from ratings and manual examples.
+## Still intentionally not claimed
+The project still does not claim:
+- a production DAW timeline with draggable cue handles;
+- a validated large-dataset deep cue detector;
+- perfect stem isolation;
+- human-preference validation without real human ratings;
+- release-grade psychoacoustic quality scoring.
+Those are product/research milestones, not something that can be honestly proven by a local smoke test.

tests/smoke_remaining_gaps.py ADDED Viewed

	@@ -0,0 +1,134 @@

+"""Smoke checks for the remaining-gap implementation pass."""
+from __future__ import annotations
+import math
+import tempfile
+from pathlib import Path
+from types import SimpleNamespace
+import sys
+ROOT = Path(__file__).resolve().parents[1]
+if str(ROOT) not in sys.path:
+    sys.path.insert(0, str(ROOT))
+import numpy as np
+import soundfile as sf
+from automation_ir import ClipRef, build_transition_ir
+from automation_set_renderer import build_set_automation_ir
+from cue_editor import render_transition_cue_editor, choices_for_transition, apply_choices_to_plan
+from cue_learning import train_from_listening_ratings, append_training_example, train_from_jsonl
+from listening_benchmarks import record_transition_rating
+from stem_provider import StemProvider
+from transition_diagnostics import diagnose_transition_audio
+def _tone(path: Path, freq: float, *, duration: float = 5.0, sr: int = 44100) -> str:
+    t = np.arange(int(sr * duration)) / sr
+    y = (0.1 * np.sin(2 * math.pi * freq * t)).astype(np.float32)
+    sf.write(path, y, sr)
+    return str(path)
+def _track(path: str, name: str, *, bpm: float = 120.0) -> SimpleNamespace:
+    return SimpleNamespace(
+        path=path,
+        filename=name,
+        duration=5.0,
+        bpm=bpm,
+        avg_energy=0.2,
+        cue_points=[
+            {"kind": "mix_out", "time": 2.0, "label": "out", "confidence": 0.9, "evidence": {"source": "test", "phrase_score": 1.0}},
+            {"kind": "mix_in", "time": 0.5, "label": "in", "confidence": 0.8, "evidence": {"source": "test", "phrase_score": 1.0}},
+            {"kind": "first_drop", "time": 2.5, "label": "drop", "confidence": 0.85, "evidence": {"source": "test", "energy_delta": 1.0}},
+            {"kind": "drop", "time": 2.5, "label": "drop", "confidence": 0.85, "evidence": {"source": "test", "energy_delta": 1.0}},
+        ],
+        segments=[
+            {"start": 0, "end": 2, "label": "intro", "energy": 0.1},
+            {"start": 2, "end": 5, "label": "drop", "energy": 0.3},
+        ],
+    )
+def main() -> None:
+    with tempfile.TemporaryDirectory() as td:
+        td_path = Path(td)
+        a_path = _tone(td_path / "a.wav", 110)
+        b_path = _tone(td_path / "b.wav", 220)
+        track_a = _track(a_path, "a.wav")
+        track_b = _track(b_path, "b.wav")
+        plan = SimpleNamespace(
+            transition_type="bass_swap",
+            mix_out_point=2.0,
+            mix_in_point=0.5,
+            duration_seconds=2.0,
+            duration_beats=16,
+            bpm_adjustment=1.0,
+            selected_cues={
+                "a_out": {"time": 2.0, "confidence": 0.9, "label": "out"},
+                "b_in": {"time": 0.5, "confidence": 0.8, "label": "in"},
+                "b_drop": {"time": 2.5, "confidence": 0.85, "label": "drop"},
+            },
+            score_breakdown={"overall": 0.75},
+            alternatives=[],
+        )
+        image, summary = render_transition_cue_editor(track_a, track_b, plan, output_dir=td_path)
+        assert Path(image).exists()
+        assert "Waveform cue editor" in summary
+        choices = choices_for_transition(track_a, track_b, plan)
+        assert choices["a_choices"] and choices["b_in_choices"] and choices["b_drop_choices"]
+        mix_out, mix_in, duration, selected = apply_choices_to_plan(
+            plan,
+            a_choice=choices["a_choices"][0][1],
+            b_in_choice=choices["b_in_choices"][0][1],
+            b_drop_choice=choices["b_drop_choices"][0][1],
+            transition_type="drums_first",
+        )
+        assert mix_out == 2.0 and mix_in == 0.5 and duration == 2.0
+        assert selected["b_drop"]["time"] == 2.5
+        # Existing Demucs-style stem cache should be used when present.
+        stem_dir = td_path / "data" / "stems" / "htdemucs" / "a"
+        stem_dir.mkdir(parents=True)
+        for stem, freq in [("drums", 60), ("bass", 90), ("vocals", 300), ("other", 600)]:
+            _tone(stem_dir / f"{stem}.wav", freq)
+        provider = StemProvider(cache_dir=td_path / "data" / "stems", enable_demucs=False)
+        clip = ClipRef("A", "A", a_path, 0.0, 5.0, 0.0, 1.0)
+        full = np.zeros((2, 44100), dtype=np.float64)
+        kick = provider.resolve(clip, "kick", full, 44100)
+        melody = provider.resolve(clip, "melody", full, 44100)
+        assert kick is not None and kick.shape[0] == 2
+        assert melody is not None and melody.shape[0] == 2
+        assert provider.diagnostics
+        ir = build_transition_ir(plan, track_a, track_b, sr=44100)
+        assert any(lane.component == "kick" for lane in ir.lanes)
+        set_ir = build_set_automation_ir([track_a, track_b], [0, 1], [plan], sr=44100)
+        assert any(lane.component == "kick" for lane in set_ir.lanes)
+        assert set_ir.metadata["transitions"][0]["component_lanes"] is True
+        diag = diagnose_transition_audio(np.zeros((2, 44100), dtype=np.float32), sr=44100)
+        assert not diag["passed"]
+        assert diag["warnings"]
+        # Feedback-derived cue model training path.
+        rating_path = td_path / "ratings.jsonl"
+        record_transition_rating(transition=plan, track_a=track_a, track_b=track_b, rating=5, accepted=True, path=rating_path)
+        model = train_from_listening_ratings(rating_path, output_path=td_path / "cue_model.json")
+        assert model.training_examples >= 3
+        examples_path = td_path / "manual.jsonl"
+        append_training_example(examples_path, selected["a_out"], duration=5.0, label=1, source="test")
+        model2 = train_from_jsonl(examples_path, output_path=td_path / "manual_model.json")
+        assert model2.training_examples == 1
+    print("smoke_remaining_gaps ok")
+if __name__ == "__main__":
+    import os
+    main()
+    sys.stdout.flush()
+    os._exit(0)