Rik Hoffbauer commited on
Commit
3a6aebd
·
1 Parent(s): 6362e08

Update documentation and add smoke tests for remaining gaps

Browse files
README.md CHANGED
@@ -17,7 +17,7 @@ tags:
17
  - mixing
18
  - stem-separation
19
  - demucs
20
- short_description: AI DJ analyzes songs and renders DJ sets
21
  ---
22
 
23
  # AI DJ Set Builder
@@ -46,6 +46,10 @@ The original approach was too confident: it used one asserted mix-in, one assert
46
 
47
  This version adds:
48
 
 
 
 
 
49
  - ranked cue candidates with evidence in `cue_graph.py`
50
  - downbeat phase confidence instead of blind `beats[::4]`
51
  - transition edge scoring in `transition_optimizer.py`
@@ -58,10 +62,10 @@ This version adds:
58
 
59
  The system still does not prove DJ quality automatically. Metrics are diagnostics, not human preference. The remaining limits are empirical rather than missing product mechanisms:
60
 
61
- - cue learning requires labeled examples in `data/cue_model.json` or JSONL training data
62
  - listening benchmarks require real audition/rating data before they can validate quality
63
  - drum-lane decomposition is heuristic band splitting, not true instrument separation
64
- - full-set rendering and transition previews both use the AutomationIR renderer
65
 
66
  See:
67
 
@@ -69,6 +73,7 @@ See:
69
  - [`docs/architecture-after-review.md`](docs/architecture-after-review.md)
70
  - [`docs/shortcomings-addressed.md`](docs/shortcomings-addressed.md)
71
  - [`docs/implementation-completion.md`](docs/implementation-completion.md)
 
72
 
73
  ## Local run
74
 
 
17
  - mixing
18
  - stem-separation
19
  - demucs
20
+ short_description: AI analyzes songs, plans cue-aware transitions, renders DJ sets
21
  ---
22
 
23
  # AI DJ Set Builder
 
46
 
47
  This version adds:
48
 
49
+ - waveform-backed cue editor with ranked cue overlays and manual feedback export
50
+ - stem-file-aware AutomationIR rendering with Demucs-cache detection and explicit diagnostics
51
+ - feedback-to-learning path from cue edits and listening ratings
52
+ - transition diagnostics for silence, low-end jumps, HF spikes, clipping risk, and crest factor
53
  - ranked cue candidates with evidence in `cue_graph.py`
54
  - downbeat phase confidence instead of blind `beats[::4]`
55
  - transition edge scoring in `transition_optimizer.py`
 
62
 
63
  The system still does not prove DJ quality automatically. Metrics are diagnostics, not human preference. The remaining limits are empirical rather than missing product mechanisms:
64
 
65
+ - cue learning requires labeled examples in `data/cue_model.json`, `data/manual-cue-edits.jsonl`, or decisive listening ratings
66
  - listening benchmarks require real audition/rating data before they can validate quality
67
  - drum-lane decomposition is heuristic band splitting, not true instrument separation
68
+ - full-set rendering and transition previews both use the AutomationIR renderer; stem-style transitions now use component lanes when possible
69
 
70
  See:
71
 
 
73
  - [`docs/architecture-after-review.md`](docs/architecture-after-review.md)
74
  - [`docs/shortcomings-addressed.md`](docs/shortcomings-addressed.md)
75
  - [`docs/implementation-completion.md`](docs/implementation-completion.md)
76
+ - [`docs/remaining-gaps-addressed.md`](docs/remaining-gaps-addressed.md)
77
 
78
  ## Local run
79
 
docs/remaining-gaps-addressed.md ADDED
@@ -0,0 +1,84 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Remaining gaps addressed
2
+
3
+ Date: 2026-05-02
4
+
5
+ This pass addresses the remaining gaps that were previously called out as not honestly complete.
6
+
7
+ ## 1. Waveform-backed manual cue editor
8
+
9
+ Added `cue_editor.py` and integrated it into the Gradio transition tab.
10
+
11
+ The editor now:
12
+
13
+ - renders real waveform overviews for track A and track B;
14
+ - overlays selected transition anchors;
15
+ - overlays ranked cue candidates from analysis;
16
+ - exposes cue candidate dropdowns for A mix-out, B mix-in, and B drop;
17
+ - applies those choices back into `TransitionPlan.selected_cues`;
18
+ - persists manual cue edits as positive cue-training examples in `data/manual-cue-edits.jsonl`.
19
+
20
+ This is still not a DAW-grade draggable waveform editor, but it is no longer a numeric-only form.
21
+
22
+ ## 2. Stem-file-aware AutomationIR rendering
23
+
24
+ Added `stem_provider.py` and connected it to both transition preview rendering and full-set AutomationIR rendering.
25
+
26
+ The renderer now:
27
+
28
+ - looks for existing Demucs-style stem files before using heuristic component lanes;
29
+ - supports common layouts such as `separated/htdemucs/<track>/{drums,bass,vocals,other}.wav` and `data/stems/htdemucs/<track>/*.wav`;
30
+ - can optionally invoke Demucs when `AI_DJ_ENABLE_DEMUCS=1` is set;
31
+ - uses real broad stems for bass/vocals/other and splits the drum stem into kick/snare-hat/top lanes;
32
+ - records stem-provider diagnostics so fallback behavior is visible.
33
+
34
+ Full-set rendering now keeps component lanes for tracks participating in stem-style transitions instead of silently collapsing those sections back to full-track fades.
35
+
36
+ ## 3. Feedback-to-learning path
37
+
38
+ Extended `cue_learning.py` so feedback is not dead data.
39
+
40
+ New paths:
41
+
42
+ - manual waveform/numeric cue edits append supervised examples;
43
+ - accepted/high-rated transitions become positive cue examples;
44
+ - rejected/low-rated transitions become negative cue examples;
45
+ - the UI can train `data/cue_model.json` from accumulated manual edits and listening ratings.
46
+
47
+ This is not a large neural cue detector. It is a practical local learning loop that lets the prototype adapt to user corrections and audition outcomes.
48
+
49
+ ## 4. Listening diagnostics beyond spectral smoothness
50
+
51
+ Added `transition_diagnostics.py` and integrated it into candidate previews.
52
+
53
+ The diagnostics now check for:
54
+
55
+ - accidental silence;
56
+ - low-end discontinuity;
57
+ - high-frequency spike risk;
58
+ - clipping/limiter risk;
59
+ - unstable crest factor.
60
+
61
+ These are still diagnostics, not proof of musical quality.
62
+
63
+ ## 5. Verification coverage
64
+
65
+ Added `tests/smoke_remaining_gaps.py` covering:
66
+
67
+ - waveform cue editor image generation;
68
+ - cue choice parsing/application;
69
+ - existing Demucs-style stem cache detection;
70
+ - component-lane generation in transition and full-set IR;
71
+ - transition diagnostic warnings;
72
+ - cue-model training from ratings and manual examples.
73
+
74
+ ## Still intentionally not claimed
75
+
76
+ The project still does not claim:
77
+
78
+ - a production DAW timeline with draggable cue handles;
79
+ - a validated large-dataset deep cue detector;
80
+ - perfect stem isolation;
81
+ - human-preference validation without real human ratings;
82
+ - release-grade psychoacoustic quality scoring.
83
+
84
+ Those are product/research milestones, not something that can be honestly proven by a local smoke test.
tests/smoke_remaining_gaps.py ADDED
@@ -0,0 +1,134 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Smoke checks for the remaining-gap implementation pass."""
2
+
3
+ from __future__ import annotations
4
+
5
+ import math
6
+ import tempfile
7
+ from pathlib import Path
8
+ from types import SimpleNamespace
9
+ import sys
10
+
11
+ ROOT = Path(__file__).resolve().parents[1]
12
+ if str(ROOT) not in sys.path:
13
+ sys.path.insert(0, str(ROOT))
14
+
15
+ import numpy as np
16
+ import soundfile as sf
17
+
18
+ from automation_ir import ClipRef, build_transition_ir
19
+ from automation_set_renderer import build_set_automation_ir
20
+ from cue_editor import render_transition_cue_editor, choices_for_transition, apply_choices_to_plan
21
+ from cue_learning import train_from_listening_ratings, append_training_example, train_from_jsonl
22
+ from listening_benchmarks import record_transition_rating
23
+ from stem_provider import StemProvider
24
+ from transition_diagnostics import diagnose_transition_audio
25
+
26
+
27
+ def _tone(path: Path, freq: float, *, duration: float = 5.0, sr: int = 44100) -> str:
28
+ t = np.arange(int(sr * duration)) / sr
29
+ y = (0.1 * np.sin(2 * math.pi * freq * t)).astype(np.float32)
30
+ sf.write(path, y, sr)
31
+ return str(path)
32
+
33
+
34
+ def _track(path: str, name: str, *, bpm: float = 120.0) -> SimpleNamespace:
35
+ return SimpleNamespace(
36
+ path=path,
37
+ filename=name,
38
+ duration=5.0,
39
+ bpm=bpm,
40
+ avg_energy=0.2,
41
+ cue_points=[
42
+ {"kind": "mix_out", "time": 2.0, "label": "out", "confidence": 0.9, "evidence": {"source": "test", "phrase_score": 1.0}},
43
+ {"kind": "mix_in", "time": 0.5, "label": "in", "confidence": 0.8, "evidence": {"source": "test", "phrase_score": 1.0}},
44
+ {"kind": "first_drop", "time": 2.5, "label": "drop", "confidence": 0.85, "evidence": {"source": "test", "energy_delta": 1.0}},
45
+ {"kind": "drop", "time": 2.5, "label": "drop", "confidence": 0.85, "evidence": {"source": "test", "energy_delta": 1.0}},
46
+ ],
47
+ segments=[
48
+ {"start": 0, "end": 2, "label": "intro", "energy": 0.1},
49
+ {"start": 2, "end": 5, "label": "drop", "energy": 0.3},
50
+ ],
51
+ )
52
+
53
+
54
+ def main() -> None:
55
+ with tempfile.TemporaryDirectory() as td:
56
+ td_path = Path(td)
57
+ a_path = _tone(td_path / "a.wav", 110)
58
+ b_path = _tone(td_path / "b.wav", 220)
59
+ track_a = _track(a_path, "a.wav")
60
+ track_b = _track(b_path, "b.wav")
61
+ plan = SimpleNamespace(
62
+ transition_type="bass_swap",
63
+ mix_out_point=2.0,
64
+ mix_in_point=0.5,
65
+ duration_seconds=2.0,
66
+ duration_beats=16,
67
+ bpm_adjustment=1.0,
68
+ selected_cues={
69
+ "a_out": {"time": 2.0, "confidence": 0.9, "label": "out"},
70
+ "b_in": {"time": 0.5, "confidence": 0.8, "label": "in"},
71
+ "b_drop": {"time": 2.5, "confidence": 0.85, "label": "drop"},
72
+ },
73
+ score_breakdown={"overall": 0.75},
74
+ alternatives=[],
75
+ )
76
+
77
+ image, summary = render_transition_cue_editor(track_a, track_b, plan, output_dir=td_path)
78
+ assert Path(image).exists()
79
+ assert "Waveform cue editor" in summary
80
+ choices = choices_for_transition(track_a, track_b, plan)
81
+ assert choices["a_choices"] and choices["b_in_choices"] and choices["b_drop_choices"]
82
+ mix_out, mix_in, duration, selected = apply_choices_to_plan(
83
+ plan,
84
+ a_choice=choices["a_choices"][0][1],
85
+ b_in_choice=choices["b_in_choices"][0][1],
86
+ b_drop_choice=choices["b_drop_choices"][0][1],
87
+ transition_type="drums_first",
88
+ )
89
+ assert mix_out == 2.0 and mix_in == 0.5 and duration == 2.0
90
+ assert selected["b_drop"]["time"] == 2.5
91
+
92
+ # Existing Demucs-style stem cache should be used when present.
93
+ stem_dir = td_path / "data" / "stems" / "htdemucs" / "a"
94
+ stem_dir.mkdir(parents=True)
95
+ for stem, freq in [("drums", 60), ("bass", 90), ("vocals", 300), ("other", 600)]:
96
+ _tone(stem_dir / f"{stem}.wav", freq)
97
+ provider = StemProvider(cache_dir=td_path / "data" / "stems", enable_demucs=False)
98
+ clip = ClipRef("A", "A", a_path, 0.0, 5.0, 0.0, 1.0)
99
+ full = np.zeros((2, 44100), dtype=np.float64)
100
+ kick = provider.resolve(clip, "kick", full, 44100)
101
+ melody = provider.resolve(clip, "melody", full, 44100)
102
+ assert kick is not None and kick.shape[0] == 2
103
+ assert melody is not None and melody.shape[0] == 2
104
+ assert provider.diagnostics
105
+
106
+ ir = build_transition_ir(plan, track_a, track_b, sr=44100)
107
+ assert any(lane.component == "kick" for lane in ir.lanes)
108
+ set_ir = build_set_automation_ir([track_a, track_b], [0, 1], [plan], sr=44100)
109
+ assert any(lane.component == "kick" for lane in set_ir.lanes)
110
+ assert set_ir.metadata["transitions"][0]["component_lanes"] is True
111
+
112
+ diag = diagnose_transition_audio(np.zeros((2, 44100), dtype=np.float32), sr=44100)
113
+ assert not diag["passed"]
114
+ assert diag["warnings"]
115
+
116
+ # Feedback-derived cue model training path.
117
+ rating_path = td_path / "ratings.jsonl"
118
+ record_transition_rating(transition=plan, track_a=track_a, track_b=track_b, rating=5, accepted=True, path=rating_path)
119
+ model = train_from_listening_ratings(rating_path, output_path=td_path / "cue_model.json")
120
+ assert model.training_examples >= 3
121
+
122
+ examples_path = td_path / "manual.jsonl"
123
+ append_training_example(examples_path, selected["a_out"], duration=5.0, label=1, source="test")
124
+ model2 = train_from_jsonl(examples_path, output_path=td_path / "manual_model.json")
125
+ assert model2.training_examples == 1
126
+
127
+ print("smoke_remaining_gaps ok")
128
+
129
+
130
+ if __name__ == "__main__":
131
+ import os
132
+ main()
133
+ sys.stdout.flush()
134
+ os._exit(0)