Spaces:
Running on Zero
Running on Zero
Yng314
feat: implement audio transition generation pipeline with modules for transition generation, cue point selection, and audio utilities.
14984e4 | # AI DJ Project Catch-Up Note | |
| Last updated: 2026-02-19 | |
| ## 1) Project Goal (Current Direction) | |
| Build a **domain-specific AI DJ transition demo** for coursework Option 1 (Refinement): | |
| - user uploads Song A and Song B | |
| - system auto-detects cue points + BPM | |
| - Song B is time-stretched to Song A BPM | |
| - a generative model creates transition audio from text ("transition vibe") | |
| - output is a **short transition clip only** (not full-song mix) | |
| This scope is intentionally optimized for Hugging Face Spaces reliability. | |
| --- | |
| ## 2) Coursework Fit (Why this is Option 1) | |
| This is a refinement of existing pipelines/models: | |
| - existing generative pipeline (currently MusicGen, planned ACE-Step) | |
| - wrapped in domain-specific DJ UX (cue/BPM/mix controls) | |
| - not raw prompting only; structured controls for practical use | |
| --- | |
| ## 3) Current Implemented Pipeline (Already in `app.py`) | |
| Current app file: `AI_DJ_Project/app.py` | |
| ### 3.1 Input + UI | |
| - Upload `Song A` and `Song B` | |
| - Set: | |
| - transition vibe text | |
| - transition type (`riser`, `drum fill`, `sweep`, `brake`, `scratch`, `impact`) | |
| - mode (`Overlay` or `Insert`) | |
| - pre/mix/post seconds | |
| - transition length + gain | |
| - optional BPM and cue overrides | |
| ### 3.2 Audio analysis and cueing | |
| 1. Probe duration with `ffprobe` (if available) | |
| 2. Decode only needed segments (ffmpeg first, librosa fallback) | |
| 3. Estimate BPM + beat times with `librosa.beat.beat_track` | |
| 4. Auto-cue strategy: | |
| - Song A: choose beat near end analysis window | |
| - Song B: choose first beat after ~2 seconds | |
| 5. Optional manual override for BPM and cue points | |
| ### 3.3 Tempo matching | |
| - Compute stretch rate = `bpm_A / bpm_B` (clamped) | |
| - Time-stretch Song B segment via `librosa.effects.time_stretch` | |
| ### 3.4 AI transition generation | |
| - `@spaces.GPU` function `_generate_ai_transition(...)` | |
| - Uses `facebook/musicgen-small` | |
| - Prompt is domain-steered for DJ transition behavior | |
| - Returns short generated transition audio | |
| ### 3.5 Assembly | |
| - **Overlay mode**: crossfade A/B + overlay AI transition | |
| - **Insert mode**: A -> AI transition -> B (with short anti-click fades) | |
| - Edge fades + peak normalization before output | |
| ### 3.6 Output | |
| - Output audio clip (NumPy audio to Gradio) | |
| - JSON details: | |
| - BPM estimates | |
| - cue points | |
| - stretch rate | |
| - analysis settings | |
| --- | |
| ## 4) Full End-to-End Pipeline (Conceptual) | |
| Upload A/B | |
| -> decode limited windows | |
| -> BPM + beat analysis | |
| -> auto-cue points | |
| -> stretch B to A BPM | |
| -> generate transition (GenAI) | |
| -> overlay/insert assembly | |
| -> normalize/fades | |
| -> return short transition clip + diagnostics | |
| --- | |
| ## 5) Planned Upgrade: ACE-Step + Custom LoRA | |
| ### 5.1 What ACE-Step is | |
| ACE-Step 1.5 is a **full music-generation foundation model stack** (text-to-audio/music with editing/control workflows), not just a tiny SFX model. | |
| Planned usage in this project: | |
| - keep deterministic DJ logic (cue/BPM/stretch/assemble) | |
| - swap transition generation backend from MusicGen to ACE-Step | |
| - load custom LoRA adapter(s) to enforce DJ transition style | |
| ### 5.2 Integration strategy (recommended) | |
| 1. Keep current `app.py` flow unchanged for analysis/mixing | |
| 2. Introduce backend abstraction: | |
| - `MusicGenBackend` (fallback) | |
| - `AceStepBackend` (main target) | |
| 3. Add LoRA controls: | |
| - adapter selection | |
| - adapter scale | |
| 4. Continue returning short transition clips only | |
| --- | |
| ## 6) Genre-Specific LoRA Idea (Pop / Electronic / House / Dubstep / Techno) | |
| ## Is this a good idea? | |
| **Yes, as a staged plan.** | |
| It is a strong product and coursework idea because: | |
| - user-selected genre can map to distinct transition style | |
| - demonstrates clear domain-specific refinement | |
| - supports explainable UX: "You picked House -> House-style transition LoRA" | |
| ### Important caveats | |
| - Training one LoRA per genre increases data and compute requirements a lot | |
| - Early quality may vary by genre and dataset size | |
| - More adapters mean more evaluation and QA burden | |
| ### Practical rollout (recommended) | |
| Phase 1 (safe): | |
| - base model + one "general DJ transition" LoRA | |
| Phase 2 (coursework-strong): | |
| - 2-3 genre LoRAs (e.g., Pop / House / Dubstep) | |
| Phase 3 (optional extension): | |
| - larger genre library + auto-genre suggestion from uploaded songs | |
| --- | |
| ## 7) Proposed Genre LoRA Routing Logic | |
| User selects uploaded-song genre (or manually selects transition style profile): | |
| - Pop -> `lora_pop_transition` | |
| - Electronic -> `lora_electronic_transition` | |
| - House -> `lora_house_transition` | |
| - Dubstep -> `lora_dubstep_transition` | |
| - Techno -> `lora_techno_transition` | |
| - Auto/Unknown -> `lora_general_transition` | |
| Then: | |
| 1. load chosen LoRA | |
| 2. set LoRA scale | |
| 3. run ACE-Step generation for short transition duration | |
| 4. mix with A/B boundary clip | |
| --- | |
| ## 8) Data and Training Notes for LoRA | |
| - Use only licensed/royalty-free/self-owned audio for dataset and demos | |
| - Dataset should emphasize transition-like content (risers, fills, drops, sweeps, impacts) | |
| - Include metadata/captions describing genre + transition intent | |
| - Keep track of: | |
| - adapter name | |
| - dataset source and license | |
| - training config and epoch checkpoints | |
| --- | |
| ## 9) Current Risks / Constraints | |
| - ACE-Step stack is heavier than MusicGen and needs careful deployment tuning | |
| - Cold starts and memory behavior can be challenging on Spaces | |
| - Auto-cueing is heuristic; may fail on hard tracks (manual override should remain) | |
| - Time-stretch can introduce artifacts (expected in DJ contexts) | |
| --- | |
| ## 10) Fallback and Reliability Plan | |
| - Keep MusicGen backend as fallback while integrating ACE-Step | |
| - If ACE-Step init fails: | |
| - fail over to MusicGen backend | |
| - still return valid transition clip | |
| - Preserve deterministic DSP path as model-agnostic baseline | |
| --- | |
| ## 11) "If I lost track" Quick Resume Checklist | |
| 1. Open `app.py` and confirm current backend is still working end-to-end | |
| 2. Verify demo still does: | |
| - cue detect | |
| - BPM match | |
| - transition generation | |
| - clip output | |
| 3. Re-read this note section 5/6/7 | |
| 4. Continue with next implementation milestone: | |
| - backend abstraction | |
| - ACE-Step backend skeleton | |
| - single LoRA integration | |
| - then genre LoRA expansion | |
| --- | |
| ## 12) Next Concrete Milestones | |
| M1: Refactor transition generation into backend interface | |
| M2: Implement `AceStepBackend` with base model inference | |
| M3: Add LoRA load/select/scale UI + runtime controls | |
| M4: Train first "general DJ transition" LoRA | |
| M5: Train 2-3 genre LoRAs and add genre routing | |
| M6: Compare outputs (base vs LoRA, genre A vs genre B) for coursework evidence | |