Spaces:
Running on Zero
Running on Zero
File size: 6,492 Bytes
14984e4 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 | # AI DJ Project Catch-Up Note
Last updated: 2026-02-19
## 1) Project Goal (Current Direction)
Build a **domain-specific AI DJ transition demo** for coursework Option 1 (Refinement):
- user uploads Song A and Song B
- system auto-detects cue points + BPM
- Song B is time-stretched to Song A BPM
- a generative model creates transition audio from text ("transition vibe")
- output is a **short transition clip only** (not full-song mix)
This scope is intentionally optimized for Hugging Face Spaces reliability.
---
## 2) Coursework Fit (Why this is Option 1)
This is a refinement of existing pipelines/models:
- existing generative pipeline (currently MusicGen, planned ACE-Step)
- wrapped in domain-specific DJ UX (cue/BPM/mix controls)
- not raw prompting only; structured controls for practical use
---
## 3) Current Implemented Pipeline (Already in `app.py`)
Current app file: `AI_DJ_Project/app.py`
### 3.1 Input + UI
- Upload `Song A` and `Song B`
- Set:
- transition vibe text
- transition type (`riser`, `drum fill`, `sweep`, `brake`, `scratch`, `impact`)
- mode (`Overlay` or `Insert`)
- pre/mix/post seconds
- transition length + gain
- optional BPM and cue overrides
### 3.2 Audio analysis and cueing
1. Probe duration with `ffprobe` (if available)
2. Decode only needed segments (ffmpeg first, librosa fallback)
3. Estimate BPM + beat times with `librosa.beat.beat_track`
4. Auto-cue strategy:
- Song A: choose beat near end analysis window
- Song B: choose first beat after ~2 seconds
5. Optional manual override for BPM and cue points
### 3.3 Tempo matching
- Compute stretch rate = `bpm_A / bpm_B` (clamped)
- Time-stretch Song B segment via `librosa.effects.time_stretch`
### 3.4 AI transition generation
- `@spaces.GPU` function `_generate_ai_transition(...)`
- Uses `facebook/musicgen-small`
- Prompt is domain-steered for DJ transition behavior
- Returns short generated transition audio
### 3.5 Assembly
- **Overlay mode**: crossfade A/B + overlay AI transition
- **Insert mode**: A -> AI transition -> B (with short anti-click fades)
- Edge fades + peak normalization before output
### 3.6 Output
- Output audio clip (NumPy audio to Gradio)
- JSON details:
- BPM estimates
- cue points
- stretch rate
- analysis settings
---
## 4) Full End-to-End Pipeline (Conceptual)
Upload A/B
-> decode limited windows
-> BPM + beat analysis
-> auto-cue points
-> stretch B to A BPM
-> generate transition (GenAI)
-> overlay/insert assembly
-> normalize/fades
-> return short transition clip + diagnostics
---
## 5) Planned Upgrade: ACE-Step + Custom LoRA
### 5.1 What ACE-Step is
ACE-Step 1.5 is a **full music-generation foundation model stack** (text-to-audio/music with editing/control workflows), not just a tiny SFX model.
Planned usage in this project:
- keep deterministic DJ logic (cue/BPM/stretch/assemble)
- swap transition generation backend from MusicGen to ACE-Step
- load custom LoRA adapter(s) to enforce DJ transition style
### 5.2 Integration strategy (recommended)
1. Keep current `app.py` flow unchanged for analysis/mixing
2. Introduce backend abstraction:
- `MusicGenBackend` (fallback)
- `AceStepBackend` (main target)
3. Add LoRA controls:
- adapter selection
- adapter scale
4. Continue returning short transition clips only
---
## 6) Genre-Specific LoRA Idea (Pop / Electronic / House / Dubstep / Techno)
## Is this a good idea?
**Yes, as a staged plan.**
It is a strong product and coursework idea because:
- user-selected genre can map to distinct transition style
- demonstrates clear domain-specific refinement
- supports explainable UX: "You picked House -> House-style transition LoRA"
### Important caveats
- Training one LoRA per genre increases data and compute requirements a lot
- Early quality may vary by genre and dataset size
- More adapters mean more evaluation and QA burden
### Practical rollout (recommended)
Phase 1 (safe):
- base model + one "general DJ transition" LoRA
Phase 2 (coursework-strong):
- 2-3 genre LoRAs (e.g., Pop / House / Dubstep)
Phase 3 (optional extension):
- larger genre library + auto-genre suggestion from uploaded songs
---
## 7) Proposed Genre LoRA Routing Logic
User selects uploaded-song genre (or manually selects transition style profile):
- Pop -> `lora_pop_transition`
- Electronic -> `lora_electronic_transition`
- House -> `lora_house_transition`
- Dubstep -> `lora_dubstep_transition`
- Techno -> `lora_techno_transition`
- Auto/Unknown -> `lora_general_transition`
Then:
1. load chosen LoRA
2. set LoRA scale
3. run ACE-Step generation for short transition duration
4. mix with A/B boundary clip
---
## 8) Data and Training Notes for LoRA
- Use only licensed/royalty-free/self-owned audio for dataset and demos
- Dataset should emphasize transition-like content (risers, fills, drops, sweeps, impacts)
- Include metadata/captions describing genre + transition intent
- Keep track of:
- adapter name
- dataset source and license
- training config and epoch checkpoints
---
## 9) Current Risks / Constraints
- ACE-Step stack is heavier than MusicGen and needs careful deployment tuning
- Cold starts and memory behavior can be challenging on Spaces
- Auto-cueing is heuristic; may fail on hard tracks (manual override should remain)
- Time-stretch can introduce artifacts (expected in DJ contexts)
---
## 10) Fallback and Reliability Plan
- Keep MusicGen backend as fallback while integrating ACE-Step
- If ACE-Step init fails:
- fail over to MusicGen backend
- still return valid transition clip
- Preserve deterministic DSP path as model-agnostic baseline
---
## 11) "If I lost track" Quick Resume Checklist
1. Open `app.py` and confirm current backend is still working end-to-end
2. Verify demo still does:
- cue detect
- BPM match
- transition generation
- clip output
3. Re-read this note section 5/6/7
4. Continue with next implementation milestone:
- backend abstraction
- ACE-Step backend skeleton
- single LoRA integration
- then genre LoRA expansion
---
## 12) Next Concrete Milestones
M1: Refactor transition generation into backend interface
M2: Implement `AceStepBackend` with base model inference
M3: Add LoRA load/select/scale UI + runtime controls
M4: Train first "general DJ transition" LoRA
M5: Train 2-3 genre LoRAs and add genre routing
M6: Compare outputs (base vs LoRA, genre A vs genre B) for coursework evidence
|