Spaces:

mvp-lab
/

audio_generation

Running on Zero

File size: 6,492 Bytes

14984e4

# AI DJ Project Catch-Up Note

Last updated: 2026-02-19

## 1) Project Goal (Current Direction)

Build a **domain-specific AI DJ transition demo** for coursework Option 1 (Refinement):

- user uploads Song A and Song B
- system auto-detects cue points + BPM
- Song B is time-stretched to Song A BPM
- a generative model creates transition audio from text ("transition vibe")
- output is a **short transition clip only** (not full-song mix)

This scope is intentionally optimized for Hugging Face Spaces reliability.

---

## 2) Coursework Fit (Why this is Option 1)

This is a refinement of existing pipelines/models:

- existing generative pipeline (currently MusicGen, planned ACE-Step)
- wrapped in domain-specific DJ UX (cue/BPM/mix controls)
- not raw prompting only; structured controls for practical use

---

## 3) Current Implemented Pipeline (Already in `app.py`)

Current app file: `AI_DJ_Project/app.py`

### 3.1 Input + UI

- Upload `Song A` and `Song B`
- Set:
  - transition vibe text
  - transition type (`riser`, `drum fill`, `sweep`, `brake`, `scratch`, `impact`)
  - mode (`Overlay` or `Insert`)
  - pre/mix/post seconds
  - transition length + gain
  - optional BPM and cue overrides

### 3.2 Audio analysis and cueing

1. Probe duration with `ffprobe` (if available)
2. Decode only needed segments (ffmpeg first, librosa fallback)
3. Estimate BPM + beat times with `librosa.beat.beat_track`
4. Auto-cue strategy:
   - Song A: choose beat near end analysis window
   - Song B: choose first beat after ~2 seconds
5. Optional manual override for BPM and cue points

### 3.3 Tempo matching

- Compute stretch rate = `bpm_A / bpm_B` (clamped)
- Time-stretch Song B segment via `librosa.effects.time_stretch`

### 3.4 AI transition generation

- `@spaces.GPU` function `_generate_ai_transition(...)`
- Uses `facebook/musicgen-small`
- Prompt is domain-steered for DJ transition behavior
- Returns short generated transition audio

### 3.5 Assembly

- **Overlay mode**: crossfade A/B + overlay AI transition
- **Insert mode**: A -> AI transition -> B (with short anti-click fades)
- Edge fades + peak normalization before output

### 3.6 Output

- Output audio clip (NumPy audio to Gradio)
- JSON details:
  - BPM estimates
  - cue points
  - stretch rate
  - analysis settings

---

## 4) Full End-to-End Pipeline (Conceptual)

Upload A/B  
-> decode limited windows  
-> BPM + beat analysis  
-> auto-cue points  
-> stretch B to A BPM  
-> generate transition (GenAI)  
-> overlay/insert assembly  
-> normalize/fades  
-> return short transition clip + diagnostics

---

## 5) Planned Upgrade: ACE-Step + Custom LoRA

### 5.1 What ACE-Step is

ACE-Step 1.5 is a **full music-generation foundation model stack** (text-to-audio/music with editing/control workflows), not just a tiny SFX model.

Planned usage in this project:

- keep deterministic DJ logic (cue/BPM/stretch/assemble)
- swap transition generation backend from MusicGen to ACE-Step
- load custom LoRA adapter(s) to enforce DJ transition style

### 5.2 Integration strategy (recommended)

1. Keep current `app.py` flow unchanged for analysis/mixing
2. Introduce backend abstraction:
   - `MusicGenBackend` (fallback)
   - `AceStepBackend` (main target)
3. Add LoRA controls:
   - adapter selection
   - adapter scale
4. Continue returning short transition clips only

---

## 6) Genre-Specific LoRA Idea (Pop / Electronic / House / Dubstep / Techno)

## Is this a good idea?

**Yes, as a staged plan.**

It is a strong product and coursework idea because:

- user-selected genre can map to distinct transition style
- demonstrates clear domain-specific refinement
- supports explainable UX: "You picked House -> House-style transition LoRA"

### Important caveats

- Training one LoRA per genre increases data and compute requirements a lot
- Early quality may vary by genre and dataset size
- More adapters mean more evaluation and QA burden

### Practical rollout (recommended)

Phase 1 (safe):
- base model + one "general DJ transition" LoRA

Phase 2 (coursework-strong):
- 2-3 genre LoRAs (e.g., Pop / House / Dubstep)

Phase 3 (optional extension):
- larger genre library + auto-genre suggestion from uploaded songs

---

## 7) Proposed Genre LoRA Routing Logic

User selects uploaded-song genre (or manually selects transition style profile):

- Pop -> `lora_pop_transition`
- Electronic -> `lora_electronic_transition`
- House -> `lora_house_transition`
- Dubstep -> `lora_dubstep_transition`
- Techno -> `lora_techno_transition`
- Auto/Unknown -> `lora_general_transition`

Then:

1. load chosen LoRA
2. set LoRA scale
3. run ACE-Step generation for short transition duration
4. mix with A/B boundary clip

---

## 8) Data and Training Notes for LoRA

- Use only licensed/royalty-free/self-owned audio for dataset and demos
- Dataset should emphasize transition-like content (risers, fills, drops, sweeps, impacts)
- Include metadata/captions describing genre + transition intent
- Keep track of:
  - adapter name
  - dataset source and license
  - training config and epoch checkpoints

---

## 9) Current Risks / Constraints

- ACE-Step stack is heavier than MusicGen and needs careful deployment tuning
- Cold starts and memory behavior can be challenging on Spaces
- Auto-cueing is heuristic; may fail on hard tracks (manual override should remain)
- Time-stretch can introduce artifacts (expected in DJ contexts)

---

## 10) Fallback and Reliability Plan

- Keep MusicGen backend as fallback while integrating ACE-Step
- If ACE-Step init fails:
  - fail over to MusicGen backend
  - still return valid transition clip
- Preserve deterministic DSP path as model-agnostic baseline

---

## 11) "If I lost track" Quick Resume Checklist

1. Open `app.py` and confirm current backend is still working end-to-end
2. Verify demo still does:
   - cue detect
   - BPM match
   - transition generation
   - clip output
3. Re-read this note section 5/6/7
4. Continue with next implementation milestone:
   - backend abstraction
   - ACE-Step backend skeleton
   - single LoRA integration
   - then genre LoRA expansion

---

## 12) Next Concrete Milestones

M1: Refactor transition generation into backend interface  
M2: Implement `AceStepBackend` with base model inference  
M3: Add LoRA load/select/scale UI + runtime controls  
M4: Train first "general DJ transition" LoRA  
M5: Train 2-3 genre LoRAs and add genre routing  
M6: Compare outputs (base vs LoRA, genre A vs genre B) for coursework evidence