audio_generation / PROJECT_CATCHUP_NOTE.md
Yng314
feat: implement audio transition generation pipeline with modules for transition generation, cue point selection, and audio utilities.
14984e4

A newer version of the Gradio SDK is available: 6.9.0

Upgrade

AI DJ Project Catch-Up Note

Last updated: 2026-02-19

1) Project Goal (Current Direction)

Build a domain-specific AI DJ transition demo for coursework Option 1 (Refinement):

  • user uploads Song A and Song B
  • system auto-detects cue points + BPM
  • Song B is time-stretched to Song A BPM
  • a generative model creates transition audio from text ("transition vibe")
  • output is a short transition clip only (not full-song mix)

This scope is intentionally optimized for Hugging Face Spaces reliability.


2) Coursework Fit (Why this is Option 1)

This is a refinement of existing pipelines/models:

  • existing generative pipeline (currently MusicGen, planned ACE-Step)
  • wrapped in domain-specific DJ UX (cue/BPM/mix controls)
  • not raw prompting only; structured controls for practical use

3) Current Implemented Pipeline (Already in app.py)

Current app file: AI_DJ_Project/app.py

3.1 Input + UI

  • Upload Song A and Song B
  • Set:
    • transition vibe text
    • transition type (riser, drum fill, sweep, brake, scratch, impact)
    • mode (Overlay or Insert)
    • pre/mix/post seconds
    • transition length + gain
    • optional BPM and cue overrides

3.2 Audio analysis and cueing

  1. Probe duration with ffprobe (if available)
  2. Decode only needed segments (ffmpeg first, librosa fallback)
  3. Estimate BPM + beat times with librosa.beat.beat_track
  4. Auto-cue strategy:
    • Song A: choose beat near end analysis window
    • Song B: choose first beat after ~2 seconds
  5. Optional manual override for BPM and cue points

3.3 Tempo matching

  • Compute stretch rate = bpm_A / bpm_B (clamped)
  • Time-stretch Song B segment via librosa.effects.time_stretch

3.4 AI transition generation

  • @spaces.GPU function _generate_ai_transition(...)
  • Uses facebook/musicgen-small
  • Prompt is domain-steered for DJ transition behavior
  • Returns short generated transition audio

3.5 Assembly

  • Overlay mode: crossfade A/B + overlay AI transition
  • Insert mode: A -> AI transition -> B (with short anti-click fades)
  • Edge fades + peak normalization before output

3.6 Output

  • Output audio clip (NumPy audio to Gradio)
  • JSON details:
    • BPM estimates
    • cue points
    • stretch rate
    • analysis settings

4) Full End-to-End Pipeline (Conceptual)

Upload A/B
-> decode limited windows
-> BPM + beat analysis
-> auto-cue points
-> stretch B to A BPM
-> generate transition (GenAI)
-> overlay/insert assembly
-> normalize/fades
-> return short transition clip + diagnostics


5) Planned Upgrade: ACE-Step + Custom LoRA

5.1 What ACE-Step is

ACE-Step 1.5 is a full music-generation foundation model stack (text-to-audio/music with editing/control workflows), not just a tiny SFX model.

Planned usage in this project:

  • keep deterministic DJ logic (cue/BPM/stretch/assemble)
  • swap transition generation backend from MusicGen to ACE-Step
  • load custom LoRA adapter(s) to enforce DJ transition style

5.2 Integration strategy (recommended)

  1. Keep current app.py flow unchanged for analysis/mixing
  2. Introduce backend abstraction:
    • MusicGenBackend (fallback)
    • AceStepBackend (main target)
  3. Add LoRA controls:
    • adapter selection
    • adapter scale
  4. Continue returning short transition clips only

6) Genre-Specific LoRA Idea (Pop / Electronic / House / Dubstep / Techno)

Is this a good idea?

Yes, as a staged plan.

It is a strong product and coursework idea because:

  • user-selected genre can map to distinct transition style
  • demonstrates clear domain-specific refinement
  • supports explainable UX: "You picked House -> House-style transition LoRA"

Important caveats

  • Training one LoRA per genre increases data and compute requirements a lot
  • Early quality may vary by genre and dataset size
  • More adapters mean more evaluation and QA burden

Practical rollout (recommended)

Phase 1 (safe):

  • base model + one "general DJ transition" LoRA

Phase 2 (coursework-strong):

  • 2-3 genre LoRAs (e.g., Pop / House / Dubstep)

Phase 3 (optional extension):

  • larger genre library + auto-genre suggestion from uploaded songs

7) Proposed Genre LoRA Routing Logic

User selects uploaded-song genre (or manually selects transition style profile):

  • Pop -> lora_pop_transition
  • Electronic -> lora_electronic_transition
  • House -> lora_house_transition
  • Dubstep -> lora_dubstep_transition
  • Techno -> lora_techno_transition
  • Auto/Unknown -> lora_general_transition

Then:

  1. load chosen LoRA
  2. set LoRA scale
  3. run ACE-Step generation for short transition duration
  4. mix with A/B boundary clip

8) Data and Training Notes for LoRA

  • Use only licensed/royalty-free/self-owned audio for dataset and demos
  • Dataset should emphasize transition-like content (risers, fills, drops, sweeps, impacts)
  • Include metadata/captions describing genre + transition intent
  • Keep track of:
    • adapter name
    • dataset source and license
    • training config and epoch checkpoints

9) Current Risks / Constraints

  • ACE-Step stack is heavier than MusicGen and needs careful deployment tuning
  • Cold starts and memory behavior can be challenging on Spaces
  • Auto-cueing is heuristic; may fail on hard tracks (manual override should remain)
  • Time-stretch can introduce artifacts (expected in DJ contexts)

10) Fallback and Reliability Plan

  • Keep MusicGen backend as fallback while integrating ACE-Step
  • If ACE-Step init fails:
    • fail over to MusicGen backend
    • still return valid transition clip
  • Preserve deterministic DSP path as model-agnostic baseline

11) "If I lost track" Quick Resume Checklist

  1. Open app.py and confirm current backend is still working end-to-end
  2. Verify demo still does:
    • cue detect
    • BPM match
    • transition generation
    • clip output
  3. Re-read this note section 5/6/7
  4. Continue with next implementation milestone:
    • backend abstraction
    • ACE-Step backend skeleton
    • single LoRA integration
    • then genre LoRA expansion

12) Next Concrete Milestones

M1: Refactor transition generation into backend interface
M2: Implement AceStepBackend with base model inference
M3: Add LoRA load/select/scale UI + runtime controls
M4: Train first "general DJ transition" LoRA
M5: Train 2-3 genre LoRAs and add genre routing
M6: Compare outputs (base vs LoRA, genre A vs genre B) for coursework evidence