Spaces:
Running on Zero
A newer version of the Gradio SDK is available:
6.9.0
AI DJ Project Catch-Up Note
Last updated: 2026-02-19
1) Project Goal (Current Direction)
Build a domain-specific AI DJ transition demo for coursework Option 1 (Refinement):
- user uploads Song A and Song B
- system auto-detects cue points + BPM
- Song B is time-stretched to Song A BPM
- a generative model creates transition audio from text ("transition vibe")
- output is a short transition clip only (not full-song mix)
This scope is intentionally optimized for Hugging Face Spaces reliability.
2) Coursework Fit (Why this is Option 1)
This is a refinement of existing pipelines/models:
- existing generative pipeline (currently MusicGen, planned ACE-Step)
- wrapped in domain-specific DJ UX (cue/BPM/mix controls)
- not raw prompting only; structured controls for practical use
3) Current Implemented Pipeline (Already in app.py)
Current app file: AI_DJ_Project/app.py
3.1 Input + UI
- Upload
Song AandSong B - Set:
- transition vibe text
- transition type (
riser,drum fill,sweep,brake,scratch,impact) - mode (
OverlayorInsert) - pre/mix/post seconds
- transition length + gain
- optional BPM and cue overrides
3.2 Audio analysis and cueing
- Probe duration with
ffprobe(if available) - Decode only needed segments (ffmpeg first, librosa fallback)
- Estimate BPM + beat times with
librosa.beat.beat_track - Auto-cue strategy:
- Song A: choose beat near end analysis window
- Song B: choose first beat after ~2 seconds
- Optional manual override for BPM and cue points
3.3 Tempo matching
- Compute stretch rate =
bpm_A / bpm_B(clamped) - Time-stretch Song B segment via
librosa.effects.time_stretch
3.4 AI transition generation
@spaces.GPUfunction_generate_ai_transition(...)- Uses
facebook/musicgen-small - Prompt is domain-steered for DJ transition behavior
- Returns short generated transition audio
3.5 Assembly
- Overlay mode: crossfade A/B + overlay AI transition
- Insert mode: A -> AI transition -> B (with short anti-click fades)
- Edge fades + peak normalization before output
3.6 Output
- Output audio clip (NumPy audio to Gradio)
- JSON details:
- BPM estimates
- cue points
- stretch rate
- analysis settings
4) Full End-to-End Pipeline (Conceptual)
Upload A/B
-> decode limited windows
-> BPM + beat analysis
-> auto-cue points
-> stretch B to A BPM
-> generate transition (GenAI)
-> overlay/insert assembly
-> normalize/fades
-> return short transition clip + diagnostics
5) Planned Upgrade: ACE-Step + Custom LoRA
5.1 What ACE-Step is
ACE-Step 1.5 is a full music-generation foundation model stack (text-to-audio/music with editing/control workflows), not just a tiny SFX model.
Planned usage in this project:
- keep deterministic DJ logic (cue/BPM/stretch/assemble)
- swap transition generation backend from MusicGen to ACE-Step
- load custom LoRA adapter(s) to enforce DJ transition style
5.2 Integration strategy (recommended)
- Keep current
app.pyflow unchanged for analysis/mixing - Introduce backend abstraction:
MusicGenBackend(fallback)AceStepBackend(main target)
- Add LoRA controls:
- adapter selection
- adapter scale
- Continue returning short transition clips only
6) Genre-Specific LoRA Idea (Pop / Electronic / House / Dubstep / Techno)
Is this a good idea?
Yes, as a staged plan.
It is a strong product and coursework idea because:
- user-selected genre can map to distinct transition style
- demonstrates clear domain-specific refinement
- supports explainable UX: "You picked House -> House-style transition LoRA"
Important caveats
- Training one LoRA per genre increases data and compute requirements a lot
- Early quality may vary by genre and dataset size
- More adapters mean more evaluation and QA burden
Practical rollout (recommended)
Phase 1 (safe):
- base model + one "general DJ transition" LoRA
Phase 2 (coursework-strong):
- 2-3 genre LoRAs (e.g., Pop / House / Dubstep)
Phase 3 (optional extension):
- larger genre library + auto-genre suggestion from uploaded songs
7) Proposed Genre LoRA Routing Logic
User selects uploaded-song genre (or manually selects transition style profile):
- Pop ->
lora_pop_transition - Electronic ->
lora_electronic_transition - House ->
lora_house_transition - Dubstep ->
lora_dubstep_transition - Techno ->
lora_techno_transition - Auto/Unknown ->
lora_general_transition
Then:
- load chosen LoRA
- set LoRA scale
- run ACE-Step generation for short transition duration
- mix with A/B boundary clip
8) Data and Training Notes for LoRA
- Use only licensed/royalty-free/self-owned audio for dataset and demos
- Dataset should emphasize transition-like content (risers, fills, drops, sweeps, impacts)
- Include metadata/captions describing genre + transition intent
- Keep track of:
- adapter name
- dataset source and license
- training config and epoch checkpoints
9) Current Risks / Constraints
- ACE-Step stack is heavier than MusicGen and needs careful deployment tuning
- Cold starts and memory behavior can be challenging on Spaces
- Auto-cueing is heuristic; may fail on hard tracks (manual override should remain)
- Time-stretch can introduce artifacts (expected in DJ contexts)
10) Fallback and Reliability Plan
- Keep MusicGen backend as fallback while integrating ACE-Step
- If ACE-Step init fails:
- fail over to MusicGen backend
- still return valid transition clip
- Preserve deterministic DSP path as model-agnostic baseline
11) "If I lost track" Quick Resume Checklist
- Open
app.pyand confirm current backend is still working end-to-end - Verify demo still does:
- cue detect
- BPM match
- transition generation
- clip output
- Re-read this note section 5/6/7
- Continue with next implementation milestone:
- backend abstraction
- ACE-Step backend skeleton
- single LoRA integration
- then genre LoRA expansion
12) Next Concrete Milestones
M1: Refactor transition generation into backend interface
M2: Implement AceStepBackend with base model inference
M3: Add LoRA load/select/scale UI + runtime controls
M4: Train first "general DJ transition" LoRA
M5: Train 2-3 genre LoRAs and add genre routing
M6: Compare outputs (base vs LoRA, genre A vs genre B) for coursework evidence