| | --- |
| | title: SyncAI |
| | emoji: π΅ |
| | colorFrom: indigo |
| | colorTo: pink |
| | sdk: gradio |
| | sdk_version: "6.8.0" |
| | app_file: app.py |
| | pinned: false |
| | short_description: AI Music Ads Generator |
| | --- |
| | |
| | # SyncAI β AI Music Video Generator |
| |
|
| | Generate beat-synced music video ads from a song clip. Upload ~15 seconds of audio, pick a visual style, and SyncAI produces a fully assembled vertical video with AI-generated visuals cut to the beat. |
| |
|
| | ## How It Works |
| |
|
| | ``` |
| | Song (audio file) |
| | βββΊ Stem Separation (LALAL.AI) β Vocals + Drums |
| | βββΊ Lyrics Extraction (WhisperX) β Word-level timestamps |
| | βββΊ Beat Detection (madmom RNN + DBN) β Beat timestamps + drop detection |
| | βββΊ Segmentation β Lyrics mapped to beat intervals |
| | βββΊ Prompt Generation (Claude Sonnet 4.6) β Image + video motion prompts |
| | βββΊ Image Generation (SDXL + Hyper-SD + style LoRA) β 768x1344 images |
| | βββΊ Image-to-Video (Wan 2.1 14B) β Animated clips |
| | βββΊ Assembly (FFmpeg) β Beat-synced video with lyrics overlay |
| | ``` |
| |
|
| | ## Visual Styles |
| |
|
| | Each style applies a different LoRA to SDXL and sets a unique scene world for the LLM prompt generator. The Sunset Coastal Drive LoRA was custom-trained for this project; the others are community LoRAs from HuggingFace Hub: |
| |
|
| | | Style | LoRA | Setting | |
| | |-------|------|---------| |
| | | **Sunset Coastal Drive** | Custom-trained (`samuelsattler/warm-sunset-lora`) | Car cruising a coastal highway at golden hour | |
| | | **Rainy City Night** | Film grain (`artificialguybr/filmgrain-redmond`) | Walking rain-soaked city streets after dark | |
| | | **Cyberpunk** | Cyberpunk 2077 (`jbilcke-hf/sdxl-cyberpunk-2077`) | Neon-drenched futuristic megacity at night | |
| | | **Watercolour Harbour** | Watercolor (`ostris/watercolor_style_lora_sdxl`) | Coastal fishing village during a storm | |
| |
|
| | ## Assembly Features |
| |
|
| | - **Dynamic pacing**: 4-beat cuts before the drop, 2-beat cuts after for energy |
| | - **Clip shuffling**: Each clip used twice (first/second half) in randomised order for visual variety |
| | - **Ken Burns**: Alternating zoom in/out on every cut |
| | - **Lyrics overlay**: Word-level timing with gap closing |
| | - **Cover art overlay**: Album art + Spotify badge appear from the drop onwards |
| | - **Reshuffle**: Re-run assembly with a new random clip order without regenerating |
| |
|
| | ## Tech Stack |
| |
|
| | | Component | Tool | |
| | |-----------|------| |
| | | Stem separation | LALAL.AI API (Andromeda) | |
| | | Lyrics (ASR) | WhisperX (large-v2 + wav2vec2) | |
| | | Beat detection | madmom (RNN + DBN) | |
| | | Prompt generation | Claude Sonnet 4.6 (Anthropic API) | |
| | | Image generation | SDXL + Hyper-SD 8-step + style LoRA | |
| | | Image-to-video | Wan 2.1 14B (ZeroGPU with FP8) | |
| | | Video assembly | FFmpeg | |
| | | UI | Gradio | |
| |
|