File size: 2,692 Bytes
72f552e | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 | ---
title: SyncAI
emoji: π΅
colorFrom: indigo
colorTo: pink
sdk: gradio
sdk_version: "6.8.0"
app_file: app.py
pinned: false
short_description: AI Music Ads Generator
---
# SyncAI β AI Music Video Generator
Generate beat-synced music video ads from a song clip. Upload ~15 seconds of audio, pick a visual style, and SyncAI produces a fully assembled vertical video with AI-generated visuals cut to the beat.
## How It Works
```
Song (audio file)
βββΊ Stem Separation (LALAL.AI) β Vocals + Drums
βββΊ Lyrics Extraction (WhisperX) β Word-level timestamps
βββΊ Beat Detection (madmom RNN + DBN) β Beat timestamps + drop detection
βββΊ Segmentation β Lyrics mapped to beat intervals
βββΊ Prompt Generation (Claude Sonnet 4.6) β Image + video motion prompts
βββΊ Image Generation (SDXL + Hyper-SD + style LoRA) β 768x1344 images
βββΊ Image-to-Video (Wan 2.1 14B) β Animated clips
βββΊ Assembly (FFmpeg) β Beat-synced video with lyrics overlay
```
## Visual Styles
Each style applies a different LoRA to SDXL and sets a unique scene world for the LLM prompt generator. The Sunset Coastal Drive LoRA was custom-trained for this project; the others are community LoRAs from HuggingFace Hub:
| Style | LoRA | Setting |
|-------|------|---------|
| **Sunset Coastal Drive** | Custom-trained (`samuelsattler/warm-sunset-lora`) | Car cruising a coastal highway at golden hour |
| **Rainy City Night** | Film grain (`artificialguybr/filmgrain-redmond`) | Walking rain-soaked city streets after dark |
| **Cyberpunk** | Cyberpunk 2077 (`jbilcke-hf/sdxl-cyberpunk-2077`) | Neon-drenched futuristic megacity at night |
| **Watercolour Harbour** | Watercolor (`ostris/watercolor_style_lora_sdxl`) | Coastal fishing village during a storm |
## Assembly Features
- **Dynamic pacing**: 4-beat cuts before the drop, 2-beat cuts after for energy
- **Clip shuffling**: Each clip used twice (first/second half) in randomised order for visual variety
- **Ken Burns**: Alternating zoom in/out on every cut
- **Lyrics overlay**: Word-level timing with gap closing
- **Cover art overlay**: Album art + Spotify badge appear from the drop onwards
- **Reshuffle**: Re-run assembly with a new random clip order without regenerating
## Tech Stack
| Component | Tool |
|-----------|------|
| Stem separation | LALAL.AI API (Andromeda) |
| Lyrics (ASR) | WhisperX (large-v2 + wav2vec2) |
| Beat detection | madmom (RNN + DBN) |
| Prompt generation | Claude Sonnet 4.6 (Anthropic API) |
| Image generation | SDXL + Hyper-SD 8-step + style LoRA |
| Image-to-video | Wan 2.1 14B (ZeroGPU with FP8) |
| Video assembly | FFmpeg |
| UI | Gradio |
|