SyncAI / README.md
ICGenAIShare04's picture
Upload 52 files
72f552e verified
---
title: SyncAI
emoji: 🎡
colorFrom: indigo
colorTo: pink
sdk: gradio
sdk_version: "6.8.0"
app_file: app.py
pinned: false
short_description: AI Music Ads Generator
---
# SyncAI β€” AI Music Video Generator
Generate beat-synced music video ads from a song clip. Upload ~15 seconds of audio, pick a visual style, and SyncAI produces a fully assembled vertical video with AI-generated visuals cut to the beat.
## How It Works
```
Song (audio file)
β”œβ”€β–Ί Stem Separation (LALAL.AI) β†’ Vocals + Drums
β”œβ”€β–Ί Lyrics Extraction (WhisperX) β†’ Word-level timestamps
β”œβ”€β–Ί Beat Detection (madmom RNN + DBN) β†’ Beat timestamps + drop detection
β”œβ”€β–Ί Segmentation β†’ Lyrics mapped to beat intervals
β”œβ”€β–Ί Prompt Generation (Claude Sonnet 4.6) β†’ Image + video motion prompts
β”œβ”€β–Ί Image Generation (SDXL + Hyper-SD + style LoRA) β†’ 768x1344 images
β”œβ”€β–Ί Image-to-Video (Wan 2.1 14B) β†’ Animated clips
└─► Assembly (FFmpeg) β†’ Beat-synced video with lyrics overlay
```
## Visual Styles
Each style applies a different LoRA to SDXL and sets a unique scene world for the LLM prompt generator. The Sunset Coastal Drive LoRA was custom-trained for this project; the others are community LoRAs from HuggingFace Hub:
| Style | LoRA | Setting |
|-------|------|---------|
| **Sunset Coastal Drive** | Custom-trained (`samuelsattler/warm-sunset-lora`) | Car cruising a coastal highway at golden hour |
| **Rainy City Night** | Film grain (`artificialguybr/filmgrain-redmond`) | Walking rain-soaked city streets after dark |
| **Cyberpunk** | Cyberpunk 2077 (`jbilcke-hf/sdxl-cyberpunk-2077`) | Neon-drenched futuristic megacity at night |
| **Watercolour Harbour** | Watercolor (`ostris/watercolor_style_lora_sdxl`) | Coastal fishing village during a storm |
## Assembly Features
- **Dynamic pacing**: 4-beat cuts before the drop, 2-beat cuts after for energy
- **Clip shuffling**: Each clip used twice (first/second half) in randomised order for visual variety
- **Ken Burns**: Alternating zoom in/out on every cut
- **Lyrics overlay**: Word-level timing with gap closing
- **Cover art overlay**: Album art + Spotify badge appear from the drop onwards
- **Reshuffle**: Re-run assembly with a new random clip order without regenerating
## Tech Stack
| Component | Tool |
|-----------|------|
| Stem separation | LALAL.AI API (Andromeda) |
| Lyrics (ASR) | WhisperX (large-v2 + wav2vec2) |
| Beat detection | madmom (RNN + DBN) |
| Prompt generation | Claude Sonnet 4.6 (Anthropic API) |
| Image generation | SDXL + Hyper-SD 8-step + style LoRA |
| Image-to-video | Wan 2.1 14B (ZeroGPU with FP8) |
| Video assembly | FFmpeg |
| UI | Gradio |