SyncAI / README.md
ICGenAIShare04's picture
Upload 52 files
72f552e verified

A newer version of the Gradio SDK is available: 6.9.0

Upgrade
metadata
title: SyncAI
emoji: 🎡
colorFrom: indigo
colorTo: pink
sdk: gradio
sdk_version: 6.8.0
app_file: app.py
pinned: false
short_description: AI Music Ads Generator

SyncAI β€” AI Music Video Generator

Generate beat-synced music video ads from a song clip. Upload ~15 seconds of audio, pick a visual style, and SyncAI produces a fully assembled vertical video with AI-generated visuals cut to the beat.

How It Works

Song (audio file)
  β”œβ”€β–Ί Stem Separation (LALAL.AI) β†’ Vocals + Drums
  β”œβ”€β–Ί Lyrics Extraction (WhisperX) β†’ Word-level timestamps
  β”œβ”€β–Ί Beat Detection (madmom RNN + DBN) β†’ Beat timestamps + drop detection
  β”œβ”€β–Ί Segmentation β†’ Lyrics mapped to beat intervals
  β”œβ”€β–Ί Prompt Generation (Claude Sonnet 4.6) β†’ Image + video motion prompts
  β”œβ”€β–Ί Image Generation (SDXL + Hyper-SD + style LoRA) β†’ 768x1344 images
  β”œβ”€β–Ί Image-to-Video (Wan 2.1 14B) β†’ Animated clips
  └─► Assembly (FFmpeg) β†’ Beat-synced video with lyrics overlay

Visual Styles

Each style applies a different LoRA to SDXL and sets a unique scene world for the LLM prompt generator. The Sunset Coastal Drive LoRA was custom-trained for this project; the others are community LoRAs from HuggingFace Hub:

Style LoRA Setting
Sunset Coastal Drive Custom-trained (samuelsattler/warm-sunset-lora) Car cruising a coastal highway at golden hour
Rainy City Night Film grain (artificialguybr/filmgrain-redmond) Walking rain-soaked city streets after dark
Cyberpunk Cyberpunk 2077 (jbilcke-hf/sdxl-cyberpunk-2077) Neon-drenched futuristic megacity at night
Watercolour Harbour Watercolor (ostris/watercolor_style_lora_sdxl) Coastal fishing village during a storm

Assembly Features

  • Dynamic pacing: 4-beat cuts before the drop, 2-beat cuts after for energy
  • Clip shuffling: Each clip used twice (first/second half) in randomised order for visual variety
  • Ken Burns: Alternating zoom in/out on every cut
  • Lyrics overlay: Word-level timing with gap closing
  • Cover art overlay: Album art + Spotify badge appear from the drop onwards
  • Reshuffle: Re-run assembly with a new random clip order without regenerating

Tech Stack

Component Tool
Stem separation LALAL.AI API (Andromeda)
Lyrics (ASR) WhisperX (large-v2 + wav2vec2)
Beat detection madmom (RNN + DBN)
Prompt generation Claude Sonnet 4.6 (Anthropic API)
Image generation SDXL + Hyper-SD 8-step + style LoRA
Image-to-video Wan 2.1 14B (ZeroGPU with FP8)
Video assembly FFmpeg
UI Gradio