Spaces:

mvp-lab
/

SyncAI

Sleeping

App Files Files Community

SyncAI / README.md

ICGenAIShare04

Upload 52 files

72f552e verified 11 days ago

preview code

raw

history blame contribute delete

2.69 kB

	---
	title: SyncAI
	emoji: 🎵
	colorFrom: indigo
	colorTo: pink
	sdk: gradio
	sdk_version: "6.8.0"
	app_file: app.py
	pinned: false
	short_description: AI Music Ads Generator
	---

	# SyncAI — AI Music Video Generator

	Generate beat-synced music video ads from a song clip. Upload ~15 seconds of audio, pick a visual style, and SyncAI produces a fully assembled vertical video with AI-generated visuals cut to the beat.

	## How It Works

	```
	Song (audio file)
	├─► Stem Separation (LALAL.AI) → Vocals + Drums
	├─► Lyrics Extraction (WhisperX) → Word-level timestamps
	├─► Beat Detection (madmom RNN + DBN) → Beat timestamps + drop detection
	├─► Segmentation → Lyrics mapped to beat intervals
	├─► Prompt Generation (Claude Sonnet 4.6) → Image + video motion prompts
	├─► Image Generation (SDXL + Hyper-SD + style LoRA) → 768x1344 images
	├─► Image-to-Video (Wan 2.1 14B) → Animated clips
	└─► Assembly (FFmpeg) → Beat-synced video with lyrics overlay
	```

	## Visual Styles

	Each style applies a different LoRA to SDXL and sets a unique scene world for the LLM prompt generator. The Sunset Coastal Drive LoRA was custom-trained for this project; the others are community LoRAs from HuggingFace Hub:

	\| Style \| LoRA \| Setting \|
	\|-------\|------\|---------\|
	\| Sunset Coastal Drive \| Custom-trained (`samuelsattler/warm-sunset-lora`) \| Car cruising a coastal highway at golden hour \|
	\| Rainy City Night \| Film grain (`artificialguybr/filmgrain-redmond`) \| Walking rain-soaked city streets after dark \|
	\| Cyberpunk \| Cyberpunk 2077 (`jbilcke-hf/sdxl-cyberpunk-2077`) \| Neon-drenched futuristic megacity at night \|
	\| Watercolour Harbour \| Watercolor (`ostris/watercolor_style_lora_sdxl`) \| Coastal fishing village during a storm \|

	## Assembly Features

	- Dynamic pacing: 4-beat cuts before the drop, 2-beat cuts after for energy
	- Clip shuffling: Each clip used twice (first/second half) in randomised order for visual variety
	- Ken Burns: Alternating zoom in/out on every cut
	- Lyrics overlay: Word-level timing with gap closing
	- Cover art overlay: Album art + Spotify badge appear from the drop onwards
	- Reshuffle: Re-run assembly with a new random clip order without regenerating

	## Tech Stack

	\| Component \| Tool \|
	\|-----------\|------\|
	\| Stem separation \| LALAL.AI API (Andromeda) \|
	\| Lyrics (ASR) \| WhisperX (large-v2 + wav2vec2) \|
	\| Beat detection \| madmom (RNN + DBN) \|
	\| Prompt generation \| Claude Sonnet 4.6 (Anthropic API) \|
	\| Image generation \| SDXL + Hyper-SD 8-step + style LoRA \|
	\| Image-to-video \| Wan 2.1 14B (ZeroGPU with FP8) \|
	\| Video assembly \| FFmpeg \|
	\| UI \| Gradio \|