Spaces:
Running
A newer version of the Gradio SDK is available:
6.2.0
title: Director's Cut
emoji: π¬
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 6.0.1
app_file: app.py
pinned: false
tags:
- mcp-server-track
- building-mcp-creative
- gradio
- modal
- elevenlabs
- gemini
- nebius
- openai
- chatgpt-app
license: mit
short_description: AI Video Editor - YouTube to Viral Shorts
π¬ Director's Cut
The autonomous multi-agent system that transforms any YouTube video into viral vertical content. Zero editing skills. Five AI agents. One click.
π€ Why We Need This
Content creators are drowning. They have hours of amazing landscape YouTube content sitting thereβcompletely worthless on TikTok, Instagram Reels, and YouTube Shorts.
The current "solutions" are a joke:
- β Center crop = butchers your content, cuts off 60% of what matters
- β Manual editing = 2-3 hours per video, soul-crushing repetitive work
- β Hiring editors = $30-50/hour, burns a hole in your pocket
- β "AI" tools = glorified filters, no actual intelligence
The vertical video revolution is here, and creators are being left behind.
Every day, millions of hours of incredible content stay trapped in 16:9 format while the algorithm rewards 9:16. Something had to change.
π What We Created
Director's Cut is an autonomous multi-agent AI system that doesn't just crop your videoβit thinks about your video.
We built a 5-agent pipeline that:
- π Analyzes your entire video for viral-worthy moments
- β Verifies clip quality using vision AI
- π¬ Plans the perfect edit with pacing and transitions
- ποΈ Executes with FFmpeg precision
- π Polishes with intros, smart crop, subtitles, and music
One YouTube URL β Production-ready vertical content in 3-5 minutes.
πΊ Demo Video
βΆοΈ Click to Watch Full Demo on YouTube
πΌ Social Media Post
π Read the Full Post on LinkedIn
ποΈ Architecture 1: Creating Viral Clips from Long-Form Content
The first half of our pipeline takes a massive YouTube video and identifies the golden moments worth sharing.
Scout β Verifier β Director β Hands: From YouTube URL to raw compiled clip
Gradio App - Clip Generation Interface
The Gradio interface for generating clips from YouTube URLs
ποΈ Architecture 2: Production Polish & Refinement
The second half takes that raw clip and transforms it into viral-ready vertical content.
Showrunner: Smart Crop β Intro β Subtitles β Final Assembly
Gradio App - Production Studio Interface
The Production Studio for adding polish to your clips
π€ Partner Technologies
π Modal β The Backend Powerhouse
Modal isn't just part of our stackβit IS our stack. Without Modal, this project would've been impossible.
Why Modal Changed Everything:
When you're processing videos, you need:
- 50-500MB file uploads that don't timeout
- FFmpeg with all codecs pre-installed
- GPU compute for Whisper transcription
- Parallel processing without infrastructure management
- Pay-per-use so you don't burn money on idle servers
Modal delivered ALL of this out of the box.
@app.function(
image=base_image, # FFmpeg, ImageMagick, fonts pre-installed
volumes={STORAGE_PATH: storage_volume}, # Instant file transfers
timeout=3600,
memory=32768, # 32GB RAM for video processing
cpu=8.0
)
@modal.web_endpoint(method="POST")
def process_video(request: dict):
# This just works. No Docker. No K8s. No DevOps nightmares.
# Files transfer at lightning speed via Modal volumes.
# Scales to zero when idleβwe only pay when processing.
The Impact:
| Before Modal | With Modal |
|---|---|
| 45min upload times | < 30s file transfers |
| Docker dependency hell | Zero config FFmpeg |
| $200/month idle servers | Pay only when processing |
| Manual scaling | Auto-scales to demand |
Huge thanks to Modal for the generous credits that made this possible. We pushed their infrastructure HARD and it never flinched.
How Modal powers the entire Director's Cut backend
π¨ Nebius AI Studio β Qwen VL + FLUX
Nebius provides lightning-fast inference for two critical features:
Qwen 2.5-VL-72B β Intelligent Subject Tracking
This is NOT center crop. We built genuine AI-powered smart cropping:
# For each key frame, Qwen VL detects the main subject position
qwen_response = requests.post(
"https://api.studio.nebius.ai/v1/chat/completions",
headers={"Authorization": f"Bearer {nebius_key}"},
json={
"model": "Qwen/Qwen2.5-VL-72B-Instruct",
"messages": [{
"role": "user",
"content": [
{"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{frame}"}},
{"type": "text", "text": "Find the horizontal position of the main subject. Return decimal 0.0-1.0"}
]
}]
}
)
# Result: 92% subject retention vs 40% with dumb center crop
Qwen VL tracking subjects for intelligent 9:16 framing
FLUX β Custom Intro Image Generation
Every video gets a unique AI-generated intro card that matches its mood:
response = requests.post(
"https://api.studio.nebius.ai/v1/images/generations",
headers={"Authorization": f"Bearer {nebius_key}"},
json={
"model": "black-forest-labs/flux-schnell",
"prompt": f"High-energy social media intro, vertical 9:16, "
f"bold typography '{title}', vibrant neon gradients",
"width": 1080,
"height": 1920,
"num_inference_steps": 4
}
)
# Generates in < 5 seconds on Nebius
Mood-Matched Styles:
- π₯ Hype β Neon gradients, bold typography, TikTok energy
- π¬ Suspense β Cinematic noir, dramatic shadows
- πΏ Chill β Soft pastels, minimal aesthetic
FLUX generating mood-matched intro cards via Nebius
ποΈ ElevenLabs β Professional Voiceover
Every video gets a content-aware AI voiceover for the intro:
from elevenlabs import ElevenLabs
client = ElevenLabs(api_key=elevenlabs_key)
# Gemini writes a hook based on actual video content
intro_script = "Joe Rogan just dropped some insane knowledge about AI. " \
"This take is gonna blow your mind, check it out..."
audio = client.text_to_speech.convert(
text=intro_script,
voice_id="21m00Tcm4TlvDq8ikWAM", # Rachel - engaging, professional
model_id="eleven_turbo_v2_5"
)
What Makes It Special:
- π§ Scripts reference actual video content (not generic templates)
- π Voice selection adapts to video mood
- β‘ Sub-2s generation time
ElevenLabs generating professional voiceover intros
π MCP Server Integration
π€ ChatGPT Integration (GPT Apps SDK)
We built a ChatGPT App using the GPT Apps SDK that turns ChatGPT into your personal video production assistant.
How It Works:
- Open ChatGPT
- Find "Director's Cut" in Apps
- Upload your 15-50 second clip
- Tell ChatGPT what you want: "Add subtitles and a hype intro"
- Download your polished video
This is insanely cool because ChatGPT becomes a conversational video editor. No UI to learn, no buttons to clickβjust describe what you want.
User: "Take this clip and make it TikTok ready"
β
ChatGPT: Understands intent, calls Director's Cut MCP tools
β
MCP Server: Processes video (smart crop, subtitles, music)
β
ChatGPT: "Here's your viral-ready video! π¬"
ChatGPT as your personal video production assistant
π₯οΈ Claude Desktop MCP Server
For the full autonomous pipeline, connect Claude Desktop to our MCP server.
Option 1: Run Locally (Recommended)
Why Local? Modal cloud processing requires credits that aren't available to everyone. Running locally gives you full control and works with just API keys.
Step 1: Clone the repository
git clone https://github.com/tayyab415/directors-cut.git
cd directors-cut
Step 2: Install dependencies
pip install -r requirements.txt
Step 3: Set up environment variables
Create a .env file:
# Required API Keys
GEMINI_API_KEY=your_gemini_key
NEBIUS_API_KEY=your_nebius_key
ELEVENLABS_API_KEY=your_elevenlabs_key
# Optional: Modal (for cloud processing - requires Modal credits)
MODAL_TOKEN_ID=your_modal_token_id
MODAL_TOKEN_SECRET=your_modal_token_secret
Step 4: Run the MCP server
python app.py
Step 5: Configure Claude Desktop
Add to claude_desktop_config.json:
{
"mcpServers": {
"directors-cut": {
"type": "sse",
"url": "http://localhost:7860/gradio_api/mcp/sse"
}
}
}
Step 6: Restart Claude Desktop and start creating!
Then just ask Claude:
"Process this YouTube video into a viral TikTok: https://youtube.com/watch?v=..."
Option 2: Use the Hosted Server
If you have Modal credits or just want to try the hosted version:
Add to your claude_desktop_config.json:
{
"mcpServers": {
"directors-cut": {
"type": "sse",
"url": "https://tyb343-directors-cut.hf.space/gradio_api/mcp/sse"
}
}
}
MCP Tools Available:
| Tool | Description |
|---|---|
process_video |
Full pipeline: YouTube URL β Viral video |
step1_analyze_video_mcp |
Analyze and classify video content |
step2_scout_hotspots_mcp |
Find viral-worthy moments |
step3_verify_hotspots_mcp |
Vision AI verification |
step4_create_plan_mcp |
Generate edit plan |
render_and_produce_mcp |
Render + production polish |
smart_crop_video |
Standalone 9:16 smart crop |
add_production_value |
Add intro, subtitles, music |
Claude Desktop orchestrating the full Director's Cut pipeline
β‘ Performance
| Metric | Value |
|---|---|
| Processing Time | 3-5 min for 10-min video |
| Smart Crop Accuracy | 92% subject retention |
| Subtitle Accuracy | 95%+ (Whisper large-v3) |
| Cost Per Video | ~$0.15 |
| Human Editor Equivalent | $30-50/hour saved |
π οΈ Tech Stack
| Component | Technology | Purpose |
|---|---|---|
| MCP Server | Gradio 5.x | Claude/ChatGPT integration |
| Backend Compute | Modal Labs | Video processing at scale |
| Video Analysis | Gemini 2.0 Flash | Hotspot detection, planning |
| Smart Crop | Qwen VL (Nebius) | Subject tracking |
| Intro Images | FLUX (Nebius) | Custom title cards |
| Voiceover | ElevenLabs | Professional narration |
| Subtitles | WhisperX | Word-level captions |
| Video Processing | FFmpeg + MoviePy | Rendering |
π Project Structure
directors-cut/
βββ app.py # Main Gradio app + MCP tools
βββ modal_simple.py # Modal backend endpoints
βββ src/
β βββ scout.py # Hotspot detection agent
β βββ verifier.py # Vision-based verification agent
β βββ director.py # Edit plan generation agent
β βββ hands.py # FFmpeg execution agent
β βββ showrunner.py # Production polish agent
β βββ server.py # Standalone MCP server
β βββ paths.py # File management
βββ assets/music/ # Mood-matched background tracks
β βββ hype/
β βββ chill/
β βββ suspense/
βββ requirements.txt
βββ README.md
π What We Learned
Agent Coordination is Harder Than It Looks
Early versions had agents stepping on each other. Solution: Clear responsibility boundaries + Verifier as quality gate.
Smart Crop is a Game-Changer
Center crop loses 60% of content. Using Qwen VL for actual subject trackingβthe difference is night and day.
Modal is Insanely Good
We tried local FFmpeg first. Disaster. Modal's pre-configured containers + instant volumes saved 40+ hours of DevOps.
MCP Makes AI Actually Useful
Without MCP, this is "another AI tool." With MCP, Claude/ChatGPT become genuine creative assistants.
π₯ Team One_Horizon
- Tayyab Khan (tyb343) β Full-stack Development, Multi-agent Architecture, MCP Integration
- Sahil Tanna (sahiltanna7) β Development & Testing, Prompt Engineering
- Nikunj (nikunj30) β Development & Testing, MCP Integration
β οΈ Disclaimer & Responsible Use
Important Notice on Copyright and Intended Use:
Director's Cut is designed to help content creators repurpose their own content for different platforms. The intended use cases are:
β Legitimate Uses:
- Creators repurposing their own YouTube content for TikTok/Reels/Shorts
- Businesses creating short-form content from their long-form material
- Educational content being reformatted for different audiences
- Personal projects and creative experimentation
β This tool should NOT be used for:
- Downloading and repurposing content you don't own
- Creating content that infringes on others' copyrights
- Removing watermarks or attribution from original creators
- Monetizing content without proper rights or licensing
By using Director's Cut, you agree to:
- Only process content you have rights to use
- Respect copyright laws in your jurisdiction
- Properly attribute original creators when required
- Not use this tool for deceptive or harmful purposes
We are not responsible for misuse of this tool. The technology is built to empower creators, not to enable copyright infringement. Please use responsibly.
π License
MIT License - Build cool stuff with this, but build it ethically!
π Acknowledgments
Massive thanks to:
- Modal β For infrastructure that actually works and generous hackathon credits
- Nebius β For blazing-fast Qwen VL and FLUX inference
- ElevenLabs β For voices that sound genuinely human
- Google Gemini β For the multimodal reasoning powering our agents
- Anthropic & Gradio β For MCP and hosting this incredible hackathon
Built with β€οΈ for content creators who refuse to let great content die in landscape format.