directors-cut / README.md
tayyab415
Update Team One_Horizon + Add README as first tab
27aa36e

A newer version of the Gradio SDK is available: 6.2.0

Upgrade
metadata
title: Director's Cut
emoji: 🎬
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 6.0.1
app_file: app.py
pinned: false
tags:
  - mcp-server-track
  - building-mcp-creative
  - gradio
  - modal
  - elevenlabs
  - gemini
  - nebius
  - openai
  - chatgpt-app
license: mit
short_description: AI Video Editor - YouTube to Viral Shorts

🎬 Director's Cut

The autonomous multi-agent system that transforms any YouTube video into viral vertical content. Zero editing skills. Five AI agents. One click.

Live Demo Demo Video Social Post

MCP Server Modal Gemini FLUX ElevenLabs Qwen VL


😀 Why We Need This

Content creators are drowning. They have hours of amazing landscape YouTube content sitting thereβ€”completely worthless on TikTok, Instagram Reels, and YouTube Shorts.

The current "solutions" are a joke:

  • ❌ Center crop = butchers your content, cuts off 60% of what matters
  • ❌ Manual editing = 2-3 hours per video, soul-crushing repetitive work
  • ❌ Hiring editors = $30-50/hour, burns a hole in your pocket
  • ❌ "AI" tools = glorified filters, no actual intelligence

The vertical video revolution is here, and creators are being left behind.

Every day, millions of hours of incredible content stay trapped in 16:9 format while the algorithm rewards 9:16. Something had to change.


πŸš€ What We Created

Director's Cut is an autonomous multi-agent AI system that doesn't just crop your videoβ€”it thinks about your video.

We built a 5-agent pipeline that:

  • πŸ” Analyzes your entire video for viral-worthy moments
  • βœ… Verifies clip quality using vision AI
  • 🎬 Plans the perfect edit with pacing and transitions
  • πŸ–οΈ Executes with FFmpeg precision
  • 🎭 Polishes with intros, smart crop, subtitles, and music

One YouTube URL β†’ Production-ready vertical content in 3-5 minutes.


πŸ“Ί Demo Video

Demo Video Thumbnail

▢️ Click to Watch Full Demo on YouTube


πŸ’Ό Social Media Post

LinkedIn Post

πŸ“„ Read the Full Post on LinkedIn


πŸ—οΈ Architecture 1: Creating Viral Clips from Long-Form Content

The first half of our pipeline takes a massive YouTube video and identifies the golden moments worth sharing.

Architecture 1 - Clip Generation Pipeline
Scout β†’ Verifier β†’ Director β†’ Hands: From YouTube URL to raw compiled clip

Gradio App - Clip Generation Interface

Gradio Clip Generation Demo
The Gradio interface for generating clips from YouTube URLs


πŸ—οΈ Architecture 2: Production Polish & Refinement

The second half takes that raw clip and transforms it into viral-ready vertical content.

Architecture 2 - Production Polish Pipeline
Showrunner: Smart Crop β†’ Intro β†’ Subtitles β†’ Final Assembly

Gradio App - Production Studio Interface

Gradio Production Studio Demo
The Production Studio for adding polish to your clips


🀝 Partner Technologies

πŸš€ Modal β€” The Backend Powerhouse

Modal isn't just part of our stackβ€”it IS our stack. Without Modal, this project would've been impossible.

Why Modal Changed Everything:

When you're processing videos, you need:

  • 50-500MB file uploads that don't timeout
  • FFmpeg with all codecs pre-installed
  • GPU compute for Whisper transcription
  • Parallel processing without infrastructure management
  • Pay-per-use so you don't burn money on idle servers

Modal delivered ALL of this out of the box.

@app.function(
    image=base_image,  # FFmpeg, ImageMagick, fonts pre-installed
    volumes={STORAGE_PATH: storage_volume},  # Instant file transfers
    timeout=3600,
    memory=32768,  # 32GB RAM for video processing
    cpu=8.0
)
@modal.web_endpoint(method="POST")
def process_video(request: dict):
    # This just works. No Docker. No K8s. No DevOps nightmares.
    # Files transfer at lightning speed via Modal volumes.
    # Scales to zero when idleβ€”we only pay when processing.

The Impact:

Before Modal With Modal
45min upload times < 30s file transfers
Docker dependency hell Zero config FFmpeg
$200/month idle servers Pay only when processing
Manual scaling Auto-scales to demand

Huge thanks to Modal for the generous credits that made this possible. We pushed their infrastructure HARD and it never flinched.

Modal Backend Architecture
How Modal powers the entire Director's Cut backend


🎨 Nebius AI Studio β€” Qwen VL + FLUX

Nebius provides lightning-fast inference for two critical features:

Qwen 2.5-VL-72B β€” Intelligent Subject Tracking

This is NOT center crop. We built genuine AI-powered smart cropping:

# For each key frame, Qwen VL detects the main subject position
qwen_response = requests.post(
    "https://api.studio.nebius.ai/v1/chat/completions",
    headers={"Authorization": f"Bearer {nebius_key}"},
    json={
        "model": "Qwen/Qwen2.5-VL-72B-Instruct",
        "messages": [{
            "role": "user",
            "content": [
                {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{frame}"}},
                {"type": "text", "text": "Find the horizontal position of the main subject. Return decimal 0.0-1.0"}
            ]
        }]
    }
)
# Result: 92% subject retention vs 40% with dumb center crop

Smart Crop Demo
Qwen VL tracking subjects for intelligent 9:16 framing

FLUX β€” Custom Intro Image Generation

Every video gets a unique AI-generated intro card that matches its mood:

response = requests.post(
    "https://api.studio.nebius.ai/v1/images/generations",
    headers={"Authorization": f"Bearer {nebius_key}"},
    json={
        "model": "black-forest-labs/flux-schnell",
        "prompt": f"High-energy social media intro, vertical 9:16, "
                  f"bold typography '{title}', vibrant neon gradients",
        "width": 1080,
        "height": 1920,
        "num_inference_steps": 4
    }
)
# Generates in < 5 seconds on Nebius

Mood-Matched Styles:

  • πŸ”₯ Hype β†’ Neon gradients, bold typography, TikTok energy
  • 🎬 Suspense β†’ Cinematic noir, dramatic shadows
  • 🌿 Chill β†’ Soft pastels, minimal aesthetic

FLUX Intro Generation
FLUX generating mood-matched intro cards via Nebius


πŸŽ™οΈ ElevenLabs β€” Professional Voiceover

Every video gets a content-aware AI voiceover for the intro:

from elevenlabs import ElevenLabs

client = ElevenLabs(api_key=elevenlabs_key)

# Gemini writes a hook based on actual video content
intro_script = "Joe Rogan just dropped some insane knowledge about AI. " \
               "This take is gonna blow your mind, check it out..."

audio = client.text_to_speech.convert(
    text=intro_script,
    voice_id="21m00Tcm4TlvDq8ikWAM",  # Rachel - engaging, professional
    model_id="eleven_turbo_v2_5"
)

What Makes It Special:

  • 🧠 Scripts reference actual video content (not generic templates)
  • 🎭 Voice selection adapts to video mood
  • ⚑ Sub-2s generation time

ElevenLabs Voiceover Demo
ElevenLabs generating professional voiceover intros


πŸ”Œ MCP Server Integration

πŸ€– ChatGPT Integration (GPT Apps SDK)

We built a ChatGPT App using the GPT Apps SDK that turns ChatGPT into your personal video production assistant.

How It Works:

  1. Open ChatGPT
  2. Find "Director's Cut" in Apps
  3. Upload your 15-50 second clip
  4. Tell ChatGPT what you want: "Add subtitles and a hype intro"
  5. Download your polished video

This is insanely cool because ChatGPT becomes a conversational video editor. No UI to learn, no buttons to clickβ€”just describe what you want.

User: "Take this clip and make it TikTok ready"
       ↓
ChatGPT: Understands intent, calls Director's Cut MCP tools
       ↓
MCP Server: Processes video (smart crop, subtitles, music)
       ↓
ChatGPT: "Here's your viral-ready video! 🎬"

ChatGPT App Demo
ChatGPT as your personal video production assistant


πŸ–₯️ Claude Desktop MCP Server

For the full autonomous pipeline, connect Claude Desktop to our MCP server.

Option 1: Run Locally (Recommended)

Why Local? Modal cloud processing requires credits that aren't available to everyone. Running locally gives you full control and works with just API keys.

Step 1: Clone the repository

git clone https://github.com/tayyab415/directors-cut.git
cd directors-cut

Step 2: Install dependencies

pip install -r requirements.txt

Step 3: Set up environment variables

Create a .env file:

# Required API Keys
GEMINI_API_KEY=your_gemini_key
NEBIUS_API_KEY=your_nebius_key
ELEVENLABS_API_KEY=your_elevenlabs_key

# Optional: Modal (for cloud processing - requires Modal credits)
MODAL_TOKEN_ID=your_modal_token_id
MODAL_TOKEN_SECRET=your_modal_token_secret

Step 4: Run the MCP server

python app.py

Step 5: Configure Claude Desktop

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "directors-cut": {
      "type": "sse",
      "url": "http://localhost:7860/gradio_api/mcp/sse"
    }
  }
}

Step 6: Restart Claude Desktop and start creating!

Then just ask Claude:

"Process this YouTube video into a viral TikTok: https://youtube.com/watch?v=..."


Option 2: Use the Hosted Server

If you have Modal credits or just want to try the hosted version:

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "directors-cut": {
      "type": "sse",
      "url": "https://tyb343-directors-cut.hf.space/gradio_api/mcp/sse"
    }
  }
}

MCP Tools Available:

Tool Description
process_video Full pipeline: YouTube URL β†’ Viral video
step1_analyze_video_mcp Analyze and classify video content
step2_scout_hotspots_mcp Find viral-worthy moments
step3_verify_hotspots_mcp Vision AI verification
step4_create_plan_mcp Generate edit plan
render_and_produce_mcp Render + production polish
smart_crop_video Standalone 9:16 smart crop
add_production_value Add intro, subtitles, music

Claude Desktop MCP Demo
Claude Desktop orchestrating the full Director's Cut pipeline


⚑ Performance

Metric Value
Processing Time 3-5 min for 10-min video
Smart Crop Accuracy 92% subject retention
Subtitle Accuracy 95%+ (Whisper large-v3)
Cost Per Video ~$0.15
Human Editor Equivalent $30-50/hour saved

πŸ› οΈ Tech Stack

Component Technology Purpose
MCP Server Gradio 5.x Claude/ChatGPT integration
Backend Compute Modal Labs Video processing at scale
Video Analysis Gemini 2.0 Flash Hotspot detection, planning
Smart Crop Qwen VL (Nebius) Subject tracking
Intro Images FLUX (Nebius) Custom title cards
Voiceover ElevenLabs Professional narration
Subtitles WhisperX Word-level captions
Video Processing FFmpeg + MoviePy Rendering

πŸ“ Project Structure

directors-cut/
β”œβ”€β”€ app.py                 # Main Gradio app + MCP tools
β”œβ”€β”€ modal_simple.py        # Modal backend endpoints
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ scout.py           # Hotspot detection agent
β”‚   β”œβ”€β”€ verifier.py        # Vision-based verification agent
β”‚   β”œβ”€β”€ director.py        # Edit plan generation agent
β”‚   β”œβ”€β”€ hands.py           # FFmpeg execution agent
β”‚   β”œβ”€β”€ showrunner.py      # Production polish agent
β”‚   β”œβ”€β”€ server.py          # Standalone MCP server
β”‚   └── paths.py           # File management
β”œβ”€β”€ assets/music/          # Mood-matched background tracks
β”‚   β”œβ”€β”€ hype/
β”‚   β”œβ”€β”€ chill/
β”‚   └── suspense/
β”œβ”€β”€ requirements.txt
└── README.md

πŸŽ“ What We Learned

Agent Coordination is Harder Than It Looks

Early versions had agents stepping on each other. Solution: Clear responsibility boundaries + Verifier as quality gate.

Smart Crop is a Game-Changer

Center crop loses 60% of content. Using Qwen VL for actual subject trackingβ€”the difference is night and day.

Modal is Insanely Good

We tried local FFmpeg first. Disaster. Modal's pre-configured containers + instant volumes saved 40+ hours of DevOps.

MCP Makes AI Actually Useful

Without MCP, this is "another AI tool." With MCP, Claude/ChatGPT become genuine creative assistants.


πŸ‘₯ Team One_Horizon

  • Tayyab Khan (tyb343) β€” Full-stack Development, Multi-agent Architecture, MCP Integration
  • Sahil Tanna (sahiltanna7) β€” Development & Testing, Prompt Engineering
  • Nikunj (nikunj30) β€” Development & Testing, MCP Integration

⚠️ Disclaimer & Responsible Use

Important Notice on Copyright and Intended Use:

Director's Cut is designed to help content creators repurpose their own content for different platforms. The intended use cases are:

βœ… Legitimate Uses:

  • Creators repurposing their own YouTube content for TikTok/Reels/Shorts
  • Businesses creating short-form content from their long-form material
  • Educational content being reformatted for different audiences
  • Personal projects and creative experimentation

❌ This tool should NOT be used for:

  • Downloading and repurposing content you don't own
  • Creating content that infringes on others' copyrights
  • Removing watermarks or attribution from original creators
  • Monetizing content without proper rights or licensing

By using Director's Cut, you agree to:

  1. Only process content you have rights to use
  2. Respect copyright laws in your jurisdiction
  3. Properly attribute original creators when required
  4. Not use this tool for deceptive or harmful purposes

We are not responsible for misuse of this tool. The technology is built to empower creators, not to enable copyright infringement. Please use responsibly.


πŸ“œ License

MIT License - Build cool stuff with this, but build it ethically!


πŸ™ Acknowledgments

Massive thanks to:

  • Modal β€” For infrastructure that actually works and generous hackathon credits
  • Nebius β€” For blazing-fast Qwen VL and FLUX inference
  • ElevenLabs β€” For voices that sound genuinely human
  • Google Gemini β€” For the multimodal reasoning powering our agents
  • Anthropic & Gradio β€” For MCP and hosting this incredible hackathon

Built with ❀️ for content creators who refuse to let great content die in landscape format.

πŸš€ Try Director's Cut Now