Spaces:

tyb343
/

directors-cut

Running

App Files Files Community

directors-cut / README.md

tayyab415

Update Team One_Horizon + Add README as first tab

27aa36e about 1 month ago

preview code

raw

history blame contribute delete

18.2 kB

A newer version of the Gradio SDK is available: 6.2.0

Upgrade

metadata

title: Director's Cut
emoji: 🎬
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 6.0.1
app_file: app.py
pinned: false
tags:
  - mcp-server-track
  - building-mcp-creative
  - gradio
  - modal
  - elevenlabs
  - gemini
  - nebius
  - openai
  - chatgpt-app
license: mit
short_description: AI Video Editor - YouTube to Viral Shorts

🎬 Director's Cut

The autonomous multi-agent system that transforms any YouTube video into viral vertical content. Zero editing skills. Five AI agents. One click.

😤 Why We Need This

Content creators are drowning. They have hours of amazing landscape YouTube content sitting there—completely worthless on TikTok, Instagram Reels, and YouTube Shorts.

The current "solutions" are a joke:

❌ Center crop = butchers your content, cuts off 60% of what matters
❌ Manual editing = 2-3 hours per video, soul-crushing repetitive work
❌ Hiring editors = $30-50/hour, burns a hole in your pocket
❌ "AI" tools = glorified filters, no actual intelligence

The vertical video revolution is here, and creators are being left behind.

Every day, millions of hours of incredible content stay trapped in 16:9 format while the algorithm rewards 9:16. Something had to change.

🚀 What We Created

Director's Cut is an autonomous multi-agent AI system that doesn't just crop your video—it thinks about your video.

We built a 5-agent pipeline that:

🔍 Analyzes your entire video for viral-worthy moments
✅ Verifies clip quality using vision AI
🎬 Plans the perfect edit with pacing and transitions
🖐️ Executes with FFmpeg precision
🎭 Polishes with intros, smart crop, subtitles, and music

One YouTube URL → Production-ready vertical content in 3-5 minutes.

📺 Demo Video

▶️ Click to Watch Full Demo on YouTube

💼 Social Media Post

📄 Read the Full Post on LinkedIn

🏗️ Architecture 1: Creating Viral Clips from Long-Form Content

The first half of our pipeline takes a massive YouTube video and identifies the golden moments worth sharing.

Architecture 1 - Clip Generation Pipeline
Scout → Verifier → Director → Hands: From YouTube URL to raw compiled clip

Gradio App - Clip Generation Interface

Gradio Clip Generation Demo
The Gradio interface for generating clips from YouTube URLs

🏗️ Architecture 2: Production Polish & Refinement

The second half takes that raw clip and transforms it into viral-ready vertical content.

Architecture 2 - Production Polish Pipeline
Showrunner: Smart Crop → Intro → Subtitles → Final Assembly

Gradio App - Production Studio Interface

Gradio Production Studio Demo
The Production Studio for adding polish to your clips

🤝 Partner Technologies

🚀 Modal — The Backend Powerhouse

Modal isn't just part of our stack—it IS our stack. Without Modal, this project would've been impossible.

Why Modal Changed Everything:

When you're processing videos, you need:

50-500MB file uploads that don't timeout
FFmpeg with all codecs pre-installed
GPU compute for Whisper transcription
Parallel processing without infrastructure management
Pay-per-use so you don't burn money on idle servers

Modal delivered ALL of this out of the box.

@app.function(
    image=base_image,  # FFmpeg, ImageMagick, fonts pre-installed
    volumes={STORAGE_PATH: storage_volume},  # Instant file transfers
    timeout=3600,
    memory=32768,  # 32GB RAM for video processing
    cpu=8.0
)
@modal.web_endpoint(method="POST")
def process_video(request: dict):
    # This just works. No Docker. No K8s. No DevOps nightmares.
    # Files transfer at lightning speed via Modal volumes.
    # Scales to zero when idle—we only pay when processing.

The Impact:

Before Modal	With Modal
45min upload times	< 30s file transfers
Docker dependency hell	Zero config FFmpeg
$200/month idle servers	Pay only when processing
Manual scaling	Auto-scales to demand

Huge thanks to Modal for the generous credits that made this possible. We pushed their infrastructure HARD and it never flinched.

Modal Backend Architecture
How Modal powers the entire Director's Cut backend

🎨 Nebius AI Studio — Qwen VL + FLUX

Nebius provides lightning-fast inference for two critical features:

Qwen 2.5-VL-72B — Intelligent Subject Tracking

This is NOT center crop. We built genuine AI-powered smart cropping:

# For each key frame, Qwen VL detects the main subject position
qwen_response = requests.post(
    "https://api.studio.nebius.ai/v1/chat/completions",
    headers={"Authorization": f"Bearer {nebius_key}"},
    json={
        "model": "Qwen/Qwen2.5-VL-72B-Instruct",
        "messages": [{
            "role": "user",
            "content": [
                {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{frame}"}},
                {"type": "text", "text": "Find the horizontal position of the main subject. Return decimal 0.0-1.0"}
            ]
        }]
    }
)
# Result: 92% subject retention vs 40% with dumb center crop

Smart Crop Demo
Qwen VL tracking subjects for intelligent 9:16 framing

FLUX — Custom Intro Image Generation

Every video gets a unique AI-generated intro card that matches its mood:

response = requests.post(
    "https://api.studio.nebius.ai/v1/images/generations",
    headers={"Authorization": f"Bearer {nebius_key}"},
    json={
        "model": "black-forest-labs/flux-schnell",
        "prompt": f"High-energy social media intro, vertical 9:16, "
                  f"bold typography '{title}', vibrant neon gradients",
        "width": 1080,
        "height": 1920,
        "num_inference_steps": 4
    }
)
# Generates in < 5 seconds on Nebius

Mood-Matched Styles:

🔥 Hype → Neon gradients, bold typography, TikTok energy
🎬 Suspense → Cinematic noir, dramatic shadows
🌿 Chill → Soft pastels, minimal aesthetic

FLUX Intro Generation
FLUX generating mood-matched intro cards via Nebius

🎙️ ElevenLabs — Professional Voiceover

Every video gets a content-aware AI voiceover for the intro:

from elevenlabs import ElevenLabs

client = ElevenLabs(api_key=elevenlabs_key)

# Gemini writes a hook based on actual video content
intro_script = "Joe Rogan just dropped some insane knowledge about AI. " \
               "This take is gonna blow your mind, check it out..."

audio = client.text_to_speech.convert(
    text=intro_script,
    voice_id="21m00Tcm4TlvDq8ikWAM",  # Rachel - engaging, professional
    model_id="eleven_turbo_v2_5"
)

What Makes It Special:

🧠 Scripts reference actual video content (not generic templates)
🎭 Voice selection adapts to video mood
⚡ Sub-2s generation time

ElevenLabs Voiceover Demo
ElevenLabs generating professional voiceover intros

🔌 MCP Server Integration

🤖 ChatGPT Integration (GPT Apps SDK)

We built a ChatGPT App using the GPT Apps SDK that turns ChatGPT into your personal video production assistant.

How It Works:

Open ChatGPT
Find "Director's Cut" in Apps
Upload your 15-50 second clip
Tell ChatGPT what you want: "Add subtitles and a hype intro"
Download your polished video

This is insanely cool because ChatGPT becomes a conversational video editor. No UI to learn, no buttons to click—just describe what you want.

User: "Take this clip and make it TikTok ready"
       ↓
ChatGPT: Understands intent, calls Director's Cut MCP tools
       ↓
MCP Server: Processes video (smart crop, subtitles, music)
       ↓
ChatGPT: "Here's your viral-ready video! 🎬"

ChatGPT App Demo
ChatGPT as your personal video production assistant

🖥️ Claude Desktop MCP Server

For the full autonomous pipeline, connect Claude Desktop to our MCP server.

Option 1: Run Locally (Recommended)

Why Local? Modal cloud processing requires credits that aren't available to everyone. Running locally gives you full control and works with just API keys.

Step 1: Clone the repository

git clone https://github.com/tayyab415/directors-cut.git
cd directors-cut

Step 2: Install dependencies

pip install -r requirements.txt

Step 3: Set up environment variables

Create a .env file:

# Required API Keys
GEMINI_API_KEY=your_gemini_key
NEBIUS_API_KEY=your_nebius_key
ELEVENLABS_API_KEY=your_elevenlabs_key

# Optional: Modal (for cloud processing - requires Modal credits)
MODAL_TOKEN_ID=your_modal_token_id
MODAL_TOKEN_SECRET=your_modal_token_secret

Step 4: Run the MCP server

python app.py

Step 5: Configure Claude Desktop

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "directors-cut": {
      "type": "sse",
      "url": "http://localhost:7860/gradio_api/mcp/sse"
    }
  }
}

Step 6: Restart Claude Desktop and start creating!

Then just ask Claude:

"Process this YouTube video into a viral TikTok: https://youtube.com/watch?v=..."

Option 2: Use the Hosted Server

If you have Modal credits or just want to try the hosted version:

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "directors-cut": {
      "type": "sse",
      "url": "https://tyb343-directors-cut.hf.space/gradio_api/mcp/sse"
    }
  }
}

MCP Tools Available:

Tool	Description
`process_video`	Full pipeline: YouTube URL → Viral video
`step1_analyze_video_mcp`	Analyze and classify video content
`step2_scout_hotspots_mcp`	Find viral-worthy moments
`step3_verify_hotspots_mcp`	Vision AI verification
`step4_create_plan_mcp`	Generate edit plan
`render_and_produce_mcp`	Render + production polish
`smart_crop_video`	Standalone 9:16 smart crop
`add_production_value`	Add intro, subtitles, music

Claude Desktop MCP Demo
Claude Desktop orchestrating the full Director's Cut pipeline

⚡ Performance

Metric	Value
Processing Time	3-5 min for 10-min video
Smart Crop Accuracy	92% subject retention
Subtitle Accuracy	95%+ (Whisper large-v3)
Cost Per Video	~$0.15
Human Editor Equivalent	$30-50/hour saved

🛠️ Tech Stack

Component	Technology	Purpose
MCP Server	Gradio 5.x	Claude/ChatGPT integration
Backend Compute	Modal Labs	Video processing at scale
Video Analysis	Gemini 2.0 Flash	Hotspot detection, planning
Smart Crop	Qwen VL (Nebius)	Subject tracking
Intro Images	FLUX (Nebius)	Custom title cards
Voiceover	ElevenLabs	Professional narration
Subtitles	WhisperX	Word-level captions
Video Processing	FFmpeg + MoviePy	Rendering

📁 Project Structure

directors-cut/
├── app.py                 # Main Gradio app + MCP tools
├── modal_simple.py        # Modal backend endpoints
├── src/
│   ├── scout.py           # Hotspot detection agent
│   ├── verifier.py        # Vision-based verification agent
│   ├── director.py        # Edit plan generation agent
│   ├── hands.py           # FFmpeg execution agent
│   ├── showrunner.py      # Production polish agent
│   ├── server.py          # Standalone MCP server
│   └── paths.py           # File management
├── assets/music/          # Mood-matched background tracks
│   ├── hype/
│   ├── chill/
│   └── suspense/
├── requirements.txt
└── README.md

🎓 What We Learned

Agent Coordination is Harder Than It Looks

Early versions had agents stepping on each other. Solution: Clear responsibility boundaries + Verifier as quality gate.

Smart Crop is a Game-Changer

Center crop loses 60% of content. Using Qwen VL for actual subject tracking—the difference is night and day.

Modal is Insanely Good

We tried local FFmpeg first. Disaster. Modal's pre-configured containers + instant volumes saved 40+ hours of DevOps.

MCP Makes AI Actually Useful

Without MCP, this is "another AI tool." With MCP, Claude/ChatGPT become genuine creative assistants.

👥 Team One_Horizon

Tayyab Khan (tyb343) — Full-stack Development, Multi-agent Architecture, MCP Integration
Sahil Tanna (sahiltanna7) — Development & Testing, Prompt Engineering
Nikunj (nikunj30) — Development & Testing, MCP Integration

⚠️ Disclaimer & Responsible Use

Important Notice on Copyright and Intended Use:

Director's Cut is designed to help content creators repurpose their own content for different platforms. The intended use cases are:

✅ Legitimate Uses:

Creators repurposing their own YouTube content for TikTok/Reels/Shorts
Businesses creating short-form content from their long-form material
Educational content being reformatted for different audiences
Personal projects and creative experimentation

❌ This tool should NOT be used for:

Downloading and repurposing content you don't own
Creating content that infringes on others' copyrights
Removing watermarks or attribution from original creators
Monetizing content without proper rights or licensing

By using Director's Cut, you agree to:

Only process content you have rights to use
Respect copyright laws in your jurisdiction
Properly attribute original creators when required
Not use this tool for deceptive or harmful purposes

We are not responsible for misuse of this tool. The technology is built to empower creators, not to enable copyright infringement. Please use responsibly.

📜 License

MIT License - Build cool stuff with this, but build it ethically!

🙏 Acknowledgments

Massive thanks to:

Modal — For infrastructure that actually works and generous hackathon credits
Nebius — For blazing-fast Qwen VL and FLUX inference
ElevenLabs — For voices that sound genuinely human
Google Gemini — For the multimodal reasoning powering our agents
Anthropic & Gradio — For MCP and hosting this incredible hackathon

Built with ❤️ for content creators who refuse to let great content die in landscape format.

🚀 Try Director's Cut Now