directors-cut / README.md
tayyab415
Update Team One_Horizon + Add README as first tab
27aa36e
---
title: Director's Cut
emoji: 🎬
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 6.0.1
app_file: app.py
pinned: false
tags:
- mcp-server-track
- building-mcp-creative
- gradio
- modal
- elevenlabs
- gemini
- nebius
- openai
- chatgpt-app
license: mit
short_description: AI Video Editor - YouTube to Viral Shorts
---
# 🎬 **Director's Cut**
### *The autonomous multi-agent system that transforms any YouTube video into viral vertical content. Zero editing skills. Five AI agents. One click.*
<p align="center">
<a href="https://huggingface.co/spaces/tyb343/directors-cut"><img src="https://img.shields.io/badge/πŸš€_Live_Demo-HuggingFace-yellow?style=for-the-badge" alt="Live Demo"></a>
<a href="https://www.youtube.com/watch?v=h8U5oW4UIVQ"><img src="https://img.shields.io/badge/πŸ“Ή_Demo_Video-YouTube-red?style=for-the-badge" alt="Demo Video"></a>
<a href="https://www.linkedin.com/posts/tayyab-khan-159153282_mcphackathon-mcp1stbirthday-anthropic-share-7400949504557957120-pE0i?utm_source=share&utm_medium=member_desktop&rcm=ACoAAES4lhUBjyG38IZnp2meH1RiGNnldSYW8qY"><img src="https://img.shields.io/badge/πŸ’Ό_Social-LinkedIn-blue?style=for-the-badge" alt="Social Post"></a>
</p>
<p align="center">
<img src="https://img.shields.io/badge/MCP-Server-green?style=flat-square" alt="MCP Server">
<img src="https://img.shields.io/badge/Modal-Powered-purple?style=flat-square" alt="Modal">
<img src="https://img.shields.io/badge/Gemini_2.0-Flash-blue?style=flat-square" alt="Gemini">
<img src="https://img.shields.io/badge/FLUX-Nebius-orange?style=flat-square" alt="FLUX">
<img src="https://img.shields.io/badge/ElevenLabs-Voice-pink?style=flat-square" alt="ElevenLabs">
<img src="https://img.shields.io/badge/Qwen_VL-Smart_Crop-cyan?style=flat-square" alt="Qwen VL">
</p>
---
## 😀 **Why We Need This**
Content creators are **drowning**. They have hours of amazing landscape YouTube content sitting thereβ€”completely **worthless** on TikTok, Instagram Reels, and YouTube Shorts.
The current "solutions" are a joke:
- ❌ **Center crop** = butchers your content, cuts off 60% of what matters
- ❌ **Manual editing** = 2-3 hours per video, soul-crushing repetitive work
- ❌ **Hiring editors** = $30-50/hour, burns a hole in your pocket
- ❌ **"AI" tools** = glorified filters, no actual intelligence
**The vertical video revolution is here, and creators are being left behind.**
Every day, millions of hours of incredible content stay trapped in 16:9 format while the algorithm rewards 9:16. Something had to change.
---
## πŸš€ **What We Created**
**Director's Cut** is an **autonomous multi-agent AI system** that doesn't just crop your videoβ€”it *thinks* about your video.
We built a 5-agent pipeline that:
- πŸ” **Analyzes** your entire video for viral-worthy moments
- βœ… **Verifies** clip quality using vision AI
- 🎬 **Plans** the perfect edit with pacing and transitions
- πŸ–οΈ **Executes** with FFmpeg precision
- 🎭 **Polishes** with intros, smart crop, subtitles, and music
**One YouTube URL β†’ Production-ready vertical content in 3-5 minutes.**
---
## πŸ“Ί **Demo Video**
<p align="center">
<a href="https://www.youtube.com/watch?v=h8U5oW4UIVQ">
<img src="https://img.youtube.com/vi/h8U5oW4UIVQ/hqdefault.jpg" width="600" alt="Demo Video Thumbnail"/>
</a>
</p>
<p align="center">
<strong>▢️ <a href="https://www.youtube.com/watch?v=h8U5oW4UIVQ">Click to Watch Full Demo on YouTube</a></strong>
</p>
---
## πŸ’Ό **Social Media Post**
<p align="center">
<a href="https://www.linkedin.com/posts/tayyab-khan-159153282_mcphackathon-mcp1stbirthday-anthropic-share-7400949504557957120-pE0i?utm_source=share&utm_medium=member_desktop&rcm=ACoAAES4lhUBjyG38IZnp2meH1RiGNnldSYW8qY">
<img src="./resources/linkedin-post.jpg" width="600" alt="LinkedIn Post"/>
</a>
</p>
<p align="center">
<strong>πŸ“„ <a href="https://www.linkedin.com/posts/tayyab-khan-159153282_mcphackathon-mcp1stbirthday-anthropic-share-7400949504557957120-pE0i?utm_source=share&utm_medium=member_desktop&rcm=ACoAAES4lhUBjyG38IZnp2meH1RiGNnldSYW8qY">Read the Full Post on LinkedIn</a></strong>
</p>
---
## πŸ—οΈ **Architecture 1: Creating Viral Clips from Long-Form Content**
The first half of our pipeline takes a massive YouTube video and identifies the **golden moments** worth sharing.
<p align="center">
<img src="./resources/diagram1.png" width="700" alt="Architecture 1 - Clip Generation Pipeline">
<br>
<em>Scout β†’ Verifier β†’ Director β†’ Hands: From YouTube URL to raw compiled clip</em>
</p>
### **Gradio App - Clip Generation Interface**
<p align="center">
<img src="./resources/gradio-clip.gif" width="700" alt="Gradio Clip Generation Demo">
<br>
<em>The Gradio interface for generating clips from YouTube URLs</em>
</p>
---
## πŸ—οΈ **Architecture 2: Production Polish & Refinement**
The second half takes that raw clip and transforms it into **viral-ready vertical content**.
<p align="center">
<img src="./resources/diagram2.png" width="700" alt="Architecture 2 - Production Polish Pipeline">
<br>
<em>Showrunner: Smart Crop β†’ Intro β†’ Subtitles β†’ Final Assembly</em>
</p>
### **Gradio App - Production Studio Interface**
<p align="center">
<img src="./resources/gradio2-productionstudio.gif" width="700" alt="Gradio Production Studio Demo">
<br>
<em>The Production Studio for adding polish to your clips</em>
</p>
---
## 🀝 **Partner Technologies**
### **πŸš€ Modal β€” The Backend Powerhouse**
Modal isn't just part of our stackβ€”it **IS** our stack. Without Modal, this project would've been impossible.
**Why Modal Changed Everything:**
When you're processing videos, you need:
- 50-500MB file uploads that don't timeout
- FFmpeg with all codecs pre-installed
- GPU compute for Whisper transcription
- Parallel processing without infrastructure management
- Pay-per-use so you don't burn money on idle servers
**Modal delivered ALL of this out of the box.**
```python
@app.function(
image=base_image, # FFmpeg, ImageMagick, fonts pre-installed
volumes={STORAGE_PATH: storage_volume}, # Instant file transfers
timeout=3600,
memory=32768, # 32GB RAM for video processing
cpu=8.0
)
@modal.web_endpoint(method="POST")
def process_video(request: dict):
# This just works. No Docker. No K8s. No DevOps nightmares.
# Files transfer at lightning speed via Modal volumes.
# Scales to zero when idleβ€”we only pay when processing.
```
**The Impact:**
| Before Modal | With Modal |
|--------------|------------|
| 45min upload times | **< 30s** file transfers |
| Docker dependency hell | **Zero config** FFmpeg |
| $200/month idle servers | **Pay only when processing** |
| Manual scaling | **Auto-scales to demand** |
**Huge thanks to Modal for the generous credits that made this possible.** We pushed their infrastructure HARD and it never flinched.
<p align="center">
<img src="./resources/modal-arch.png" width="700" alt="Modal Backend Architecture">
<br>
<em>How Modal powers the entire Director's Cut backend</em>
</p>
---
### **🎨 Nebius AI Studio β€” Qwen VL + FLUX**
Nebius provides lightning-fast inference for two critical features:
#### **Qwen 2.5-VL-72B β€” Intelligent Subject Tracking**
This is **NOT center crop**. We built genuine AI-powered smart cropping:
```python
# For each key frame, Qwen VL detects the main subject position
qwen_response = requests.post(
"https://api.studio.nebius.ai/v1/chat/completions",
headers={"Authorization": f"Bearer {nebius_key}"},
json={
"model": "Qwen/Qwen2.5-VL-72B-Instruct",
"messages": [{
"role": "user",
"content": [
{"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{frame}"}},
{"type": "text", "text": "Find the horizontal position of the main subject. Return decimal 0.0-1.0"}
]
}]
}
)
# Result: 92% subject retention vs 40% with dumb center crop
```
<p align="center">
<img src="./resources/smartcrop.gif" width="600" alt="Smart Crop Demo">
<br>
<em>Qwen VL tracking subjects for intelligent 9:16 framing</em>
</p>
#### **FLUX β€” Custom Intro Image Generation**
Every video gets a **unique AI-generated intro card** that matches its mood:
```python
response = requests.post(
"https://api.studio.nebius.ai/v1/images/generations",
headers={"Authorization": f"Bearer {nebius_key}"},
json={
"model": "black-forest-labs/flux-schnell",
"prompt": f"High-energy social media intro, vertical 9:16, "
f"bold typography '{title}', vibrant neon gradients",
"width": 1080,
"height": 1920,
"num_inference_steps": 4
}
)
# Generates in < 5 seconds on Nebius
```
**Mood-Matched Styles:**
- πŸ”₯ **Hype** β†’ Neon gradients, bold typography, TikTok energy
- 🎬 **Suspense** β†’ Cinematic noir, dramatic shadows
- 🌿 **Chill** β†’ Soft pastels, minimal aesthetic
<p align="center">
<img src="./resources/intro-example.png" width="600" alt="FLUX Intro Generation">
<br>
<em>FLUX generating mood-matched intro cards via Nebius</em>
</p>
---
### **πŸŽ™οΈ ElevenLabs β€” Professional Voiceover**
Every video gets a **content-aware AI voiceover** for the intro:
```python
from elevenlabs import ElevenLabs
client = ElevenLabs(api_key=elevenlabs_key)
# Gemini writes a hook based on actual video content
intro_script = "Joe Rogan just dropped some insane knowledge about AI. " \
"This take is gonna blow your mind, check it out..."
audio = client.text_to_speech.convert(
text=intro_script,
voice_id="21m00Tcm4TlvDq8ikWAM", # Rachel - engaging, professional
model_id="eleven_turbo_v2_5"
)
```
**What Makes It Special:**
- 🧠 Scripts reference actual video content (not generic templates)
- 🎭 Voice selection adapts to video mood
- ⚑ Sub-2s generation time
<p align="center">
<img src="./resources/intro-example.png" width="600" alt="ElevenLabs Voiceover Demo">
<br>
<em>ElevenLabs generating professional voiceover intros</em>
</p>
---
## πŸ”Œ **MCP Server Integration**
### **πŸ€– ChatGPT Integration (GPT Apps SDK)**
We built a **ChatGPT App** using the GPT Apps SDK that turns ChatGPT into your personal video production assistant.
**How It Works:**
1. Open ChatGPT
2. Find "Director's Cut" in Apps
3. Upload your 15-50 second clip
4. Tell ChatGPT what you want: *"Add subtitles and a hype intro"*
5. Download your polished video
**This is insanely cool** because ChatGPT becomes a conversational video editor. No UI to learn, no buttons to clickβ€”just describe what you want.
```
User: "Take this clip and make it TikTok ready"
↓
ChatGPT: Understands intent, calls Director's Cut MCP tools
↓
MCP Server: Processes video (smart crop, subtitles, music)
↓
ChatGPT: "Here's your viral-ready video! 🎬"
```
<p align="center">
<img src="./resources/gifgpt1.gif" width="600" alt="ChatGPT App Demo">
<br>
<em>ChatGPT as your personal video production assistant</em>
</p>
---
### **πŸ–₯️ Claude Desktop MCP Server**
For the full autonomous pipeline, connect Claude Desktop to our MCP server.
#### **Option 1: Run Locally (Recommended)**
> **Why Local?** Modal cloud processing requires credits that aren't available to everyone. Running locally gives you full control and works with just API keys.
**Step 1: Clone the repository**
```bash
git clone https://github.com/tayyab415/directors-cut.git
cd directors-cut
```
**Step 2: Install dependencies**
```bash
pip install -r requirements.txt
```
**Step 3: Set up environment variables**
Create a `.env` file:
```env
# Required API Keys
GEMINI_API_KEY=your_gemini_key
NEBIUS_API_KEY=your_nebius_key
ELEVENLABS_API_KEY=your_elevenlabs_key
# Optional: Modal (for cloud processing - requires Modal credits)
MODAL_TOKEN_ID=your_modal_token_id
MODAL_TOKEN_SECRET=your_modal_token_secret
```
**Step 4: Run the MCP server**
```bash
python app.py
```
**Step 5: Configure Claude Desktop**
Add to `claude_desktop_config.json`:
```json
{
"mcpServers": {
"directors-cut": {
"type": "sse",
"url": "http://localhost:7860/gradio_api/mcp/sse"
}
}
}
```
**Step 6: Restart Claude Desktop and start creating!**
Then just ask Claude:
> *"Process this YouTube video into a viral TikTok: https://youtube.com/watch?v=..."*
---
#### **Option 2: Use the Hosted Server**
If you have Modal credits or just want to try the hosted version:
Add to your `claude_desktop_config.json`:
```json
{
"mcpServers": {
"directors-cut": {
"type": "sse",
"url": "https://tyb343-directors-cut.hf.space/gradio_api/mcp/sse"
}
}
}
```
#### **MCP Tools Available:**
| Tool | Description |
|------|-------------|
| `process_video` | Full pipeline: YouTube URL β†’ Viral video |
| `step1_analyze_video_mcp` | Analyze and classify video content |
| `step2_scout_hotspots_mcp` | Find viral-worthy moments |
| `step3_verify_hotspots_mcp` | Vision AI verification |
| `step4_create_plan_mcp` | Generate edit plan |
| `render_and_produce_mcp` | Render + production polish |
| `smart_crop_video` | Standalone 9:16 smart crop |
| `add_production_value` | Add intro, subtitles, music |
<p align="center">
<img src="./resources/claudegif.gif" width="600" alt="Claude Desktop MCP Demo">
<br>
<em>Claude Desktop orchestrating the full Director's Cut pipeline</em>
</p>
---
## ⚑ **Performance**
| Metric | Value |
|--------|-------|
| **Processing Time** | 3-5 min for 10-min video |
| **Smart Crop Accuracy** | 92% subject retention |
| **Subtitle Accuracy** | 95%+ (Whisper large-v3) |
| **Cost Per Video** | ~$0.15 |
| **Human Editor Equivalent** | $30-50/hour saved |
---
## πŸ› οΈ **Tech Stack**
| Component | Technology | Purpose |
|-----------|------------|---------|
| **MCP Server** | Gradio 5.x | Claude/ChatGPT integration |
| **Backend Compute** | Modal Labs | Video processing at scale |
| **Video Analysis** | Gemini 2.0 Flash | Hotspot detection, planning |
| **Smart Crop** | Qwen VL (Nebius) | Subject tracking |
| **Intro Images** | FLUX (Nebius) | Custom title cards |
| **Voiceover** | ElevenLabs | Professional narration |
| **Subtitles** | WhisperX | Word-level captions |
| **Video Processing** | FFmpeg + MoviePy | Rendering |
---
## πŸ“ **Project Structure**
```
directors-cut/
β”œβ”€β”€ app.py # Main Gradio app + MCP tools
β”œβ”€β”€ modal_simple.py # Modal backend endpoints
β”œβ”€β”€ src/
β”‚ β”œβ”€β”€ scout.py # Hotspot detection agent
β”‚ β”œβ”€β”€ verifier.py # Vision-based verification agent
β”‚ β”œβ”€β”€ director.py # Edit plan generation agent
β”‚ β”œβ”€β”€ hands.py # FFmpeg execution agent
β”‚ β”œβ”€β”€ showrunner.py # Production polish agent
β”‚ β”œβ”€β”€ server.py # Standalone MCP server
β”‚ └── paths.py # File management
β”œβ”€β”€ assets/music/ # Mood-matched background tracks
β”‚ β”œβ”€β”€ hype/
β”‚ β”œβ”€β”€ chill/
β”‚ └── suspense/
β”œβ”€β”€ requirements.txt
└── README.md
```
---
## πŸŽ“ **What We Learned**
### **Agent Coordination is Harder Than It Looks**
Early versions had agents stepping on each other. Solution: Clear responsibility boundaries + Verifier as quality gate.
### **Smart Crop is a Game-Changer**
Center crop loses 60% of content. Using Qwen VL for actual subject trackingβ€”the difference is night and day.
### **Modal is Insanely Good**
We tried local FFmpeg first. Disaster. Modal's pre-configured containers + instant volumes saved 40+ hours of DevOps.
### **MCP Makes AI Actually Useful**
Without MCP, this is "another AI tool." With MCP, Claude/ChatGPT become genuine creative assistants.
---
## πŸ‘₯ **Team One_Horizon**
- **Tayyab Khan** ([tyb343](https://huggingface.co/tyb343)) β€” Full-stack Development, Multi-agent Architecture, MCP Integration
- **Sahil Tanna** ([sahiltanna7](https://huggingface.co/sahiltanna7)) β€” Development & Testing, Prompt Engineering
- **Nikunj** ([nikunj30](https://huggingface.co/nikunj30)) β€” Development & Testing, MCP Integration
---
## ⚠️ **Disclaimer & Responsible Use**
**Important Notice on Copyright and Intended Use:**
Director's Cut is designed to help **content creators repurpose their own content** for different platforms. The intended use cases are:
βœ… **Legitimate Uses:**
- Creators repurposing their own YouTube content for TikTok/Reels/Shorts
- Businesses creating short-form content from their long-form material
- Educational content being reformatted for different audiences
- Personal projects and creative experimentation
❌ **This tool should NOT be used for:**
- Downloading and repurposing content you don't own
- Creating content that infringes on others' copyrights
- Removing watermarks or attribution from original creators
- Monetizing content without proper rights or licensing
**By using Director's Cut, you agree to:**
1. Only process content you have rights to use
2. Respect copyright laws in your jurisdiction
3. Properly attribute original creators when required
4. Not use this tool for deceptive or harmful purposes
**We are not responsible for misuse of this tool.** The technology is built to empower creators, not to enable copyright infringement. Please use responsibly.
---
## πŸ“œ **License**
MIT License - Build cool stuff with this, but build it ethically!
---
## πŸ™ **Acknowledgments**
Massive thanks to:
- **Modal** β€” For infrastructure that actually works and generous hackathon credits
- **Nebius** β€” For blazing-fast Qwen VL and FLUX inference
- **ElevenLabs** β€” For voices that sound genuinely human
- **Google Gemini** β€” For the multimodal reasoning powering our agents
- **Anthropic & Gradio** β€” For MCP and hosting this incredible hackathon
---
<p align="center">
<b>Built with ❀️ for content creators who refuse to let great content die in landscape format.</b>
</p>
<p align="center">
<a href="https://huggingface.co/spaces/tyb343/directors-cut">πŸš€ Try Director's Cut Now</a>
</p>