Spaces:
Running
Running
| title: Director's Cut | |
| emoji: π¬ | |
| colorFrom: indigo | |
| colorTo: purple | |
| sdk: gradio | |
| sdk_version: 6.0.1 | |
| app_file: app.py | |
| pinned: false | |
| tags: | |
| - mcp-server-track | |
| - building-mcp-creative | |
| - gradio | |
| - modal | |
| - elevenlabs | |
| - gemini | |
| - nebius | |
| - openai | |
| - chatgpt-app | |
| license: mit | |
| short_description: AI Video Editor - YouTube to Viral Shorts | |
| # π¬ **Director's Cut** | |
| ### *The autonomous multi-agent system that transforms any YouTube video into viral vertical content. Zero editing skills. Five AI agents. One click.* | |
| <p align="center"> | |
| <a href="https://huggingface.co/spaces/tyb343/directors-cut"><img src="https://img.shields.io/badge/π_Live_Demo-HuggingFace-yellow?style=for-the-badge" alt="Live Demo"></a> | |
| <a href="https://www.youtube.com/watch?v=h8U5oW4UIVQ"><img src="https://img.shields.io/badge/πΉ_Demo_Video-YouTube-red?style=for-the-badge" alt="Demo Video"></a> | |
| <a href="https://www.linkedin.com/posts/tayyab-khan-159153282_mcphackathon-mcp1stbirthday-anthropic-share-7400949504557957120-pE0i?utm_source=share&utm_medium=member_desktop&rcm=ACoAAES4lhUBjyG38IZnp2meH1RiGNnldSYW8qY"><img src="https://img.shields.io/badge/πΌ_Social-LinkedIn-blue?style=for-the-badge" alt="Social Post"></a> | |
| </p> | |
| <p align="center"> | |
| <img src="https://img.shields.io/badge/MCP-Server-green?style=flat-square" alt="MCP Server"> | |
| <img src="https://img.shields.io/badge/Modal-Powered-purple?style=flat-square" alt="Modal"> | |
| <img src="https://img.shields.io/badge/Gemini_2.0-Flash-blue?style=flat-square" alt="Gemini"> | |
| <img src="https://img.shields.io/badge/FLUX-Nebius-orange?style=flat-square" alt="FLUX"> | |
| <img src="https://img.shields.io/badge/ElevenLabs-Voice-pink?style=flat-square" alt="ElevenLabs"> | |
| <img src="https://img.shields.io/badge/Qwen_VL-Smart_Crop-cyan?style=flat-square" alt="Qwen VL"> | |
| </p> | |
| --- | |
| ## π€ **Why We Need This** | |
| Content creators are **drowning**. They have hours of amazing landscape YouTube content sitting thereβcompletely **worthless** on TikTok, Instagram Reels, and YouTube Shorts. | |
| The current "solutions" are a joke: | |
| - β **Center crop** = butchers your content, cuts off 60% of what matters | |
| - β **Manual editing** = 2-3 hours per video, soul-crushing repetitive work | |
| - β **Hiring editors** = $30-50/hour, burns a hole in your pocket | |
| - β **"AI" tools** = glorified filters, no actual intelligence | |
| **The vertical video revolution is here, and creators are being left behind.** | |
| Every day, millions of hours of incredible content stay trapped in 16:9 format while the algorithm rewards 9:16. Something had to change. | |
| --- | |
| ## π **What We Created** | |
| **Director's Cut** is an **autonomous multi-agent AI system** that doesn't just crop your videoβit *thinks* about your video. | |
| We built a 5-agent pipeline that: | |
| - π **Analyzes** your entire video for viral-worthy moments | |
| - β **Verifies** clip quality using vision AI | |
| - π¬ **Plans** the perfect edit with pacing and transitions | |
| - ποΈ **Executes** with FFmpeg precision | |
| - π **Polishes** with intros, smart crop, subtitles, and music | |
| **One YouTube URL β Production-ready vertical content in 3-5 minutes.** | |
| --- | |
| ## πΊ **Demo Video** | |
| <p align="center"> | |
| <a href="https://www.youtube.com/watch?v=h8U5oW4UIVQ"> | |
| <img src="https://img.youtube.com/vi/h8U5oW4UIVQ/hqdefault.jpg" width="600" alt="Demo Video Thumbnail"/> | |
| </a> | |
| </p> | |
| <p align="center"> | |
| <strong>βΆοΈ <a href="https://www.youtube.com/watch?v=h8U5oW4UIVQ">Click to Watch Full Demo on YouTube</a></strong> | |
| </p> | |
| --- | |
| ## πΌ **Social Media Post** | |
| <p align="center"> | |
| <a href="https://www.linkedin.com/posts/tayyab-khan-159153282_mcphackathon-mcp1stbirthday-anthropic-share-7400949504557957120-pE0i?utm_source=share&utm_medium=member_desktop&rcm=ACoAAES4lhUBjyG38IZnp2meH1RiGNnldSYW8qY"> | |
| <img src="./resources/linkedin-post.jpg" width="600" alt="LinkedIn Post"/> | |
| </a> | |
| </p> | |
| <p align="center"> | |
| <strong>π <a href="https://www.linkedin.com/posts/tayyab-khan-159153282_mcphackathon-mcp1stbirthday-anthropic-share-7400949504557957120-pE0i?utm_source=share&utm_medium=member_desktop&rcm=ACoAAES4lhUBjyG38IZnp2meH1RiGNnldSYW8qY">Read the Full Post on LinkedIn</a></strong> | |
| </p> | |
| --- | |
| ## ποΈ **Architecture 1: Creating Viral Clips from Long-Form Content** | |
| The first half of our pipeline takes a massive YouTube video and identifies the **golden moments** worth sharing. | |
| <p align="center"> | |
| <img src="./resources/diagram1.png" width="700" alt="Architecture 1 - Clip Generation Pipeline"> | |
| <br> | |
| <em>Scout β Verifier β Director β Hands: From YouTube URL to raw compiled clip</em> | |
| </p> | |
| ### **Gradio App - Clip Generation Interface** | |
| <p align="center"> | |
| <img src="./resources/gradio-clip.gif" width="700" alt="Gradio Clip Generation Demo"> | |
| <br> | |
| <em>The Gradio interface for generating clips from YouTube URLs</em> | |
| </p> | |
| --- | |
| ## ποΈ **Architecture 2: Production Polish & Refinement** | |
| The second half takes that raw clip and transforms it into **viral-ready vertical content**. | |
| <p align="center"> | |
| <img src="./resources/diagram2.png" width="700" alt="Architecture 2 - Production Polish Pipeline"> | |
| <br> | |
| <em>Showrunner: Smart Crop β Intro β Subtitles β Final Assembly</em> | |
| </p> | |
| ### **Gradio App - Production Studio Interface** | |
| <p align="center"> | |
| <img src="./resources/gradio2-productionstudio.gif" width="700" alt="Gradio Production Studio Demo"> | |
| <br> | |
| <em>The Production Studio for adding polish to your clips</em> | |
| </p> | |
| --- | |
| ## π€ **Partner Technologies** | |
| ### **π Modal β The Backend Powerhouse** | |
| Modal isn't just part of our stackβit **IS** our stack. Without Modal, this project would've been impossible. | |
| **Why Modal Changed Everything:** | |
| When you're processing videos, you need: | |
| - 50-500MB file uploads that don't timeout | |
| - FFmpeg with all codecs pre-installed | |
| - GPU compute for Whisper transcription | |
| - Parallel processing without infrastructure management | |
| - Pay-per-use so you don't burn money on idle servers | |
| **Modal delivered ALL of this out of the box.** | |
| ```python | |
| @app.function( | |
| image=base_image, # FFmpeg, ImageMagick, fonts pre-installed | |
| volumes={STORAGE_PATH: storage_volume}, # Instant file transfers | |
| timeout=3600, | |
| memory=32768, # 32GB RAM for video processing | |
| cpu=8.0 | |
| ) | |
| @modal.web_endpoint(method="POST") | |
| def process_video(request: dict): | |
| # This just works. No Docker. No K8s. No DevOps nightmares. | |
| # Files transfer at lightning speed via Modal volumes. | |
| # Scales to zero when idleβwe only pay when processing. | |
| ``` | |
| **The Impact:** | |
| | Before Modal | With Modal | | |
| |--------------|------------| | |
| | 45min upload times | **< 30s** file transfers | | |
| | Docker dependency hell | **Zero config** FFmpeg | | |
| | $200/month idle servers | **Pay only when processing** | | |
| | Manual scaling | **Auto-scales to demand** | | |
| **Huge thanks to Modal for the generous credits that made this possible.** We pushed their infrastructure HARD and it never flinched. | |
| <p align="center"> | |
| <img src="./resources/modal-arch.png" width="700" alt="Modal Backend Architecture"> | |
| <br> | |
| <em>How Modal powers the entire Director's Cut backend</em> | |
| </p> | |
| --- | |
| ### **π¨ Nebius AI Studio β Qwen VL + FLUX** | |
| Nebius provides lightning-fast inference for two critical features: | |
| #### **Qwen 2.5-VL-72B β Intelligent Subject Tracking** | |
| This is **NOT center crop**. We built genuine AI-powered smart cropping: | |
| ```python | |
| # For each key frame, Qwen VL detects the main subject position | |
| qwen_response = requests.post( | |
| "https://api.studio.nebius.ai/v1/chat/completions", | |
| headers={"Authorization": f"Bearer {nebius_key}"}, | |
| json={ | |
| "model": "Qwen/Qwen2.5-VL-72B-Instruct", | |
| "messages": [{ | |
| "role": "user", | |
| "content": [ | |
| {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{frame}"}}, | |
| {"type": "text", "text": "Find the horizontal position of the main subject. Return decimal 0.0-1.0"} | |
| ] | |
| }] | |
| } | |
| ) | |
| # Result: 92% subject retention vs 40% with dumb center crop | |
| ``` | |
| <p align="center"> | |
| <img src="./resources/smartcrop.gif" width="600" alt="Smart Crop Demo"> | |
| <br> | |
| <em>Qwen VL tracking subjects for intelligent 9:16 framing</em> | |
| </p> | |
| #### **FLUX β Custom Intro Image Generation** | |
| Every video gets a **unique AI-generated intro card** that matches its mood: | |
| ```python | |
| response = requests.post( | |
| "https://api.studio.nebius.ai/v1/images/generations", | |
| headers={"Authorization": f"Bearer {nebius_key}"}, | |
| json={ | |
| "model": "black-forest-labs/flux-schnell", | |
| "prompt": f"High-energy social media intro, vertical 9:16, " | |
| f"bold typography '{title}', vibrant neon gradients", | |
| "width": 1080, | |
| "height": 1920, | |
| "num_inference_steps": 4 | |
| } | |
| ) | |
| # Generates in < 5 seconds on Nebius | |
| ``` | |
| **Mood-Matched Styles:** | |
| - π₯ **Hype** β Neon gradients, bold typography, TikTok energy | |
| - π¬ **Suspense** β Cinematic noir, dramatic shadows | |
| - πΏ **Chill** β Soft pastels, minimal aesthetic | |
| <p align="center"> | |
| <img src="./resources/intro-example.png" width="600" alt="FLUX Intro Generation"> | |
| <br> | |
| <em>FLUX generating mood-matched intro cards via Nebius</em> | |
| </p> | |
| --- | |
| ### **ποΈ ElevenLabs β Professional Voiceover** | |
| Every video gets a **content-aware AI voiceover** for the intro: | |
| ```python | |
| from elevenlabs import ElevenLabs | |
| client = ElevenLabs(api_key=elevenlabs_key) | |
| # Gemini writes a hook based on actual video content | |
| intro_script = "Joe Rogan just dropped some insane knowledge about AI. " \ | |
| "This take is gonna blow your mind, check it out..." | |
| audio = client.text_to_speech.convert( | |
| text=intro_script, | |
| voice_id="21m00Tcm4TlvDq8ikWAM", # Rachel - engaging, professional | |
| model_id="eleven_turbo_v2_5" | |
| ) | |
| ``` | |
| **What Makes It Special:** | |
| - π§ Scripts reference actual video content (not generic templates) | |
| - π Voice selection adapts to video mood | |
| - β‘ Sub-2s generation time | |
| <p align="center"> | |
| <img src="./resources/intro-example.png" width="600" alt="ElevenLabs Voiceover Demo"> | |
| <br> | |
| <em>ElevenLabs generating professional voiceover intros</em> | |
| </p> | |
| --- | |
| ## π **MCP Server Integration** | |
| ### **π€ ChatGPT Integration (GPT Apps SDK)** | |
| We built a **ChatGPT App** using the GPT Apps SDK that turns ChatGPT into your personal video production assistant. | |
| **How It Works:** | |
| 1. Open ChatGPT | |
| 2. Find "Director's Cut" in Apps | |
| 3. Upload your 15-50 second clip | |
| 4. Tell ChatGPT what you want: *"Add subtitles and a hype intro"* | |
| 5. Download your polished video | |
| **This is insanely cool** because ChatGPT becomes a conversational video editor. No UI to learn, no buttons to clickβjust describe what you want. | |
| ``` | |
| User: "Take this clip and make it TikTok ready" | |
| β | |
| ChatGPT: Understands intent, calls Director's Cut MCP tools | |
| β | |
| MCP Server: Processes video (smart crop, subtitles, music) | |
| β | |
| ChatGPT: "Here's your viral-ready video! π¬" | |
| ``` | |
| <p align="center"> | |
| <img src="./resources/gifgpt1.gif" width="600" alt="ChatGPT App Demo"> | |
| <br> | |
| <em>ChatGPT as your personal video production assistant</em> | |
| </p> | |
| --- | |
| ### **π₯οΈ Claude Desktop MCP Server** | |
| For the full autonomous pipeline, connect Claude Desktop to our MCP server. | |
| #### **Option 1: Run Locally (Recommended)** | |
| > **Why Local?** Modal cloud processing requires credits that aren't available to everyone. Running locally gives you full control and works with just API keys. | |
| **Step 1: Clone the repository** | |
| ```bash | |
| git clone https://github.com/tayyab415/directors-cut.git | |
| cd directors-cut | |
| ``` | |
| **Step 2: Install dependencies** | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| **Step 3: Set up environment variables** | |
| Create a `.env` file: | |
| ```env | |
| # Required API Keys | |
| GEMINI_API_KEY=your_gemini_key | |
| NEBIUS_API_KEY=your_nebius_key | |
| ELEVENLABS_API_KEY=your_elevenlabs_key | |
| # Optional: Modal (for cloud processing - requires Modal credits) | |
| MODAL_TOKEN_ID=your_modal_token_id | |
| MODAL_TOKEN_SECRET=your_modal_token_secret | |
| ``` | |
| **Step 4: Run the MCP server** | |
| ```bash | |
| python app.py | |
| ``` | |
| **Step 5: Configure Claude Desktop** | |
| Add to `claude_desktop_config.json`: | |
| ```json | |
| { | |
| "mcpServers": { | |
| "directors-cut": { | |
| "type": "sse", | |
| "url": "http://localhost:7860/gradio_api/mcp/sse" | |
| } | |
| } | |
| } | |
| ``` | |
| **Step 6: Restart Claude Desktop and start creating!** | |
| Then just ask Claude: | |
| > *"Process this YouTube video into a viral TikTok: https://youtube.com/watch?v=..."* | |
| --- | |
| #### **Option 2: Use the Hosted Server** | |
| If you have Modal credits or just want to try the hosted version: | |
| Add to your `claude_desktop_config.json`: | |
| ```json | |
| { | |
| "mcpServers": { | |
| "directors-cut": { | |
| "type": "sse", | |
| "url": "https://tyb343-directors-cut.hf.space/gradio_api/mcp/sse" | |
| } | |
| } | |
| } | |
| ``` | |
| #### **MCP Tools Available:** | |
| | Tool | Description | | |
| |------|-------------| | |
| | `process_video` | Full pipeline: YouTube URL β Viral video | | |
| | `step1_analyze_video_mcp` | Analyze and classify video content | | |
| | `step2_scout_hotspots_mcp` | Find viral-worthy moments | | |
| | `step3_verify_hotspots_mcp` | Vision AI verification | | |
| | `step4_create_plan_mcp` | Generate edit plan | | |
| | `render_and_produce_mcp` | Render + production polish | | |
| | `smart_crop_video` | Standalone 9:16 smart crop | | |
| | `add_production_value` | Add intro, subtitles, music | | |
| <p align="center"> | |
| <img src="./resources/claudegif.gif" width="600" alt="Claude Desktop MCP Demo"> | |
| <br> | |
| <em>Claude Desktop orchestrating the full Director's Cut pipeline</em> | |
| </p> | |
| --- | |
| ## β‘ **Performance** | |
| | Metric | Value | | |
| |--------|-------| | |
| | **Processing Time** | 3-5 min for 10-min video | | |
| | **Smart Crop Accuracy** | 92% subject retention | | |
| | **Subtitle Accuracy** | 95%+ (Whisper large-v3) | | |
| | **Cost Per Video** | ~$0.15 | | |
| | **Human Editor Equivalent** | $30-50/hour saved | | |
| --- | |
| ## π οΈ **Tech Stack** | |
| | Component | Technology | Purpose | | |
| |-----------|------------|---------| | |
| | **MCP Server** | Gradio 5.x | Claude/ChatGPT integration | | |
| | **Backend Compute** | Modal Labs | Video processing at scale | | |
| | **Video Analysis** | Gemini 2.0 Flash | Hotspot detection, planning | | |
| | **Smart Crop** | Qwen VL (Nebius) | Subject tracking | | |
| | **Intro Images** | FLUX (Nebius) | Custom title cards | | |
| | **Voiceover** | ElevenLabs | Professional narration | | |
| | **Subtitles** | WhisperX | Word-level captions | | |
| | **Video Processing** | FFmpeg + MoviePy | Rendering | | |
| --- | |
| ## π **Project Structure** | |
| ``` | |
| directors-cut/ | |
| βββ app.py # Main Gradio app + MCP tools | |
| βββ modal_simple.py # Modal backend endpoints | |
| βββ src/ | |
| β βββ scout.py # Hotspot detection agent | |
| β βββ verifier.py # Vision-based verification agent | |
| β βββ director.py # Edit plan generation agent | |
| β βββ hands.py # FFmpeg execution agent | |
| β βββ showrunner.py # Production polish agent | |
| β βββ server.py # Standalone MCP server | |
| β βββ paths.py # File management | |
| βββ assets/music/ # Mood-matched background tracks | |
| β βββ hype/ | |
| β βββ chill/ | |
| β βββ suspense/ | |
| βββ requirements.txt | |
| βββ README.md | |
| ``` | |
| --- | |
| ## π **What We Learned** | |
| ### **Agent Coordination is Harder Than It Looks** | |
| Early versions had agents stepping on each other. Solution: Clear responsibility boundaries + Verifier as quality gate. | |
| ### **Smart Crop is a Game-Changer** | |
| Center crop loses 60% of content. Using Qwen VL for actual subject trackingβthe difference is night and day. | |
| ### **Modal is Insanely Good** | |
| We tried local FFmpeg first. Disaster. Modal's pre-configured containers + instant volumes saved 40+ hours of DevOps. | |
| ### **MCP Makes AI Actually Useful** | |
| Without MCP, this is "another AI tool." With MCP, Claude/ChatGPT become genuine creative assistants. | |
| --- | |
| ## π₯ **Team One_Horizon** | |
| - **Tayyab Khan** ([tyb343](https://huggingface.co/tyb343)) β Full-stack Development, Multi-agent Architecture, MCP Integration | |
| - **Sahil Tanna** ([sahiltanna7](https://huggingface.co/sahiltanna7)) β Development & Testing, Prompt Engineering | |
| - **Nikunj** ([nikunj30](https://huggingface.co/nikunj30)) β Development & Testing, MCP Integration | |
| --- | |
| ## β οΈ **Disclaimer & Responsible Use** | |
| **Important Notice on Copyright and Intended Use:** | |
| Director's Cut is designed to help **content creators repurpose their own content** for different platforms. The intended use cases are: | |
| β **Legitimate Uses:** | |
| - Creators repurposing their own YouTube content for TikTok/Reels/Shorts | |
| - Businesses creating short-form content from their long-form material | |
| - Educational content being reformatted for different audiences | |
| - Personal projects and creative experimentation | |
| β **This tool should NOT be used for:** | |
| - Downloading and repurposing content you don't own | |
| - Creating content that infringes on others' copyrights | |
| - Removing watermarks or attribution from original creators | |
| - Monetizing content without proper rights or licensing | |
| **By using Director's Cut, you agree to:** | |
| 1. Only process content you have rights to use | |
| 2. Respect copyright laws in your jurisdiction | |
| 3. Properly attribute original creators when required | |
| 4. Not use this tool for deceptive or harmful purposes | |
| **We are not responsible for misuse of this tool.** The technology is built to empower creators, not to enable copyright infringement. Please use responsibly. | |
| --- | |
| ## π **License** | |
| MIT License - Build cool stuff with this, but build it ethically! | |
| --- | |
| ## π **Acknowledgments** | |
| Massive thanks to: | |
| - **Modal** β For infrastructure that actually works and generous hackathon credits | |
| - **Nebius** β For blazing-fast Qwen VL and FLUX inference | |
| - **ElevenLabs** β For voices that sound genuinely human | |
| - **Google Gemini** β For the multimodal reasoning powering our agents | |
| - **Anthropic & Gradio** β For MCP and hosting this incredible hackathon | |
| --- | |
| <p align="center"> | |
| <b>Built with β€οΈ for content creators who refuse to let great content die in landscape format.</b> | |
| </p> | |
| <p align="center"> | |
| <a href="https://huggingface.co/spaces/tyb343/directors-cut">π Try Director's Cut Now</a> | |
| </p> |