Spaces:

tyb343
/

directors-cut

Running

App Files Files Community

directors-cut / README.md

tayyab415

Update Team One_Horizon + Add README as first tab

27aa36e about 1 month ago

preview code

raw

history blame contribute delete

18.2 kB

	---
	title: Director's Cut
	emoji: 🎬
	colorFrom: indigo
	colorTo: purple
	sdk: gradio
	sdk_version: 6.0.1
	app_file: app.py
	pinned: false
	tags:
	- mcp-server-track
	- building-mcp-creative
	- gradio
	- modal
	- elevenlabs
	- gemini
	- nebius
	- openai
	- chatgpt-app
	license: mit
	short_description: AI Video Editor - YouTube to Viral Shorts
	---

	# 🎬 Director's Cut

	### The autonomous multi-agent system that transforms any YouTube video into viral vertical content. Zero editing skills. Five AI agents. One click.

	<p align="center">
	<a href="https://huggingface.co/spaces/tyb343/directors-cut"><img src="https://img.shields.io/badge/🚀_Live_Demo-HuggingFace-yellow?style=for-the-badge" alt="Live Demo"></a>
	<a href="https://www.youtube.com/watch?v=h8U5oW4UIVQ"><img src="https://img.shields.io/badge/📹_Demo_Video-YouTube-red?style=for-the-badge" alt="Demo Video"></a>
	<a href="https://www.linkedin.com/posts/tayyab-khan-159153282_mcphackathon-mcp1stbirthday-anthropic-share-7400949504557957120-pE0i?utm_source=share&utm_medium=member_desktop&rcm=ACoAAES4lhUBjyG38IZnp2meH1RiGNnldSYW8qY"><img src="https://img.shields.io/badge/💼_Social-LinkedIn-blue?style=for-the-badge" alt="Social Post"></a>
	</p>

	<p align="center">
	<img src="https://img.shields.io/badge/MCP-Server-green?style=flat-square" alt="MCP Server">
	<img src="https://img.shields.io/badge/Modal-Powered-purple?style=flat-square" alt="Modal">
	<img src="https://img.shields.io/badge/Gemini_2.0-Flash-blue?style=flat-square" alt="Gemini">
	<img src="https://img.shields.io/badge/FLUX-Nebius-orange?style=flat-square" alt="FLUX">
	<img src="https://img.shields.io/badge/ElevenLabs-Voice-pink?style=flat-square" alt="ElevenLabs">
	<img src="https://img.shields.io/badge/Qwen_VL-Smart_Crop-cyan?style=flat-square" alt="Qwen VL">
	</p>

	---

	## 😤 Why We Need This

	Content creators are drowning. They have hours of amazing landscape YouTube content sitting there—completely worthless on TikTok, Instagram Reels, and YouTube Shorts.

	The current "solutions" are a joke:
	- ❌ Center crop = butchers your content, cuts off 60% of what matters
	- ❌ Manual editing = 2-3 hours per video, soul-crushing repetitive work
	- ❌ Hiring editors = $30-50/hour, burns a hole in your pocket
	- ❌ "AI" tools = glorified filters, no actual intelligence

	The vertical video revolution is here, and creators are being left behind.

	Every day, millions of hours of incredible content stay trapped in 16:9 format while the algorithm rewards 9:16. Something had to change.

	---

	## 🚀 What We Created

	Director's Cut is an autonomous multi-agent AI system that doesn't just crop your video—it thinks about your video.

	We built a 5-agent pipeline that:
	- 🔍 Analyzes your entire video for viral-worthy moments
	- ✅ Verifies clip quality using vision AI
	- 🎬 Plans the perfect edit with pacing and transitions
	- 🖐️ Executes with FFmpeg precision
	- 🎭 Polishes with intros, smart crop, subtitles, and music

	One YouTube URL → Production-ready vertical content in 3-5 minutes.

	---

	## 📺 Demo Video

	<p align="center">
	<a href="https://www.youtube.com/watch?v=h8U5oW4UIVQ">
	<img src="https://img.youtube.com/vi/h8U5oW4UIVQ/hqdefault.jpg" width="600" alt="Demo Video Thumbnail"/>
	</a>
	</p>

	<p align="center">
	<strong>▶️ <a href="https://www.youtube.com/watch?v=h8U5oW4UIVQ">Click to Watch Full Demo on YouTube</a></strong>
	</p>

	---

	## 💼 Social Media Post

	<p align="center">
	<a href="https://www.linkedin.com/posts/tayyab-khan-159153282_mcphackathon-mcp1stbirthday-anthropic-share-7400949504557957120-pE0i?utm_source=share&utm_medium=member_desktop&rcm=ACoAAES4lhUBjyG38IZnp2meH1RiGNnldSYW8qY">
	<img src="./resources/linkedin-post.jpg" width="600" alt="LinkedIn Post"/>
	</a>
	</p>

	<p align="center">
	<strong>📄 <a href="https://www.linkedin.com/posts/tayyab-khan-159153282_mcphackathon-mcp1stbirthday-anthropic-share-7400949504557957120-pE0i?utm_source=share&utm_medium=member_desktop&rcm=ACoAAES4lhUBjyG38IZnp2meH1RiGNnldSYW8qY">Read the Full Post on LinkedIn</a></strong>
	</p>

	---

	## 🏗️ Architecture 1: Creating Viral Clips from Long-Form Content

	The first half of our pipeline takes a massive YouTube video and identifies the golden moments worth sharing.

	<p align="center">
	<img src="./resources/diagram1.png" width="700" alt="Architecture 1 - Clip Generation Pipeline">
	<br>
	<em>Scout → Verifier → Director → Hands: From YouTube URL to raw compiled clip</em>
	</p>

	### Gradio App - Clip Generation Interface

	<p align="center">
	<img src="./resources/gradio-clip.gif" width="700" alt="Gradio Clip Generation Demo">
	<br>
	<em>The Gradio interface for generating clips from YouTube URLs</em>
	</p>

	---

	## 🏗️ Architecture 2: Production Polish & Refinement

	The second half takes that raw clip and transforms it into viral-ready vertical content.

	<p align="center">
	<img src="./resources/diagram2.png" width="700" alt="Architecture 2 - Production Polish Pipeline">
	<br>
	<em>Showrunner: Smart Crop → Intro → Subtitles → Final Assembly</em>
	</p>

	### Gradio App - Production Studio Interface

	<p align="center">
	<img src="./resources/gradio2-productionstudio.gif" width="700" alt="Gradio Production Studio Demo">
	<br>
	<em>The Production Studio for adding polish to your clips</em>
	</p>

	---

	## 🤝 Partner Technologies

	### 🚀 Modal — The Backend Powerhouse

	Modal isn't just part of our stack—it IS our stack. Without Modal, this project would've been impossible.

	Why Modal Changed Everything:

	When you're processing videos, you need:
	- 50-500MB file uploads that don't timeout
	- FFmpeg with all codecs pre-installed
	- GPU compute for Whisper transcription
	- Parallel processing without infrastructure management
	- Pay-per-use so you don't burn money on idle servers

	Modal delivered ALL of this out of the box.

	```python
	@app.function(
	image=base_image, # FFmpeg, ImageMagick, fonts pre-installed
	volumes={STORAGE_PATH: storage_volume}, # Instant file transfers
	timeout=3600,
	memory=32768, # 32GB RAM for video processing
	cpu=8.0
	)
	@modal.web_endpoint(method="POST")
	def process_video(request: dict):
	# This just works. No Docker. No K8s. No DevOps nightmares.
	# Files transfer at lightning speed via Modal volumes.
	# Scales to zero when idle—we only pay when processing.
	```

	The Impact:
	\| Before Modal \| With Modal \|
	\|--------------\|------------\|
	\| 45min upload times \| < 30s file transfers \|
	\| Docker dependency hell \| Zero config FFmpeg \|
	\| $200/month idle servers \| Pay only when processing \|
	\| Manual scaling \| Auto-scales to demand \|

	Huge thanks to Modal for the generous credits that made this possible. We pushed their infrastructure HARD and it never flinched.

	<p align="center">
	<img src="./resources/modal-arch.png" width="700" alt="Modal Backend Architecture">
	<br>
	<em>How Modal powers the entire Director's Cut backend</em>
	</p>

	---

	### 🎨 Nebius AI Studio — Qwen VL + FLUX

	Nebius provides lightning-fast inference for two critical features:

	#### Qwen 2.5-VL-72B — Intelligent Subject Tracking

	This is NOT center crop. We built genuine AI-powered smart cropping:

	```python
	# For each key frame, Qwen VL detects the main subject position
	qwen_response = requests.post(
	"https://api.studio.nebius.ai/v1/chat/completions",
	headers={"Authorization": f"Bearer {nebius_key}"},
	json={
	"model": "Qwen/Qwen2.5-VL-72B-Instruct",
	"messages": [{
	"role": "user",
	"content": [
	{"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{frame}"}},
	{"type": "text", "text": "Find the horizontal position of the main subject. Return decimal 0.0-1.0"}
	]
	}]
	}
	)
	# Result: 92% subject retention vs 40% with dumb center crop
	```

	<p align="center">
	<img src="./resources/smartcrop.gif" width="600" alt="Smart Crop Demo">
	<br>
	<em>Qwen VL tracking subjects for intelligent 9:16 framing</em>
	</p>

	#### FLUX — Custom Intro Image Generation

	Every video gets a unique AI-generated intro card that matches its mood:

	```python
	response = requests.post(
	"https://api.studio.nebius.ai/v1/images/generations",
	headers={"Authorization": f"Bearer {nebius_key}"},
	json={
	"model": "black-forest-labs/flux-schnell",
	"prompt": f"High-energy social media intro, vertical 9:16, "
	f"bold typography '{title}', vibrant neon gradients",
	"width": 1080,
	"height": 1920,
	"num_inference_steps": 4
	}
	)
	# Generates in < 5 seconds on Nebius
	```

	Mood-Matched Styles:
	- 🔥 Hype → Neon gradients, bold typography, TikTok energy
	- 🎬 Suspense → Cinematic noir, dramatic shadows
	- 🌿 Chill → Soft pastels, minimal aesthetic

	<p align="center">
	<img src="./resources/intro-example.png" width="600" alt="FLUX Intro Generation">
	<br>
	<em>FLUX generating mood-matched intro cards via Nebius</em>
	</p>

	---

	### 🎙️ ElevenLabs — Professional Voiceover

	Every video gets a content-aware AI voiceover for the intro:

	```python
	from elevenlabs import ElevenLabs

	client = ElevenLabs(api_key=elevenlabs_key)

	# Gemini writes a hook based on actual video content
	intro_script = "Joe Rogan just dropped some insane knowledge about AI. " \
	"This take is gonna blow your mind, check it out..."

	audio = client.text_to_speech.convert(
	text=intro_script,
	voice_id="21m00Tcm4TlvDq8ikWAM", # Rachel - engaging, professional
	model_id="eleven_turbo_v2_5"
	)
	```

	What Makes It Special:
	- 🧠 Scripts reference actual video content (not generic templates)
	- 🎭 Voice selection adapts to video mood
	- ⚡ Sub-2s generation time

	<p align="center">
	<img src="./resources/intro-example.png" width="600" alt="ElevenLabs Voiceover Demo">
	<br>
	<em>ElevenLabs generating professional voiceover intros</em>
	</p>

	---

	## 🔌 MCP Server Integration

	### 🤖 ChatGPT Integration (GPT Apps SDK)

	We built a ChatGPT App using the GPT Apps SDK that turns ChatGPT into your personal video production assistant.

	How It Works:
	1. Open ChatGPT
	2. Find "Director's Cut" in Apps
	3. Upload your 15-50 second clip
	4. Tell ChatGPT what you want: "Add subtitles and a hype intro"
	5. Download your polished video

	This is insanely cool because ChatGPT becomes a conversational video editor. No UI to learn, no buttons to click—just describe what you want.

	```
	User: "Take this clip and make it TikTok ready"
	↓
	ChatGPT: Understands intent, calls Director's Cut MCP tools
	↓
	MCP Server: Processes video (smart crop, subtitles, music)
	↓
	ChatGPT: "Here's your viral-ready video! 🎬"
	```

	<p align="center">
	<img src="./resources/gifgpt1.gif" width="600" alt="ChatGPT App Demo">
	<br>
	<em>ChatGPT as your personal video production assistant</em>
	</p>

	---

	### 🖥️ Claude Desktop MCP Server

	For the full autonomous pipeline, connect Claude Desktop to our MCP server.

	#### Option 1: Run Locally (Recommended)

	> Why Local? Modal cloud processing requires credits that aren't available to everyone. Running locally gives you full control and works with just API keys.

	Step 1: Clone the repository
	```bash
	git clone https://github.com/tayyab415/directors-cut.git
	cd directors-cut
	```

	Step 2: Install dependencies
	```bash
	pip install -r requirements.txt
	```

	Step 3: Set up environment variables

	Create a `.env` file:
	```env
	# Required API Keys
	GEMINI_API_KEY=your_gemini_key
	NEBIUS_API_KEY=your_nebius_key
	ELEVENLABS_API_KEY=your_elevenlabs_key

	# Optional: Modal (for cloud processing - requires Modal credits)
	MODAL_TOKEN_ID=your_modal_token_id
	MODAL_TOKEN_SECRET=your_modal_token_secret
	```

	Step 4: Run the MCP server
	```bash
	python app.py
	```

	Step 5: Configure Claude Desktop

	Add to `claude_desktop_config.json`:
	```json
	{
	"mcpServers": {
	"directors-cut": {
	"type": "sse",
	"url": "http://localhost:7860/gradio_api/mcp/sse"
	}
	}
	}
	```

	Step 6: Restart Claude Desktop and start creating!

	Then just ask Claude:
	> "Process this YouTube video into a viral TikTok: https://youtube.com/watch?v=..."

	---

	#### Option 2: Use the Hosted Server

	If you have Modal credits or just want to try the hosted version:

	Add to your `claude_desktop_config.json`:

	```json
	{
	"mcpServers": {
	"directors-cut": {
	"type": "sse",
	"url": "https://tyb343-directors-cut.hf.space/gradio_api/mcp/sse"
	}
	}
	}
	```

	#### MCP Tools Available:

	\| Tool \| Description \|
	\|------\|-------------\|
	\| `process_video` \| Full pipeline: YouTube URL → Viral video \|
	\| `step1_analyze_video_mcp` \| Analyze and classify video content \|
	\| `step2_scout_hotspots_mcp` \| Find viral-worthy moments \|
	\| `step3_verify_hotspots_mcp` \| Vision AI verification \|
	\| `step4_create_plan_mcp` \| Generate edit plan \|
	\| `render_and_produce_mcp` \| Render + production polish \|
	\| `smart_crop_video` \| Standalone 9:16 smart crop \|
	\| `add_production_value` \| Add intro, subtitles, music \|

	<p align="center">
	<img src="./resources/claudegif.gif" width="600" alt="Claude Desktop MCP Demo">
	<br>
	<em>Claude Desktop orchestrating the full Director's Cut pipeline</em>
	</p>

	---

	## ⚡ Performance

	\| Metric \| Value \|
	\|--------\|-------\|
	\| Processing Time \| 3-5 min for 10-min video \|
	\| Smart Crop Accuracy \| 92% subject retention \|
	\| Subtitle Accuracy \| 95%+ (Whisper large-v3) \|
	\| Cost Per Video \| ~$0.15 \|
	\| Human Editor Equivalent \| $30-50/hour saved \|

	---

	## 🛠️ Tech Stack

	\| Component \| Technology \| Purpose \|
	\|-----------\|------------\|---------\|
	\| MCP Server \| Gradio 5.x \| Claude/ChatGPT integration \|
	\| Backend Compute \| Modal Labs \| Video processing at scale \|
	\| Video Analysis \| Gemini 2.0 Flash \| Hotspot detection, planning \|
	\| Smart Crop \| Qwen VL (Nebius) \| Subject tracking \|
	\| Intro Images \| FLUX (Nebius) \| Custom title cards \|
	\| Voiceover \| ElevenLabs \| Professional narration \|
	\| Subtitles \| WhisperX \| Word-level captions \|
	\| Video Processing \| FFmpeg + MoviePy \| Rendering \|

	---

	## 📁 Project Structure

	```
	directors-cut/
	├── app.py # Main Gradio app + MCP tools
	├── modal_simple.py # Modal backend endpoints
	├── src/
	│ ├── scout.py # Hotspot detection agent
	│ ├── verifier.py # Vision-based verification agent
	│ ├── director.py # Edit plan generation agent
	│ ├── hands.py # FFmpeg execution agent
	│ ├── showrunner.py # Production polish agent
	│ ├── server.py # Standalone MCP server
	│ └── paths.py # File management
	├── assets/music/ # Mood-matched background tracks
	│ ├── hype/
	│ ├── chill/
	│ └── suspense/
	├── requirements.txt
	└── README.md
	```

	---

	## 🎓 What We Learned

	### Agent Coordination is Harder Than It Looks
	Early versions had agents stepping on each other. Solution: Clear responsibility boundaries + Verifier as quality gate.

	### Smart Crop is a Game-Changer
	Center crop loses 60% of content. Using Qwen VL for actual subject tracking—the difference is night and day.

	### Modal is Insanely Good
	We tried local FFmpeg first. Disaster. Modal's pre-configured containers + instant volumes saved 40+ hours of DevOps.

	### MCP Makes AI Actually Useful
	Without MCP, this is "another AI tool." With MCP, Claude/ChatGPT become genuine creative assistants.

	---

	## 👥 Team One_Horizon

	- Tayyab Khan ([tyb343](https://huggingface.co/tyb343)) — Full-stack Development, Multi-agent Architecture, MCP Integration
	- Sahil Tanna ([sahiltanna7](https://huggingface.co/sahiltanna7)) — Development & Testing, Prompt Engineering
	- Nikunj ([nikunj30](https://huggingface.co/nikunj30)) — Development & Testing, MCP Integration

	---

	## ⚠️ Disclaimer & Responsible Use

	Important Notice on Copyright and Intended Use:

	Director's Cut is designed to help content creators repurpose their own content for different platforms. The intended use cases are:

	✅ Legitimate Uses:
	- Creators repurposing their own YouTube content for TikTok/Reels/Shorts
	- Businesses creating short-form content from their long-form material
	- Educational content being reformatted for different audiences
	- Personal projects and creative experimentation

	❌ This tool should NOT be used for:
	- Downloading and repurposing content you don't own
	- Creating content that infringes on others' copyrights
	- Removing watermarks or attribution from original creators
	- Monetizing content without proper rights or licensing

	By using Director's Cut, you agree to:
	1. Only process content you have rights to use
	2. Respect copyright laws in your jurisdiction
	3. Properly attribute original creators when required
	4. Not use this tool for deceptive or harmful purposes

	We are not responsible for misuse of this tool. The technology is built to empower creators, not to enable copyright infringement. Please use responsibly.

	---

	## 📜 License

	MIT License - Build cool stuff with this, but build it ethically!

	---

	## 🙏 Acknowledgments

	Massive thanks to:
	- Modal — For infrastructure that actually works and generous hackathon credits
	- Nebius — For blazing-fast Qwen VL and FLUX inference
	- ElevenLabs — For voices that sound genuinely human
	- Google Gemini — For the multimodal reasoning powering our agents
	- Anthropic & Gradio — For MCP and hosting this incredible hackathon

	---

	<p align="center">
	<b>Built with ❤️ for content creators who refuse to let great content die in landscape format.</b>
	</p>

	<p align="center">
	<a href="https://huggingface.co/spaces/tyb343/directors-cut">🚀 Try Director's Cut Now</a>
	</p>