tayyab415 commited on
Commit
ad2c461
Β·
1 Parent(s): ba7ebb7

Update README for MCP hackathon submission

Browse files
.gitattributes CHANGED
@@ -1,3 +1,5 @@
1
  assets/music/chill/chill.mp3 filter=lfs diff=lfs merge=lfs -text
2
  assets/music/hype/hype.mp3 filter=lfs diff=lfs merge=lfs -text
3
  assets/music/suspense/suspense.mp3 filter=lfs diff=lfs merge=lfs -text
 
 
 
1
  assets/music/chill/chill.mp3 filter=lfs diff=lfs merge=lfs -text
2
  assets/music/hype/hype.mp3 filter=lfs diff=lfs merge=lfs -text
3
  assets/music/suspense/suspense.mp3 filter=lfs diff=lfs merge=lfs -text
4
+ *.gif filter=lfs diff=lfs merge=lfs -text
5
+ *.png filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,54 +1,652 @@
1
  ---
2
  title: Director's Cut
3
  emoji: 🎬
4
- colorFrom: purple
5
- colorTo: blue
6
  sdk: gradio
7
- sdk_version: 6.0.1
8
  app_file: app.py
9
  pinned: false
 
 
 
10
  license: mit
11
- short_description: AI-Powered Video Editor - YouTube to Viral Clips
12
  ---
13
 
14
- # 🎬 Director's Cut
15
 
16
- Transform YouTube videos into viral short-form content using AI.
17
 
18
- ## Features
 
 
 
 
19
 
20
- - **Video Analysis** - Get video info and transcripts from YouTube
21
- - **Hotspot Detection** - Find the most engaging moments
22
- - **Smart Crop** - AI-powered 9:16 vertical conversion
23
- - **AI Intro** - Generate custom intros with FLUX
24
- - **Subtitles** - Whisper-powered transcription
 
 
 
25
 
26
- ## How It Works
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
 
28
- This Space provides a frontend that connects to a Modal Labs backend for GPU-accelerated processing. All heavy compute (video download, AI analysis, rendering) happens on Modal.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
 
30
- ## MCP Integration
31
 
32
- Connect Claude Desktop or other MCP clients:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
 
34
  ```json
35
  {
36
- "mcpServers": {
37
- "directors-cut": {
38
- "url": "https://tyb343-directors-cut.hf.space/gradio_api/mcp/sse"
39
- }
40
  }
 
41
  }
42
  ```
43
 
44
- ## Available Tools
45
 
46
  | Tool | Description |
47
  |------|-------------|
48
- | `process_video` | Full pipeline - extract viral clips from YouTube |
49
- | `get_video_info` | Get video metadata |
50
- | `get_transcript` | Get video transcript |
51
- | `health_check` | Check backend status |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
52
 
53
  ---
54
- *Powered by Modal Labs β€’ Built for the Anthropic MCP Hackathon*
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  title: Director's Cut
3
  emoji: 🎬
4
+ colorFrom: indigo
5
+ colorTo: purple
6
  sdk: gradio
7
+ sdk_version: "6.0.1"
8
  app_file: app.py
9
  pinned: false
10
+ tags:
11
+ - mcp-server-track
12
+ - building-mcp-creative
13
  license: mit
14
+ short_description: AI Video Editor - YouTube to Viral Shorts
15
  ---
16
 
17
+ # 🎬 **Director's Cut**
18
 
19
+ ### *The autonomous multi-agent system that transforms any YouTube video into viral vertical content. Zero editing skills. Five AI agents. One click.*
20
 
21
+ <p align="center">
22
+ <a href="https://huggingface.co/spaces/tyb343/directors-cut"><img src="https://img.shields.io/badge/πŸš€_Live_Demo-HuggingFace-yellow?style=for-the-badge" alt="Live Demo"></a>
23
+ <a href="https://www.youtube.com/watch?v=h8U5oW4UIVQ"><img src="https://img.shields.io/badge/πŸ“Ή_Demo_Video-YouTube-red?style=for-the-badge" alt="Demo Video"></a>
24
+ <a href="https://www.linkedin.com/posts/tayyab-khan-159153282_mcphackathon-mcp1stbirthday-anthropic-share-7400949504557957120-pE0i"><img src="https://img.shields.io/badge/πŸ’Ό_Social-LinkedIn-blue?style=for-the-badge" alt="Social Post"></a>
25
+ </p>
26
 
27
+ <p align="center">
28
+ <img src="https://img.shields.io/badge/MCP-Server-green?style=flat-square" alt="MCP Server">
29
+ <img src="https://img.shields.io/badge/Modal-Powered-purple?style=flat-square" alt="Modal">
30
+ <img src="https://img.shields.io/badge/Gemini_2.0-Flash-blue?style=flat-square" alt="Gemini">
31
+ <img src="https://img.shields.io/badge/FLUX-Nebius-orange?style=flat-square" alt="FLUX">
32
+ <img src="https://img.shields.io/badge/ElevenLabs-Voice-pink?style=flat-square" alt="ElevenLabs">
33
+ <img src="https://img.shields.io/badge/Qwen_VL-Smart_Crop-cyan?style=flat-square" alt="Qwen VL">
34
+ </p>
35
 
36
+ ---
37
+
38
+ ## 😀 **Why We Need This**
39
+
40
+ Content creators are **drowning**. They have hours of amazing landscape YouTube content sitting thereβ€”completely **worthless** on TikTok, Instagram Reels, and YouTube Shorts.
41
+
42
+ The current "solutions" are a joke:
43
+ - ❌ **Center crop** = butchers your content, cuts off 60% of what matters
44
+ - ❌ **Manual editing** = 2-3 hours per video, soul-crushing repetitive work
45
+ - ❌ **Hiring editors** = $30-50/hour, burns a hole in your pocket
46
+ - ❌ **"AI" tools** = glorified filters, no actual intelligence
47
+
48
+ **The vertical video revolution is here, and creators are being left behind.**
49
+
50
+ Every day, millions of hours of incredible content stay trapped in 16:9 format while the algorithm rewards 9:16. Something had to change.
51
+
52
+ ---
53
+
54
+ ## πŸš€ **What We Created**
55
+
56
+ **Director's Cut** is an **autonomous multi-agent AI system** that doesn't just crop your videoβ€”it *thinks* about your video.
57
+
58
+ We built a 5-agent pipeline that:
59
+ - πŸ” **Analyzes** your entire video for viral-worthy moments
60
+ - βœ… **Verifies** clip quality using vision AI
61
+ - 🎬 **Plans** the perfect edit with pacing and transitions
62
+ - πŸ–οΈ **Executes** with FFmpeg precision
63
+ - 🎭 **Polishes** with intros, smart crop, subtitles, and music
64
+
65
+ **One YouTube URL β†’ Production-ready vertical content in 3-5 minutes.**
66
+
67
+ ### πŸ“Ί **Watch It In Action**
68
+
69
+ <p align="center">
70
+ <a href="https://www.youtube.com/watch?v=h8U5oW4UIVQ">
71
+ <img src="https://img.youtube.com/vi/h8U5oW4UIVQ/maxresdefault.jpg" width="600" alt="Director's Cut Demo Video">
72
+ </a>
73
+ <br>
74
+ <em>πŸ‘† Click to watch the full demo</em>
75
+ </p>
76
+
77
+ ---
78
+
79
+ ## πŸ—οΈ **Architecture 1: Creating Viral Clips from Long-Form Content**
80
+
81
+ The first half of our pipeline takes a massive YouTube video and identifies the **golden moments** worth sharing.
82
+
83
+ ### **The Agent Workflow**
84
+
85
+ ```
86
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
87
+ β”‚ πŸ“₯ PHASE 1: HOTSPOT DETECTION & CLIP CREATION β”‚
88
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
89
+
90
+ YouTube URL (10min - 3hr video)
91
+ β”‚
92
+ β–Ό
93
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
94
+ β”‚ πŸ” SCOUT AGENT β”‚
95
+ β”‚ ───────────────────────────────────────────────────────────── β”‚
96
+ β”‚ Model: Gemini 2.0 Flash β”‚
97
+ β”‚ ───────────────────────────────────────────────────────────── β”‚
98
+ β”‚ β€’ Downloads audio + fetches transcript β”‚
99
+ β”‚ β€’ Analyzes audio energy peaks (detects hype moments) β”‚
100
+ β”‚ β€’ Semantic analysis of transcript for viral potential β”‚
101
+ β”‚ β€’ Classifies video type (podcast/tutorial/vlog/gaming) β”‚
102
+ β”‚ β€’ Outputs: Top 10-20 timestamp candidates with reasoning β”‚
103
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
104
+ β”‚
105
+ β–Ό
106
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
107
+ β”‚ βœ… VERIFIER AGENT β”‚
108
+ β”‚ ───────────────────────────────────────────────────────────── β”‚
109
+ β”‚ Model: Gemini 2.0 Flash + Vision API β”‚
110
+ β”‚ ───────────────────────────────────────────────────────────── β”‚
111
+ β”‚ β€’ Downloads 5-10s video segments around each hotspot β”‚
112
+ β”‚ β€’ Uploads to Gemini Vision for visual quality analysis β”‚
113
+ β”‚ β€’ Scores each clip: Visual Quality | Engagement | Shareability β”‚
114
+ β”‚ β€’ Filters out low-quality, blurry, or boring segments β”‚
115
+ β”‚ β€’ Outputs: Verified clips ranked by viral potential β”‚
116
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
117
+ β”‚
118
+ β–Ό
119
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
120
+ β”‚ 🎬 DIRECTOR AGENT β”‚
121
+ β”‚ ───────────────────────────────────────────────────────────── β”‚
122
+ β”‚ Model: Gemini 2.0 Flash Lite β”‚
123
+ β”‚ ───────────────────────────────────────────────────────────── β”‚
124
+ β”‚ β€’ Creates the edit plan from verified clips β”‚
125
+ β”‚ β€’ Determines optimal clip order for narrative flow β”‚
126
+ β”‚ β€’ Sets duration per clip (8-15s each, 30-60s total) β”‚
127
+ β”‚ β€’ Plans transition types and pacing β”‚
128
+ β”‚ β€’ Outputs: JSON edit plan with precise timestamps β”‚
129
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
130
+ β”‚
131
+ β–Ό
132
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
133
+ β”‚ πŸ–οΈ HANDS AGENT β”‚
134
+ β”‚ ───────────────────────────────────────────────────────────── β”‚
135
+ β”‚ Tech: FFmpeg + Modal (Cloud Compute) β”‚
136
+ β”‚ ───────────────────────────────────────────────────────────── β”‚
137
+ β”‚ β€’ Extracts clips at exact timestamps (no re-download) β”‚
138
+ β”‚ β€’ Concatenates with crossfade transitions β”‚
139
+ β”‚ β€’ Handles audio normalization β”‚
140
+ β”‚ β€’ Outputs: Raw compiled video (30-60s landscape) β”‚
141
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
142
+ β”‚
143
+ β–Ό
144
+ πŸŽ₯ RAW COMPILED CLIP
145
+ (Ready for Production Polish)
146
+ ```
147
+
148
+ ### **Gradio App - Clip Generation Interface**
149
+
150
+ <p align="center">
151
+ <img src="./resources/gradio-clip.gif" width="700" alt="Gradio Clip Generation Demo">
152
+ <br>
153
+ <em>The Gradio interface for generating clips from YouTube URLs</em>
154
+ </p>
155
+
156
+ ### **Architecture Block Diagram**
157
+
158
+ <p align="center">
159
+ <img src="./resources/diagram1.png" width="700" alt="Architecture 1 - Clip Generation Pipeline">
160
+ </p>
161
+
162
+ ---
163
+
164
+ ## πŸ—οΈ **Architecture 2: Production Polish & Refinement**
165
+
166
+ The second half takes that raw clip and transforms it into **viral-ready vertical content**.
167
+
168
+ ### **The Showrunner Pipeline**
169
+
170
+ ```
171
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
172
+ β”‚ 🎭 PHASE 2: PRODUCTION POLISH & REFINEMENT β”‚
173
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
174
+
175
+ Raw Compiled Clip (30-60s landscape)
176
+ β”‚
177
+ β–Ό
178
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
179
+ β”‚ 🎭 SHOWRUNNER AGENT - Step 1: Smart Crop β”‚
180
+ β”‚ ───────────────────────────────────────────────────────────── β”‚
181
+ β”‚ Tech: Gemini Flash Lite + Qwen VL (Nebius) β”‚
182
+ β”‚ ───────────────────────────────────────────────────────────── β”‚
183
+ β”‚ β€’ Gemini detects scene change timestamps β”‚
184
+ β”‚ β€’ Qwen VL analyzes each key frame for subject position β”‚
185
+ β”‚ β€’ Calculates optimal crop window (tracks faces/subjects) β”‚
186
+ β”‚ β€’ Smooth interpolation between positions (no jarring jumps) β”‚
187
+ β”‚ β€’ Renders 9:16 vertical with intelligent framing β”‚
188
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
189
+ β”‚
190
+ β–Ό
191
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
192
+ β”‚ 🎭 SHOWRUNNER AGENT - Step 2: Intro Generation β”‚
193
+ β”‚ ───────────────────────────────────────────────────────────── β”‚
194
+ β”‚ Tech: FLUX (Nebius) + ElevenLabs β”‚
195
+ β”‚ ───────────────────────────────────────────────────────────── β”‚
196
+ β”‚ β€’ Gemini writes content-aware hook script β”‚
197
+ β”‚ β€’ FLUX generates custom intro image (mood-matched) β”‚
198
+ β”‚ β€’ ElevenLabs synthesizes professional voiceover β”‚
199
+ β”‚ β€’ Combines into 3-5s animated intro sequence β”‚
200
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
201
+ β”‚
202
+ β–Ό
203
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
204
+ β”‚ 🎭 SHOWRUNNER AGENT - Step 3: Audio & Subtitles β”‚
205
+ β”‚ ───────────────────────────────────────────────────────────── β”‚
206
+ β”‚ Tech: WhisperX + FFmpeg β”‚
207
+ β”‚ ───────────────────────────────────────────────────────────── β”‚
208
+ β”‚ β€’ WhisperX transcribes with word-level timestamps β”‚
209
+ β”‚ β€’ Mood detection selects background music (hype/chill/suspense) β”‚
210
+ β”‚ β€’ Audio mixing: voice + music at optimal levels β”‚
211
+ β”‚ β€’ Burns stylized subtitles into video β”‚
212
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
213
+ β”‚
214
+ β–Ό
215
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
216
+ β”‚ 🎭 SHOWRUNNER AGENT - Step 4: Final Assembly β”‚
217
+ β”‚ ───────────────────────────────────────────────────────────── β”‚
218
+ β”‚ Tech: FFmpeg + MoviePy β”‚
219
+ β”‚ ───────────────────────────────────────────────────────────── β”‚
220
+ β”‚ β€’ Concatenates intro + main content β”‚
221
+ β”‚ β€’ Applies final color grading β”‚
222
+ β”‚ β€’ Exports in TikTok/Reels/Shorts optimized format β”‚
223
+ β”‚ β€’ Delivers download-ready MP4 β”‚
224
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
225
+ β”‚
226
+ β–Ό
227
+ πŸ“± VIRAL-READY VERTICAL VIDEO
228
+ (9:16, subtitled, intro, music)
229
+ ```
230
+
231
+ ### **Gradio App - Production Studio Interface**
232
+
233
+ <p align="center">
234
+ <img src="./resources/gradio2-productionstudio.gif" width="700" alt="Gradio Production Studio Demo">
235
+ <br>
236
+ <em>The Production Studio for adding polish to your clips</em>
237
+ </p>
238
+
239
+ ### **Architecture Block Diagram**
240
+
241
+ <p align="center">
242
+ <img src="./resources/diagram2.png" width="700" alt="Architecture 2 - Production Polish Pipeline">
243
+ </p>
244
+
245
+ ---
246
+
247
+ ## 🀝 **Partner Technologies**
248
+
249
+ ### **πŸš€ Modal β€” The Backend Powerhouse**
250
+
251
+ Modal isn't just part of our stackβ€”it **IS** our stack. Without Modal, this project would've been impossible.
252
+
253
+ **Why Modal Changed Everything:**
254
+
255
+ When you're processing videos, you need:
256
+ - 50-500MB file uploads that don't timeout
257
+ - FFmpeg with all codecs pre-installed
258
+ - GPU compute for Whisper transcription
259
+ - Parallel processing without infrastructure management
260
+ - Pay-per-use so you don't burn money on idle servers
261
+
262
+ **Modal delivered ALL of this out of the box.**
263
+
264
+ ```python
265
+ @app.function(
266
+ image=base_image, # FFmpeg, ImageMagick, fonts pre-installed
267
+ volumes={STORAGE_PATH: storage_volume}, # Instant file transfers
268
+ timeout=3600,
269
+ memory=32768, # 32GB RAM for video processing
270
+ cpu=8.0
271
+ )
272
+ @modal.web_endpoint(method="POST")
273
+ def process_video(request: dict):
274
+ # This just works. No Docker. No K8s. No DevOps nightmares.
275
+ # Files transfer at lightning speed via Modal volumes.
276
+ # Scales to zero when idleβ€”we only pay when processing.
277
+ ```
278
+
279
+ **The Impact:**
280
+ | Before Modal | With Modal |
281
+ |--------------|------------|
282
+ | 45min upload times | **< 30s** file transfers |
283
+ | Docker dependency hell | **Zero config** FFmpeg |
284
+ | $200/month idle servers | **Pay only when processing** |
285
+ | Manual scaling | **Auto-scales to demand** |
286
+
287
+ **Huge thanks to Modal for the generous credits that made this possible.** We pushed their infrastructure HARD and it never flinched.
288
+
289
+ <p align="center">
290
+ <img src="./resources/modal-arch.png" width="700" alt="Modal Backend Architecture">
291
+ <br>
292
+ <em>How Modal powers the entire Director's Cut backend</em>
293
+ </p>
294
+
295
+ ---
296
+
297
+ ### **🎨 Nebius AI Studio β€” Qwen VL + FLUX**
298
+
299
+ Nebius provides lightning-fast inference for two critical features:
300
+
301
+ #### **Qwen 2.5-VL-72B β€” Intelligent Subject Tracking**
302
+
303
+ This is **NOT center crop**. We built genuine AI-powered smart cropping:
304
+
305
+ ```python
306
+ # For each key frame, Qwen VL detects the main subject position
307
+ qwen_response = requests.post(
308
+ "https://api.studio.nebius.ai/v1/chat/completions",
309
+ headers={"Authorization": f"Bearer {nebius_key}"},
310
+ json={
311
+ "model": "Qwen/Qwen2.5-VL-72B-Instruct",
312
+ "messages": [{
313
+ "role": "user",
314
+ "content": [
315
+ {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{frame}"}},
316
+ {"type": "text", "text": "Find the horizontal position of the main subject. Return decimal 0.0-1.0"}
317
+ ]
318
+ }]
319
+ }
320
+ )
321
+ # Result: 92% subject retention vs 40% with dumb center crop
322
+ ```
323
+
324
+ <p align="center">
325
+ <img src="./resources/smartcrop.gif" width="600" alt="Smart Crop Demo">
326
+ <br>
327
+ <em>Qwen VL tracking subjects for intelligent 9:16 framing</em>
328
+ </p>
329
+
330
+ #### **FLUX β€” Custom Intro Image Generation**
331
+
332
+ Every video gets a **unique AI-generated intro card** that matches its mood:
333
+
334
+ ```python
335
+ response = requests.post(
336
+ "https://api.studio.nebius.ai/v1/images/generations",
337
+ headers={"Authorization": f"Bearer {nebius_key}"},
338
+ json={
339
+ "model": "black-forest-labs/flux-schnell",
340
+ "prompt": f"High-energy social media intro, vertical 9:16, "
341
+ f"bold typography '{title}', vibrant neon gradients",
342
+ "width": 1080,
343
+ "height": 1920,
344
+ "num_inference_steps": 4
345
+ }
346
+ )
347
+ # Generates in < 5 seconds on Nebius
348
+ ```
349
+
350
+ **Mood-Matched Styles:**
351
+ - πŸ”₯ **Hype** β†’ Neon gradients, bold typography, TikTok energy
352
+ - 🎬 **Suspense** β†’ Cinematic noir, dramatic shadows
353
+ - 🌿 **Chill** β†’ Soft pastels, minimal aesthetic
354
+
355
+ <p align="center">
356
+ <img src="./resources/Screenshot 2025-11-30 at 11.27.46 PM.png" width="600" alt="FLUX Intro Generation">
357
+ <br>
358
+ <em>FLUX generating mood-matched intro cards via Nebius</em>
359
+ </p>
360
+
361
+ ---
362
+
363
+ ### **πŸŽ™οΈ ElevenLabs β€” Professional Voiceover**
364
+
365
+ Every video gets a **content-aware AI voiceover** for the intro:
366
+
367
+ ```python
368
+ from elevenlabs import ElevenLabs
369
+
370
+ client = ElevenLabs(api_key=elevenlabs_key)
371
+
372
+ # Gemini writes a hook based on actual video content
373
+ intro_script = "Joe Rogan just dropped some insane knowledge about AI. " \
374
+ "This take is gonna blow your mind, check it out..."
375
+
376
+ audio = client.text_to_speech.convert(
377
+ text=intro_script,
378
+ voice_id="21m00Tcm4TlvDq8ikWAM", # Rachel - engaging, professional
379
+ model_id="eleven_turbo_v2_5"
380
+ )
381
+ ```
382
+
383
+ **What Makes It Special:**
384
+ - 🧠 Scripts reference actual video content (not generic templates)
385
+ - 🎭 Voice selection adapts to video mood
386
+ - ⚑ Sub-2s generation time
387
+
388
+ <p align="center">
389
+ <img src="./resources/Screenshot 2025-11-30 at 11.27.46 PM.png" width="600" alt="ElevenLabs Voiceover Demo">
390
+ <br>
391
+ <em>ElevenLabs generating professional voiceover intros</em>
392
+ </p>
393
+
394
+ ---
395
+
396
+ ## πŸ”Œ **MCP Server Integration**
397
+
398
+ ### **πŸ€– ChatGPT Integration (GPT Apps SDK)**
399
+
400
+ We built a **ChatGPT App** using the GPT Apps SDK that turns ChatGPT into your personal video production assistant.
401
+
402
+ **How It Works:**
403
+ 1. Open ChatGPT
404
+ 2. Find "Director's Cut" in Apps
405
+ 3. Upload your 15-50 second clip
406
+ 4. Tell ChatGPT what you want: *"Add subtitles and a hype intro"*
407
+ 5. Download your polished video
408
+
409
+ **This is insanely cool** because ChatGPT becomes a conversational video editor. No UI to learn, no buttons to clickβ€”just describe what you want.
410
 
411
+ ```
412
+ User: "Take this clip and make it TikTok ready"
413
+ ↓
414
+ ChatGPT: Understands intent, calls Director's Cut MCP tools
415
+ ↓
416
+ MCP Server: Processes video (smart crop, subtitles, music)
417
+ ↓
418
+ ChatGPT: "Here's your viral-ready video! 🎬"
419
+ ```
420
+
421
+ <p align="center">
422
+ <img src="./resources/gifgpt1.gif" width="600" alt="ChatGPT App Demo">
423
+ <br>
424
+ <em>ChatGPT as your personal video production assistant</em>
425
+ </p>
426
+
427
+ ---
428
 
429
+ ### **πŸ–₯️ Claude Desktop MCP Server**
430
 
431
+ For the full autonomous pipeline, connect Claude Desktop to our MCP server.
432
+
433
+ #### **Option 1: Run Locally (Recommended)**
434
+
435
+ > **Why Local?** Modal cloud processing requires credits that aren't available to everyone. Running locally gives you full control and works with just API keys.
436
+
437
+ **Step 1: Clone the repository**
438
+ ```bash
439
+ git clone https://github.com/tayyab415/directors-cut.git
440
+ cd directors-cut
441
+ ```
442
+
443
+ **Step 2: Install dependencies**
444
+ ```bash
445
+ pip install -r requirements.txt
446
+ ```
447
+
448
+ **Step 3: Set up environment variables**
449
+
450
+ Create a `.env` file:
451
+ ```env
452
+ # Required API Keys
453
+ GEMINI_API_KEY=your_gemini_key
454
+ NEBIUS_API_KEY=your_nebius_key
455
+ ELEVENLABS_API_KEY=your_elevenlabs_key
456
+
457
+ # Optional: Modal (for cloud processing - requires Modal credits)
458
+ MODAL_TOKEN_ID=your_modal_token_id
459
+ MODAL_TOKEN_SECRET=your_modal_token_secret
460
+ ```
461
+
462
+ **Step 4: Run the MCP server**
463
+ ```bash
464
+ python app.py
465
+ ```
466
+
467
+ **Step 5: Configure Claude Desktop**
468
+
469
+ Add to `claude_desktop_config.json`:
470
+ ```json
471
+ {
472
+ "mcpServers": {
473
+ "directors-cut": {
474
+ "type": "sse",
475
+ "url": "http://localhost:7860/gradio_api/mcp/sse"
476
+ }
477
+ }
478
+ }
479
+ ```
480
+
481
+ **Step 6: Restart Claude Desktop and start creating!**
482
+
483
+ Then just ask Claude:
484
+ > *"Process this YouTube video into a viral TikTok: https://youtube.com/watch?v=..."*
485
+
486
+ ---
487
+
488
+ #### **Option 2: Use the Hosted Server**
489
+
490
+ If you have Modal credits or just want to try the hosted version:
491
+
492
+ Add to your `claude_desktop_config.json`:
493
 
494
  ```json
495
  {
496
+ "mcpServers": {
497
+ "directors-cut": {
498
+ "type": "sse",
499
+ "url": "https://tyb343-directors-cut.hf.space/gradio_api/mcp/sse"
500
  }
501
+ }
502
  }
503
  ```
504
 
505
+ #### **MCP Tools Available:**
506
 
507
  | Tool | Description |
508
  |------|-------------|
509
+ | `process_video` | Full pipeline: YouTube URL β†’ Viral video |
510
+ | `step1_analyze_video_mcp` | Analyze and classify video content |
511
+ | `step2_scout_hotspots_mcp` | Find viral-worthy moments |
512
+ | `step3_verify_hotspots_mcp` | Vision AI verification |
513
+ | `step4_create_plan_mcp` | Generate edit plan |
514
+ | `render_and_produce_mcp` | Render + production polish |
515
+ | `smart_crop_video` | Standalone 9:16 smart crop |
516
+ | `add_production_value` | Add intro, subtitles, music |
517
+
518
+ <p align="center">
519
+ <img src="./resources/claudegif.gif" width="600" alt="Claude Desktop MCP Demo">
520
+ <br>
521
+ <em>Claude Desktop orchestrating the full Director's Cut pipeline</em>
522
+ </p>
523
+
524
+ ---
525
+
526
+ ## ⚑ **Performance**
527
+
528
+ | Metric | Value |
529
+ |--------|-------|
530
+ | **Processing Time** | 3-5 min for 10-min video |
531
+ | **Smart Crop Accuracy** | 92% subject retention |
532
+ | **Subtitle Accuracy** | 95%+ (Whisper large-v3) |
533
+ | **Cost Per Video** | ~$0.15 |
534
+ | **Human Editor Equivalent** | $30-50/hour saved |
535
+
536
+ ---
537
+
538
+ ## πŸ› οΈ **Tech Stack**
539
+
540
+ | Component | Technology | Purpose |
541
+ |-----------|------------|---------|
542
+ | **MCP Server** | Gradio 5.x | Claude/ChatGPT integration |
543
+ | **Backend Compute** | Modal Labs | Video processing at scale |
544
+ | **Video Analysis** | Gemini 2.0 Flash | Hotspot detection, planning |
545
+ | **Smart Crop** | Qwen VL (Nebius) | Subject tracking |
546
+ | **Intro Images** | FLUX (Nebius) | Custom title cards |
547
+ | **Voiceover** | ElevenLabs | Professional narration |
548
+ | **Subtitles** | WhisperX | Word-level captions |
549
+ | **Video Processing** | FFmpeg + MoviePy | Rendering |
550
+
551
+ ---
552
+
553
+ ## πŸ“ **Project Structure**
554
+
555
+ ```
556
+ directors-cut/
557
+ β”œβ”€β”€ app.py # Main Gradio app + MCP tools
558
+ β”œβ”€β”€ modal_simple.py # Modal backend endpoints
559
+ β”œβ”€β”€ src/
560
+ β”‚ β”œβ”€β”€ scout.py # Hotspot detection agent
561
+ β”‚ β”œβ”€β”€ verifier.py # Vision-based verification agent
562
+ β”‚ β”œβ”€β”€ director.py # Edit plan generation agent
563
+ β”‚ β”œβ”€β”€ hands.py # FFmpeg execution agent
564
+ β”‚ β”œβ”€β”€ showrunner.py # Production polish agent
565
+ β”‚ β”œβ”€β”€ server.py # Standalone MCP server
566
+ β”‚ └── paths.py # File management
567
+ β”œβ”€β”€ assets/music/ # Mood-matched background tracks
568
+ β”‚ β”œβ”€β”€ hype/
569
+ β”‚ β”œβ”€β”€ chill/
570
+ β”‚ └── suspense/
571
+ β”œβ”€β”€ requirements.txt
572
+ └── README.md
573
+ ```
574
 
575
  ---
576
+
577
+ ## πŸŽ“ **What We Learned**
578
+
579
+ ### **Agent Coordination is Harder Than It Looks**
580
+ Early versions had agents stepping on each other. Solution: Clear responsibility boundaries + Verifier as quality gate.
581
+
582
+ ### **Smart Crop is a Game-Changer**
583
+ Center crop loses 60% of content. Using Qwen VL for actual subject trackingβ€”the difference is night and day.
584
+
585
+ ### **Modal is Insanely Good**
586
+ We tried local FFmpeg first. Disaster. Modal's pre-configured containers + instant volumes saved 40+ hours of DevOps.
587
+
588
+ ### **MCP Makes AI Actually Useful**
589
+ Without MCP, this is "another AI tool." With MCP, Claude/ChatGPT become genuine creative assistants.
590
+
591
+ ---
592
+
593
+ ## πŸ‘₯ **Team**
594
+
595
+ - **Tayyab Khan** ([tyb343](https://huggingface.co/tyb343)) β€” Full-stack development, Multi-agent architecture, MCP integration
596
+ - **Sahil Tanna** ([sahiltanna7](https://huggingface.co/sahiltanna7)) β€” Development & Testing
597
+ - **Nikunj** ([nikunj30](https://huggingface.co/nikunj30)) β€” Development & Testing
598
+
599
+ ---
600
+
601
+ ## ⚠️ **Disclaimer & Responsible Use**
602
+
603
+ **Important Notice on Copyright and Intended Use:**
604
+
605
+ Director's Cut is designed to help **content creators repurpose their own content** for different platforms. The intended use cases are:
606
+
607
+ βœ… **Legitimate Uses:**
608
+ - Creators repurposing their own YouTube content for TikTok/Reels/Shorts
609
+ - Businesses creating short-form content from their long-form material
610
+ - Educational content being reformatted for different audiences
611
+ - Personal projects and creative experimentation
612
+
613
+ ❌ **This tool should NOT be used for:**
614
+ - Downloading and repurposing content you don't own
615
+ - Creating content that infringes on others' copyrights
616
+ - Removing watermarks or attribution from original creators
617
+ - Monetizing content without proper rights or licensing
618
+
619
+ **By using Director's Cut, you agree to:**
620
+ 1. Only process content you have rights to use
621
+ 2. Respect copyright laws in your jurisdiction
622
+ 3. Properly attribute original creators when required
623
+ 4. Not use this tool for deceptive or harmful purposes
624
+
625
+ **We are not responsible for misuse of this tool.** The technology is built to empower creators, not to enable copyright infringement. Please use responsibly.
626
+
627
+ ---
628
+
629
+ ## πŸ“œ **License**
630
+
631
+ MIT License - Build cool stuff with this, but build it ethically!
632
+
633
+ ---
634
+
635
+ ## πŸ™ **Acknowledgments**
636
+
637
+ Massive thanks to:
638
+ - **Modal** β€” For infrastructure that actually works and generous hackathon credits
639
+ - **Nebius** β€” For blazing-fast Qwen VL and FLUX inference
640
+ - **ElevenLabs** β€” For voices that sound genuinely human
641
+ - **Google Gemini** β€” For the multimodal reasoning powering our agents
642
+ - **Anthropic & Gradio** β€” For MCP and hosting this incredible hackathon
643
+
644
+ ---
645
+
646
+ <p align="center">
647
+ <b>Built with ❀️ for content creators who refuse to let great content die in landscape format.</b>
648
+ </p>
649
+
650
+ <p align="center">
651
+ <a href="https://huggingface.co/spaces/tyb343/directors-cut">πŸš€ Try Director's Cut Now</a>
652
+ </p>
resources/Screenshot 2025-11-30 at 11.27.46β€―PM.png ADDED

Git LFS Details

  • SHA256: 8646be0190bb7051e53a0656de2caa8842cb270483bb7ee67fe59e0b3387d537
  • Pointer size: 132 Bytes
  • Size of remote file: 1.19 MB
resources/claudegif.gif ADDED

Git LFS Details

  • SHA256: bf3a4882933872844af4bcf2bd192735f45b86910c4633155fa847026e9f6f10
  • Pointer size: 132 Bytes
  • Size of remote file: 3.16 MB
resources/clip-gen-diagram.png ADDED

Git LFS Details

  • SHA256: 37c1f8e145e70c9678512d459ba9cec9bda6f6da5218a09750a982e7f2637145
  • Pointer size: 132 Bytes
  • Size of remote file: 4.63 MB
resources/diagram1.png ADDED

Git LFS Details

  • SHA256: 01546d21e1d505cbf2191600c0eaa1ab62f10f816d3ce4df216acb76a1740740
  • Pointer size: 132 Bytes
  • Size of remote file: 5.23 MB
resources/diagram2.png ADDED

Git LFS Details

  • SHA256: 3a16b42bb79c121253c79a1b26a4b98168839e389dd2ebc0165527bb50e1a3d9
  • Pointer size: 132 Bytes
  • Size of remote file: 4.53 MB
resources/gifgpt1.gif ADDED

Git LFS Details

  • SHA256: 843547f2ed2e2e97a3726d62b1c9af2ae777dca77afdcd6dd34545630fbe0fa4
  • Pointer size: 132 Bytes
  • Size of remote file: 4.57 MB
resources/gradio-clip.gif ADDED

Git LFS Details

  • SHA256: 3ca0635b239d107f84faec518628200062a875238fc5934da53117a896a65396
  • Pointer size: 132 Bytes
  • Size of remote file: 4.53 MB
resources/gradio2-productionstudio.gif ADDED

Git LFS Details

  • SHA256: 0da91206979cd955260ce97ec8e1e5fff7793031a25f733970a489ccab5b54e5
  • Pointer size: 133 Bytes
  • Size of remote file: 12.9 MB
resources/modal-arch.png ADDED

Git LFS Details

  • SHA256: 66d34d2f42b998d6aea29b54b99e6def9afc5e9436d11159806863e567a2853d
  • Pointer size: 132 Bytes
  • Size of remote file: 4.78 MB
resources/modal.png ADDED

Git LFS Details

  • SHA256: 727510356e7183d8f089a61d1098d68ec999529e1ba0859ce7c61edd3d0bda23
  • Pointer size: 132 Bytes
  • Size of remote file: 4.75 MB
resources/prod-stduio.png ADDED

Git LFS Details

  • SHA256: 46ae5dd1bd4723d4a75eb3d8e52d3affbb90435b1eae38fbdc8ee843569f6695
  • Pointer size: 132 Bytes
  • Size of remote file: 5.06 MB
resources/smartcrop.gif ADDED

Git LFS Details

  • SHA256: be3c5fa15e962e6f4a6e7ddcfb686ed6023434bfb5b92392702748c36473ac20
  • Pointer size: 132 Bytes
  • Size of remote file: 3.14 MB