Spaces:
Sleeping
Sleeping
tayyab415
commited on
Commit
Β·
fe95fe1
1
Parent(s):
96c309b
Simplify architecture sections - diagrams first, remove ASCII art
Browse files
README.md
CHANGED
|
@@ -105,70 +105,11 @@ We built a 5-agent pipeline that:
|
|
| 105 |
|
| 106 |
The first half of our pipeline takes a massive YouTube video and identifies the **golden moments** worth sharing.
|
| 107 |
|
| 108 |
-
|
| 109 |
-
|
| 110 |
-
|
| 111 |
-
|
| 112 |
-
|
| 113 |
-
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 114 |
-
|
| 115 |
-
YouTube URL (10min - 3hr video)
|
| 116 |
-
β
|
| 117 |
-
βΌ
|
| 118 |
-
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 119 |
-
β π SCOUT AGENT β
|
| 120 |
-
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
|
| 121 |
-
β Model: Gemini 2.0 Flash β
|
| 122 |
-
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
|
| 123 |
-
β β’ Downloads audio + fetches transcript β
|
| 124 |
-
β β’ Analyzes audio energy peaks (detects hype moments) β
|
| 125 |
-
β β’ Semantic analysis of transcript for viral potential β
|
| 126 |
-
β β’ Classifies video type (podcast/tutorial/vlog/gaming) β
|
| 127 |
-
β β’ Outputs: Top 10-20 timestamp candidates with reasoning β
|
| 128 |
-
ββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββ
|
| 129 |
-
β
|
| 130 |
-
βΌ
|
| 131 |
-
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 132 |
-
β β
VERIFIER AGENT β
|
| 133 |
-
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
|
| 134 |
-
β Model: Gemini 2.0 Flash + Vision API β
|
| 135 |
-
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
|
| 136 |
-
β β’ Downloads 5-10s video segments around each hotspot β
|
| 137 |
-
β β’ Uploads to Gemini Vision for visual quality analysis β
|
| 138 |
-
β β’ Scores each clip: Visual Quality | Engagement | Shareability β
|
| 139 |
-
β β’ Filters out low-quality, blurry, or boring segments β
|
| 140 |
-
β β’ Outputs: Verified clips ranked by viral potential β
|
| 141 |
-
ββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββ
|
| 142 |
-
β
|
| 143 |
-
βΌ
|
| 144 |
-
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 145 |
-
β π¬ DIRECTOR AGENT β
|
| 146 |
-
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
|
| 147 |
-
β Model: Gemini 2.0 Flash Lite β
|
| 148 |
-
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
|
| 149 |
-
β β’ Creates the edit plan from verified clips β
|
| 150 |
-
β β’ Determines optimal clip order for narrative flow β
|
| 151 |
-
β β’ Sets duration per clip (8-15s each, 30-60s total) β
|
| 152 |
-
β β’ Plans transition types and pacing β
|
| 153 |
-
β β’ Outputs: JSON edit plan with precise timestamps β
|
| 154 |
-
ββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββ
|
| 155 |
-
β
|
| 156 |
-
βΌ
|
| 157 |
-
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 158 |
-
β ποΏ½οΏ½οΏ½ HANDS AGENT β
|
| 159 |
-
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
|
| 160 |
-
β Tech: FFmpeg + Modal (Cloud Compute) β
|
| 161 |
-
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
|
| 162 |
-
β β’ Extracts clips at exact timestamps (no re-download) β
|
| 163 |
-
β β’ Concatenates with crossfade transitions β
|
| 164 |
-
β β’ Handles audio normalization β
|
| 165 |
-
β β’ Outputs: Raw compiled video (30-60s landscape) β
|
| 166 |
-
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 167 |
-
β
|
| 168 |
-
βΌ
|
| 169 |
-
π₯ RAW COMPILED CLIP
|
| 170 |
-
(Ready for Production Polish)
|
| 171 |
-
```
|
| 172 |
|
| 173 |
### **Gradio App - Clip Generation Interface**
|
| 174 |
|
|
@@ -178,80 +119,17 @@ YouTube URL (10min - 3hr video)
|
|
| 178 |
<em>The Gradio interface for generating clips from YouTube URLs</em>
|
| 179 |
</p>
|
| 180 |
|
| 181 |
-
### **Architecture Block Diagram**
|
| 182 |
-
|
| 183 |
-
<p align="center">
|
| 184 |
-
<img src="./resources/diagram1.png" width="700" alt="Architecture 1 - Clip Generation Pipeline">
|
| 185 |
-
</p>
|
| 186 |
-
|
| 187 |
---
|
| 188 |
|
| 189 |
## ποΈ **Architecture 2: Production Polish & Refinement**
|
| 190 |
|
| 191 |
The second half takes that raw clip and transforms it into **viral-ready vertical content**.
|
| 192 |
|
| 193 |
-
|
| 194 |
-
|
| 195 |
-
|
| 196 |
-
|
| 197 |
-
|
| 198 |
-
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 199 |
-
|
| 200 |
-
Raw Compiled Clip (30-60s landscape)
|
| 201 |
-
β
|
| 202 |
-
βΌ
|
| 203 |
-
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 204 |
-
β π SHOWRUNNER AGENT - Step 1: Smart Crop β
|
| 205 |
-
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
|
| 206 |
-
β Tech: Gemini Flash Lite + Qwen VL (Nebius) β
|
| 207 |
-
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
|
| 208 |
-
β β’ Gemini detects scene change timestamps β
|
| 209 |
-
β β’ Qwen VL analyzes each key frame for subject position β
|
| 210 |
-
β β’ Calculates optimal crop window (tracks faces/subjects) β
|
| 211 |
-
β β’ Smooth interpolation between positions (no jarring jumps) β
|
| 212 |
-
β β’ Renders 9:16 vertical with intelligent framing β
|
| 213 |
-
ββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββ
|
| 214 |
-
β
|
| 215 |
-
βΌ
|
| 216 |
-
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 217 |
-
β π SHOWRUNNER AGENT - Step 2: Intro Generation β
|
| 218 |
-
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
|
| 219 |
-
β Tech: FLUX (Nebius) + ElevenLabs β
|
| 220 |
-
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
|
| 221 |
-
β β’ Gemini writes content-aware hook script β
|
| 222 |
-
β β’ FLUX generates custom intro image (mood-matched) β
|
| 223 |
-
β β’ ElevenLabs synthesizes professional voiceover β
|
| 224 |
-
β β’ Combines into 3-5s animated intro sequence β
|
| 225 |
-
ββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββοΏ½οΏ½οΏ½ββββββββββββ
|
| 226 |
-
β
|
| 227 |
-
βΌ
|
| 228 |
-
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 229 |
-
β π SHOWRUNNER AGENT - Step 3: Audio & Subtitles β
|
| 230 |
-
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
|
| 231 |
-
β Tech: WhisperX + FFmpeg β
|
| 232 |
-
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
|
| 233 |
-
β β’ WhisperX transcribes with word-level timestamps β
|
| 234 |
-
β β’ Mood detection selects background music (hype/chill/suspense) β
|
| 235 |
-
β β’ Audio mixing: voice + music at optimal levels β
|
| 236 |
-
β β’ Burns stylized subtitles into video β
|
| 237 |
-
ββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββ
|
| 238 |
-
β
|
| 239 |
-
βΌ
|
| 240 |
-
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 241 |
-
β π SHOWRUNNER AGENT - Step 4: Final Assembly β
|
| 242 |
-
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
|
| 243 |
-
β Tech: FFmpeg + MoviePy β
|
| 244 |
-
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
|
| 245 |
-
β β’ Concatenates intro + main content β
|
| 246 |
-
β β’ Applies final color grading β
|
| 247 |
-
β β’ Exports in TikTok/Reels/Shorts optimized format β
|
| 248 |
-
β β’ Delivers download-ready MP4 β
|
| 249 |
-
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 250 |
-
β
|
| 251 |
-
βΌ
|
| 252 |
-
π± VIRAL-READY VERTICAL VIDEO
|
| 253 |
-
(9:16, subtitled, intro, music)
|
| 254 |
-
```
|
| 255 |
|
| 256 |
### **Gradio App - Production Studio Interface**
|
| 257 |
|
|
@@ -261,12 +139,6 @@ Raw Compiled Clip (30-60s landscape)
|
|
| 261 |
<em>The Production Studio for adding polish to your clips</em>
|
| 262 |
</p>
|
| 263 |
|
| 264 |
-
### **Architecture Block Diagram**
|
| 265 |
-
|
| 266 |
-
<p align="center">
|
| 267 |
-
<img src="./resources/diagram2.png" width="700" alt="Architecture 2 - Production Polish Pipeline">
|
| 268 |
-
</p>
|
| 269 |
-
|
| 270 |
---
|
| 271 |
|
| 272 |
## π€ **Partner Technologies**
|
|
|
|
| 105 |
|
| 106 |
The first half of our pipeline takes a massive YouTube video and identifies the **golden moments** worth sharing.
|
| 107 |
|
| 108 |
+
<p align="center">
|
| 109 |
+
<img src="./resources/diagram1.png" width="700" alt="Architecture 1 - Clip Generation Pipeline">
|
| 110 |
+
<br>
|
| 111 |
+
<em>Scout β Verifier β Director β Hands: From YouTube URL to raw compiled clip</em>
|
| 112 |
+
</p>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 113 |
|
| 114 |
### **Gradio App - Clip Generation Interface**
|
| 115 |
|
|
|
|
| 119 |
<em>The Gradio interface for generating clips from YouTube URLs</em>
|
| 120 |
</p>
|
| 121 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 122 |
---
|
| 123 |
|
| 124 |
## ποΈ **Architecture 2: Production Polish & Refinement**
|
| 125 |
|
| 126 |
The second half takes that raw clip and transforms it into **viral-ready vertical content**.
|
| 127 |
|
| 128 |
+
<p align="center">
|
| 129 |
+
<img src="./resources/diagram2.png" width="700" alt="Architecture 2 - Production Polish Pipeline">
|
| 130 |
+
<br>
|
| 131 |
+
<em>Showrunner: Smart Crop β Intro β Subtitles β Final Assembly</em>
|
| 132 |
+
</p>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 133 |
|
| 134 |
### **Gradio App - Production Studio Interface**
|
| 135 |
|
|
|
|
| 139 |
<em>The Production Studio for adding polish to your clips</em>
|
| 140 |
</p>
|
| 141 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 142 |
---
|
| 143 |
|
| 144 |
## π€ **Partner Technologies**
|