Text Generation
video-editing
social-media
agent
tool-calling
sft
trl
viralcut
ryu34 commited on
Commit
5c2da8e
Β·
verified Β·
1 Parent(s): fdcb45f

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +196 -0
README.md ADDED
@@ -0,0 +1,196 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: Qwen/Qwen2.5-3B-Instruct
4
+ tags:
5
+ - video-editing
6
+ - social-media
7
+ - agent
8
+ - tool-calling
9
+ - sft
10
+ - trl
11
+ - viralcut
12
+ datasets:
13
+ - ryu34/viralcut-agent-data
14
+ - benxh/tiktok-hooks-finetune
15
+ - NousResearch/hermes-function-calling-v1
16
+ pipeline_tag: text-generation
17
+ ---
18
+
19
+ # 🎬 ViralCut Agent
20
+
21
+ **An autonomous AI agent that transforms raw video footage into professional, viral-worthy social media content.**
22
+
23
+ ViralCut Agent is a fine-tuned [Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct) model trained with QLoRA SFT on tool-calling trajectories for video editing, social media optimization, and content strategy.
24
+
25
+ ## What It Does
26
+
27
+ | Capability | How |
28
+ |---|---|
29
+ | 🎬 **Video Analysis** | Analyze raw footage, find best moments, detect scenes |
30
+ | βœ‚οΈ **Professional Editing** | Trim, transitions, effects, text overlays, color grading via FFmpeg |
31
+ | 🎡 **Audio Production** | Search & add trending royalty-free music, sound effects, audio mixing |
32
+ | πŸ“Š **Viral Optimization** | Score content for TikTok/Instagram/YouTube, optimize for algorithms |
33
+ | πŸ” **Trend Research** | Search current trends, hooks, sounds via web search |
34
+ | 🚫 **AI Slop Detection** | Filter out AI-generated junk content |
35
+ | ✍️ **Caption Generation** | Platform-optimized captions, hashtags, posting strategy |
36
+
37
+ ## Tools
38
+
39
+ The agent was trained to call these tools autonomously:
40
+
41
+ ```python
42
+ # 1. FFmpeg for video processing
43
+ ffmpeg_cmd(command="ffmpeg -y -i input.mp4 -vf 'eq=saturation=1.3' output.mp4",
44
+ description="Boost color saturation")
45
+
46
+ # 2. Web search for assets and trends
47
+ web_search(query="trending TikTok sounds food 2025", search_type="trending_content")
48
+ web_search(query="royalty free lo-fi beat", search_type="royalty_free_music")
49
+
50
+ # 3. Video analysis
51
+ analyze_video(video_path="raw.mp4", analysis_type="full")
52
+
53
+ # 4. Virality scoring
54
+ score_virality(video_path="edit.mp4", platform="tiktok", niche="food")
55
+
56
+ # 5. Caption generation
57
+ generate_caption(video_description="...", platform="tiktok", tone="casual")
58
+
59
+ # 6. AI content detection
60
+ detect_ai_slop(content_path="broll.mp4", check_type="video")
61
+ ```
62
+
63
+ ## Quick Start
64
+
65
+ ### Install
66
+ ```bash
67
+ pip install transformers torch peft bitsandbytes duckduckgo-search
68
+ ```
69
+
70
+ ### Use as Agent (with real tools)
71
+ ```bash
72
+ # Clone the repo
73
+ git clone https://huggingface.co/ryu34/viralcut-agent
74
+ cd viralcut-agent
75
+
76
+ # Edit a video
77
+ python agent.py --video raw_footage.mp4 --platform tiktok --niche food
78
+
79
+ # Get a content plan (no video needed)
80
+ python agent.py --plan --niche "coffee shop" --platform tiktok
81
+
82
+ # Check files for AI slop
83
+ python agent.py --check-slop clip1.mp4 clip2.mp4
84
+
85
+ # Interactive mode
86
+ python agent.py
87
+ ```
88
+
89
+ ### Use as Model (inference only)
90
+ ```python
91
+ from transformers import AutoModelForCausalLM, AutoTokenizer
92
+
93
+ model = AutoModelForCausalLM.from_pretrained("ryu34/viralcut-agent", device_map="auto")
94
+ tokenizer = AutoTokenizer.from_pretrained("ryu34/viralcut-agent")
95
+
96
+ messages = [
97
+ {"role": "system", "content": "You are ViralCut Agent..."},
98
+ {"role": "user", "content": "Edit my beach video into a TikTok with trending music and effects"}
99
+ ]
100
+
101
+ text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
102
+ inputs = tokenizer(text, return_tensors="pt").to(model.device)
103
+ outputs = model.generate(**inputs, max_new_tokens=1024)
104
+ print(tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:]))
105
+ ```
106
+
107
+ ## Training
108
+
109
+ ### Data
110
+ Mixed dataset of ~2,800 examples:
111
+ - **10 synthetic video editing trajectories** β€” multi-turn conversations showing full edit pipelines (analyze β†’ search β†’ edit β†’ score β†’ caption)
112
+ - **~1,300 TikTok hooks/captions** β€” real viral content data from [benxh/tiktok-hooks-finetune](https://huggingface.co/datasets/benxh/tiktok-hooks-finetune)
113
+ - **~1,200 general function-calling** β€” tool-use backbone from [NousResearch/hermes-function-calling-v1](https://huggingface.co/datasets/NousResearch/hermes-function-calling-v1)
114
+
115
+ Full dataset: [ryu34/viralcut-agent-data](https://huggingface.co/datasets/ryu34/viralcut-agent-data)
116
+
117
+ ### Method
118
+ - **Base model**: Qwen/Qwen2.5-3B-Instruct
119
+ - **Method**: QLoRA SFT (4-bit quantization, rank 16, alpha 32)
120
+ - **Training**: 3 epochs, lr=2e-4, cosine schedule, assistant-only loss
121
+ - **Hardware**: T4 16GB GPU (free tier compatible)
122
+ - **Framework**: TRL v1.3+ SFTTrainer
123
+
124
+ ### Train It Yourself
125
+ ```bash
126
+ # Option 1: Google Colab (free T4 GPU)
127
+ # Open: https://huggingface.co/datasets/ryu34/viralcut-agent-data/blob/main/train_colab.ipynb
128
+
129
+ # Option 2: Direct script
130
+ wget https://huggingface.co/datasets/ryu34/viralcut-agent-data/resolve/main/train.py
131
+ pip install transformers trl torch datasets accelerate peft bitsandbytes
132
+ python train.py
133
+ ```
134
+
135
+ ## Architecture
136
+
137
+ ```
138
+ User Request ("Edit my raw footage into a viral TikTok")
139
+ β”‚
140
+ β–Ό
141
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
142
+ β”‚ ViralCut Agent (Qwen2.5-3B) β”‚
143
+ β”‚ Fine-tuned for tool-calling β”‚
144
+ β”‚ β”‚
145
+ β”‚ Thinks β†’ Plans β†’ Calls Tools β”‚
146
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
147
+ β”‚
148
+ β”Œβ”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
149
+ β”‚ β”‚ β”‚ β”‚ β”‚
150
+ β–Ό β–Ό β–Ό β–Ό β–Ό
151
+ FFmpeg Web Video Viral AI Slop
152
+ Edit Search Anal. Score Detect
153
+ β”‚ β”‚ β”‚ β”‚ β”‚
154
+ β””β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”˜
155
+ β”‚
156
+ β–Ό
157
+ Final edited video + caption + strategy
158
+ ```
159
+
160
+ ## Example Output
161
+
162
+ **Input:** "I have 8 minutes of raw ramen footage from Tokyo. Make a TikTok."
163
+
164
+ **Agent actions:**
165
+ 1. πŸ“Š `analyze_video(raw_ramen.mp4, "full")` β†’ Found 8 scenes, best: noodle pull at 0.9 energy
166
+ 2. πŸ” `web_search("trending TikTok sounds food ASMR 2025")` β†’ Lo-fi city pop trending
167
+ 3. 🎡 `web_search("royalty free lo-fi Japanese beat")` β†’ Found "Tokyo Nights" CC BY 4.0
168
+ 4. βœ‚οΈ `ffmpeg_cmd(...)` β†’ Extracted hook shot with color boost
169
+ 5. βœ‚οΈ `ffmpeg_cmd(...)` β†’ Speed-ramped broth prep
170
+ 6. βœ‚οΈ `ffmpeg_cmd(...)` β†’ Assembled with fadeblack + slideright transitions
171
+ 7. 🎡 `ffmpeg_cmd(...)` β†’ Mixed lo-fi music at 70% with ambient
172
+ 8. πŸ“ `ffmpeg_cmd(...)` β†’ Added text hook + location overlay
173
+ 9. πŸ“ˆ `score_virality(...)` β†’ 82/100
174
+ 10. 🚫 `detect_ai_slop(...)` β†’ Authentic βœ…
175
+ 11. ✍️ `generate_caption(...)` β†’ "This man has been making ramen by hand for 30 years"
176
+
177
+ **Output:** 17s vertical TikTok with professional transitions, trending music, text overlays. Score: 82/100.
178
+
179
+ ## Limitations
180
+
181
+ - Model is 3B parameters β€” for complex creative decisions, larger models (7B+) would perform better
182
+ - FFmpeg commands may need adjustment for specific file formats
183
+ - Virality scoring is heuristic-based, not ML-based
184
+ - Web search requires `duckduckgo-search` package
185
+ - No actual video generation β€” this is an *editing* agent that works with your existing footage
186
+
187
+ ## Citation
188
+
189
+ ```bibtex
190
+ @misc{viralcut-agent-2025,
191
+ title={ViralCut Agent: Autonomous Video Editing for Social Media},
192
+ author={ryu34},
193
+ year={2025},
194
+ url={https://huggingface.co/ryu34/viralcut-agent}
195
+ }
196
+ ```