File size: 7,343 Bytes
5c2da8e | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 | ---
license: apache-2.0
base_model: Qwen/Qwen2.5-3B-Instruct
tags:
- video-editing
- social-media
- agent
- tool-calling
- sft
- trl
- viralcut
datasets:
- ryu34/viralcut-agent-data
- benxh/tiktok-hooks-finetune
- NousResearch/hermes-function-calling-v1
pipeline_tag: text-generation
---
# π¬ ViralCut Agent
**An autonomous AI agent that transforms raw video footage into professional, viral-worthy social media content.**
ViralCut Agent is a fine-tuned [Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct) model trained with QLoRA SFT on tool-calling trajectories for video editing, social media optimization, and content strategy.
## What It Does
| Capability | How |
|---|---|
| π¬ **Video Analysis** | Analyze raw footage, find best moments, detect scenes |
| βοΈ **Professional Editing** | Trim, transitions, effects, text overlays, color grading via FFmpeg |
| π΅ **Audio Production** | Search & add trending royalty-free music, sound effects, audio mixing |
| π **Viral Optimization** | Score content for TikTok/Instagram/YouTube, optimize for algorithms |
| π **Trend Research** | Search current trends, hooks, sounds via web search |
| π« **AI Slop Detection** | Filter out AI-generated junk content |
| βοΈ **Caption Generation** | Platform-optimized captions, hashtags, posting strategy |
## Tools
The agent was trained to call these tools autonomously:
```python
# 1. FFmpeg for video processing
ffmpeg_cmd(command="ffmpeg -y -i input.mp4 -vf 'eq=saturation=1.3' output.mp4",
description="Boost color saturation")
# 2. Web search for assets and trends
web_search(query="trending TikTok sounds food 2025", search_type="trending_content")
web_search(query="royalty free lo-fi beat", search_type="royalty_free_music")
# 3. Video analysis
analyze_video(video_path="raw.mp4", analysis_type="full")
# 4. Virality scoring
score_virality(video_path="edit.mp4", platform="tiktok", niche="food")
# 5. Caption generation
generate_caption(video_description="...", platform="tiktok", tone="casual")
# 6. AI content detection
detect_ai_slop(content_path="broll.mp4", check_type="video")
```
## Quick Start
### Install
```bash
pip install transformers torch peft bitsandbytes duckduckgo-search
```
### Use as Agent (with real tools)
```bash
# Clone the repo
git clone https://huggingface.co/ryu34/viralcut-agent
cd viralcut-agent
# Edit a video
python agent.py --video raw_footage.mp4 --platform tiktok --niche food
# Get a content plan (no video needed)
python agent.py --plan --niche "coffee shop" --platform tiktok
# Check files for AI slop
python agent.py --check-slop clip1.mp4 clip2.mp4
# Interactive mode
python agent.py
```
### Use as Model (inference only)
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("ryu34/viralcut-agent", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("ryu34/viralcut-agent")
messages = [
{"role": "system", "content": "You are ViralCut Agent..."},
{"role": "user", "content": "Edit my beach video into a TikTok with trending music and effects"}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=1024)
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:]))
```
## Training
### Data
Mixed dataset of ~2,800 examples:
- **10 synthetic video editing trajectories** β multi-turn conversations showing full edit pipelines (analyze β search β edit β score β caption)
- **~1,300 TikTok hooks/captions** β real viral content data from [benxh/tiktok-hooks-finetune](https://huggingface.co/datasets/benxh/tiktok-hooks-finetune)
- **~1,200 general function-calling** β tool-use backbone from [NousResearch/hermes-function-calling-v1](https://huggingface.co/datasets/NousResearch/hermes-function-calling-v1)
Full dataset: [ryu34/viralcut-agent-data](https://huggingface.co/datasets/ryu34/viralcut-agent-data)
### Method
- **Base model**: Qwen/Qwen2.5-3B-Instruct
- **Method**: QLoRA SFT (4-bit quantization, rank 16, alpha 32)
- **Training**: 3 epochs, lr=2e-4, cosine schedule, assistant-only loss
- **Hardware**: T4 16GB GPU (free tier compatible)
- **Framework**: TRL v1.3+ SFTTrainer
### Train It Yourself
```bash
# Option 1: Google Colab (free T4 GPU)
# Open: https://huggingface.co/datasets/ryu34/viralcut-agent-data/blob/main/train_colab.ipynb
# Option 2: Direct script
wget https://huggingface.co/datasets/ryu34/viralcut-agent-data/resolve/main/train.py
pip install transformers trl torch datasets accelerate peft bitsandbytes
python train.py
```
## Architecture
```
User Request ("Edit my raw footage into a viral TikTok")
β
βΌ
βββββββββββββββββββββββββββββββββββ
β ViralCut Agent (Qwen2.5-3B) β
β Fine-tuned for tool-calling β
β β
β Thinks β Plans β Calls Tools β
ββββββββββββ¬βββββββββββββββββββββββ
β
ββββββββΌβββββββββββββββββββββββ
β β β β β
βΌ βΌ βΌ βΌ βΌ
FFmpeg Web Video Viral AI Slop
Edit Search Anal. Score Detect
β β β β β
ββββββββ΄βββββββ΄βββββββ΄βββββββββ
β
βΌ
Final edited video + caption + strategy
```
## Example Output
**Input:** "I have 8 minutes of raw ramen footage from Tokyo. Make a TikTok."
**Agent actions:**
1. π `analyze_video(raw_ramen.mp4, "full")` β Found 8 scenes, best: noodle pull at 0.9 energy
2. π `web_search("trending TikTok sounds food ASMR 2025")` β Lo-fi city pop trending
3. π΅ `web_search("royalty free lo-fi Japanese beat")` β Found "Tokyo Nights" CC BY 4.0
4. βοΈ `ffmpeg_cmd(...)` β Extracted hook shot with color boost
5. βοΈ `ffmpeg_cmd(...)` β Speed-ramped broth prep
6. βοΈ `ffmpeg_cmd(...)` β Assembled with fadeblack + slideright transitions
7. π΅ `ffmpeg_cmd(...)` β Mixed lo-fi music at 70% with ambient
8. π `ffmpeg_cmd(...)` β Added text hook + location overlay
9. π `score_virality(...)` β 82/100
10. π« `detect_ai_slop(...)` β Authentic β
11. βοΈ `generate_caption(...)` β "This man has been making ramen by hand for 30 years"
**Output:** 17s vertical TikTok with professional transitions, trending music, text overlays. Score: 82/100.
## Limitations
- Model is 3B parameters β for complex creative decisions, larger models (7B+) would perform better
- FFmpeg commands may need adjustment for specific file formats
- Virality scoring is heuristic-based, not ML-based
- Web search requires `duckduckgo-search` package
- No actual video generation β this is an *editing* agent that works with your existing footage
## Citation
```bibtex
@misc{viralcut-agent-2025,
title={ViralCut Agent: Autonomous Video Editing for Social Media},
author={ryu34},
year={2025},
url={https://huggingface.co/ryu34/viralcut-agent}
}
```
|