Text Generation
video-editing
social-media
agent
tool-calling
sft
trl
viralcut
viralcut-agent / README.md
ryu34's picture
Upload README.md
5c2da8e verified
---
license: apache-2.0
base_model: Qwen/Qwen2.5-3B-Instruct
tags:
- video-editing
- social-media
- agent
- tool-calling
- sft
- trl
- viralcut
datasets:
- ryu34/viralcut-agent-data
- benxh/tiktok-hooks-finetune
- NousResearch/hermes-function-calling-v1
pipeline_tag: text-generation
---
# 🎬 ViralCut Agent
**An autonomous AI agent that transforms raw video footage into professional, viral-worthy social media content.**
ViralCut Agent is a fine-tuned [Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct) model trained with QLoRA SFT on tool-calling trajectories for video editing, social media optimization, and content strategy.
## What It Does
| Capability | How |
|---|---|
| 🎬 **Video Analysis** | Analyze raw footage, find best moments, detect scenes |
| βœ‚οΈ **Professional Editing** | Trim, transitions, effects, text overlays, color grading via FFmpeg |
| 🎡 **Audio Production** | Search & add trending royalty-free music, sound effects, audio mixing |
| πŸ“Š **Viral Optimization** | Score content for TikTok/Instagram/YouTube, optimize for algorithms |
| πŸ” **Trend Research** | Search current trends, hooks, sounds via web search |
| 🚫 **AI Slop Detection** | Filter out AI-generated junk content |
| ✍️ **Caption Generation** | Platform-optimized captions, hashtags, posting strategy |
## Tools
The agent was trained to call these tools autonomously:
```python
# 1. FFmpeg for video processing
ffmpeg_cmd(command="ffmpeg -y -i input.mp4 -vf 'eq=saturation=1.3' output.mp4",
description="Boost color saturation")
# 2. Web search for assets and trends
web_search(query="trending TikTok sounds food 2025", search_type="trending_content")
web_search(query="royalty free lo-fi beat", search_type="royalty_free_music")
# 3. Video analysis
analyze_video(video_path="raw.mp4", analysis_type="full")
# 4. Virality scoring
score_virality(video_path="edit.mp4", platform="tiktok", niche="food")
# 5. Caption generation
generate_caption(video_description="...", platform="tiktok", tone="casual")
# 6. AI content detection
detect_ai_slop(content_path="broll.mp4", check_type="video")
```
## Quick Start
### Install
```bash
pip install transformers torch peft bitsandbytes duckduckgo-search
```
### Use as Agent (with real tools)
```bash
# Clone the repo
git clone https://huggingface.co/ryu34/viralcut-agent
cd viralcut-agent
# Edit a video
python agent.py --video raw_footage.mp4 --platform tiktok --niche food
# Get a content plan (no video needed)
python agent.py --plan --niche "coffee shop" --platform tiktok
# Check files for AI slop
python agent.py --check-slop clip1.mp4 clip2.mp4
# Interactive mode
python agent.py
```
### Use as Model (inference only)
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("ryu34/viralcut-agent", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("ryu34/viralcut-agent")
messages = [
{"role": "system", "content": "You are ViralCut Agent..."},
{"role": "user", "content": "Edit my beach video into a TikTok with trending music and effects"}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=1024)
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:]))
```
## Training
### Data
Mixed dataset of ~2,800 examples:
- **10 synthetic video editing trajectories** β€” multi-turn conversations showing full edit pipelines (analyze β†’ search β†’ edit β†’ score β†’ caption)
- **~1,300 TikTok hooks/captions** β€” real viral content data from [benxh/tiktok-hooks-finetune](https://huggingface.co/datasets/benxh/tiktok-hooks-finetune)
- **~1,200 general function-calling** β€” tool-use backbone from [NousResearch/hermes-function-calling-v1](https://huggingface.co/datasets/NousResearch/hermes-function-calling-v1)
Full dataset: [ryu34/viralcut-agent-data](https://huggingface.co/datasets/ryu34/viralcut-agent-data)
### Method
- **Base model**: Qwen/Qwen2.5-3B-Instruct
- **Method**: QLoRA SFT (4-bit quantization, rank 16, alpha 32)
- **Training**: 3 epochs, lr=2e-4, cosine schedule, assistant-only loss
- **Hardware**: T4 16GB GPU (free tier compatible)
- **Framework**: TRL v1.3+ SFTTrainer
### Train It Yourself
```bash
# Option 1: Google Colab (free T4 GPU)
# Open: https://huggingface.co/datasets/ryu34/viralcut-agent-data/blob/main/train_colab.ipynb
# Option 2: Direct script
wget https://huggingface.co/datasets/ryu34/viralcut-agent-data/resolve/main/train.py
pip install transformers trl torch datasets accelerate peft bitsandbytes
python train.py
```
## Architecture
```
User Request ("Edit my raw footage into a viral TikTok")
β”‚
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ ViralCut Agent (Qwen2.5-3B) β”‚
β”‚ Fine-tuned for tool-calling β”‚
β”‚ β”‚
β”‚ Thinks β†’ Plans β†’ Calls Tools β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ β”‚ β”‚ β”‚ β”‚
β–Ό β–Ό β–Ό β–Ό β–Ό
FFmpeg Web Video Viral AI Slop
Edit Search Anal. Score Detect
β”‚ β”‚ β”‚ β”‚ β”‚
β””β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β–Ό
Final edited video + caption + strategy
```
## Example Output
**Input:** "I have 8 minutes of raw ramen footage from Tokyo. Make a TikTok."
**Agent actions:**
1. πŸ“Š `analyze_video(raw_ramen.mp4, "full")` β†’ Found 8 scenes, best: noodle pull at 0.9 energy
2. πŸ” `web_search("trending TikTok sounds food ASMR 2025")` β†’ Lo-fi city pop trending
3. 🎡 `web_search("royalty free lo-fi Japanese beat")` β†’ Found "Tokyo Nights" CC BY 4.0
4. βœ‚οΈ `ffmpeg_cmd(...)` β†’ Extracted hook shot with color boost
5. βœ‚οΈ `ffmpeg_cmd(...)` β†’ Speed-ramped broth prep
6. βœ‚οΈ `ffmpeg_cmd(...)` β†’ Assembled with fadeblack + slideright transitions
7. 🎡 `ffmpeg_cmd(...)` β†’ Mixed lo-fi music at 70% with ambient
8. πŸ“ `ffmpeg_cmd(...)` β†’ Added text hook + location overlay
9. πŸ“ˆ `score_virality(...)` β†’ 82/100
10. 🚫 `detect_ai_slop(...)` β†’ Authentic βœ…
11. ✍️ `generate_caption(...)` β†’ "This man has been making ramen by hand for 30 years"
**Output:** 17s vertical TikTok with professional transitions, trending music, text overlays. Score: 82/100.
## Limitations
- Model is 3B parameters β€” for complex creative decisions, larger models (7B+) would perform better
- FFmpeg commands may need adjustment for specific file formats
- Virality scoring is heuristic-based, not ML-based
- Web search requires `duckduckgo-search` package
- No actual video generation β€” this is an *editing* agent that works with your existing footage
## Citation
```bibtex
@misc{viralcut-agent-2025,
title={ViralCut Agent: Autonomous Video Editing for Social Media},
author={ryu34},
year={2025},
url={https://huggingface.co/ryu34/viralcut-agent}
}
```