--- license: apache-2.0 base_model: Qwen/Qwen2.5-3B-Instruct tags: - video-editing - social-media - agent - tool-calling - sft - trl - viralcut datasets: - ryu34/viralcut-agent-data - benxh/tiktok-hooks-finetune - NousResearch/hermes-function-calling-v1 pipeline_tag: text-generation --- # 🎬 ViralCut Agent **An autonomous AI agent that transforms raw video footage into professional, viral-worthy social media content.** ViralCut Agent is a fine-tuned [Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct) model trained with QLoRA SFT on tool-calling trajectories for video editing, social media optimization, and content strategy. ## What It Does | Capability | How | |---|---| | 🎬 **Video Analysis** | Analyze raw footage, find best moments, detect scenes | | ✂️ **Professional Editing** | Trim, transitions, effects, text overlays, color grading via FFmpeg | | 🎵 **Audio Production** | Search & add trending royalty-free music, sound effects, audio mixing | | 📊 **Viral Optimization** | Score content for TikTok/Instagram/YouTube, optimize for algorithms | | 🔍 **Trend Research** | Search current trends, hooks, sounds via web search | | 🚫 **AI Slop Detection** | Filter out AI-generated junk content | | ✍️ **Caption Generation** | Platform-optimized captions, hashtags, posting strategy | ## Tools The agent was trained to call these tools autonomously: ```python # 1. FFmpeg for video processing ffmpeg_cmd(command="ffmpeg -y -i input.mp4 -vf 'eq=saturation=1.3' output.mp4", description="Boost color saturation") # 2. Web search for assets and trends web_search(query="trending TikTok sounds food 2025", search_type="trending_content") web_search(query="royalty free lo-fi beat", search_type="royalty_free_music") # 3. Video analysis analyze_video(video_path="raw.mp4", analysis_type="full") # 4. Virality scoring score_virality(video_path="edit.mp4", platform="tiktok", niche="food") # 5. Caption generation generate_caption(video_description="...", platform="tiktok", tone="casual") # 6. AI content detection detect_ai_slop(content_path="broll.mp4", check_type="video") ``` ## Quick Start ### Install ```bash pip install transformers torch peft bitsandbytes duckduckgo-search ``` ### Use as Agent (with real tools) ```bash # Clone the repo git clone https://huggingface.co/ryu34/viralcut-agent cd viralcut-agent # Edit a video python agent.py --video raw_footage.mp4 --platform tiktok --niche food # Get a content plan (no video needed) python agent.py --plan --niche "coffee shop" --platform tiktok # Check files for AI slop python agent.py --check-slop clip1.mp4 clip2.mp4 # Interactive mode python agent.py ``` ### Use as Model (inference only) ```python from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained("ryu34/viralcut-agent", device_map="auto") tokenizer = AutoTokenizer.from_pretrained("ryu34/viralcut-agent") messages = [ {"role": "system", "content": "You are ViralCut Agent..."}, {"role": "user", "content": "Edit my beach video into a TikTok with trending music and effects"} ] text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tokenizer(text, return_tensors="pt").to(model.device) outputs = model.generate(**inputs, max_new_tokens=1024) print(tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:])) ``` ## Training ### Data Mixed dataset of ~2,800 examples: - **10 synthetic video editing trajectories** — multi-turn conversations showing full edit pipelines (analyze → search → edit → score → caption) - **~1,300 TikTok hooks/captions** — real viral content data from [benxh/tiktok-hooks-finetune](https://huggingface.co/datasets/benxh/tiktok-hooks-finetune) - **~1,200 general function-calling** — tool-use backbone from [NousResearch/hermes-function-calling-v1](https://huggingface.co/datasets/NousResearch/hermes-function-calling-v1) Full dataset: [ryu34/viralcut-agent-data](https://huggingface.co/datasets/ryu34/viralcut-agent-data) ### Method - **Base model**: Qwen/Qwen2.5-3B-Instruct - **Method**: QLoRA SFT (4-bit quantization, rank 16, alpha 32) - **Training**: 3 epochs, lr=2e-4, cosine schedule, assistant-only loss - **Hardware**: T4 16GB GPU (free tier compatible) - **Framework**: TRL v1.3+ SFTTrainer ### Train It Yourself ```bash # Option 1: Google Colab (free T4 GPU) # Open: https://huggingface.co/datasets/ryu34/viralcut-agent-data/blob/main/train_colab.ipynb # Option 2: Direct script wget https://huggingface.co/datasets/ryu34/viralcut-agent-data/resolve/main/train.py pip install transformers trl torch datasets accelerate peft bitsandbytes python train.py ``` ## Architecture ``` User Request ("Edit my raw footage into a viral TikTok") │ ▼ ┌─────────────────────────────────┐ │ ViralCut Agent (Qwen2.5-3B) │ │ Fine-tuned for tool-calling │ │ │ │ Thinks → Plans → Calls Tools │ └──────────┬──────────────────────┘ │ ┌──────┼──────────────────────┐ │ │ │ │ │ ▼ ▼ ▼ ▼ ▼ FFmpeg Web Video Viral AI Slop Edit Search Anal. Score Detect │ │ │ │ │ └──────┴──────┴──────┴────────┘ │ ▼ Final edited video + caption + strategy ``` ## Example Output **Input:** "I have 8 minutes of raw ramen footage from Tokyo. Make a TikTok." **Agent actions:** 1. 📊 `analyze_video(raw_ramen.mp4, "full")` → Found 8 scenes, best: noodle pull at 0.9 energy 2. 🔍 `web_search("trending TikTok sounds food ASMR 2025")` → Lo-fi city pop trending 3. 🎵 `web_search("royalty free lo-fi Japanese beat")` → Found "Tokyo Nights" CC BY 4.0 4. ✂️ `ffmpeg_cmd(...)` → Extracted hook shot with color boost 5. ✂️ `ffmpeg_cmd(...)` → Speed-ramped broth prep 6. ✂️ `ffmpeg_cmd(...)` → Assembled with fadeblack + slideright transitions 7. 🎵 `ffmpeg_cmd(...)` → Mixed lo-fi music at 70% with ambient 8. 📝 `ffmpeg_cmd(...)` → Added text hook + location overlay 9. 📈 `score_virality(...)` → 82/100 10. 🚫 `detect_ai_slop(...)` → Authentic ✅ 11. ✍️ `generate_caption(...)` → "This man has been making ramen by hand for 30 years" **Output:** 17s vertical TikTok with professional transitions, trending music, text overlays. Score: 82/100. ## Limitations - Model is 3B parameters — for complex creative decisions, larger models (7B+) would perform better - FFmpeg commands may need adjustment for specific file formats - Virality scoring is heuristic-based, not ML-based - Web search requires `duckduckgo-search` package - No actual video generation — this is an *editing* agent that works with your existing footage ## Citation ```bibtex @misc{viralcut-agent-2025, title={ViralCut Agent: Autonomous Video Editing for Social Media}, author={ryu34}, year={2025}, url={https://huggingface.co/ryu34/viralcut-agent} } ```