| --- |
| license: apache-2.0 |
| base_model: Qwen/Qwen2.5-3B-Instruct |
| tags: |
| - video-editing |
| - social-media |
| - agent |
| - tool-calling |
| - sft |
| - trl |
| - viralcut |
| datasets: |
| - ryu34/viralcut-agent-data |
| - benxh/tiktok-hooks-finetune |
| - NousResearch/hermes-function-calling-v1 |
| pipeline_tag: text-generation |
| --- |
| |
| # π¬ ViralCut Agent |
|
|
| **An autonomous AI agent that transforms raw video footage into professional, viral-worthy social media content.** |
|
|
| ViralCut Agent is a fine-tuned [Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct) model trained with QLoRA SFT on tool-calling trajectories for video editing, social media optimization, and content strategy. |
|
|
| ## What It Does |
|
|
| | Capability | How | |
| |---|---| |
| | π¬ **Video Analysis** | Analyze raw footage, find best moments, detect scenes | |
| | βοΈ **Professional Editing** | Trim, transitions, effects, text overlays, color grading via FFmpeg | |
| | π΅ **Audio Production** | Search & add trending royalty-free music, sound effects, audio mixing | |
| | π **Viral Optimization** | Score content for TikTok/Instagram/YouTube, optimize for algorithms | |
| | π **Trend Research** | Search current trends, hooks, sounds via web search | |
| | π« **AI Slop Detection** | Filter out AI-generated junk content | |
| | βοΈ **Caption Generation** | Platform-optimized captions, hashtags, posting strategy | |
|
|
| ## Tools |
|
|
| The agent was trained to call these tools autonomously: |
|
|
| ```python |
| # 1. FFmpeg for video processing |
| ffmpeg_cmd(command="ffmpeg -y -i input.mp4 -vf 'eq=saturation=1.3' output.mp4", |
| description="Boost color saturation") |
| |
| # 2. Web search for assets and trends |
| web_search(query="trending TikTok sounds food 2025", search_type="trending_content") |
| web_search(query="royalty free lo-fi beat", search_type="royalty_free_music") |
| |
| # 3. Video analysis |
| analyze_video(video_path="raw.mp4", analysis_type="full") |
| |
| # 4. Virality scoring |
| score_virality(video_path="edit.mp4", platform="tiktok", niche="food") |
| |
| # 5. Caption generation |
| generate_caption(video_description="...", platform="tiktok", tone="casual") |
| |
| # 6. AI content detection |
| detect_ai_slop(content_path="broll.mp4", check_type="video") |
| ``` |
|
|
| ## Quick Start |
|
|
| ### Install |
| ```bash |
| pip install transformers torch peft bitsandbytes duckduckgo-search |
| ``` |
|
|
| ### Use as Agent (with real tools) |
| ```bash |
| # Clone the repo |
| git clone https://huggingface.co/ryu34/viralcut-agent |
| cd viralcut-agent |
| |
| # Edit a video |
| python agent.py --video raw_footage.mp4 --platform tiktok --niche food |
| |
| # Get a content plan (no video needed) |
| python agent.py --plan --niche "coffee shop" --platform tiktok |
| |
| # Check files for AI slop |
| python agent.py --check-slop clip1.mp4 clip2.mp4 |
| |
| # Interactive mode |
| python agent.py |
| ``` |
|
|
| ### Use as Model (inference only) |
| ```python |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| |
| model = AutoModelForCausalLM.from_pretrained("ryu34/viralcut-agent", device_map="auto") |
| tokenizer = AutoTokenizer.from_pretrained("ryu34/viralcut-agent") |
| |
| messages = [ |
| {"role": "system", "content": "You are ViralCut Agent..."}, |
| {"role": "user", "content": "Edit my beach video into a TikTok with trending music and effects"} |
| ] |
| |
| text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) |
| inputs = tokenizer(text, return_tensors="pt").to(model.device) |
| outputs = model.generate(**inputs, max_new_tokens=1024) |
| print(tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:])) |
| ``` |
|
|
| ## Training |
|
|
| ### Data |
| Mixed dataset of ~2,800 examples: |
| - **10 synthetic video editing trajectories** β multi-turn conversations showing full edit pipelines (analyze β search β edit β score β caption) |
| - **~1,300 TikTok hooks/captions** β real viral content data from [benxh/tiktok-hooks-finetune](https://huggingface.co/datasets/benxh/tiktok-hooks-finetune) |
| - **~1,200 general function-calling** β tool-use backbone from [NousResearch/hermes-function-calling-v1](https://huggingface.co/datasets/NousResearch/hermes-function-calling-v1) |
|
|
| Full dataset: [ryu34/viralcut-agent-data](https://huggingface.co/datasets/ryu34/viralcut-agent-data) |
|
|
| ### Method |
| - **Base model**: Qwen/Qwen2.5-3B-Instruct |
| - **Method**: QLoRA SFT (4-bit quantization, rank 16, alpha 32) |
| - **Training**: 3 epochs, lr=2e-4, cosine schedule, assistant-only loss |
| - **Hardware**: T4 16GB GPU (free tier compatible) |
| - **Framework**: TRL v1.3+ SFTTrainer |
|
|
| ### Train It Yourself |
| ```bash |
| # Option 1: Google Colab (free T4 GPU) |
| # Open: https://huggingface.co/datasets/ryu34/viralcut-agent-data/blob/main/train_colab.ipynb |
| |
| # Option 2: Direct script |
| wget https://huggingface.co/datasets/ryu34/viralcut-agent-data/resolve/main/train.py |
| pip install transformers trl torch datasets accelerate peft bitsandbytes |
| python train.py |
| ``` |
|
|
| ## Architecture |
|
|
| ``` |
| User Request ("Edit my raw footage into a viral TikTok") |
| β |
| βΌ |
| βββββββββββββββββββββββββββββββββββ |
| β ViralCut Agent (Qwen2.5-3B) β |
| β Fine-tuned for tool-calling β |
| β β |
| β Thinks β Plans β Calls Tools β |
| ββββββββββββ¬βββββββββββββββββββββββ |
| β |
| ββββββββΌβββββββββββββββββββββββ |
| β β β β β |
| βΌ βΌ βΌ βΌ βΌ |
| FFmpeg Web Video Viral AI Slop |
| Edit Search Anal. Score Detect |
| β β β β β |
| ββββββββ΄βββββββ΄βββββββ΄βββββββββ |
| β |
| βΌ |
| Final edited video + caption + strategy |
| ``` |
|
|
| ## Example Output |
|
|
| **Input:** "I have 8 minutes of raw ramen footage from Tokyo. Make a TikTok." |
|
|
| **Agent actions:** |
| 1. π `analyze_video(raw_ramen.mp4, "full")` β Found 8 scenes, best: noodle pull at 0.9 energy |
| 2. π `web_search("trending TikTok sounds food ASMR 2025")` β Lo-fi city pop trending |
| 3. π΅ `web_search("royalty free lo-fi Japanese beat")` β Found "Tokyo Nights" CC BY 4.0 |
| 4. βοΈ `ffmpeg_cmd(...)` β Extracted hook shot with color boost |
| 5. βοΈ `ffmpeg_cmd(...)` β Speed-ramped broth prep |
| 6. βοΈ `ffmpeg_cmd(...)` β Assembled with fadeblack + slideright transitions |
| 7. π΅ `ffmpeg_cmd(...)` β Mixed lo-fi music at 70% with ambient |
| 8. π `ffmpeg_cmd(...)` β Added text hook + location overlay |
| 9. π `score_virality(...)` β 82/100 |
| 10. π« `detect_ai_slop(...)` β Authentic β
|
| 11. βοΈ `generate_caption(...)` β "This man has been making ramen by hand for 30 years" |
|
|
| **Output:** 17s vertical TikTok with professional transitions, trending music, text overlays. Score: 82/100. |
|
|
| ## Limitations |
|
|
| - Model is 3B parameters β for complex creative decisions, larger models (7B+) would perform better |
| - FFmpeg commands may need adjustment for specific file formats |
| - Virality scoring is heuristic-based, not ML-based |
| - Web search requires `duckduckgo-search` package |
| - No actual video generation β this is an *editing* agent that works with your existing footage |
|
|
| ## Citation |
|
|
| ```bibtex |
| @misc{viralcut-agent-2025, |
| title={ViralCut Agent: Autonomous Video Editing for Social Media}, |
| author={ryu34}, |
| year={2025}, |
| url={https://huggingface.co/ryu34/viralcut-agent} |
| } |
| ``` |
|
|