VideoToText / README.md
vidhi0405's picture
commit 3
a4edb01
metadata
title: SmolVLM2 Video Highlights
emoji: 🎬
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
license: apache-2.0
app_port: 7860

SmolVLM2 HuggingFace Segment-Based Video Highlights API

Generate intelligent video highlights using HuggingFace's segment-based approach

This is a FastAPI service that uses HuggingFace's proven segment-based classification method with SmolVLM2-256M-Video-Instruct for reliable, consistent highlight generation.

🚀 Features Segment-Based Analysis: Processes videos in fixed 5-second segments for consistent AI classification Dual Criteria Generation: Creates two different highlight criteria sets and selects the most selective one SmolVLM2-256M-Video-Instruct: Faster processing with specialized video understanding Visual Effects: Optional fade transitions between segments for professional-quality output REST API: Upload videos and get generated video description + analysis file path 🔗 API Endpoints POST /upload-video - Upload video and receive analysis response GET /health - Health check 📱 Usage Via API

Upload video with optional parameters

curl -X POST
-F "video=@your_video.mp4"
-F "segment_length=5.0"
-F "model_name=HuggingFaceTB/SmolVLM2-256M-Video-Instruct"
-F "with_effects=true"
https://your-space-url.hf.space/upload-video Example response:

{ "success": true, "message": "Video description generated successfully", "video_description": "A concise description of the uploaded video...", "analysis_file": "/tmp/outputs/_analysis.json" } Via Android App Use the provided Android client code to integrate with your mobile app.

⚙️ Configuration Default settings:

Segment Length: 5 seconds (fixed segments for consistent classification) Model: SmolVLM2-256M-Video-Instruct (faster processing) Effects: Enabled (fade transitions between segments) Dual Criteria: Two prompt variations for robust selection 🛠️ Technology Stack SmolVLM2-256M-Video-Instruct: Efficient vision-language model optimized for video understanding HuggingFace Transformers: Latest transformer models and inference FastAPI: Modern web framework for APIs FFmpeg: Video processing with advanced filter support PyTorch: Deep learning framework with device optimization 🎯 Perfect For Social media content creators Educational video processing Meeting/lecture summarization Sports highlight generation Entertainment content curation �� License Apache 2.0 - Free for commercial and personal use

🤝 Contributing Built with ❤️ using Hugging Face Transformers and open-source AI models.