VideoToText / README.md
vidhi0405's picture
commit 3
a4edb01
---
title: SmolVLM2 Video Highlights
emoji: "🎬"
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
license: apache-2.0
app_port: 7860
---
# SmolVLM2 HuggingFace Segment-Based Video Highlights API
Generate intelligent video highlights using HuggingFace's segment-based approach
This is a FastAPI service that uses HuggingFace's proven segment-based classification method with SmolVLM2-256M-Video-Instruct for reliable, consistent highlight generation.
🚀 Features
Segment-Based Analysis: Processes videos in fixed 5-second segments for consistent AI classification
Dual Criteria Generation: Creates two different highlight criteria sets and selects the most selective one
SmolVLM2-256M-Video-Instruct: Faster processing with specialized video understanding
Visual Effects: Optional fade transitions between segments for professional-quality output
REST API: Upload videos and get generated video description + analysis file path
🔗 API Endpoints
POST /upload-video - Upload video and receive analysis response
GET /health - Health check
📱 Usage
Via API
# Upload video with optional parameters
curl -X POST \
-F "video=@your_video.mp4" \
-F "segment_length=5.0" \
-F "model_name=HuggingFaceTB/SmolVLM2-256M-Video-Instruct" \
-F "with_effects=true" \
https://your-space-url.hf.space/upload-video
Example response:
{
"success": true,
"message": "Video description generated successfully",
"video_description": "A concise description of the uploaded video...",
"analysis_file": "/tmp/outputs/<uuid>_analysis.json"
}
Via Android App
Use the provided Android client code to integrate with your mobile app.
⚙️ Configuration
Default settings:
Segment Length: 5 seconds (fixed segments for consistent classification)
Model: SmolVLM2-256M-Video-Instruct (faster processing)
Effects: Enabled (fade transitions between segments)
Dual Criteria: Two prompt variations for robust selection
🛠️ Technology Stack
SmolVLM2-256M-Video-Instruct: Efficient vision-language model optimized for video understanding
HuggingFace Transformers: Latest transformer models and inference
FastAPI: Modern web framework for APIs
FFmpeg: Video processing with advanced filter support
PyTorch: Deep learning framework with device optimization
🎯 Perfect For
Social media content creators
Educational video processing
Meeting/lecture summarization
Sports highlight generation
Entertainment content curation
�� License
Apache 2.0 - Free for commercial and personal use
🤝 Contributing
Built with ❤️ using Hugging Face Transformers and open-source AI models.