Spaces:

vidhi0405
/

VideoToText

Sleeping

File size: 2,638 Bytes

a4edb01
a4bd75a
a4edb01
a4bd75a
 
8a9a9e9
 
a4bd75a
 
23c0589
 
 
5efe51a
a4bd75a
58f0729
a4bd75a
a4edb01
5efe51a
 
 
 
 
a4edb01
5efe51a
 
a4edb01
5efe51a
58f0729
 
 
 
7ef6739
58f0729
 
1223036
5efe51a
1223036
 
 
 
 
 
5efe51a
a4bd75a
 
a4edb01
a4bd75a
 
5efe51a
 
 
 
a4edb01
5efe51a
 
 
 
 
a4edb01
5efe51a
 
 
 
 
a4edb01
a4bd75a
 
a4edb01

---
title: SmolVLM2 Video Highlights
emoji: "🎬"
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
license: apache-2.0
app_port: 7860
---

# SmolVLM2 HuggingFace Segment-Based Video Highlights API
Generate intelligent video highlights using HuggingFace's segment-based approach

This is a FastAPI service that uses HuggingFace's proven segment-based classification method with SmolVLM2-256M-Video-Instruct for reliable, consistent highlight generation.

ðŸš€ Features
Segment-Based Analysis: Processes videos in fixed 5-second segments for consistent AI classification
Dual Criteria Generation: Creates two different highlight criteria sets and selects the most selective one
SmolVLM2-256M-Video-Instruct: Faster processing with specialized video understanding
Visual Effects: Optional fade transitions between segments for professional-quality output
REST API: Upload videos and get generated video description + analysis file path
ðŸ”— API Endpoints
POST /upload-video - Upload video and receive analysis response
GET /health - Health check
ðŸ“± Usage
Via API
# Upload video with optional parameters
curl -X POST \
  -F "video=@your_video.mp4" \
  -F "segment_length=5.0" \
  -F "model_name=HuggingFaceTB/SmolVLM2-256M-Video-Instruct" \
  -F "with_effects=true" \
  https://your-space-url.hf.space/upload-video
Example response:

{
  "success": true,
  "message": "Video description generated successfully",
  "video_description": "A concise description of the uploaded video...",
  "analysis_file": "/tmp/outputs/<uuid>_analysis.json"
}
Via Android App
Use the provided Android client code to integrate with your mobile app.

âš™ï¸ Configuration
Default settings:

Segment Length: 5 seconds (fixed segments for consistent classification)
Model: SmolVLM2-256M-Video-Instruct (faster processing)
Effects: Enabled (fade transitions between segments)
Dual Criteria: Two prompt variations for robust selection
ðŸ› ï¸ Technology Stack
SmolVLM2-256M-Video-Instruct: Efficient vision-language model optimized for video understanding
HuggingFace Transformers: Latest transformer models and inference
FastAPI: Modern web framework for APIs
FFmpeg: Video processing with advanced filter support
PyTorch: Deep learning framework with device optimization
ðŸŽ¯ Perfect For
Social media content creators
Educational video processing
Meeting/lecture summarization
Sports highlight generation
Entertainment content curation
ï¿½ï¿½ License
Apache 2.0 - Free for commercial and personal use

ðŸ¤ Contributing
Built with â¤ï¸ using Hugging Face Transformers and open-source AI models.