Spaces:

vidhi0405
/

VideoToText

Sleeping

App Files Files Community

VideoToText / README.md

vidhi0405

commit 3

a4edb01 8 days ago

preview code

raw

history blame contribute delete

2.64 kB

metadata

title: SmolVLM2 Video Highlights
emoji: 🎬
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
license: apache-2.0
app_port: 7860

SmolVLM2 HuggingFace Segment-Based Video Highlights API

Generate intelligent video highlights using HuggingFace's segment-based approach

This is a FastAPI service that uses HuggingFace's proven segment-based classification method with SmolVLM2-256M-Video-Instruct for reliable, consistent highlight generation.

ðŸš€ Features Segment-Based Analysis: Processes videos in fixed 5-second segments for consistent AI classification Dual Criteria Generation: Creates two different highlight criteria sets and selects the most selective one SmolVLM2-256M-Video-Instruct: Faster processing with specialized video understanding Visual Effects: Optional fade transitions between segments for professional-quality output REST API: Upload videos and get generated video description + analysis file path ðŸ”— API Endpoints POST /upload-video - Upload video and receive analysis response GET /health - Health check ðŸ“± Usage Via API

Upload video with optional parameters

curl -X POST
-F "video=@your_video.mp4"
-F "segment_length=5.0"
-F "model_name=HuggingFaceTB/SmolVLM2-256M-Video-Instruct"
-F "with_effects=true"
https://your-space-url.hf.space/upload-video Example response:

{ "success": true, "message": "Video description generated successfully", "video_description": "A concise description of the uploaded video...", "analysis_file": "/tmp/outputs/_analysis.json" } Via Android App Use the provided Android client code to integrate with your mobile app.

âš™ï¸ Configuration Default settings:

Segment Length: 5 seconds (fixed segments for consistent classification) Model: SmolVLM2-256M-Video-Instruct (faster processing) Effects: Enabled (fade transitions between segments) Dual Criteria: Two prompt variations for robust selection ðŸ› ï¸ Technology Stack SmolVLM2-256M-Video-Instruct: Efficient vision-language model optimized for video understanding HuggingFace Transformers: Latest transformer models and inference FastAPI: Modern web framework for APIs FFmpeg: Video processing with advanced filter support PyTorch: Deep learning framework with device optimization ðŸŽ¯ Perfect For Social media content creators Educational video processing Meeting/lecture summarization Sports highlight generation Entertainment content curation ï¿½ï¿½ License Apache 2.0 - Free for commercial and personal use

ðŸ¤ Contributing Built with â¤ï¸ using Hugging Face Transformers and open-source AI models.