Spaces:

vidhi0405
/

VideoToText

Sleeping

App Files Files Community

VideoToText / README.md

vidhi0405

commit 3

a4edb01 8 days ago

preview code

raw

history blame contribute delete

2.64 kB

	---
	title: SmolVLM2 Video Highlights
	emoji: "🎬"
	colorFrom: blue
	colorTo: purple
	sdk: docker
	pinned: false
	license: apache-2.0
	app_port: 7860
	---

	# SmolVLM2 HuggingFace Segment-Based Video Highlights API
	Generate intelligent video highlights using HuggingFace's segment-based approach

	This is a FastAPI service that uses HuggingFace's proven segment-based classification method with SmolVLM2-256M-Video-Instruct for reliable, consistent highlight generation.

	ðŸš€ Features
	Segment-Based Analysis: Processes videos in fixed 5-second segments for consistent AI classification
	Dual Criteria Generation: Creates two different highlight criteria sets and selects the most selective one
	SmolVLM2-256M-Video-Instruct: Faster processing with specialized video understanding
	Visual Effects: Optional fade transitions between segments for professional-quality output
	REST API: Upload videos and get generated video description + analysis file path
	ðŸ”— API Endpoints
	POST /upload-video - Upload video and receive analysis response
	GET /health - Health check
	ðŸ“± Usage
	Via API
	# Upload video with optional parameters
	curl -X POST \
	-F "video=@your_video.mp4" \
	-F "segment_length=5.0" \
	-F "model_name=HuggingFaceTB/SmolVLM2-256M-Video-Instruct" \
	-F "with_effects=true" \
	https://your-space-url.hf.space/upload-video
	Example response:

	{
	"success": true,
	"message": "Video description generated successfully",
	"video_description": "A concise description of the uploaded video...",
	"analysis_file": "/tmp/outputs/<uuid>_analysis.json"
	}
	Via Android App
	Use the provided Android client code to integrate with your mobile app.

	âš™ï¸ Configuration
	Default settings:

	Segment Length: 5 seconds (fixed segments for consistent classification)
	Model: SmolVLM2-256M-Video-Instruct (faster processing)
	Effects: Enabled (fade transitions between segments)
	Dual Criteria: Two prompt variations for robust selection
	ðŸ› ï¸ Technology Stack
	SmolVLM2-256M-Video-Instruct: Efficient vision-language model optimized for video understanding
	HuggingFace Transformers: Latest transformer models and inference
	FastAPI: Modern web framework for APIs
	FFmpeg: Video processing with advanced filter support
	PyTorch: Deep learning framework with device optimization
	ðŸŽ¯ Perfect For
	Social media content creators
	Educational video processing
	Meeting/lecture summarization
	Sports highlight generation
	Entertainment content curation
	ï¿½ï¿½ License
	Apache 2.0 - Free for commercial and personal use

	ðŸ¤ Contributing
	Built with â¤ï¸ using Hugging Face Transformers and open-source AI models.