Spaces:
Runtime error
Runtime error
A newer version of the Gradio SDK is available:
6.7.0
metadata
title: Video Analyzer
emoji: 🎬
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 6.2.0
python_version: '3.11'
app_file: app.py
pinned: false
license: mit
suggested_hardware: zero-a10g
suggested_storage: small
hf_oauth: true
hf_oauth_scopes:
- inference-api
hf_oauth_expiration_minutes: 480
tags:
- video
- youtube
- transcription
- whisper
- rag
- chatbot
models:
- openai/whisper-base
- Salesforce/blip-image-captioning-base
- Qwen/Qwen2.5-72B-Instruct
short_description: Download, transcribe, and chat with YouTube videos using AI
Video Analyzer
A conversational AI assistant that analyzes YouTube videos and answers questions about their content.
Features
Core Capabilities
- YouTube Video Download: Supports videos, playlists, and shorts via yt-dlp
- Speech-to-Text: Automatic transcription using OpenAI Whisper (whisper-base)
- Visual Analysis: Key frame extraction and captioning with BLIP
- Knowledge Base: Per-session vector storage with ChromaDB for semantic search
- RAG Chatbot: Ask questions about your videos using Qwen2.5-72B-Instruct
Voice Interaction
- Voice Input: Speak your questions using the microphone (transcribed with Whisper)
- Voice Output: Hear responses read aloud with natural TTS
- Dual TTS Engines:
- Edge-TTS (default): Fast, free Microsoft voices, no GPU needed
- Parler-TTS (optional): SOTA quality with GPU, HuggingFace model
User Experience
- Unified Chat Interface: Single chatbot handles both video analysis and Q&A
- Auto URL Detection: Just paste a YouTube URL and the assistant analyzes it
- Conversational Flow: The assistant guides you through the process
- Per-Session Storage: Your analyzed videos are private to your session
- Persistent Sessions: Your knowledge base persists across page reloads (tied to your HuggingFace profile)
Technical Features
- ZeroGPU Support: Leverages HuggingFace ZeroGPU for faster GPU-accelerated processing
- Model Fallback: Automatic fallback chain (Qwen2.5-72B → Llama-3.1-70B) for reliability
- HuggingFace OAuth: Secure authentication via HuggingFace login
- Gradio 6: Modern UI with the Soft theme
How to Use
- Sign in with your HuggingFace account using the button in the top right
- Paste a YouTube URL directly in the chat (e.g.,
https://youtube.com/watch?v=...) - Wait for processing - the assistant will transcribe audio and analyze key frames
- Ask questions about the video content in natural language
Example Interactions
You: https://youtube.com/watch?v=dQw4w9WgXcQ
Bot: I'll analyze that video for you. This may take a few minutes...
Bot: Done! I've analyzed "Never Gonna Give You Up" and added it to my knowledge base.
You: What is this video about?
Bot: Based on the transcript, this video is a music video for Rick Astley's 1987 hit song...
You: What visual elements were shown?
Bot: The video shows a man dancing in various locations...
Tech Stack
| Component | Technology |
|---|---|
| Web Framework | Gradio 6 with OAuth |
| Speech Recognition | OpenAI Whisper (whisper-base) |
| Image Captioning | Salesforce BLIP |
| Vector Database | ChromaDB (in-memory, per-session) |
| Text Embeddings | Sentence Transformers (all-MiniLM-L6-v2) |
| Language Model | HuggingFace Inference API (Qwen2.5-72B-Instruct) |
| Video Download | yt-dlp |
| GPU Acceleration | HuggingFace ZeroGPU (A10G) |
Limitations
- Works best with videos under 10 minutes
- Requires HuggingFace login for authentication
- Knowledge base is session-based (stored in memory, not persistent across Space restarts)
- Audio extraction requires FFmpeg (pre-installed on HuggingFace Spaces)
Development
Prerequisites
- Python 3.11+
- uv package manager
- FFmpeg
Setup
# Install dependencies
uv sync
# Install dev dependencies
uv sync --extra dev
# Run the app locally
uv run python app.py
Testing
# Run unit tests
uv run --extra dev pytest tests/test_app.py -v
# Run E2E tests (requires playwright browsers)
uv run --extra dev playwright install
uv run --extra dev pytest tests/test_e2e.py -v
License
MIT