Spaces:
Runtime error
Runtime error
| title: Video Analyzer | |
| emoji: "🎬" | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: gradio | |
| sdk_version: "6.2.0" | |
| python_version: "3.11" | |
| app_file: app.py | |
| pinned: false | |
| license: mit | |
| suggested_hardware: zero-a10g | |
| suggested_storage: small | |
| hf_oauth: true | |
| hf_oauth_scopes: | |
| - inference-api | |
| hf_oauth_expiration_minutes: 480 | |
| tags: | |
| - video | |
| - youtube | |
| - transcription | |
| - whisper | |
| - rag | |
| - chatbot | |
| models: | |
| - openai/whisper-base | |
| - Salesforce/blip-image-captioning-base | |
| - Qwen/Qwen2.5-72B-Instruct | |
| short_description: Download, transcribe, and chat with YouTube videos using AI | |
| # Video Analyzer | |
| A conversational AI assistant that analyzes YouTube videos and answers questions about their content. | |
| ## Features | |
| ### Core Capabilities | |
| - **YouTube Video Download**: Supports videos, playlists, and shorts via yt-dlp | |
| - **Speech-to-Text**: Automatic transcription using OpenAI Whisper (whisper-base) | |
| - **Visual Analysis**: Key frame extraction and captioning with BLIP | |
| - **Knowledge Base**: Per-session vector storage with ChromaDB for semantic search | |
| - **RAG Chatbot**: Ask questions about your videos using Qwen2.5-72B-Instruct | |
| ### Voice Interaction | |
| - **Voice Input**: Speak your questions using the microphone (transcribed with Whisper) | |
| - **Voice Output**: Hear responses read aloud with natural TTS | |
| - **Dual TTS Engines**: | |
| - **Edge-TTS** (default): Fast, free Microsoft voices, no GPU needed | |
| - **Parler-TTS** (optional): SOTA quality with GPU, HuggingFace model | |
| ### User Experience | |
| - **Unified Chat Interface**: Single chatbot handles both video analysis and Q&A | |
| - **Auto URL Detection**: Just paste a YouTube URL and the assistant analyzes it | |
| - **Conversational Flow**: The assistant guides you through the process | |
| - **Per-Session Storage**: Your analyzed videos are private to your session | |
| - **Persistent Sessions**: Your knowledge base persists across page reloads (tied to your HuggingFace profile) | |
| ### Technical Features | |
| - **ZeroGPU Support**: Leverages HuggingFace ZeroGPU for faster GPU-accelerated processing | |
| - **Model Fallback**: Automatic fallback chain (Qwen2.5-72B → Llama-3.1-70B) for reliability | |
| - **HuggingFace OAuth**: Secure authentication via HuggingFace login | |
| - **Gradio 6**: Modern UI with the Soft theme | |
| ## How to Use | |
| 1. **Sign in** with your HuggingFace account using the button in the top right | |
| 2. **Paste** a YouTube URL directly in the chat (e.g., `https://youtube.com/watch?v=...`) | |
| 3. **Wait** for processing - the assistant will transcribe audio and analyze key frames | |
| 4. **Ask questions** about the video content in natural language | |
| ### Example Interactions | |
| ``` | |
| You: https://youtube.com/watch?v=dQw4w9WgXcQ | |
| Bot: I'll analyze that video for you. This may take a few minutes... | |
| Bot: Done! I've analyzed "Never Gonna Give You Up" and added it to my knowledge base. | |
| You: What is this video about? | |
| Bot: Based on the transcript, this video is a music video for Rick Astley's 1987 hit song... | |
| You: What visual elements were shown? | |
| Bot: The video shows a man dancing in various locations... | |
| ``` | |
| ## Tech Stack | |
| | Component | Technology | | |
| |-----------|------------| | |
| | Web Framework | Gradio 6 with OAuth | | |
| | Speech Recognition | OpenAI Whisper (whisper-base) | | |
| | Image Captioning | Salesforce BLIP | | |
| | Vector Database | ChromaDB (in-memory, per-session) | | |
| | Text Embeddings | Sentence Transformers (all-MiniLM-L6-v2) | | |
| | Language Model | HuggingFace Inference API (Qwen2.5-72B-Instruct) | | |
| | Video Download | yt-dlp | | |
| | GPU Acceleration | HuggingFace ZeroGPU (A10G) | | |
| ## Limitations | |
| - Works best with videos under 10 minutes | |
| - Requires HuggingFace login for authentication | |
| - Knowledge base is session-based (stored in memory, not persistent across Space restarts) | |
| - Audio extraction requires FFmpeg (pre-installed on HuggingFace Spaces) | |
| ## Development | |
| ### Prerequisites | |
| - Python 3.11+ | |
| - uv package manager | |
| - FFmpeg | |
| ### Setup | |
| ```bash | |
| # Install dependencies | |
| uv sync | |
| # Install dev dependencies | |
| uv sync --extra dev | |
| # Run the app locally | |
| uv run python app.py | |
| ``` | |
| ### Testing | |
| ```bash | |
| # Run unit tests | |
| uv run --extra dev pytest tests/test_app.py -v | |
| # Run E2E tests (requires playwright browsers) | |
| uv run --extra dev playwright install | |
| uv run --extra dev pytest tests/test_e2e.py -v | |
| ``` | |
| ## License | |
| MIT | |
| <!-- Build: 2025-12-28 --> | |