--- title: Video Analyzer emoji: "🎬" colorFrom: blue colorTo: purple sdk: gradio sdk_version: "6.2.0" python_version: "3.11" app_file: app.py pinned: false license: mit suggested_hardware: zero-a10g suggested_storage: small hf_oauth: true hf_oauth_scopes: - inference-api hf_oauth_expiration_minutes: 480 tags: - video - youtube - transcription - whisper - rag - chatbot models: - openai/whisper-base - Salesforce/blip-image-captioning-base - Qwen/Qwen2.5-72B-Instruct short_description: Download, transcribe, and chat with YouTube videos using AI --- # Video Analyzer A conversational AI assistant that analyzes YouTube videos and answers questions about their content. ## Features ### Core Capabilities - **YouTube Video Download**: Supports videos, playlists, and shorts via yt-dlp - **Speech-to-Text**: Automatic transcription using OpenAI Whisper (whisper-base) - **Visual Analysis**: Key frame extraction and captioning with BLIP - **Knowledge Base**: Per-session vector storage with ChromaDB for semantic search - **RAG Chatbot**: Ask questions about your videos using Qwen2.5-72B-Instruct ### Voice Interaction - **Voice Input**: Speak your questions using the microphone (transcribed with Whisper) - **Voice Output**: Hear responses read aloud with natural TTS - **Dual TTS Engines**: - **Edge-TTS** (default): Fast, free Microsoft voices, no GPU needed - **Parler-TTS** (optional): SOTA quality with GPU, HuggingFace model ### User Experience - **Unified Chat Interface**: Single chatbot handles both video analysis and Q&A - **Auto URL Detection**: Just paste a YouTube URL and the assistant analyzes it - **Conversational Flow**: The assistant guides you through the process - **Per-Session Storage**: Your analyzed videos are private to your session - **Persistent Sessions**: Your knowledge base persists across page reloads (tied to your HuggingFace profile) ### Technical Features - **ZeroGPU Support**: Leverages HuggingFace ZeroGPU for faster GPU-accelerated processing - **Model Fallback**: Automatic fallback chain (Qwen2.5-72B → Llama-3.1-70B) for reliability - **HuggingFace OAuth**: Secure authentication via HuggingFace login - **Gradio 6**: Modern UI with the Soft theme ## How to Use 1. **Sign in** with your HuggingFace account using the button in the top right 2. **Paste** a YouTube URL directly in the chat (e.g., `https://youtube.com/watch?v=...`) 3. **Wait** for processing - the assistant will transcribe audio and analyze key frames 4. **Ask questions** about the video content in natural language ### Example Interactions ``` You: https://youtube.com/watch?v=dQw4w9WgXcQ Bot: I'll analyze that video for you. This may take a few minutes... Bot: Done! I've analyzed "Never Gonna Give You Up" and added it to my knowledge base. You: What is this video about? Bot: Based on the transcript, this video is a music video for Rick Astley's 1987 hit song... You: What visual elements were shown? Bot: The video shows a man dancing in various locations... ``` ## Tech Stack | Component | Technology | |-----------|------------| | Web Framework | Gradio 6 with OAuth | | Speech Recognition | OpenAI Whisper (whisper-base) | | Image Captioning | Salesforce BLIP | | Vector Database | ChromaDB (in-memory, per-session) | | Text Embeddings | Sentence Transformers (all-MiniLM-L6-v2) | | Language Model | HuggingFace Inference API (Qwen2.5-72B-Instruct) | | Video Download | yt-dlp | | GPU Acceleration | HuggingFace ZeroGPU (A10G) | ## Limitations - Works best with videos under 10 minutes - Requires HuggingFace login for authentication - Knowledge base is session-based (stored in memory, not persistent across Space restarts) - Audio extraction requires FFmpeg (pre-installed on HuggingFace Spaces) ## Development ### Prerequisites - Python 3.11+ - uv package manager - FFmpeg ### Setup ```bash # Install dependencies uv sync # Install dev dependencies uv sync --extra dev # Run the app locally uv run python app.py ``` ### Testing ```bash # Run unit tests uv run --extra dev pytest tests/test_app.py -v # Run E2E tests (requires playwright browsers) uv run --extra dev playwright install uv run --extra dev pytest tests/test_e2e.py -v ``` ## License MIT