video_analyzer / README.md
Claude
refactor: Switch to Gradio 6 and simplify proxy config
091d83c unverified
---
title: Video Analyzer
emoji: "🎬"
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: "6.2.0"
python_version: "3.11"
app_file: app.py
pinned: false
license: mit
suggested_hardware: zero-a10g
suggested_storage: small
hf_oauth: true
hf_oauth_scopes:
- inference-api
hf_oauth_expiration_minutes: 480
tags:
- video
- youtube
- transcription
- whisper
- rag
- chatbot
models:
- openai/whisper-base
- Salesforce/blip-image-captioning-base
- Qwen/Qwen2.5-72B-Instruct
short_description: Download, transcribe, and chat with YouTube videos using AI
---
# Video Analyzer
A conversational AI assistant that analyzes YouTube videos and answers questions about their content.
## Features
### Core Capabilities
- **YouTube Video Download**: Supports videos, playlists, and shorts via yt-dlp
- **Speech-to-Text**: Automatic transcription using OpenAI Whisper (whisper-base)
- **Visual Analysis**: Key frame extraction and captioning with BLIP
- **Knowledge Base**: Per-session vector storage with ChromaDB for semantic search
- **RAG Chatbot**: Ask questions about your videos using Qwen2.5-72B-Instruct
### Voice Interaction
- **Voice Input**: Speak your questions using the microphone (transcribed with Whisper)
- **Voice Output**: Hear responses read aloud with natural TTS
- **Dual TTS Engines**:
- **Edge-TTS** (default): Fast, free Microsoft voices, no GPU needed
- **Parler-TTS** (optional): SOTA quality with GPU, HuggingFace model
### User Experience
- **Unified Chat Interface**: Single chatbot handles both video analysis and Q&A
- **Auto URL Detection**: Just paste a YouTube URL and the assistant analyzes it
- **Conversational Flow**: The assistant guides you through the process
- **Per-Session Storage**: Your analyzed videos are private to your session
- **Persistent Sessions**: Your knowledge base persists across page reloads (tied to your HuggingFace profile)
### Technical Features
- **ZeroGPU Support**: Leverages HuggingFace ZeroGPU for faster GPU-accelerated processing
- **Model Fallback**: Automatic fallback chain (Qwen2.5-72B → Llama-3.1-70B) for reliability
- **HuggingFace OAuth**: Secure authentication via HuggingFace login
- **Gradio 6**: Modern UI with the Soft theme
## How to Use
1. **Sign in** with your HuggingFace account using the button in the top right
2. **Paste** a YouTube URL directly in the chat (e.g., `https://youtube.com/watch?v=...`)
3. **Wait** for processing - the assistant will transcribe audio and analyze key frames
4. **Ask questions** about the video content in natural language
### Example Interactions
```
You: https://youtube.com/watch?v=dQw4w9WgXcQ
Bot: I'll analyze that video for you. This may take a few minutes...
Bot: Done! I've analyzed "Never Gonna Give You Up" and added it to my knowledge base.
You: What is this video about?
Bot: Based on the transcript, this video is a music video for Rick Astley's 1987 hit song...
You: What visual elements were shown?
Bot: The video shows a man dancing in various locations...
```
## Tech Stack
| Component | Technology |
|-----------|------------|
| Web Framework | Gradio 6 with OAuth |
| Speech Recognition | OpenAI Whisper (whisper-base) |
| Image Captioning | Salesforce BLIP |
| Vector Database | ChromaDB (in-memory, per-session) |
| Text Embeddings | Sentence Transformers (all-MiniLM-L6-v2) |
| Language Model | HuggingFace Inference API (Qwen2.5-72B-Instruct) |
| Video Download | yt-dlp |
| GPU Acceleration | HuggingFace ZeroGPU (A10G) |
## Limitations
- Works best with videos under 10 minutes
- Requires HuggingFace login for authentication
- Knowledge base is session-based (stored in memory, not persistent across Space restarts)
- Audio extraction requires FFmpeg (pre-installed on HuggingFace Spaces)
## Development
### Prerequisites
- Python 3.11+
- uv package manager
- FFmpeg
### Setup
```bash
# Install dependencies
uv sync
# Install dev dependencies
uv sync --extra dev
# Run the app locally
uv run python app.py
```
### Testing
```bash
# Run unit tests
uv run --extra dev pytest tests/test_app.py -v
# Run E2E tests (requires playwright browsers)
uv run --extra dev playwright install
uv run --extra dev pytest tests/test_e2e.py -v
```
## License
MIT
<!-- Build: 2025-12-28 -->