Spaces:

jonathanagustin
/

video_analyzer

Runtime error

App Files Files Community

video_analyzer / README.md

Claude

refactor: Switch to Gradio 6 and simplify proxy config

091d83c unverified 2 months ago

preview code

raw

history blame contribute delete

4.26 kB

	---
	title: Video Analyzer
	emoji: "🎬"
	colorFrom: blue
	colorTo: purple
	sdk: gradio
	sdk_version: "6.2.0"
	python_version: "3.11"
	app_file: app.py
	pinned: false
	license: mit
	suggested_hardware: zero-a10g
	suggested_storage: small
	hf_oauth: true
	hf_oauth_scopes:
	- inference-api
	hf_oauth_expiration_minutes: 480
	tags:
	- video
	- youtube
	- transcription
	- whisper
	- rag
	- chatbot
	models:
	- openai/whisper-base
	- Salesforce/blip-image-captioning-base
	- Qwen/Qwen2.5-72B-Instruct
	short_description: Download, transcribe, and chat with YouTube videos using AI
	---

	# Video Analyzer

	A conversational AI assistant that analyzes YouTube videos and answers questions about their content.

	## Features

	### Core Capabilities
	- YouTube Video Download: Supports videos, playlists, and shorts via yt-dlp
	- Speech-to-Text: Automatic transcription using OpenAI Whisper (whisper-base)
	- Visual Analysis: Key frame extraction and captioning with BLIP
	- Knowledge Base: Per-session vector storage with ChromaDB for semantic search
	- RAG Chatbot: Ask questions about your videos using Qwen2.5-72B-Instruct

	### Voice Interaction
	- Voice Input: Speak your questions using the microphone (transcribed with Whisper)
	- Voice Output: Hear responses read aloud with natural TTS
	- Dual TTS Engines:
	- Edge-TTS (default): Fast, free Microsoft voices, no GPU needed
	- Parler-TTS (optional): SOTA quality with GPU, HuggingFace model

	### User Experience
	- Unified Chat Interface: Single chatbot handles both video analysis and Q&A
	- Auto URL Detection: Just paste a YouTube URL and the assistant analyzes it
	- Conversational Flow: The assistant guides you through the process
	- Per-Session Storage: Your analyzed videos are private to your session
	- Persistent Sessions: Your knowledge base persists across page reloads (tied to your HuggingFace profile)

	### Technical Features
	- ZeroGPU Support: Leverages HuggingFace ZeroGPU for faster GPU-accelerated processing
	- Model Fallback: Automatic fallback chain (Qwen2.5-72B → Llama-3.1-70B) for reliability
	- HuggingFace OAuth: Secure authentication via HuggingFace login
	- Gradio 6: Modern UI with the Soft theme

	## How to Use

	1. Sign in with your HuggingFace account using the button in the top right
	2. Paste a YouTube URL directly in the chat (e.g., `https://youtube.com/watch?v=...`)
	3. Wait for processing - the assistant will transcribe audio and analyze key frames
	4. Ask questions about the video content in natural language

	### Example Interactions

	```
	You: https://youtube.com/watch?v=dQw4w9WgXcQ
	Bot: I'll analyze that video for you. This may take a few minutes...
	Bot: Done! I've analyzed "Never Gonna Give You Up" and added it to my knowledge base.

	You: What is this video about?
	Bot: Based on the transcript, this video is a music video for Rick Astley's 1987 hit song...

	You: What visual elements were shown?
	Bot: The video shows a man dancing in various locations...
	```

	## Tech Stack

	\| Component \| Technology \|
	\|-----------\|------------\|
	\| Web Framework \| Gradio 6 with OAuth \|
	\| Speech Recognition \| OpenAI Whisper (whisper-base) \|
	\| Image Captioning \| Salesforce BLIP \|
	\| Vector Database \| ChromaDB (in-memory, per-session) \|
	\| Text Embeddings \| Sentence Transformers (all-MiniLM-L6-v2) \|
	\| Language Model \| HuggingFace Inference API (Qwen2.5-72B-Instruct) \|
	\| Video Download \| yt-dlp \|
	\| GPU Acceleration \| HuggingFace ZeroGPU (A10G) \|

	## Limitations

	- Works best with videos under 10 minutes
	- Requires HuggingFace login for authentication
	- Knowledge base is session-based (stored in memory, not persistent across Space restarts)
	- Audio extraction requires FFmpeg (pre-installed on HuggingFace Spaces)

	## Development

	### Prerequisites
	- Python 3.11+
	- uv package manager
	- FFmpeg

	### Setup
	```bash
	# Install dependencies
	uv sync

	# Install dev dependencies
	uv sync --extra dev

	# Run the app locally
	uv run python app.py
	```

	### Testing
	```bash
	# Run unit tests
	uv run --extra dev pytest tests/test_app.py -v

	# Run E2E tests (requires playwright browsers)
	uv run --extra dev playwright install
	uv run --extra dev pytest tests/test_e2e.py -v
	```

	## License

	MIT

	<!-- Build: 2025-12-28 -->