Spaces:

jonathanagustin
/

video_analyzer

Runtime error

App Files Files Community

video_analyzer / README.md

Claude

refactor: Switch to Gradio 6 and simplify proxy config

091d83c unverified 2 months ago

preview code

raw

history blame contribute delete

4.26 kB

A newer version of the Gradio SDK is available: 6.7.0

Upgrade

metadata

title: Video Analyzer
emoji: 🎬
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 6.2.0
python_version: '3.11'
app_file: app.py
pinned: false
license: mit
suggested_hardware: zero-a10g
suggested_storage: small
hf_oauth: true
hf_oauth_scopes:
  - inference-api
hf_oauth_expiration_minutes: 480
tags:
  - video
  - youtube
  - transcription
  - whisper
  - rag
  - chatbot
models:
  - openai/whisper-base
  - Salesforce/blip-image-captioning-base
  - Qwen/Qwen2.5-72B-Instruct
short_description: Download, transcribe, and chat with YouTube videos using AI

Video Analyzer

A conversational AI assistant that analyzes YouTube videos and answers questions about their content.

Features

Core Capabilities

YouTube Video Download: Supports videos, playlists, and shorts via yt-dlp
Speech-to-Text: Automatic transcription using OpenAI Whisper (whisper-base)
Visual Analysis: Key frame extraction and captioning with BLIP
Knowledge Base: Per-session vector storage with ChromaDB for semantic search
RAG Chatbot: Ask questions about your videos using Qwen2.5-72B-Instruct

Voice Interaction

Voice Input: Speak your questions using the microphone (transcribed with Whisper)
Voice Output: Hear responses read aloud with natural TTS
Dual TTS Engines:
- Edge-TTS (default): Fast, free Microsoft voices, no GPU needed
- Parler-TTS (optional): SOTA quality with GPU, HuggingFace model

User Experience

Unified Chat Interface: Single chatbot handles both video analysis and Q&A
Auto URL Detection: Just paste a YouTube URL and the assistant analyzes it
Conversational Flow: The assistant guides you through the process
Per-Session Storage: Your analyzed videos are private to your session
Persistent Sessions: Your knowledge base persists across page reloads (tied to your HuggingFace profile)

Technical Features

ZeroGPU Support: Leverages HuggingFace ZeroGPU for faster GPU-accelerated processing
Model Fallback: Automatic fallback chain (Qwen2.5-72B → Llama-3.1-70B) for reliability
HuggingFace OAuth: Secure authentication via HuggingFace login
Gradio 6: Modern UI with the Soft theme

How to Use

Sign in with your HuggingFace account using the button in the top right
Paste a YouTube URL directly in the chat (e.g., https://youtube.com/watch?v=...)
Wait for processing - the assistant will transcribe audio and analyze key frames
Ask questions about the video content in natural language

Example Interactions

You: https://youtube.com/watch?v=dQw4w9WgXcQ
Bot: I'll analyze that video for you. This may take a few minutes...
Bot: Done! I've analyzed "Never Gonna Give You Up" and added it to my knowledge base.

You: What is this video about?
Bot: Based on the transcript, this video is a music video for Rick Astley's 1987 hit song...

You: What visual elements were shown?
Bot: The video shows a man dancing in various locations...

Tech Stack

Component	Technology
Web Framework	Gradio 6 with OAuth
Speech Recognition	OpenAI Whisper (whisper-base)
Image Captioning	Salesforce BLIP
Vector Database	ChromaDB (in-memory, per-session)
Text Embeddings	Sentence Transformers (all-MiniLM-L6-v2)
Language Model	HuggingFace Inference API (Qwen2.5-72B-Instruct)
Video Download	yt-dlp
GPU Acceleration	HuggingFace ZeroGPU (A10G)

Limitations

Works best with videos under 10 minutes
Requires HuggingFace login for authentication
Knowledge base is session-based (stored in memory, not persistent across Space restarts)
Audio extraction requires FFmpeg (pre-installed on HuggingFace Spaces)

Development

Prerequisites

Python 3.11+
uv package manager
FFmpeg

Setup

# Install dependencies
uv sync

# Install dev dependencies
uv sync --extra dev

# Run the app locally
uv run python app.py

Testing

# Run unit tests
uv run --extra dev pytest tests/test_app.py -v

# Run E2E tests (requires playwright browsers)
uv run --extra dev playwright install
uv run --extra dev pytest tests/test_e2e.py -v

License

MIT