@woai
🧹 Major code cleanup and internationalization - Remove Russian comments/strings, translate UI to English, clean linter errors, remove hardcoded tokens, delete test files. Ready for production deployment
e775565
---
title: YouTube Creator MetaData Extractor
emoji: 🎬
colorFrom: red
colorTo: purple
sdk: gradio
sdk_version: 5.32.0
app_file: app.py
pinned: false
license: mit
---
# 🎬 YouTube Creator MetaData Extractor
AI-powered tool for content creators to analyze YouTube videos and generate professional metadata using advanced language models.
## πŸš€ Features
- **πŸ” Video Search**: Search YouTube videos by keywords with advanced filters
- **πŸ“Š Video Analysis**: Extract comprehensive video metadata (views, likes, duration, etc.)
- **πŸ“ Transcript Extraction**: Get video transcripts in multiple languages
- **⏱️ Smart Timecodes**: AI-generated timecodes for better video navigation
- **πŸ€– Gemini AI Integration**: Advanced timecode generation using Google's Gemini 2.0
- **🌐 Multi-language Support**: Works with videos in Ukrainian, Russian, English, and more
- **πŸ“± URL Flexibility**: Supports all YouTube URL formats (regular, shorts, embed links)
## ⚠️ Cloud Platform Limitations
**YouTube blocks transcript access from cloud IPs** (Hugging Face Spaces, AWS, etc.)
**What works on HF Spaces:**
- βœ… Video Search
- βœ… Video Metadata
- ❌ Transcripts (limited)
- ❌ AI Timecodes (limited)
**For full functionality**, download and run locally:
```bash
git clone https://huggingface.co/spaces/dzianisBY/YouTube_Creator_MetaData
cd YouTube_Creator_MetaData
pip install -r requirements.txt
# Add your API keys to .env file
python main.py
```
## πŸ› οΈ Setup
### Required API Keys
To use this tool, you need two API keys:
1. **YouTube Data API v3 Key**
- Go to [Google Cloud Console](https://console.developers.google.com/)
- Create a new project or select existing
- Enable "YouTube Data API v3"
- Create credentials (API Key)
2. **Gemini API Key** (for AI features)
- Visit [Google AI Studio](https://ai.google.dev/)
- Get your free API key for Gemini
### Environment Variables
Set these in your Hugging Face Space settings:
```
YOUTUBE_API_KEY=your_youtube_api_key_here
GEMINI_API_KEY=your_gemini_api_key_here
```
## πŸ“– How to Use
### 1. Video Search
- Enter keywords to find YouTube videos
- Filter by upload date, view count, duration
- Get detailed metadata for any video
### 2. Transcript Analysis
- Extract transcripts from videos with subtitles
- Support for auto-generated and manual captions
- Multiple language detection and support
### 3. Timecode Generation
**Basic Timecodes**: Algorithmic segmentation based on transcript timing
**AI Timecodes**: Intelligent topic-based segmentation using Gemini AI
**Supported Formats**:
- **YouTube**: Ready for video descriptions (e.g., `05:30 Topic description`)
- **Markdown**: Clickable links with timestamps (e.g., `- [05:30](link) Topic`)
**Language Codes**:
- `uk` - Ukrainian
- `ru` - Russian
- `en` - English
- And many others (ISO 639-1 standard)
## πŸ”§ API Reference
This application provides both a web interface and REST API endpoints:
### Search Videos
```http
POST /api/search
{
"query": "your search query",
"max_results": 10,
"order": "relevance"
}
```
### Get Video Info
```http
POST /api/video_info
{
"video_id": "video_id_or_full_url"
}
```
### Extract Transcript
```http
POST /api/transcript
{
"video_id": "video_id_or_full_url",
"language_code": "uk"
}
```
### Generate AI Timecodes
```http
POST /api/gemini_timecodes
{
"video_id": "video_id_or_full_url",
"language_code": "uk",
"format": "youtube",
"model": "gemini-2.0-flash-001"
}
```
## πŸ—οΈ Architecture
- **Frontend**: Gradio web interface with responsive design
- **Backend**: FastAPI server with async processing
- **AI Integration**: Google Gemini 2.0 for intelligent content analysis
- **APIs**: YouTube Data API v3 for video metadata
- **Transcript**: YouTube Transcript API for subtitle extraction
## πŸ“ Project Structure
```
β”œβ”€β”€ main.py # Unified launcher (API/UI/both modes)
β”œβ”€β”€ run_telegram_bot.py # Telegram bot launcher
β”œβ”€β”€ api_server.py # FastAPI backend server
β”œβ”€β”€ telegram_bot.py # Telegram bot implementation
β”œβ”€β”€ mcp_handlers.py # Model Context Protocol handlers
β”œβ”€β”€ gemini_helper.py # Gemini AI integration
β”œβ”€β”€ utils.py # Utility functions
β”œβ”€β”€ models.py # Data models
β”œβ”€β”€ app.py # Gradio app (HF Spaces entry point)
β”œβ”€β”€ gradio_app.py # Extended Gradio interface
β”œβ”€β”€ requirements.txt # Python dependencies
β”œβ”€β”€ telegram_requirements.txt # Telegram bot dependencies
β”œβ”€β”€ cloudflare-config.yml # Cloudflare tunnel configuration
β”œβ”€β”€ TUNNEL_SOLUTIONS.md # Tunnel troubleshooting guide
β”œβ”€β”€ youtube-content-metagen-agent.ipynb # Kaggle reference notebook
└── README.md # This file
```
## πŸ”¬ Technology Stack
- **Python 3.13+**
- **Gradio** - Web interface framework
- **FastAPI** - High-performance API framework
- **Google Gemini 2.0** - Advanced language model for content analysis
- **YouTube APIs** - Official Google APIs for video data
- **AsyncIO** - Asynchronous processing for better performance
## 🌟 Use Cases
- **Content Creators**: Generate professional timecodes for YouTube videos
- **Educators**: Extract and analyze educational content structure
- **Researchers**: Analyze video metadata and transcripts at scale
- **Marketers**: Research competitor content and trends
- **Accessibility**: Create better navigation for long-form content
## πŸ“„ License
MIT License - feel free to use in your projects!
## 🀝 Contributing
Contributions welcome! This project is designed to help content creators worldwide.
---
**Made with ❀️ for the YouTube creator community**