@woai
🧹 Major code cleanup and internationalization - Remove Russian comments/strings, translate UI to English, clean linter errors, remove hardcoded tokens, delete test files. Ready for production deployment
e775565

A newer version of the Gradio SDK is available: 6.2.0

Upgrade
metadata
title: YouTube Creator MetaData Extractor
emoji: 🎬
colorFrom: red
colorTo: purple
sdk: gradio
sdk_version: 5.32.0
app_file: app.py
pinned: false
license: mit

🎬 YouTube Creator MetaData Extractor

AI-powered tool for content creators to analyze YouTube videos and generate professional metadata using advanced language models.

πŸš€ Features

  • πŸ” Video Search: Search YouTube videos by keywords with advanced filters
  • πŸ“Š Video Analysis: Extract comprehensive video metadata (views, likes, duration, etc.)
  • πŸ“ Transcript Extraction: Get video transcripts in multiple languages
  • ⏱️ Smart Timecodes: AI-generated timecodes for better video navigation
  • πŸ€– Gemini AI Integration: Advanced timecode generation using Google's Gemini 2.0
  • 🌐 Multi-language Support: Works with videos in Ukrainian, Russian, English, and more
  • πŸ“± URL Flexibility: Supports all YouTube URL formats (regular, shorts, embed links)

⚠️ Cloud Platform Limitations

YouTube blocks transcript access from cloud IPs (Hugging Face Spaces, AWS, etc.)

What works on HF Spaces:

  • βœ… Video Search
  • βœ… Video Metadata
  • ❌ Transcripts (limited)
  • ❌ AI Timecodes (limited)

For full functionality, download and run locally:

git clone https://huggingface.co/spaces/dzianisBY/YouTube_Creator_MetaData
cd YouTube_Creator_MetaData
pip install -r requirements.txt
# Add your API keys to .env file
python main.py

πŸ› οΈ Setup

Required API Keys

To use this tool, you need two API keys:

  1. YouTube Data API v3 Key

    • Go to Google Cloud Console
    • Create a new project or select existing
    • Enable "YouTube Data API v3"
    • Create credentials (API Key)
  2. Gemini API Key (for AI features)

Environment Variables

Set these in your Hugging Face Space settings:

YOUTUBE_API_KEY=your_youtube_api_key_here
GEMINI_API_KEY=your_gemini_api_key_here

πŸ“– How to Use

1. Video Search

  • Enter keywords to find YouTube videos
  • Filter by upload date, view count, duration
  • Get detailed metadata for any video

2. Transcript Analysis

  • Extract transcripts from videos with subtitles
  • Support for auto-generated and manual captions
  • Multiple language detection and support

3. Timecode Generation

Basic Timecodes: Algorithmic segmentation based on transcript timing AI Timecodes: Intelligent topic-based segmentation using Gemini AI

Supported Formats:

  • YouTube: Ready for video descriptions (e.g., 05:30 Topic description)
  • Markdown: Clickable links with timestamps (e.g., - [05:30](link) Topic)

Language Codes:

  • uk - Ukrainian
  • ru - Russian
  • en - English
  • And many others (ISO 639-1 standard)

πŸ”§ API Reference

This application provides both a web interface and REST API endpoints:

Search Videos

POST /api/search
{
  "query": "your search query",
  "max_results": 10,
  "order": "relevance"
}

Get Video Info

POST /api/video_info
{
  "video_id": "video_id_or_full_url"
}

Extract Transcript

POST /api/transcript
{
  "video_id": "video_id_or_full_url",
  "language_code": "uk"
}

Generate AI Timecodes

POST /api/gemini_timecodes
{
  "video_id": "video_id_or_full_url",
  "language_code": "uk",
  "format": "youtube",
  "model": "gemini-2.0-flash-001"
}

πŸ—οΈ Architecture

  • Frontend: Gradio web interface with responsive design
  • Backend: FastAPI server with async processing
  • AI Integration: Google Gemini 2.0 for intelligent content analysis
  • APIs: YouTube Data API v3 for video metadata
  • Transcript: YouTube Transcript API for subtitle extraction

πŸ“ Project Structure

β”œβ”€β”€ main.py                     # Unified launcher (API/UI/both modes)
β”œβ”€β”€ run_telegram_bot.py         # Telegram bot launcher
β”œβ”€β”€ api_server.py              # FastAPI backend server
β”œβ”€β”€ telegram_bot.py            # Telegram bot implementation
β”œβ”€β”€ mcp_handlers.py            # Model Context Protocol handlers
β”œβ”€β”€ gemini_helper.py           # Gemini AI integration
β”œβ”€β”€ utils.py                   # Utility functions
β”œβ”€β”€ models.py                  # Data models
β”œβ”€β”€ app.py                     # Gradio app (HF Spaces entry point)
β”œβ”€β”€ gradio_app.py              # Extended Gradio interface
β”œβ”€β”€ requirements.txt           # Python dependencies
β”œβ”€β”€ telegram_requirements.txt  # Telegram bot dependencies
β”œβ”€β”€ cloudflare-config.yml      # Cloudflare tunnel configuration
β”œβ”€β”€ TUNNEL_SOLUTIONS.md        # Tunnel troubleshooting guide
β”œβ”€β”€ youtube-content-metagen-agent.ipynb  # Kaggle reference notebook
└── README.md                  # This file

πŸ”¬ Technology Stack

  • Python 3.13+
  • Gradio - Web interface framework
  • FastAPI - High-performance API framework
  • Google Gemini 2.0 - Advanced language model for content analysis
  • YouTube APIs - Official Google APIs for video data
  • AsyncIO - Asynchronous processing for better performance

🌟 Use Cases

  • Content Creators: Generate professional timecodes for YouTube videos
  • Educators: Extract and analyze educational content structure
  • Researchers: Analyze video metadata and transcripts at scale
  • Marketers: Research competitor content and trends
  • Accessibility: Create better navigation for long-form content

πŸ“„ License

MIT License - feel free to use in your projects!

🀝 Contributing

Contributions welcome! This project is designed to help content creators worldwide.


Made with ❀️ for the YouTube creator community