File size: 5,815 Bytes
ab2045a
 
 
 
 
 
e014546
ab2045a
 
 
 
 
d619c43
 
 
 
 
 
 
 
 
 
 
 
 
 
7cc4e00
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d619c43
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e775565
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d619c43
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e014546
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
---
title: YouTube Creator MetaData Extractor
emoji: 🎬
colorFrom: red
colorTo: purple
sdk: gradio
sdk_version: 5.32.0
app_file: app.py
pinned: false
license: mit
---

# 🎬 YouTube Creator MetaData Extractor

AI-powered tool for content creators to analyze YouTube videos and generate professional metadata using advanced language models.

## πŸš€ Features

- **πŸ” Video Search**: Search YouTube videos by keywords with advanced filters
- **πŸ“Š Video Analysis**: Extract comprehensive video metadata (views, likes, duration, etc.)
- **πŸ“ Transcript Extraction**: Get video transcripts in multiple languages
- **⏱️ Smart Timecodes**: AI-generated timecodes for better video navigation
- **πŸ€– Gemini AI Integration**: Advanced timecode generation using Google's Gemini 2.0
- **🌐 Multi-language Support**: Works with videos in Ukrainian, Russian, English, and more
- **πŸ“± URL Flexibility**: Supports all YouTube URL formats (regular, shorts, embed links)

## ⚠️ Cloud Platform Limitations

**YouTube blocks transcript access from cloud IPs** (Hugging Face Spaces, AWS, etc.)

**What works on HF Spaces:**
- βœ… Video Search 
- βœ… Video Metadata
- ❌ Transcripts (limited)
- ❌ AI Timecodes (limited)

**For full functionality**, download and run locally:
```bash
git clone https://huggingface.co/spaces/dzianisBY/YouTube_Creator_MetaData
cd YouTube_Creator_MetaData
pip install -r requirements.txt
# Add your API keys to .env file
python main.py
```

## πŸ› οΈ Setup

### Required API Keys

To use this tool, you need two API keys:

1. **YouTube Data API v3 Key**
   - Go to [Google Cloud Console](https://console.developers.google.com/)
   - Create a new project or select existing
   - Enable "YouTube Data API v3"
   - Create credentials (API Key)

2. **Gemini API Key** (for AI features)
   - Visit [Google AI Studio](https://ai.google.dev/)
   - Get your free API key for Gemini

### Environment Variables

Set these in your Hugging Face Space settings:

```
YOUTUBE_API_KEY=your_youtube_api_key_here
GEMINI_API_KEY=your_gemini_api_key_here
```

## πŸ“– How to Use

### 1. Video Search
- Enter keywords to find YouTube videos
- Filter by upload date, view count, duration
- Get detailed metadata for any video

### 2. Transcript Analysis
- Extract transcripts from videos with subtitles
- Support for auto-generated and manual captions
- Multiple language detection and support

### 3. Timecode Generation

**Basic Timecodes**: Algorithmic segmentation based on transcript timing
**AI Timecodes**: Intelligent topic-based segmentation using Gemini AI

**Supported Formats**:
- **YouTube**: Ready for video descriptions (e.g., `05:30 Topic description`)
- **Markdown**: Clickable links with timestamps (e.g., `- [05:30](link) Topic`)

**Language Codes**:
- `uk` - Ukrainian
- `ru` - Russian  
- `en` - English
- And many others (ISO 639-1 standard)

## πŸ”§ API Reference

This application provides both a web interface and REST API endpoints:

### Search Videos
```http
POST /api/search
{
  "query": "your search query",
  "max_results": 10,
  "order": "relevance"
}
```

### Get Video Info
```http
POST /api/video_info
{
  "video_id": "video_id_or_full_url"
}
```

### Extract Transcript
```http
POST /api/transcript
{
  "video_id": "video_id_or_full_url",
  "language_code": "uk"
}
```

### Generate AI Timecodes
```http
POST /api/gemini_timecodes
{
  "video_id": "video_id_or_full_url",
  "language_code": "uk",
  "format": "youtube",
  "model": "gemini-2.0-flash-001"
}
```

## πŸ—οΈ Architecture

- **Frontend**: Gradio web interface with responsive design
- **Backend**: FastAPI server with async processing
- **AI Integration**: Google Gemini 2.0 for intelligent content analysis
- **APIs**: YouTube Data API v3 for video metadata
- **Transcript**: YouTube Transcript API for subtitle extraction

## πŸ“ Project Structure

```
β”œβ”€β”€ main.py                     # Unified launcher (API/UI/both modes)
β”œβ”€β”€ run_telegram_bot.py         # Telegram bot launcher
β”œβ”€β”€ api_server.py              # FastAPI backend server
β”œβ”€β”€ telegram_bot.py            # Telegram bot implementation
β”œβ”€β”€ mcp_handlers.py            # Model Context Protocol handlers
β”œβ”€β”€ gemini_helper.py           # Gemini AI integration
β”œβ”€β”€ utils.py                   # Utility functions
β”œβ”€β”€ models.py                  # Data models
β”œβ”€β”€ app.py                     # Gradio app (HF Spaces entry point)
β”œβ”€β”€ gradio_app.py              # Extended Gradio interface
β”œβ”€β”€ requirements.txt           # Python dependencies
β”œβ”€β”€ telegram_requirements.txt  # Telegram bot dependencies
β”œβ”€β”€ cloudflare-config.yml      # Cloudflare tunnel configuration
β”œβ”€β”€ TUNNEL_SOLUTIONS.md        # Tunnel troubleshooting guide
β”œβ”€β”€ youtube-content-metagen-agent.ipynb  # Kaggle reference notebook
└── README.md                  # This file
```

## πŸ”¬ Technology Stack

- **Python 3.13+**
- **Gradio** - Web interface framework
- **FastAPI** - High-performance API framework
- **Google Gemini 2.0** - Advanced language model for content analysis
- **YouTube APIs** - Official Google APIs for video data
- **AsyncIO** - Asynchronous processing for better performance

## 🌟 Use Cases

- **Content Creators**: Generate professional timecodes for YouTube videos
- **Educators**: Extract and analyze educational content structure
- **Researchers**: Analyze video metadata and transcripts at scale
- **Marketers**: Research competitor content and trends
- **Accessibility**: Create better navigation for long-form content

## πŸ“„ License

MIT License - feel free to use in your projects!

## 🀝 Contributing

Contributions welcome! This project is designed to help content creators worldwide.

---

**Made with ❀️ for the YouTube creator community**