junaidbashir392 commited on
Commit
15b5142
ยท
1 Parent(s): 92a3ed9

YouTubeCompetitorAnalysis

Browse files
Files changed (3) hide show
  1. README.md +62 -13
  2. app.py +1330 -0
  3. requirements.txt +6 -0
README.md CHANGED
@@ -1,13 +1,62 @@
1
- ---
2
- title: YouTubeCompetitorAnalysis
3
- emoji: ๐Ÿข
4
- colorFrom: green
5
- colorTo: purple
6
- sdk: gradio
7
- sdk_version: 5.49.1
8
- app_file: app.py
9
- pinned: false
10
- license: apache-2.0
11
- ---
12
-
13
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # YouTube Competitor Analysis (Global)
2
+
3
+ This application analyzes competitor YouTube channels and detects global trending clusters of people mentioned in videos using Gemini (Google Generative AI) and the YouTube Data API.
4
+
5
+ This README explains how to run the app and where to enter your API keys in the UI.
6
+
7
+ ## Key points
8
+ - API keys are entered via the Gradio UI (Data Update tab) โ€” they are not read automatically from environment variables.
9
+ - Keys are stored in memory for the running session only (not persisted to disk).
10
+
11
+ ## Requirements
12
+ The project has a `requirements.txt`. From your project root on Windows PowerShell run:
13
+
14
+ ```powershell
15
+ python -m pip install -r .\requirements.txt
16
+ ```
17
+
18
+ If you don't have a virtual environment, consider creating one first:
19
+
20
+ ```powershell
21
+ python -m venv venv
22
+ .\venv\Scripts\Activate.ps1
23
+ python -m pip install -r .\requirements.txt
24
+ ```
25
+
26
+ ## Run the app
27
+
28
+ From the repository root run:
29
+
30
+ ```powershell
31
+ python .\app.py
32
+ ```
33
+
34
+ Gradio will print a local URL (and possibly a public share URL). Open the local URL in your browser.
35
+
36
+ ## Where to enter API keys
37
+ 1. Open the Gradio UI and go to the "๐Ÿ”„ Data Update" tab.
38
+ 2. Enter your keys in the two password-style text boxes:
39
+ - `YouTube API Key` โ€” for the YouTube Data API (developerKey)
40
+ - `Gemini API Key` โ€” for the Google Generative AI client
41
+ 3. Click `Apply API Keys`. The status box will show whether configuration succeeded.
42
+
43
+ After applying keys you can use:
44
+ - `Start Global Data Update` to fetch video data and run person-detection logic.
45
+ - Channel management features (Add/Delete/Update) which require a configured YouTube client.
46
+
47
+ ## Behavior & notes
48
+ - Keys are kept in memory for the running Gradio session. If you stop the app you must re-enter them.
49
+ - If you see messages like `YouTube API key is not set.` or `AI features will not work`, confirm you applied the keys successfully.
50
+ - The app does not currently persist keys to disk. If you want persistence (encrypted local store or OS keyring) I can add that.
51
+
52
+ ## Troubleshooting
53
+ - Missing dependencies: re-install with `pip install -r requirements.txt`.
54
+ - If the YouTube API returns quota errors, check your API quota in Google Cloud Console.
55
+ - If Gemini calls fail, ensure the Gemini key is valid and that the `google-generativeai` client package version in `requirements.txt` is compatible.
56
+
57
+ ## Next steps (optional)
58
+ - Persist keys securely between sessions (encrypted file or OS keyring).
59
+ - Pre-fill UI inputs from environment variables for dev convenience.
60
+ - Add unit tests to mock Gemini/YouTube clients and verify wiring.
61
+
62
+ If you'd like one of those, tell me which and I will implement it.
app.py ADDED
@@ -0,0 +1,1330 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ import sqlite3
3
+ import json
4
+ import requests
5
+ from datetime import datetime, timedelta, timezone
6
+ from typing import List, Dict, Any, Optional
7
+ import google.generativeai as genai
8
+ from googleapiclient.discovery import build
9
+ import pandas as pd
10
+ import re
11
+ from collections import defaultdict
12
+ import base64
13
+
14
+ # Do NOT auto-load API keys from environment; keys should be provided by the user via the UI.
15
+ # Keep variables here so other functions can reference them after the user provides keys.
16
+ YOUTUBE_API_KEY: Optional[str] = None
17
+ GEMINI_API_KEY: Optional[str] = None
18
+ model = None
19
+ youtube = None
20
+
21
+
22
+ def set_api_keys(youtube_key: Optional[str], gemini_key: Optional[str]) -> tuple[str, str, str]:
23
+ """Apply API keys provided by the user at runtime.
24
+
25
+ This will configure the Gemini client and the YouTube Data API client
26
+ so the rest of the app uses the provided keys instead of environment vars.
27
+ """
28
+ global YOUTUBE_API_KEY, GEMINI_API_KEY, model, youtube
29
+ messages = []
30
+
31
+ # Configure Gemini (Generative AI)
32
+ if gemini_key:
33
+ try:
34
+ genai.configure(api_key=gemini_key)
35
+ model = genai.GenerativeModel('gemini-2.5-flash')
36
+ GEMINI_API_KEY = gemini_key
37
+ messages.append("Gemini API key applied successfully.")
38
+ except Exception as e:
39
+ messages.append(f"Failed to apply Gemini API key: {e}")
40
+
41
+ # Configure YouTube Data API
42
+ if youtube_key:
43
+ try:
44
+ youtube = build('youtube', 'v3', developerKey=youtube_key)
45
+ YOUTUBE_API_KEY = youtube_key
46
+ messages.append("YouTube API key applied successfully.")
47
+ except Exception as e:
48
+ messages.append(f"Failed to apply YouTube API key: {e}")
49
+
50
+ if not messages:
51
+ # Return status and empty keys
52
+ return "No API keys provided.", "", ""
53
+
54
+ # Return status plus the applied keys so the UI can store them in state
55
+ return "\n".join(messages), YOUTUBE_API_KEY or "", GEMINI_API_KEY or ""
56
+
57
+
58
+ class YouTubeCompetitorAnalyzer:
59
+ def __init__(self):
60
+ self.init_database()
61
+
62
+ def init_database(self):
63
+ """Initialize the database"""
64
+ conn = sqlite3.connect('competitor_data.db')
65
+ cursor = conn.cursor()
66
+
67
+ # Channel table (added last_updated_at column)
68
+ cursor.execute('''
69
+ CREATE TABLE IF NOT EXISTS channels (
70
+ id INTEGER PRIMARY KEY AUTOINCREMENT,
71
+ channel_id TEXT UNIQUE,
72
+ channel_name TEXT,
73
+ channel_icon_url TEXT,
74
+ subscriber_count INTEGER,
75
+ added_date TEXT,
76
+ last_updated_at TEXT
77
+ )
78
+ ''')
79
+
80
+ # Video data table (added description and tags)
81
+ cursor.execute('''
82
+ CREATE TABLE IF NOT EXISTS videos (
83
+ id INTEGER PRIMARY KEY AUTOINCREMENT,
84
+ video_id TEXT UNIQUE,
85
+ channel_id TEXT,
86
+ title TEXT,
87
+ description TEXT,
88
+ tags TEXT,
89
+ published_at TEXT,
90
+ view_count INTEGER,
91
+ thumbnail_url TEXT,
92
+ detected_person TEXT,
93
+ detection_source TEXT,
94
+ importance_level TEXT,
95
+ created_at TEXT
96
+ )
97
+ ''')
98
+
99
+ # Trend clusters table
100
+ cursor.execute('''
101
+ CREATE TABLE IF NOT EXISTS trends (
102
+ id INTEGER PRIMARY KEY AUTOINCREMENT,
103
+ person_name TEXT,
104
+ video_ids TEXT,
105
+ trend_date TEXT,
106
+ is_active BOOLEAN
107
+ )
108
+ ''')
109
+
110
+ conn.commit()
111
+ conn.close()
112
+
113
+ def extract_person_from_title_with_gemini(self, title: str) -> Optional[str]:
114
+ """Extract a person's name from the title using Gemini (global, highest priority)"""
115
+ if not model:
116
+ return None
117
+ try:
118
+ prompt = f"""
119
+ Please extract a single famous person's name (historical or contemporary) from this YouTube title.
120
+
121
+ Title: "{title}"
122
+
123
+ Target:
124
+ Globally well-known individuals (no restriction on nationality, era, or field)
125
+ - People from any country or region worldwide
126
+ - From ancient to modern times
127
+ - Any field: politics, business, philosophy, literature, science, arts, religion, sports, etc.
128
+ - Real historical or contemporary figures
129
+
130
+ Criteria:
131
+ - Widely known at a general-knowledge level
132
+ - Frequently mentioned in books, education, or media
133
+ - Identifiable as a specific real person by proper name
134
+
135
+ Response format:
136
+ - If a matching person exists: return the person's name only (in Japanese)
137
+ - If none: return "ใชใ—"
138
+ - If multiple apply: return the single most relevant person
139
+
140
+ Note: Do not restrict by nationality, era, or field. Consider notable people worldwide.
141
+
142
+ Examples:
143
+ "The secret of innovation by Steve Jobs" -> Steve Jobs
144
+ "Learning leadership from Confucius" -> Confucius
145
+ "Introduction to Einstein's theory of relativity" -> Einstein
146
+ "Konosuke Matsushita on business philosophy" -> Konosuke Matsushita
147
+ "General success tips" -> none
148
+ """
149
+
150
+ response = model.generate_content(prompt)
151
+ result = response.text.strip()
152
+
153
+ # If result is "ใชใ—" or empty, return None
154
+ if not result or result.lower() in ['ใชใ—', 'none', '่ฉฒๅฝ“ใชใ—', 'ไธๆ˜Ž']:
155
+ return None
156
+
157
+ # Remove line breaks and extra characters to isolate the person's name
158
+ clean_result = re.sub(r'[ใ€Œใ€ใ€Žใ€ใ€ใ€‘๏ฝœ\n\r\t]', '', result).strip()
159
+
160
+ # Check global name pattern (2-15 chars: supports Japanese, English, Chinese, etc.)
161
+ global_name_pattern = r'^[\u4E00-\u9FAF\u3040-\u309F\u30A0-\u30FF\uAC00-\uD7AF\u0041-\u005A\u0061-\u007A\u00C0-\u017F\u0100-\u024F\s\u30FB\u00B7\u2022]{2,15}$'
162
+ if re.match(global_name_pattern, clean_result):
163
+ return clean_result
164
+ else:
165
+ return None
166
+
167
+ except Exception as e:
168
+ print(f"Gemini global title parsing error: {e}")
169
+ return None
170
+
171
+ def extract_person_from_description_with_gemini(self, description: str) -> Optional[str]:
172
+ """Extract a person's name from the description using Gemini (global, priority 2)"""
173
+ if not description or len(description.strip()) < 10 or not model:
174
+ return None
175
+
176
+ try:
177
+ # If the description is too long, limit to the first 500 characters
178
+ desc_excerpt = description[:500] if len(description) > 500 else description
179
+
180
+ prompt = f"""
181
+ Please extract a single famous person's name (historical or contemporary) from this YouTube video's description.
182
+
183
+ Description excerpt: "{desc_excerpt}"
184
+
185
+ Target:
186
+ Globally well-known individuals (no restriction on nationality, era, or field)
187
+
188
+ Criteria:
189
+ - Widely known at a general-knowledge level
190
+ - Frequently mentioned in books, education, or media
191
+ - Identifiable as a specific real person by proper name
192
+
193
+ Response format:
194
+ - If a matching person exists: return the person's name only (in Japanese)
195
+ - If none: return "ใชใ—"
196
+ - If multiple apply: return the single most relevant person
197
+ - Hashtags (e.g., #SteveJobs, #Confucius) should also be considered
198
+
199
+ Note: Do not restrict by nationality, era, or field. Consider notable people worldwide.
200
+ """
201
+
202
+ response = model.generate_content(prompt)
203
+ result = response.text.strip()
204
+
205
+ if not result or result.lower() in ['ใชใ—', 'none', '่ฉฒๅฝ“ใชใ—', 'ไธๆ˜Ž']:
206
+ return None
207
+
208
+ clean_result = re.sub(r'[ใ€Œใ€ใ€Žใ€ใ€ใ€‘๏ฝœ#\n\r\t]', '', result).strip()
209
+
210
+ # Check global name pattern
211
+ global_name_pattern = r'^[\u4E00-\u9FAF\u3040-\u309F\u30A0-\u30FF\uAC00-\uD7AF\u0041-\u005A\u0061-\u007A\u00C0-\u017F\u0100-\u024F\s\u30FB\u00B7\u2022]{2,15}$'
212
+ if re.match(global_name_pattern, clean_result):
213
+ return clean_result
214
+ else:
215
+ return None
216
+
217
+ except Exception as e:
218
+ print(f"Gemini global description parsing error: {e}")
219
+ return None
220
+
221
+ def extract_person_from_tags(self, tags: List[str]) -> Optional[str]:
222
+ """Extract a person's name from tags (global, priority 3)"""
223
+ if not tags:
224
+ return None
225
+
226
+ # Define global name pattern (supports Japanese, English, Chinese, Korean, etc.)
227
+ global_name_pattern = r'^[\u4E00-\u9FAF\u3040-\u309F\u30A0-\u30FF\uAC00-\uD7AF\u0041-\u005A\u0061-\u007A\u00C0-\u017F\u0100-\u024F\s\u30FB\u00B7\u2022]{2,15}$'
228
+
229
+ # ใ‚ฟใ‚ฐใ‹ใ‚‰ไบบ็‰ฉๅใ‚‰ใ—ใใ‚‚ใฎใ‚’ๆŽขใ™
230
+ for tag in tags:
231
+ if re.match(global_name_pattern, tag):
232
+ # Exclude overly generic words (global support)
233
+ exclude_words = [
234
+ 'ๅ‹•็”ป', 'ๆŠ•็จฟ', 'ๆ›ดๆ–ฐ', '้…ไฟก', 'ไบบ็”Ÿ', '็ตŒๅ–ถ', 'ไป•ไบ‹', 'ๆˆๅŠŸ', 'ๅคฑๆ•—',
235
+ 'video', 'life', 'business', 'success', 'leadership', 'philosophy',
236
+ 'motivation', 'inspiration', 'education', 'training', 'coach'
237
+ ]
238
+ if tag not in exclude_words and tag.lower() not in [word.lower() for word in exclude_words]:
239
+ return tag
240
+
241
+ return None
242
+
243
+ def analyze_thumbnail_ocr(self, thumbnail_url: str) -> Optional[str]:
244
+ """Thumbnail OCR analysis (priority 4)"""
245
+ if not model:
246
+ return None
247
+ try:
248
+ response = requests.get(thumbnail_url, timeout=10)
249
+ image_data = base64.b64encode(response.content).decode()
250
+
251
+ prompt = """
252
+ Extract text from this YouTube thumbnail image.
253
+ Pay special attention to names of famous individuals (worldwide, historical or modern).
254
+
255
+ Reply in JSON using the following format:
256
+ {
257
+ "detected_text": "All text read by OCR",
258
+ "person_names": ["List of extracted person names"]
259
+ }
260
+ """
261
+
262
+ image_part = {
263
+ "mime_type": "image/jpeg",
264
+ "data": image_data
265
+ }
266
+
267
+ response = model.generate_content([prompt, image_part])
268
+ result_text = response.text
269
+
270
+ # Extract JSON
271
+ json_match = re.search(r'```json\n(.*?)\n```', result_text, re.DOTALL)
272
+ if json_match:
273
+ result_text = json_match.group(1)
274
+
275
+ try:
276
+ result = json.loads(result_text)
277
+ person_names = result.get('person_names', [])
278
+ return person_names[0] if person_names else None
279
+ except json.JSONDecodeError:
280
+ return None
281
+
282
+ except Exception as e:
283
+ print(f"Thumbnail OCR analysis error: {e}")
284
+ return None
285
+
286
+ def analyze_thumbnail_face_recognition(self, thumbnail_url: str) -> Optional[str]:
287
+ """Thumbnail face recognition (priority 5)"""
288
+ if not model:
289
+ return None
290
+ try:
291
+ response = requests.get(thumbnail_url, timeout=10)
292
+ image_data = base64.b64encode(response.content).decode()
293
+
294
+ prompt = """
295
+ Identify the person shown in this image.
296
+ Consider famous people worldwide, including historical figures, philosophers, business leaders, writers, and scientists.
297
+
298
+ Only return a person's name if you can identify them with confidence.
299
+ If unknown, return null.
300
+
301
+ Respond in JSON:
302
+ {
303
+ "person_name": "Identified person name or null"
304
+ }
305
+ """
306
+
307
+ image_part = {
308
+ "mime_type": "image/jpeg",
309
+ "data": image_data
310
+ }
311
+
312
+ response = model.generate_content([prompt, image_part])
313
+ result_text = response.text
314
+
315
+ json_match = re.search(r'```json\n(.*?)\n```', result_text, re.DOTALL)
316
+ if json_match:
317
+ result_text = json_match.group(1)
318
+
319
+ try:
320
+ result = json.loads(result_text)
321
+ return result.get('person_name')
322
+ except json.JSONDecodeError:
323
+ return None
324
+
325
+ except Exception as e:
326
+ print(f"Face recognition analysis error: {e}")
327
+ return None
328
+
329
+ def extract_person_comprehensive(self, video_data: Dict) -> tuple[Optional[str], str]:
330
+ """Comprehensive person extraction (Gemini prioritized, global system)"""
331
+ title = video_data.get('title', '')
332
+ description = video_data.get('description', '')
333
+ tags = video_data.get('tags', [])
334
+ thumbnail_url = video_data.get('thumbnail_url', '')
335
+
336
+ # Priority 1: Gemini title analysis (global, highest priority)
337
+ person = self.extract_person_from_title_with_gemini(title)
338
+ if person:
339
+ return person, "Gemini-GlobalTitle"
340
+
341
+ # Priority 2: Gemini description analysis (global)
342
+ person = self.extract_person_from_description_with_gemini(description)
343
+ if person:
344
+ return person, "Gemini-GlobalDescription"
345
+
346
+ # Priority 3: Tag analysis (global)
347
+ person = self.extract_person_from_tags(tags)
348
+ if person:
349
+ return person, "GlobalTag"
350
+
351
+ # Priority 4: Thumbnail OCR
352
+ person = self.analyze_thumbnail_ocr(thumbnail_url)
353
+ if person:
354
+ return person, "ThumbnailOCR"
355
+
356
+ # Priority 5: Face recognition
357
+ person = self.analyze_thumbnail_face_recognition(thumbnail_url)
358
+ if person:
359
+ return person, "FaceRecognition"
360
+
361
+ return None, "Not detected"
362
+
363
+ def add_channel(self, channel_id: str) -> str:
364
+ """Add a channel"""
365
+ if not youtube:
366
+ return "YouTube API key is not set."
367
+ try:
368
+ # Retrieve channel info
369
+ response = youtube.channels().list(
370
+ part='snippet,statistics',
371
+ id=channel_id
372
+ ).execute()
373
+
374
+ if not response['items']:
375
+ return f"ID: {channel_id} - Channel not found"
376
+
377
+ channel_info = response['items'][0]
378
+ channel_name = channel_info['snippet']['title']
379
+ channel_icon = channel_info['snippet']['thumbnails']['default']['url']
380
+ subscriber_count = int(channel_info['statistics'].get('subscriberCount', 0))
381
+
382
+ conn = sqlite3.connect('competitor_data.db')
383
+ cursor = conn.cursor()
384
+
385
+ cursor.execute('''
386
+ INSERT OR REPLACE INTO channels
387
+ (channel_id, channel_name, channel_icon_url, subscriber_count, added_date)
388
+ VALUES (?, ?, ?, ?, ?)
389
+ ''', (channel_id, channel_name, channel_icon, subscriber_count, datetime.now().isoformat()))
390
+
391
+ conn.commit()
392
+ conn.close()
393
+
394
+ return f"Channel '{channel_name}' added"
395
+
396
+ except Exception as e:
397
+ return f"ID: {channel_id} - Error: {str(e)}"
398
+
399
+ def delete_channel(self, channel_id: str) -> str:
400
+ """Delete a channel"""
401
+ try:
402
+ conn = sqlite3.connect('competitor_data.db')
403
+ cursor = conn.cursor()
404
+
405
+ # Retrieve channel name
406
+ cursor.execute('SELECT channel_name FROM channels WHERE channel_id = ?', (channel_id,))
407
+ result = cursor.fetchone()
408
+
409
+ if not result:
410
+ conn.close()
411
+ return "Channel not found"
412
+
413
+ channel_name = result[0]
414
+
415
+ # Delete channel and related video data
416
+ cursor.execute('DELETE FROM videos WHERE channel_id = ?', (channel_id,))
417
+ cursor.execute('DELETE FROM channels WHERE channel_id = ?', (channel_id,))
418
+
419
+ conn.commit()
420
+ conn.close()
421
+
422
+ return f"Channel '{channel_name}' deleted"
423
+
424
+ except Exception as e:
425
+ return f"Deletion error: {str(e)}"
426
+
427
+ def update_channel_name(self, channel_id: str, new_name: str) -> str:
428
+ """Update channel name"""
429
+ try:
430
+ conn = sqlite3.connect('competitor_data.db')
431
+ cursor = conn.cursor()
432
+
433
+ cursor.execute('''
434
+ UPDATE channels
435
+ SET channel_name = ?
436
+ WHERE channel_id = ?
437
+ ''', (new_name, channel_id))
438
+
439
+ if cursor.rowcount == 0:
440
+ conn.close()
441
+ return "Channel not found"
442
+
443
+ conn.commit()
444
+ conn.close()
445
+
446
+ return f"Channel name updated to '{new_name}'"
447
+
448
+ except Exception as e:
449
+ return f"Update error: {str(e)}"
450
+
451
+ def get_channels(self) -> List[Dict]:
452
+ """Get list of registered channels"""
453
+ conn = sqlite3.connect('competitor_data.db')
454
+ cursor = conn.cursor()
455
+
456
+ cursor.execute('''
457
+ SELECT channel_id, channel_name, channel_icon_url, subscriber_count, added_date, last_updated_at
458
+ FROM channels
459
+ ORDER BY added_date DESC
460
+ ''')
461
+
462
+ channels = []
463
+ for row in cursor.fetchall():
464
+ channels.append({
465
+ 'id': row[0],
466
+ 'name': row[1],
467
+ 'icon_url': row[2],
468
+ 'subscriber_count': row[3],
469
+ 'added_date': row[4],
470
+ 'last_updated_at': row[5]
471
+ })
472
+
473
+ conn.close()
474
+ return channels
475
+
476
+ def fetch_videos_from_channel(self, channel_id: str, since_date: Optional[str] = None) -> List[Dict]:
477
+ """Fetch videos from a channel since the specified date"""
478
+ if not youtube:
479
+ return []
480
+ try:
481
+ # since_dateใŒใชใ‘ใ‚Œใฐใ€้ŽๅŽป7ๆ—ฅ้–“ใซ่จญๅฎš
482
+ if not since_date:
483
+ since_date_dt = datetime.now(timezone.utc) - timedelta(days=7)
484
+ else:
485
+ since_date_dt = datetime.fromisoformat(since_date)
486
+
487
+ # YouTube APIใฎใƒ•ใ‚ฉใƒผใƒžใƒƒใƒˆใซๅค‰ๆ›
488
+ published_after = since_date_dt.isoformat().replace('+00:00', 'Z')
489
+
490
+ response = youtube.search().list(
491
+ part='snippet',
492
+ channelId=channel_id,
493
+ maxResults=50,
494
+ order='date',
495
+ publishedAfter=published_after,
496
+ type='video'
497
+ ).execute()
498
+
499
+ videos = []
500
+ video_ids = [item['id']['videoId'] for item in response['items']]
501
+
502
+ # ๅ‹•็”ปใฎ่ฉณ็ดฐๆƒ…ๅ ฑ๏ผˆๅ†็”Ÿๅ›žๆ•ฐใ€ๆฆ‚่ฆๆฌ„ใ€ใ‚ฟใ‚ฐใชใฉ๏ผ‰ใ‚’ๅ–ๅพ—
503
+ if video_ids:
504
+ video_details = youtube.videos().list(
505
+ part='statistics,snippet',
506
+ id=','.join(video_ids)
507
+ ).execute()
508
+
509
+ for item in video_details['items']:
510
+ videos.append({
511
+ 'video_id': item['id'],
512
+ 'title': item['snippet']['title'],
513
+ 'description': item['snippet'].get('description', ''),
514
+ 'tags': item['snippet'].get('tags', []),
515
+ 'published_at': item['snippet']['publishedAt'],
516
+ 'view_count': int(item['statistics'].get('viewCount', 0)),
517
+ 'thumbnail_url': item['snippet']['thumbnails']['high']['url']
518
+ })
519
+
520
+ return videos
521
+
522
+ except Exception as e:
523
+ print(f"Video fetch error: {e}")
524
+ return []
525
+
526
+ def determine_importance(self, video_data: Dict) -> str:
527
+ """Determine importance level"""
528
+ published_at_str = video_data['published_at']
529
+ # Add 'Z' when timezone information is missing
530
+ if 'Z' not in published_at_str and '+' not in published_at_str:
531
+ published_at_str += 'Z'
532
+
533
+ published_at = datetime.fromisoformat(published_at_str.replace('Z', '+00:00'))
534
+
535
+ now_utc = datetime.now(published_at.tzinfo)
536
+ hours_since_published = (now_utc - published_at).total_seconds() / 3600
537
+ view_count = video_data['view_count']
538
+
539
+ if hours_since_published <= 24 and view_count >= 10000:
540
+ return "Critical"
541
+ elif hours_since_published <= 48 and view_count >= 10000:
542
+ return "Important"
543
+ else:
544
+ return "Normal"
545
+
546
+ def detect_trends(self) -> List[Dict]:
547
+ """Detect trending clusters"""
548
+ conn = sqlite3.connect('competitor_data.db')
549
+ cursor = conn.cursor()
550
+
551
+ # ้ŽๅŽป2ๆ—ฅไปฅๅ†…ใฎๅ‹•็”ปใงไบบ็‰ฉใŒ็‰นๅฎšใ•ใ‚ŒใŸใ‚‚ใฎใ‚’ๅ–ๅพ—
552
+ two_days_ago = (datetime.now() - timedelta(days=2)).isoformat()
553
+
554
+ cursor.execute('''
555
+ SELECT detected_person, COUNT(*) as video_count,
556
+ GROUP_CONCAT(video_id) as video_ids,
557
+ GROUP_CONCAT(DISTINCT channel_id) as channels,
558
+ GROUP_CONCAT(detection_source) as sources
559
+ FROM videos
560
+ WHERE detected_person IS NOT NULL
561
+ AND detected_person != ''
562
+ AND published_at > ?
563
+ GROUP BY detected_person
564
+ HAVING COUNT(*) >= 2
565
+ AND COUNT(DISTINCT channel_id) >= 2
566
+ ORDER BY video_count DESC
567
+ ''', (two_days_ago,))
568
+
569
+ trends = []
570
+ for row in cursor.fetchall():
571
+ person_name, count, video_ids, channels, sources = row
572
+ unique_channels = len(set(channels.split(',')))
573
+
574
+ trends.append({
575
+ 'person_name': person_name,
576
+ 'video_count': count,
577
+ 'unique_channels': unique_channels,
578
+ 'video_ids': video_ids.split(','),
579
+ 'detection_sources': sources.split(',')
580
+ })
581
+
582
+ conn.close()
583
+ return trends
584
+
585
+ def update_all_data(self) -> str:
586
+ """Update data for all channels"""
587
+ channels = self.get_channels()
588
+ total_new_videos = 0
589
+
590
+ conn = sqlite3.connect('competitor_data.db')
591
+ cursor = conn.cursor()
592
+
593
+ for channel in channels:
594
+ channel_id = channel['id']
595
+ last_update = channel['last_updated_at']
596
+
597
+ # Only fetch videos published after the channel's last_updated_at
598
+ videos = self.fetch_videos_from_channel(channel_id, since_date=last_update)
599
+
600
+ for video in videos:
601
+ # ๅŒ…ๆ‹ฌ็š„ใชไบบ็‰ฉๅๆŠฝๅ‡บ
602
+ detected_person, detection_source = self.extract_person_comprehensive(video)
603
+
604
+ # ้‡่ฆๅบฆใ‚’ๅˆคๅฎš
605
+ importance = self.determine_importance(video)
606
+
607
+ # ใƒ‡ใƒผใ‚ฟใƒ™ใƒผใ‚นใซไฟๅญ˜
608
+ cursor.execute('''
609
+ INSERT OR IGNORE INTO videos
610
+ (video_id, channel_id, title, description, tags, published_at, view_count,
611
+ thumbnail_url, detected_person, detection_source, importance_level, created_at)
612
+ VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
613
+ ''', (
614
+ video['video_id'],
615
+ channel_id,
616
+ video['title'],
617
+ video['description'],
618
+ ','.join(video['tags']) if video['tags'] else '',
619
+ video['published_at'],
620
+ video['view_count'],
621
+ video['thumbnail_url'],
622
+ detected_person,
623
+ detection_source,
624
+ importance,
625
+ datetime.now(timezone.utc).isoformat()
626
+ ))
627
+
628
+ if cursor.rowcount > 0:
629
+ total_new_videos += 1
630
+
631
+ # Update this channel's last_updated_at to now
632
+ cursor.execute('''
633
+ UPDATE channels
634
+ SET last_updated_at = ?
635
+ WHERE channel_id = ?
636
+ ''', (datetime.now(timezone.utc).isoformat(), channel_id))
637
+
638
+ conn.commit()
639
+ conn.close()
640
+
641
+ return f"Update complete: added {total_new_videos} new videos"
642
+
643
+ def get_recent_videos_by_timerange(self, hours: int, limit: int = 50) -> List[Dict]:
644
+ """Get videos within the specified time range sorted by view count (JST-based)"""
645
+ conn = sqlite3.connect('competitor_data.db')
646
+ cursor = conn.cursor()
647
+
648
+ # ๆ—ฅๆœฌๆ™‚้–“๏ผˆJST = UTC+9๏ผ‰ใงๆŒ‡ๅฎšๆ™‚้–“ๅ‰ใฎๆ—ฅๆ™‚ใ‚’่จˆ็ฎ—
649
+ jst = timezone(timedelta(hours=9))
650
+ cutoff_time_jst = datetime.now(jst) - timedelta(hours=hours)
651
+ cutoff_time_utc = cutoff_time_jst.astimezone(timezone.utc)
652
+
653
+ cursor.execute('''
654
+ SELECT v.video_id, v.title, v.published_at, v.view_count, v.thumbnail_url,
655
+ v.detected_person, v.detection_source, v.importance_level,
656
+ c.channel_name, c.channel_icon_url, v.channel_id
657
+ FROM videos v
658
+ JOIN channels c ON v.channel_id = c.channel_id
659
+ WHERE v.published_at > ?
660
+ ORDER BY v.view_count DESC
661
+ LIMIT ?
662
+ ''', (cutoff_time_utc.isoformat(), limit))
663
+
664
+ videos = []
665
+ for row in cursor.fetchall():
666
+ video_id, title, published_at, view_count, thumbnail_url, detected_person, \
667
+ detection_source, importance_level, channel_name, channel_icon_url, channel_id = row
668
+
669
+ # UTCๆ™‚้–“ใ‚’JSTใซๅค‰ๆ›
670
+ published_at_utc = datetime.fromisoformat(published_at.replace('Z', '+00:00'))
671
+ published_at_jst = published_at_utc.astimezone(jst)
672
+
673
+ videos.append({
674
+ 'video_id': video_id,
675
+ 'title': title,
676
+ 'published_at': published_at,
677
+ 'published_at_jst': published_at_jst,
678
+ 'view_count': view_count,
679
+ 'thumbnail_url': thumbnail_url,
680
+ 'detected_person': detected_person or 'Not detected',
681
+ 'detection_source': detection_source or '-',
682
+ 'importance_level': importance_level or 'Normal',
683
+ 'channel_name': channel_name,
684
+ 'channel_icon_url': channel_icon_url,
685
+ 'channel_id': channel_id
686
+ })
687
+
688
+ conn.close()
689
+ return videos
690
+
691
+ def generate_recent_videos_html(self, hours: int, limit: int = 50) -> str:
692
+ """Generate HTML for the recent videos list"""
693
+ videos = self.get_recent_videos_by_timerange(hours, limit)
694
+
695
+ # Build a description for the time range
696
+ time_range_text = f"Past {hours} hours"
697
+
698
+ html = f"""
699
+ <!DOCTYPE html>
700
+ <html lang="en">
701
+ <head>
702
+ <meta charset="UTF-8">
703
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
704
+ <title>Recent Videos - {time_range_text}</title>
705
+ <style>
706
+ body {{ font-family: 'Helvetica Neue', Arial, sans-serif; margin: 0; padding: 20px; background-color: #f5f5f5; }}
707
+ .container {{ max-width: 1200px; margin: 0 auto; }}
708
+ h1 {{ color: #333; text-align: center; margin-bottom: 30px; }}
709
+ .stats-info {{
710
+ background: #e3f2fd;
711
+ padding: 15px;
712
+ border-radius: 8px;
713
+ margin-bottom: 20px;
714
+ text-align: center;
715
+ font-size: 16px;
716
+ color: #1976d2;
717
+ }}
718
+ .video-item {{
719
+ display: flex;
720
+ align-items: flex-start;
721
+ background: white;
722
+ margin-bottom: 15px;
723
+ padding: 15px;
724
+ border-radius: 8px;
725
+ box-shadow: 0 2px 4px rgba(0,0,0,0.1);
726
+ border-left: 4px solid #ccc;
727
+ }}
728
+ .video-item.Critical {{ border-left-color: #f44336; background-color: #ffebee; }}
729
+ .video-item.Important {{ border-left-color: #ff9800; background-color: #fff3e0; }}
730
+ .thumbnail {{
731
+ width: 160px;
732
+ height: 90px;
733
+ object-fit: cover;
734
+ margin-right: 15px;
735
+ border-radius: 4px;
736
+ flex-shrink: 0;
737
+ }}
738
+ .video-info {{ flex: 1; }}
739
+ .video-title {{
740
+ font-weight: bold;
741
+ margin-bottom: 8px;
742
+ font-size: 16px;
743
+ line-height: 1.4;
744
+ }}
745
+ .video-meta {{
746
+ color: #666;
747
+ font-size: 14px;
748
+ line-height: 1.6;
749
+ margin-bottom: 5px;
750
+ }}
751
+ .channel-info {{
752
+ display: flex;
753
+ align-items: center;
754
+ margin-bottom: 8px;
755
+ }}
756
+ .channel-icon {{
757
+ width: 24px;
758
+ height: 24px;
759
+ border-radius: 50%;
760
+ margin-right: 8px;
761
+ }}
762
+ .stats {{ color: #2196f3; font-weight: bold; }}
763
+ .importance-badge {{
764
+ display: inline-block;
765
+ padding: 2px 8px;
766
+ border-radius: 12px;
767
+ font-size: 12px;
768
+ font-weight: bold;
769
+ margin-left: 10px;
770
+ }}
771
+ .importance-badge.Critical {{ background: #f44336; color: white; }}
772
+ .importance-badge.Important {{ background: #ff9800; color: white; }}
773
+ .importance-badge.Normal {{ background: #4caf50; color: white; }}
774
+ .detection-badge {{
775
+ display: inline-block;
776
+ padding: 2px 6px;
777
+ border-radius: 8px;
778
+ font-size: 11px;
779
+ background: #2196f3;
780
+ color: white;
781
+ margin-left: 5px;
782
+ }}
783
+ .video-links {{ margin-top: 8px; }}
784
+ .video-links a {{
785
+ display: inline-block;
786
+ margin-right: 15px;
787
+ color: #1976d2;
788
+ text-decoration: none;
789
+ font-size: 14px;
790
+ padding: 4px 8px;
791
+ border: 1px solid #1976d2;
792
+ border-radius: 4px;
793
+ transition: background-color 0.3s;
794
+ }}
795
+ .video-links a:hover {{
796
+ background-color: #e3f2fd;
797
+ }}
798
+ .no-videos {{
799
+ text-align: center;
800
+ padding: 40px;
801
+ color: #666;
802
+ font-size: 16px;
803
+ }}
804
+ </style>
805
+ </head>
806
+ <body>
807
+ <div class="container">
808
+ <h1>๐Ÿ“บ Recent Videos ({time_range_text} ยท Sorted by views)</h1>
809
+
810
+ <div class="stats-info">
811
+ ๐Ÿ“Š Showing {len(videos)} videos (up to {limit}) | ๐Ÿ• Times shown in JST | ๐Ÿ“ˆ Sorted by views
812
+ </div>
813
+ """
814
+
815
+ if videos:
816
+ for video in videos:
817
+ # ๆ—ฅๆœฌๆ™‚้–“ใงใฎๆŠ•็จฟๆ—ฅๆ™‚ใ‚’ใƒ•ใ‚ฉใƒผใƒžใƒƒใƒˆ
818
+ published_jst = video['published_at_jst'].strftime('%m/%d %H:%M')
819
+
820
+ # Construct video and channel URLs
821
+ video_url = f"https://www.youtube.com/watch?v={video['video_id']}"
822
+ channel_url = f"https://www.youtube.com/channel/{video['channel_id']}"
823
+
824
+ html += f"""
825
+ <div class="video-item {video['importance_level']}">
826
+ <img src="{video['thumbnail_url']}" alt="thumbnail" class="thumbnail">
827
+ <div class="video-info">
828
+ <div class="video-title">{video['title']}</div>
829
+ <div class="channel-info">
830
+ <img src="{video['channel_icon_url']}" alt="channel" class="channel-icon">
831
+ <span>{video['channel_name']}</span>
832
+ </div>
833
+ <div class="video-meta">
834
+ ๐Ÿ“… {published_jst} (JST) |
835
+ <span class="stats">๐Ÿ‘€ {video['view_count']:,} views</span> |
836
+ ๐Ÿ‘ค {video['detected_person']}
837
+ <span class="detection-badge">{video['detection_source']}</span>
838
+ <span class="importance-badge {video['importance_level']}">{video['importance_level']}</span>
839
+ </div>
840
+ <div class="video-links">
841
+ <a href="{video_url}" target="_blank">๐ŸŽฌ Watch video</a>
842
+ <a href="{channel_url}" target="_blank">๐Ÿ“บ Channel</a>
843
+ </div>
844
+ </div>
845
+ </div>
846
+ """
847
+ else:
848
+ html += f"""
849
+ <div class="no-videos">
850
+ ๐Ÿ“ญ No videos were posted in the {time_range_text}.<br>
851
+ Try updating data or expanding the time range.
852
+ </div>
853
+ """
854
+
855
+ html += """
856
+ </div>
857
+ </body>
858
+ </html>
859
+ """
860
+
861
+ return html
862
+
863
+ def generate_dashboard(self) -> str:
864
+ """Generate the HTML dashboard"""
865
+ trends = self.detect_trends()
866
+
867
+ conn = sqlite3.connect('competitor_data.db')
868
+ cursor = conn.cursor()
869
+
870
+ # ้‡่ฆๅบฆๅˆฅใฎๅ‹•็”ปใ‚’ๅ–ๅพ—
871
+ cursor.execute('''
872
+ SELECT v.title, v.published_at, v.view_count, v.detected_person,
873
+ v.importance_level, c.channel_name, v.thumbnail_url, v.video_id,
874
+ v.channel_id, v.detection_source
875
+ FROM videos v
876
+ JOIN channels c ON v.channel_id = c.channel_id
877
+ WHERE v.published_at > ?
878
+ ORDER BY
879
+ CASE v.importance_level
880
+ WHEN 'Critical' THEN 1
881
+ WHEN 'Important' THEN 2
882
+ ELSE 3
883
+ END,
884
+ v.view_count DESC
885
+ ''', ((datetime.now() - timedelta(days=7)).isoformat(),))
886
+
887
+ videos = cursor.fetchall()
888
+ conn.close()
889
+
890
+ # Generate HTML
891
+ html = """
892
+ <!DOCTYPE html>
893
+ <html lang="en">
894
+ <head>
895
+ <meta charset="UTF-8">
896
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
897
+ <title>YouTube Competitor Analysis Dashboard (Global)</title>
898
+ <style>
899
+ body { font-family: 'Helvetica Neue', Arial, sans-serif; margin: 0; padding: 20px; background-color: #f5f5f5; }
900
+ .container { max-width: 1200px; margin: 0 auto; }
901
+ h1 { color: #333; text-align: center; margin-bottom: 30px; }
902
+ h2 { border-bottom: 2px solid #eee; padding-bottom: 10px; }
903
+ .section { background: white; margin-bottom: 30px; padding: 20px; border-radius: 8px; box-shadow: 0 2px 4px rgba(0,0,0,0.1); }
904
+ .trend-item { border: 1px solid #ddd; margin-bottom: 15px; padding: 15px; border-radius: 5px; background: #fafafa; }
905
+ .trend-title { font-size: 18px; font-weight: bold; color: #d32f2f; margin-bottom: 10px; }
906
+ .trend-meta { font-size: 14px; color: #666; margin-bottom: 5px; }
907
+ .video-item { display: flex; align-items: flex-start; margin-bottom: 15px; padding: 10px; border-left: 4px solid #ccc; }
908
+ .video-item.Critical { border-left-color: #f44336; background-color: #ffebee; }
909
+ .video-item.Important { border-left-color: #ff9800; background-color: #fff3e0; }
910
+ .thumbnail { width: 160px; height: 90px; object-fit: cover; margin-right: 15px; border-radius: 4px; flex-shrink: 0; }
911
+ .video-info { flex: 1; }
912
+ .video-title { font-weight: bold; margin-bottom: 5px; font-size: 16px; }
913
+ .video-meta { color: #666; font-size: 14px; line-height: 1.6; }
914
+ .importance-badge {
915
+ display: inline-block;
916
+ padding: 2px 8px;
917
+ border-radius: 12px;
918
+ font-size: 12px;
919
+ font-weight: bold;
920
+ margin-left: 10px;
921
+ }
922
+ .importance-badge.Critical { background: #f44336; color: white; }
923
+ .importance-badge.Important { background: #ff9800; color: white; }
924
+ .importance-badge.Normal { background: #4caf50; color: white; }
925
+ .detection-badge {
926
+ display: inline-block;
927
+ padding: 2px 6px;
928
+ border-radius: 8px;
929
+ font-size: 11px;
930
+ background: #2196f3;
931
+ color: white;
932
+ margin-left: 5px;
933
+ }
934
+ .stats { color: #2196f3; font-weight: bold; }
935
+ .video-links { margin-top: 8px; }
936
+ .video-links a {
937
+ display: inline-block;
938
+ margin-right: 15px;
939
+ color: #1976d2;
940
+ text-decoration: none;
941
+ font-size: 14px;
942
+ padding: 4px 8px;
943
+ border: 1px solid #1976d2;
944
+ border-radius: 4px;
945
+ transition: background-color 0.3s;
946
+ }
947
+ .video-links a:hover {
948
+ background-color: #e3f2fd;
949
+ }
950
+ </style>
951
+ </head>
952
+ <body>
953
+ <div class="container">
954
+ <h1>๐ŸŒ YouTube Competitor Analysis Dashboard (Global)</h1>
955
+
956
+ <div class="section">
957
+ <h2>๐Ÿ”ฅ Currently Trending Clusters</h2>
958
+ """
959
+
960
+ if trends:
961
+ for trend in trends:
962
+ detection_sources = list(set(trend['detection_sources']))
963
+ sources_text = ', '.join(detection_sources)
964
+
965
+ html += f"""
966
+ <div class="trend-item">
967
+ <div class="trend-title">๐Ÿ‘ค {trend['person_name']}</div>
968
+ <div class="trend-meta">๐Ÿ“บ <strong>{trend['video_count']}</strong> videos posted across <strong>{trend['unique_channels']}</strong> channels</div>
969
+ <div class="trend-meta">๐Ÿ” Detection methods: {sources_text}</div>
970
+ </div>
971
+ """
972
+ else:
973
+ html += "<p>There are currently no trending clusters.</p>"
974
+
975
+ html += """
976
+ </div>
977
+
978
+ <div class="section">
979
+ <h2>๐Ÿ“บ Recent Videos (Global person detection)</h2>
980
+ """
981
+
982
+ for video in videos[:20]: # ไธŠไฝ20ๆœฌใ‚’่กจ็คบ
983
+ (title, published_at, view_count, detected_person, importance,
984
+ channel_name, thumbnail_url, video_id, channel_id, detection_source) = video
985
+
986
+ published_date = datetime.fromisoformat(published_at.replace('Z', '+00:00')).strftime('%m/%d %H:%M')
987
+ person_display = detected_person if detected_person else "Unknown"
988
+
989
+ # URLใ‚’็”Ÿๆˆ
990
+ video_url = f"https://www.youtube.com/watch?v={video_id}"
991
+ channel_url = f"https://www.youtube.com/channel/{channel_id}"
992
+
993
+ html += f"""
994
+ <div class="video-item {importance}">
995
+ <img src="{thumbnail_url}" alt="thumbnail" class="thumbnail">
996
+ <div class="video-info">
997
+ <div class="video-title">{title}</div>
998
+ <div class="video-meta">
999
+ ๐Ÿ“บ {channel_name} | ๐Ÿ“… {published_date} |
1000
+ <span class="stats">๐Ÿ‘€ {view_count:,} views</span> |
1001
+ ๐Ÿ‘ค {person_display}
1002
+ <span class="detection-badge">{detection_source}</span>
1003
+ <span class="importance-badge {importance}">{importance}</span>
1004
+ </div>
1005
+ <div class="video-links">
1006
+ <a href="{video_url}" target="_blank">๐ŸŽฌ Watch video</a>
1007
+ <a href="{channel_url}" target="_blank">๐Ÿ“บ Channel</a>
1008
+ </div>
1009
+ </div>
1010
+ </div>
1011
+ """
1012
+
1013
+ html += """
1014
+ </div>
1015
+ </div>
1016
+ </body>
1017
+ </html>
1018
+ """
1019
+
1020
+ return html
1021
+
1022
+ def generate_channel_management_html(self) -> str:
1023
+ """Generate HTML for channel management"""
1024
+ channels = self.get_channels()
1025
+
1026
+ html = """
1027
+ <!DOCTYPE html>
1028
+ <html lang="en">
1029
+ <head>
1030
+ <meta charset="UTF-8">
1031
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
1032
+ <title>Channel Management (Global)</title>
1033
+ <style>
1034
+ body { font-family: 'Helvetica Neue', Arial, sans-serif; margin: 0; padding: 20px; background-color: #f5f5f5; }
1035
+ .container { max-width: 800px; margin: 0 auto; }
1036
+ h2 { border-bottom: 2px solid #eee; padding-bottom: 10px; }
1037
+ .channel-item {
1038
+ display: flex;
1039
+ align-items: center;
1040
+ background: white;
1041
+ margin-bottom: 15px;
1042
+ padding: 15px;
1043
+ border-radius: 8px;
1044
+ box-shadow: 0 2px 4px rgba(0,0,0,0.1);
1045
+ }
1046
+ .channel-icon {
1047
+ width: 48px;
1048
+ height: 48px;
1049
+ border-radius: 50%;
1050
+ margin-right: 15px;
1051
+ object-fit: cover;
1052
+ }
1053
+ .channel-info { flex: 1; }
1054
+ .channel-name { font-weight: bold; font-size: 16px; margin-bottom: 5px; }
1055
+ .channel-meta { color: #666; font-size: 14px; }
1056
+ .channel-actions { display: flex; gap: 10px; }
1057
+ .btn {
1058
+ padding: 6px 12px;
1059
+ border: none;
1060
+ border-radius: 4px;
1061
+ cursor: pointer;
1062
+ font-size: 12px;
1063
+ text-decoration: none;
1064
+ display: inline-block;
1065
+ transition: opacity 0.3s;
1066
+ }
1067
+ .btn-edit { background: #2196f3; color: white; }
1068
+ .btn-delete { background: #f44336; color: white; }
1069
+ .btn:hover { opacity: 0.8; }
1070
+ </style>
1071
+ </head>
1072
+ <body>
1073
+ <div class="container">
1074
+ <h2>๐ŸŒ Registered Channels (Global person detection)</h2>
1075
+ """
1076
+
1077
+ if channels:
1078
+ for channel in channels:
1079
+ added_date = datetime.fromisoformat(channel['added_date']).strftime('%Y/%m/%d')
1080
+ subscriber_text = f"{channel['subscriber_count']:,} subscribers" if channel['subscriber_count'] else "Private"
1081
+
1082
+ html += f"""
1083
+ <div class="channel-item">
1084
+ <img src="{channel['icon_url']}" alt="icon" class="channel-icon">
1085
+ <div class="channel-info">
1086
+ <div class="channel-name">{channel['name']}</div>
1087
+ <div class="channel-meta">
1088
+ ๐Ÿ‘ฅ Subscribers: {subscriber_text} | ๐Ÿ“… Added: {added_date}
1089
+ </div>
1090
+ <div class="channel-meta">
1091
+ ๐Ÿ†” {channel['id']}
1092
+ </div>
1093
+ </div>
1094
+ <div class="channel-actions">
1095
+ <button class="btn btn-edit" onclick="editChannel('{channel['id']}', '{channel['name']}')">โœ๏ธ Edit</button>
1096
+ <button class="btn btn-delete" onclick="deleteChannel('{channel['id']}')">๐Ÿ—‘๏ธ Delete</button>
1097
+ </div>
1098
+ </div>
1099
+ """
1100
+ else:
1101
+ html += "<p>No channels registered.</p>"
1102
+
1103
+ html += """
1104
+ <script>
1105
+ function editChannel(channelId, currentName) {
1106
+ const newName = prompt('Enter new channel name:', currentName);
1107
+ if (newName && newName !== currentName) {
1108
+ alert('Channel name updates must be performed from the Gradio interface.');
1109
+ }
1110
+ }
1111
+
1112
+ function deleteChannel(channelId) {
1113
+ if (confirm('Are you sure you want to delete this channel?\\nRelated video data will also be removed.')) {
1114
+ alert('Channel deletion must be performed from the Gradio interface.\\nChannel ID: ' + channelId);
1115
+ }
1116
+ }
1117
+ </script>
1118
+ </div>
1119
+ </body>
1120
+ </html>
1121
+ """
1122
+
1123
+ return html
1124
+
1125
+ # ใ‚ขใƒ—ใƒชใฎใ‚คใƒณใ‚นใ‚ฟใƒณใ‚นใ‚’ไฝœๆˆ
1126
+ analyzer = YouTubeCompetitorAnalyzer()
1127
+
1128
+ # Gradio ใ‚คใƒณใ‚ฟใƒผface
1129
+ def add_channel_interface(channel_ids_text):
1130
+ """Interface function that supports adding multiple channel IDs"""
1131
+ if not channel_ids_text:
1132
+ return "Please enter channel ID"
1133
+
1134
+ # ๆ”น่กŒใงๅˆ†ๅ‰ฒใ—ใ€ๅ‰ๅพŒใฎ็ฉบ็™ฝใ‚’้™คๅŽปใ—ใ€็ฉบ่กŒใ‚’็„ก่ฆ–ใ™ใ‚‹
1135
+ channel_ids = [cid.strip() for cid in channel_ids_text.split('\n') if cid.strip()]
1136
+
1137
+ if not channel_ids:
1138
+ return "No valid channel IDs provided."
1139
+
1140
+ results = []
1141
+ # Process each channel ID in order
1142
+ for channel_id in channel_ids:
1143
+ result = analyzer.add_channel(channel_id)
1144
+ results.append(result)
1145
+
1146
+ # ็ตๆžœใ‚’ๆ”น่กŒใง้€ฃ็ตใ—ใฆ่ฟ”ใ™
1147
+ return "\n".join(results)
1148
+
1149
+ def delete_channel_interface(channel_id):
1150
+ if not channel_id:
1151
+ return "Please enter a channel ID to delete"
1152
+ return analyzer.delete_channel(channel_id.strip())
1153
+
1154
+ def update_channel_name_interface(channel_id, new_name):
1155
+ if not channel_id or not new_name:
1156
+ return "Please enter channel ID and new name"
1157
+ return analyzer.update_channel_name(channel_id.strip(), new_name.strip())
1158
+
1159
+ def update_data_interface():
1160
+ return analyzer.update_all_data()
1161
+
1162
+ def show_dashboard():
1163
+ return analyzer.generate_dashboard()
1164
+
1165
+ def show_channel_management():
1166
+ return analyzer.generate_channel_management_html()
1167
+
1168
+ def show_recent_videos_interface(hours_selection, limit_selection):
1169
+ """Interface function for recent videos list"""
1170
+ hours_map = {
1171
+ "6 hours": 6,
1172
+ "12 hours": 12,
1173
+ "24 hours": 24,
1174
+ "48 hours": 48
1175
+ }
1176
+
1177
+ limit_map = {
1178
+ "20 items": 20,
1179
+ "50 items": 50,
1180
+ "100 items": 100,
1181
+ "200 items": 200
1182
+ }
1183
+
1184
+ hours = hours_map.get(hours_selection, 24)
1185
+ limit = limit_map.get(limit_selection, 50)
1186
+
1187
+ return analyzer.generate_recent_videos_html(hours, limit)
1188
+
1189
+ # Gradioใ‚ขใƒ—ใƒชใฎๆง‹็ฏ‰
1190
+ with gr.Blocks(title="YouTube Competitor Analysis (Global)", theme=gr.themes.Soft()) as app:
1191
+ gr.Markdown("# ๐ŸŒ YouTube Competitor Analysis App (Global)")
1192
+ gr.Markdown("Analyze competitor channel uploads and detect global clustered trends using **Gemini 2.5 Flash**.")
1193
+
1194
+ with gr.Tab("๐Ÿ“Š Dashboard"):
1195
+ gr.Markdown("## Global Analysis Dashboard")
1196
+
1197
+ refresh_btn = gr.Button("๐Ÿ“ˆ Refresh Dashboard", variant="secondary")
1198
+ dashboard_html = gr.HTML()
1199
+
1200
+ refresh_btn.click(show_dashboard, inputs=[], outputs=[dashboard_html])
1201
+
1202
+ # initial load
1203
+ app.load(show_dashboard, inputs=[], outputs=[dashboard_html])
1204
+
1205
+ with gr.Tab("๐Ÿ“บ Recent Videos"):
1206
+ gr.Markdown("## ๐Ÿ“บ Recent Videos (select time range and max items)")
1207
+ gr.Markdown("""
1208
+ ### Features
1209
+ - โฐ Time range selection: choose between 6 to 48 hours
1210
+ - ๐Ÿ“Š Sorted by view count: show the highest-view videos first
1211
+ - ๐Ÿ• Times displayed in JST (UTC+9)
1212
+ - ๐Ÿ”ข Max items: limit display between 20 and 200
1213
+ - ๐ŸŒ Global detection: detect notable people worldwide
1214
+ """)
1215
+
1216
+ with gr.Row():
1217
+ hours_dropdown = gr.Dropdown(
1218
+ choices=["6 hours", "12 hours", "24 hours", "48 hours"],
1219
+ value="24 hours",
1220
+ label="โฐ Time Range"
1221
+ )
1222
+ limit_dropdown = gr.Dropdown(
1223
+ choices=["20 items", "50 items", "100 items", "200 items"],
1224
+ value="50 items",
1225
+ label="๐Ÿ”ข Max items"
1226
+ )
1227
+
1228
+ update_recent_btn = gr.Button("๐Ÿ”„ Update Recent Videos", variant="primary", size="lg")
1229
+ recent_videos_html = gr.HTML()
1230
+
1231
+ update_recent_btn.click(
1232
+ show_recent_videos_interface,
1233
+ inputs=[hours_dropdown, limit_dropdown],
1234
+ outputs=[recent_videos_html]
1235
+ )
1236
+
1237
+ # initial load (24 hours, 50 items)
1238
+ app.load(
1239
+ show_recent_videos_interface,
1240
+ inputs=[gr.State("24 hours"), gr.State("50 items")],
1241
+ outputs=[recent_videos_html]
1242
+ )
1243
+
1244
+ with gr.Tab("๐Ÿ”„ Data Update"):
1245
+ gr.Markdown("## ๐ŸŒ Global AI High-Precision Data Update System")
1246
+ gr.Markdown("""
1247
+ ### ๐ŸŽฏ Person extraction using Gemini 2.5 Flash (priority order)
1248
+ 1. **๐Ÿค– Gemini title analysis** - high-precision extraction of notable people from titles
1249
+ 2. **๐Ÿค– Gemini description analysis** - detect people via hashtags and description
1250
+ 3. **๐Ÿท๏ธ Tag analysis** - identify multilingual person names from tags
1251
+ 4. **๐Ÿ–ผ๏ธ Thumbnail OCR** - read text from thumbnails
1252
+ 5. **๐Ÿ‘ค Face recognition** - identify persons via face recognition
1253
+
1254
+ **๐ŸŒ Global support**: detects people regardless of nationality, era, or field
1255
+ """)
1256
+
1257
+ # API key inputs (allow user to provide keys at runtime instead of env vars)
1258
+ with gr.Row():
1259
+ youtube_key_input = gr.Textbox(label="YouTube API Key", placeholder="Enter YouTube API key", type="password")
1260
+ gemini_key_input = gr.Textbox(label="Gemini API Key", placeholder="Enter Gemini API key", type="password")
1261
+
1262
+ apply_keys_btn = gr.Button("Apply API Keys", variant="secondary")
1263
+ api_keys_result = gr.Textbox(label="API Key Status", interactive=False)
1264
+
1265
+ # Keep applied keys in hidden Gradio state so other callbacks can reference them if needed
1266
+ youtube_key_state = gr.State("")
1267
+ gemini_key_state = gr.State("")
1268
+
1269
+ # set_api_keys now returns (status, youtube_key, gemini_key)
1270
+ apply_keys_btn.click(
1271
+ set_api_keys,
1272
+ inputs=[youtube_key_input, gemini_key_input],
1273
+ outputs=[api_keys_result, youtube_key_state, gemini_key_state]
1274
+ )
1275
+
1276
+ update_btn = gr.Button("๐ŸŒ Start Global Data Update", variant="primary", size="lg")
1277
+ update_result = gr.Textbox(label="Update Result", interactive=False)
1278
+
1279
+ update_btn.click(update_data_interface, inputs=[], outputs=[update_result])
1280
+
1281
+ with gr.Tab("๐Ÿ“บ Channel Management"):
1282
+ with gr.Row():
1283
+ with gr.Column(scale=2):
1284
+ gr.Markdown("## Add Channels")
1285
+ # Textbox switched to multiline to support multiple lines
1286
+ channel_input = gr.TextArea(
1287
+ label="Channel ID (one per line)",
1288
+ placeholder="UCxxxxxxxxxxxxxxxxxxxxxxxx\nUCyyyyyyyyyyyyyyyyyyyyyyyy",
1289
+ info="Enter multiple YouTube channel IDs separated by newlines"
1290
+ )
1291
+ add_btn = gr.Button("Add Channels", variant="primary")
1292
+ add_result = gr.Textbox(label="Result", interactive=False)
1293
+
1294
+ add_btn.click(add_channel_interface, inputs=[channel_input], outputs=[add_result])
1295
+
1296
+ with gr.Column(scale=2):
1297
+ gr.Markdown("## Delete Channel")
1298
+ delete_channel_input = gr.Textbox(
1299
+ label="Channel ID to delete",
1300
+ placeholder="UCxxxxxxxxxxxxxxxxxxxxxxxx"
1301
+ )
1302
+ delete_btn = gr.Button("Delete Channel", variant="stop")
1303
+ delete_result = gr.Textbox(label="Delete Result", interactive=False)
1304
+
1305
+ delete_btn.click(delete_channel_interface, inputs=[delete_channel_input], outputs=[delete_result])
1306
+
1307
+ with gr.Row():
1308
+ with gr.Column():
1309
+ gr.Markdown("## Edit Channel Name")
1310
+ edit_channel_id = gr.Textbox(label="Channel ID to edit", placeholder="UCxxxxxxxxxxxxxxxxxxxxxxxx")
1311
+ new_channel_name = gr.Textbox(label="New channel name", placeholder="Enter new name")
1312
+ update_name_btn = gr.Button("Update Name", variant="secondary")
1313
+ update_name_result = gr.Textbox(label="Update Result", interactive=False)
1314
+
1315
+ update_name_btn.click(
1316
+ update_channel_name_interface,
1317
+ inputs=[edit_channel_id, new_channel_name],
1318
+ outputs=[update_name_result]
1319
+ )
1320
+
1321
+ gr.Markdown("## Registered Channels")
1322
+ channel_list_html = gr.HTML()
1323
+ refresh_channels_btn = gr.Button("Refresh list", variant="secondary")
1324
+ refresh_channels_btn.click(show_channel_management, inputs=[], outputs=[channel_list_html])
1325
+
1326
+ # initial load
1327
+ app.load(show_channel_management, inputs=[], outputs=[channel_list_html])
1328
+
1329
+ if __name__ == "__main__":
1330
+ app.launch()
requirements.txt ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ gradio==4.44.0
2
+ google-generativeai==0.8.3
3
+ google-api-python-client==2.149.0
4
+ requests==2.32.3
5
+ pandas==2.2.2
6
+ huggingface_hub==0.19.4