Spaces:

rottg
/

telegram-analytics

Sleeping

App Files Files Community

rottg commited on Feb 5

Commit

a99d4dc

1 Parent(s): c703ef2

Upload folder using huggingface_hub

Browse files

Files changed (21) hide show

.gitattributes +1 -0
Dockerfile +31 -0
README.md +1037 -11
ai_search.py +776 -0
algorithms.py +819 -0
dashboard.py +2086 -0
data_structures.py +773 -0
indexer.py +817 -0
requirements.txt +4 -0
schema.sql +200 -0
search.py +564 -0
semantic_search.py +411 -0
static/css/style.css +859 -0
static/js/dashboard.js +622 -0
templates/chat.html +831 -0
templates/index.html +223 -0
templates/moderation.html +459 -0
templates/search.html +359 -0
templates/settings.html +444 -0
templates/user_profile.html +721 -0
templates/users.html +344 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+telegram.db filter=lfs diff=lfs merge=lfs -text

Dockerfile ADDED Viewed

	@@ -0,0 +1,31 @@

+FROM python:3.12-slim
+WORKDIR /app
+# Install dependencies
+COPY requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+# Copy application code
+COPY dashboard.py .
+COPY ai_search.py .
+COPY algorithms.py .
+COPY data_structures.py .
+COPY indexer.py .
+COPY search.py .
+COPY semantic_search.py .
+COPY schema.sql .
+COPY static/ static/
+COPY templates/ templates/
+# Copy database
+COPY telegram.db .
+# HF Spaces uses port 7860
+ENV PORT=7860
+ENV HOST=0.0.0.0
+ENV DB_PATH=telegram.db
+EXPOSE 7860
+CMD ["gunicorn", "dashboard:app", "--bind", "0.0.0.0:7860", "--workers", "2", "--timeout", "120"]

README.md CHANGED Viewed

@@ -1,11 +1,1037 @@
----
-title: Telegram Analytics
-emoji: 💻
-colorFrom: purple
-colorTo: green
-sdk: docker
-pinned: false
-short_description: telegram-analytics
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+---
+title: Telegram Analytics Dashboard
+emoji: 📊
+colorFrom: blue
+colorTo: indigo
+sdk: docker
+app_port: 7860
+---
+# Telegram JSON Indexer & Analyzer
+A high-performance system for indexing, searching, and analyzing Telegram chat exports using SQLite FTS5 and advanced algorithms from Data Structures course. Includes a full-featured **Web Dashboard** with **AI-powered search**.
+```
+╔══════════════════════════════════════════════════════════════════════════════╗
+║                         TELEGRAM CHAT ANALYZER                                ║
+║                                                                               ║
+║  ┌─────────┐    ┌─────────┐    ┌─────────┐    ┌─────────────────────────┐    ║
+║  │  JSON   │───▶│ INDEXER │───▶│ SQLite  │───▶│     WEB DASHBOARD       │    ║
+║  │ Export  │    │ Bloom   │    │ + FTS5  │    │  ┌─────┬─────┬─────┐   │    ║
+║  │         │    │ Filter  │    │         │    │  │Stats│Users│Chat │   │    ║
+║  └─────────┘    └─────────┘    └─────────┘    │  ├─────┼─────┼─────┤   │    ║
+║                                               │  │Search│ AI  │Mod  │   │    ║
+║                                               │  └─────┴─────┴─────┘   │    ║
+║                                               └─────────────────────────┘    ║
+╚══════════════════════════════════════════════════════════════════════════════╝
+```
+## Features
+### Core Features
+- **Full-Text Search** - Fast search with Hebrew support using SQLite FTS5
+- **Fuzzy Search** - Find messages even with typos using trigram similarity
+- **Similar Message Detection** - LCS algorithm finds duplicates/reposts
+- **Conversation Threads** - DFS/BFS traversal reconstructs reply chains
+- **User Rankings** - O(log n) rank queries using AVL Rank Tree
+- **Time Analytics** - Bucket Sort for efficient histograms
+- **Top-K Queries** - Heap-based O(n log k) instead of O(n log n)
+- **Percentiles** - O(n) median/percentiles using Selection algorithm
+### Web Dashboard
+- **Interactive Overview** - Charts, stats, activity graphs
+- **User Leaderboard** - Rankings with detailed user profiles
+- **Telegram-like Chat View** - Browse all messages like in Telegram
+- **Advanced Search** - Full-text + fuzzy search with filters
+- **AI-Powered Search** - Natural language queries (Hebrew/English)
+- **Moderation Analytics** - Links, mentions, domains analysis
+- **Database Updates** - Upload new JSON files via web UI
+### AI Search (Free Providers)
+- **Ollama** - Local LLM (recommended, 100% free)
+- **Groq** - Free API tier available
+- **Google Gemini** - Free API tier available
+---
+## Table of Contents
+1. [Installation](#installation)
+2. [Quick Start](#quick-start)
+3. [Web Dashboard](#web-dashboard)
+4. [AI Search](#ai-search)
+5. [Database Updates](#database-updates)
+6. [Architecture](#architecture)
+7. [Usage Guide](#usage-guide)
+8. [Algorithms](#algorithms)
+9. [API Reference](#api-reference)
+10. [Examples](#examples)
+---
+## Installation
+### Requirements
+- Python 3.10 or higher
+- No external packages required for core functionality
+### Setup
+```bash
+# Clone or download the project
+cd telegram
+# Verify Python version
+python --version  # Should be 3.10+
+# Test the system
+python algorithms.py  # Should print "ALL TESTS PASSED!"
+```
+### Optional: Semantic Search
+For AI-powered semantic similarity search:
+```bash
+pip install numpy faiss-cpu sentence-transformers
+```
+---
+## Quick Start
+### Step 1: Export from Telegram
+1. Open Telegram Desktop
+2. Go to any chat/group
+3. Click ⋮ → Export Chat History
+4. Select JSON format
+5. Save as `result.json`
+### Step 2: Index Your Data
+```bash
+python indexer.py result.json --db telegram.db
+```
+### Step 3: Launch Web Dashboard
+```bash
+# Start the dashboard (recommended)
+python dashboard.py
+# Open in browser: http://localhost:5000
+```
+### Step 4: Search & Analyze (CLI)
+```bash
+# Search messages
+python search.py "שלום"
+# View statistics
+python analyzer.py --stats
+# Find similar messages
+python analyzer.py --similar
+```
+---
+## Web Dashboard
+The web dashboard provides a complete visual interface for analyzing your Telegram data.
+### Starting the Dashboard
+```bash
+python dashboard.py
+# Or with custom port:
+python dashboard.py --port 8080
+```
+### Dashboard Pages
+```
+┌─────────────────────────────────────────────────────────────────────────┐
+│                           WEB DASHBOARD                                  │
+├─────────────────────────────────────────────────────────────────────────┤
+│                                                                          │
+│  📈 Overview      │  Main statistics, charts, activity graphs           │
+│                   │  - Total messages, users, links, media              │
+│                   │  - Daily/hourly activity charts                     │
+│                   │  - Top users leaderboard                            │
+│                                                                          │
+│  👥 Users         │  User leaderboard with detailed profiles            │
+│                   │  - Ranking by message count                         │
+│                   │  - User details modal (hourly activity)             │
+│                   │  - Export users to CSV                              │
+│                                                                          │
+│  💬 Chat          │  Telegram-like message view                         │
+│                   │  - Browse all messages chronologically              │
+│                   │  - Filter by user, date, media type                 │
+│                   │  - Click message to view full thread                │
+│                   │  - AI search with natural language                  │
+│                                                                          │
+│  🔍 Search        │  Advanced search interface                          │
+│                   │  - Full-text search (Hebrew supported)              │
+│                   │  - AI-powered natural language search               │
+│                   │  - Boolean operators (AND, OR, NOT)                 │
+│                   │  - Export search results                            │
+│                                                                          │
+│  🛡️ Moderation    │  Content analytics                                  │
+│                   │  - Top shared domains                               │
+│                   │  - Most mentioned users                             │
+│                   │  - Link sharers leaderboard                         │
+│                   │  - Word frequency analysis                          │
+│                                                                          │
+│  ⚙️ Settings      │  Database management                                │
+│                   │  - View database statistics                         │
+│                   │  - Upload new JSON files                            │
+│                   │  - Automatic duplicate detection                    │
+│                                                                          │
+└─────────────────────────────────────────────────────────────────────────┘
+```
+### Dashboard Features
+- **Dark Theme** - Modern dark UI, easy on the eyes
+- **RTL Support** - Full Hebrew/Arabic text support
+- **Responsive** - Works on mobile and desktop
+- **Real-time Charts** - Interactive Chart.js visualizations
+- **Export** - Download data as CSV/JSON
+---
+## AI Search
+Ask questions about your chat data in natural language (Hebrew or English).
+### Setup AI Provider (Free Options)
+#### Option 1: Ollama (Recommended - 100% Local & Free)
+```bash
+# Install Ollama (https://ollama.ai)
+curl -fsSL https://ollama.ai/install.sh | sh
+# Pull a model
+ollama pull llama3.2
+# Start Ollama server
+ollama serve
+```
+#### Option 2: Groq (Free API Tier)
+```bash
+# Get free API key from https://console.groq.com
+export GROQ_API_KEY="your_api_key"
+```
+#### Option 3: Google Gemini (Free API Tier)
+```bash
+# Get free API key from https://makersuite.google.com/app/apikey
+export GEMINI_API_KEY="your_api_key"
+```
+### AI Search Examples
+```
+┌─────────────────────────────────────────────────────────────────────────┐
+│  🤖 AI Search - Natural Language Queries                                │
+├─────────────────────────────────────────────────────────────────────────┤
+│                                                                          │
+│  Query: "מי שלח הכי הרבה הודעות?"                                       │
+│  Answer: המשתמש הפעיל ביותר הוא דני עם 5,432 הודעות                     │
+│                                                                          │
+│  Query: "מתי היו הכי הרבה הודעות?"                                      │
+│  Answer: היום הפעיל ביותר היה 15.03.2024 עם 342 הודעות                  │
+│                                                                          │
+│  Query: "Who mentioned @admin the most?"                                 │
+│  Answer: User "Mike" mentioned @admin 47 times                           │
+│                                                                          │
+│  Query: "הראה הודעות עם קישורים מהשבוע האחרון"                          │
+│  Answer: נמצאו 23 הודעות עם קישורים...                                  │
+│                                                                          │
+└─────────────────────────────────────────────────────────────────────────┘
+```
+### AI Search API
+```python
+from ai_search import AISearchEngine
+# Initialize with Ollama (local)
+ai = AISearchEngine('telegram.db', provider='ollama')
+# Or with Groq
+ai = AISearchEngine('telegram.db', provider='groq', api_key='your_key')
+# Search
+result = ai.search("מי הכי פעיל בלילה?")
+print(result['answer'])  # Natural language answer
+print(result['sql'])     # Generated SQL query
+print(result['results']) # Raw data
+```
+---
+## Database Updates
+Update your database with new JSON exports without losing existing data.
+### Via Web UI
+1. Go to **Settings** page in the dashboard
+2. Drag & drop your new `result.json` file
+3. Wait for processing (duplicate detection automatic)
+4. See summary of new messages added
+### Via CLI
+```bash
+# Update existing database with new JSON
+python indexer.py new_export.json --db telegram.db --update
+# What happens:
+# 1. Loads existing message IDs into Bloom filter (O(n))
+# 2. For each message in JSON:
+#    - Check if exists using Bloom filter (O(1))
+#    - Only insert if new
+# 3. Re-index FTS if needed
+# 4. Report: X new messages, Y duplicates skipped
+```
+### Incremental Update Process
+```
+┌─────────────────────────────────────────────────────────────────────────┐
+│                    INCREMENTAL UPDATE PROCESS                            │
+├─────────────────────────────────────────────────────────────────────────┤
+│                                                                          │
+│  Existing DB                    New JSON                                 │
+│  ┌─────────────┐               ┌─────────────┐                          │
+│  │ msg_1 ✓     │               │ msg_1       │ → Skip (duplicate)       │
+│  │ msg_2 ✓     │               │ msg_2       │ → Skip (duplicate)       │
+│  │ msg_3 ✓     │               │ msg_5  NEW  │ → Insert                 │
+│  │ msg_4 ✓     │               │ msg_6  NEW  │ → Insert                 │
+│  └─────────────┘               └─────────────┘                          │
+│         │                             │                                  │
+│         │      Bloom Filter           │                                  │
+│         │      ┌───────────┐          │                                  │
+│         └─────▶│ O(1) test │◀─────────┘                                  │
+│                └───────────┘                                             │
+│                                                                          │
+│  Result: Only msg_5 and msg_6 added (fast!)                             │
+│                                                                          │
+└─────────────────────────────────────────────────────────────────────────┘
+```
+---
+## Architecture
+### System Overview
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                         INPUT                                    │
+│  ┌─────────────────────────────────────────────────────────┐    │
+│  │  Telegram JSON Export (result.json)                      │    │
+│  │  ├── messages[]                                          │    │
+│  │  │   ├── id, date, from, text                           │    │
+│  │  │   ├── reply_to_message_id                            │    │
+│  │  │   └── text_entities[] (links, mentions)              │    │
+│  │  └── ...                                                 │    │
+│  └─────────────────────────────────────────────────────────┘    │
+└─────────────────────────┬───────────────────────────────────────┘
+                          │
+                          ▼
+┌─────────────────────────────────────────────────────────────────┐
+│                      INDEXER (indexer.py)                        │
+│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐              │
+│  │   Batch     │  │   Bloom     │  │   Reply     │              │
+│  │  Processing │  │   Filter    │  │   Graph     │              │
+│  │  (1000/tx)  │  │ (Dedup O(1))│  │  Builder    │              │
+│  └─────────────┘  └─────────────┘  └─────────────┘              │
+└─────────────────────────┬───────────────────────────────────────┘
+                          │
+                          ▼
+┌─────────────────────────────────────────────────────────────────┐
+│                    SQLite DATABASE                               │
+│  ┌─────────────────────────────────────────────────────────┐    │
+│  │  messages          │  FTS5 Index      │  reply_graph    │    │
+│  │  ├── id (PK)       │  ├── text_plain  │  ├── parent_id  │    │
+│  │  ├── text_plain    │  └── from_name   │  └── child_id   │    │
+│  │  ├── from_id       │                  │                 │    │
+│  │  ├── date_unixtime │  entities        │  threads        │    │
+│  │  └── ...           │  ├── links       │  └── messages   │    │
+│  │                    │  └── mentions    │                 │    │
+│  └─────────────────────────────────────────────────────────┘    │
+└─────────────────────────┬───────────────────────────────────────┘
+                          │
+          ┌───────────────┼───────────────┐
+          ▼               ▼               ▼
+┌─────────────┐  ┌─────────────┐  ┌─────────────┐
+│   SEARCH    │  │  ANALYZER   │  │   VECTOR    │
+│ (search.py) │  │(analyzer.py)│  │  (optional) │
+│             │  │             │  │             │
+│ • FTS5+BM25 │  │ • Top-K     │  │ • FAISS     │
+│ • Fuzzy     │  │ • LCS       │  │ • Semantic  │
+│ • Threads   │  │ • Rank Tree │  │ • Clustering│
+│ • LRU Cache │  │ • Percentile│  │             │
+└─────────────┘  └─────────────┘  └─────────────┘
+```
+### Data Flow
+```
+JSON Message                 Database Tables              Search/Analytics
+───────────                  ───────────────              ────────────────
+{                           ┌─────────────┐
+  "id": 548795,       ───▶  │  messages   │  ───▶  Full-text search
+  "text": "שלום",           └─────────────┘        User filtering
+  "from": "User1",                                 Date range queries
+  "from_id": "user123", ─▶  ┌─────────────┐
+  "date_unixtime": ...,     │   users     │  ───▶  Top users (Heap)
+                            └─────────────┘        User rank (Rank Tree)
+  "text_entities": [
+    {"type": "link", ────▶  ┌─────────────┐
+     "text": "url"}         │  entities   │  ───▶  Link analysis
+  ],                        └─────────────┘        Mention network
+  "reply_to_message_id" ─▶  ┌─────────────┐
+                            │ reply_graph │  ───▶  Thread DFS/BFS
+}                           └─────────────┘        Conversation view
+```
+### File Structure
+```
+telegram/
+│
+├── dashboard.py        # 🌐 Web Dashboard (Flask)
+│   └── Routes: /, /users, /chat, /search, /moderation, /settings
+│   └── API: /api/overview, /api/users, /api/search, /api/update, etc.
+│
+├── ai_search.py        # 🤖 AI-Powered Search
+│   └── AISearchEngine class
+│       ├── Natural language to SQL
+│       ├── Ollama/Groq/Gemini providers
+│       └── Hebrew/English support
+│
+├── indexer.py          # JSON → SQLite indexer
+│   ├── OptimizedIndexer class
+│   │   ├── Batch processing (100x faster)
+│   │   ├── Bloom filter (duplicate detection)
+│   │   └── Graph builder (reply threads)
+│   └── IncrementalIndexer class
+│       ├── Update existing database
+│       ├── Bloom filter duplicate check
+│       └── Only insert new messages
+│
+├── search.py           # Search interface
+│   └── TelegramSearch class
+│       ├── FTS5 full-text search
+│       ├── Fuzzy trigram search
+│       ├── LRU query cache
+│       └── DFS/BFS thread traversal
+│
+├── analyzer.py         # Analytics & statistics
+│   └── TelegramAnalyzer class
+│       ├── LCS similar messages
+│       ├── Heap-based Top-K
+│       ├── Selection percentiles
+│       ├── Rank Tree queries
+│       └── Bucket Sort histograms
+│
+├── data_structures.py  # Core data structures
+│   ├── BloomFilter     # O(1) membership test
+│   ├── Trie            # O(k) prefix search
+│   ├── LRUCache        # O(1) caching
+│   ├── ReplyGraph      # DFS/BFS traversal
+│   └── TrigramIndex    # Fuzzy matching
+│
+├── algorithms.py       # Course algorithms
+│   ├── LCS             # Similar message detection
+│   ├── TopK (Heap)     # Efficient ranking
+│   ├── Selection       # O(n) percentiles
+│   ├── RankTree        # O(log n) rank queries
+│   └── BucketSort      # Time histograms
+│
+├── templates/          # 🎨 HTML Templates
+│   ├── index.html      # Overview dashboard
+│   ├── users.html      # User leaderboard
+│   ├── chat.html       # Telegram-like chat view
+│   ├── search.html     # Search interface
+│   ├── moderation.html # Content analytics
+│   └── settings.html   # Settings & DB update
+│
+├── static/             # 📁 Static assets
+│   ├── css/style.css   # Dashboard styles
+│   └── js/dashboard.js # Dashboard scripts
+│
+├── vector_search.py    # Optional: Semantic search
+│   └── VectorSearch class (requires FAISS)
+│
+├── schema.sql          # Database schema
+└── telegram.db         # SQLite database (created)
+```
+---
+## Usage Guide
+### Web Dashboard (Recommended)
+```bash
+# Start the dashboard
+python dashboard.py
+# Custom port
+python dashboard.py --port 8080
+# Custom database
+python dashboard.py --db my_chat.db
+```
+### Indexing
+```bash
+# Basic indexing
+python indexer.py result.json
+# Custom database name
+python indexer.py result.json --db my_chat.db
+# With trigram index (for fuzzy search)
+python indexer.py result.json --build-trigrams
+# Larger batch size (faster for big files)
+python indexer.py result.json --batch-size 5000
+# Update existing database with new JSON (incremental)
+python indexer.py new_export.json --db telegram.db --update
+```
+### Searching
+```bash
+# Basic search (Hebrew supported)
+python search.py "שלום"
+# Search with filters
+python search.py "מילה" --user user123456 --limit 50
+# Date range
+python search.py "חדשות" --from-date 2024-01-01 --to-date 2024-12-31
+# Fuzzy search (finds typos)
+python search.py "שלמ" --fuzzy --threshold 0.3
+# View conversation thread
+python search.py --thread 548795
+# List all links
+python search.py --list-links
+# List all mentions
+python search.py --list-mentions
+```
+### Analytics
+```bash
+# General statistics
+python analyzer.py --stats
+# Top users (Heap-based O(n log k))
+python analyzer.py --top-users --limit 10
+# Hourly activity
+python analyzer.py --hourly
+# Daily activity
+python analyzer.py --daily
+# Top words
+python analyzer.py --words --limit 30
+# Top domains
+python analyzer.py --domains
+# Find similar messages (LCS algorithm)
+python analyzer.py --similar --threshold 0.7
+# Find reposts
+python analyzer.py --reposts
+# Message length percentiles (Selection algorithm)
+python analyzer.py --percentiles
+# Response time percentiles
+python analyzer.py --response-times
+# User rank (Rank Tree O(log n))
+python analyzer.py --user-rank user123456
+# Get user at rank #5
+python analyzer.py --rank 5
+# Activity histogram (Bucket Sort)
+python analyzer.py --histogram --bucket-size 86400
+# Export as JSON
+python analyzer.py --stats --json > stats.json
+```
+---
+## Algorithms
+### Algorithm Complexity Comparison
+```
+┌────────────────────┬─────────────────┬─────────────────┬─────────────┐
+│     Operation      │  Naive Method   │  Our Algorithm  │ Improvement │
+├────────────────────┼─────────────────┼─────────────────┼─────────────┤
+│ Top-K users        │ O(n log n) sort │ O(n log k) heap │   ~10x      │
+│ Find median        │ O(n log n) sort │ O(n) selection  │   ~5x       │
+│ User rank query    │ O(n) scan       │ O(log n) tree   │   ~100x     │
+│ Duplicate check    │ O(n) lookup     │ O(1) bloom      │   ~1000x    │
+│ Similar messages   │ O(n²m²) naive   │ O(n²m) LCS+DP   │   ~10x      │
+│ Time histogram     │ O(n log n) sort │ O(n+k) bucket   │   ~5x       │
+│ Thread traversal   │ O(n) repeated   │ O(V+E) DFS/BFS  │   ~10x      │
+└────────────────────┴─────────────────┴─────────────────┴─────────────┘
+```
+### 1. LCS (Longest Common Subsequence)
+**Purpose:** Find similar/duplicate messages
+```
+String 1: "שלום לכולם מה קורה"
+String 2: "שלום לכולם מה נשמע"
+                          ↓
+LCS:      "שלום לכולם מה "
+Similarity: 77.78%
+```
+**Algorithm:**
+```
+┌───┬───┬───┬───┬───┬───┐
+│   │ ∅ │ A │ B │ C │ D │   DP Table
+├───┼───┼───┼───┼───┼───┤
+│ ∅ │ 0 │ 0 │ 0 │ 0 │ 0 │   dp[i][j] = length of LCS
+│ A │ 0 │ 1 │ 1 │ 1 │ 1 │   for first i and j chars
+│ C │ 0 │ 1 │ 1 │ 2 │ 2 │
+│ B │ 0 │ 1 │ 2 │ 2 │ 2 │   Time:  O(m × n)
+│ D │ 0 │ 1 │ 2 │ 2 │ 3 │   Space: O(min(m,n))
+└───┴───┴───┴───┴───┴───┘
+```
+### 2. Heap-based Top-K
+**Purpose:** Find top K items without sorting everything
+```
+Finding Top 3 from [5,2,8,1,9,3,7,4,6]
+Min-Heap (size K=3):
+Step 1: [5]           Add 5
+Step 2: [2,5]         Add 2
+Step 3: [2,5,8]       Add 8 (heap full)
+Step 4: [2,5,8]       Skip 1 (< min)
+Step 5: [5,9,8]       Replace 2 with 9
+Step 6: [5,9,8]       Skip 3 (< min)
+Step 7: [7,9,8]       Replace 5 with 7
+...
+Result: [7,8,9]       Top 3!
+Time: O(n log k) vs O(n log n) for full sort
+```
+### 3. Selection Algorithm (Median of Medians)
+**Purpose:** Find k-th element or percentiles in O(n)
+```
+Find median of [3,1,4,1,5,9,2,6,5,3,5]
+┌─────────────────────────────────────────┐
+│  Divide into groups of 5:               │
+│  [3,1,4,1,5] [9,2,6,5,3] [5]           │
+│       ↓           ↓        ↓            │
+│  Medians: 3       5        5            │
+│       ↓                                 │
+│  Median of medians: 5 (pivot)           │
+│       ↓                                 │
+│  Partition around 5                     │
+│  [3,1,4,1,2,3] [5,5,5] [9,6]           │
+│       6 elements  3     2               │
+│       ↓                                 │
+│  Median is at position 5 → found!       │
+└─────────────────────────────────────────┘
+Time: O(n) guaranteed (not just average!)
+```
+### 4. Rank Tree (Order Statistics Tree)
+**Purpose:** O(log n) rank queries
+```
+AVL Tree with size augmentation:
+           ┌───────────────┐
+           │  150 (size=5) │
+           └───────┬───────┘
+          ┌────────┴────────┐
+    ┌─────┴─────┐     ┌─────┴─────┐
+    │ 100 (s=2) │     │ 250 (s=2) │
+    └─────┬─────┘     └─────┬─────┘
+    ┌─────┴              ┌──┴
+┌───┴───┐            ┌───┴───┐
+│50 (1) │            │300 (1)│
+└───────┘            └───────┘
+select(3) → 150  (3rd smallest)
+rank(150) → 3    (rank of 150)
+Time: O(log n) for both operations
+```
+### 5. Bucket Sort (Time Histograms)
+**Purpose:** O(n+k) time-based grouping
+```
+Messages with timestamps:
+[1000, 1500, 2500, 1200, 3000]
+Bucket size: 1000 seconds
+┌─────────┬─────────┬─────────┬─────────┐
+│ 0-1000  │1000-2000│2000-3000│3000-4000│
+├─────────┼─────────┼─────────┼─────────┤
+│         │ 1000    │  2500   │  3000   │
+│         │ 1500    │         │         │
+│         │ 1200    │         │         │
+├─────────┼─────────┼─────────┼─────────┤
+│ Count:0 │ Count:3 │ Count:1 │ Count:1 │
+└─────────┴─────────┴─────────┴─────────┘
+Time: O(n + k) where k = number of buckets
+```
+### 6. DFS/BFS Thread Traversal
+**Purpose:** Reconstruct conversation threads
+```
+Reply Graph:
+    [1] Original message
+     │
+     ├──[2] Reply to 1
+     │   │
+     │   ├──[4] Reply to 2
+     │   │
+     │   └──[5] Reply to 2
+     │
+     └──[3] Reply to 1
+DFS order: [1, 2, 4, 5, 3]  (deep first)
+BFS order: [1, 2, 3, 4, 5]  (level by level)
+With depth info:
+  [1] depth=0
+    [2] depth=1
+      [4] depth=2
+      [5] depth=2
+    [3] depth=1
+Time: O(V + E)
+```
+---
+## API Reference
+### Dashboard REST API
+The web dashboard exposes a REST API for all operations:
+```
+┌─────────────────────────────────────────────────────────────────────────┐
+│                         REST API ENDPOINTS                               │
+├─────────────────────────────────────────────────────────────────────────┤
+│                                                                          │
+│  GET  /api/overview           Overview statistics                        │
+│       ?timeframe=month        (today|yesterday|week|month|year|all)      │
+│                                                                          │
+│  GET  /api/users              User leaderboard                           │
+│       ?timeframe=month        Timeframe filter                           │
+│       &limit=100              Max users                                  │
+│                                                                          │
+│  GET  /api/user/<user_id>     User details                              │
+│       ?timeframe=month        Includes hourly activity                   │
+│                                                                          │
+│  GET  /api/search             Full-text search                           │
+│       ?q=search_term          Search query                               │
+│       &timeframe=all          Timeframe filter                           │
+│       &limit=20&offset=0      Pagination                                 │
+│                                                                          │
+│  POST /api/ai/search          AI-powered search                          │
+│       {"query": "..."}        Natural language query                     │
+│                                                                          │
+│  GET  /api/chat/messages      Chat messages                              │
+│       ?limit=50&offset=0      Pagination                                 │
+│       &user_id=...            Filter by user                             │
+│       &from_date=...          Date range                                 │
+│                                                                          │
+│  GET  /api/chat/thread/<id>   Get conversation thread                    │
+│                               Returns full thread with DFS               │
+│                                                                          │
+│  GET  /api/top/domains        Top shared domains                         │
+│  GET  /api/top/mentions       Top mentioned users                        │
+│  GET  /api/top/words          Most frequent words                        │
+│                                                                          │
+│  POST /api/update             Update database with JSON                  │
+│       (multipart form)        File upload                                │
+│                                                                          │
+│  GET  /api/db/stats           Database statistics                        │
+│                               Size, counts, date range                   │
+│                                                                          │
+│  GET  /api/export/users       Export users as CSV                        │
+│  GET  /api/export/messages    Export messages as CSV                     │
+│                                                                          │
+├─────────────────────────────────────────────────────────────────────────┤
+│                    ALGORITHM-POWERED ENDPOINTS                           │
+├─────────────────────────────────────────────────────────────────────────┤
+│                                                                          │
+│  GET  /api/similar/<id>       Find similar messages (LCS algorithm)      │
+│       ?threshold=0.7          Similarity threshold                       │
+│       ?limit=10               Max results                                │
+│       Complexity: O(n*m)      n=sample, m=avg length                     │
+│                                                                          │
+│  GET  /api/analytics/similar  Find all similar pairs in DB               │
+│       ?threshold=0.8          Similarity threshold                       │
+│       Algorithm: LCS          O(n² * m) with early termination           │
+│                                                                          │
+│  GET  /api/user/rank/<id>     Get user rank (RankTree)                   │
+│       Complexity: O(log n)    vs O(n) SQL scan                           │
+│                                                                          │
+│  GET  /api/user/by-rank/<k>   Get k-th ranked user (RankTree)            │
+│       Algorithm: select(k)    O(log n)                                   │
+│                                                                          │
+│  GET  /api/analytics/histogram Activity histogram (Bucket Sort)          │
+│       ?bucket=86400           Bucket size in seconds                     │
+│       Complexity: O(n + k)    k=number of buckets                        │
+│                                                                          │
+│  GET  /api/analytics/percentiles Message length stats (Selection)        │
+│       Algorithm: Quickselect  O(n) guaranteed                            │
+│       Returns: min,max,median,p25,p75,p90,p95,p99                        │
+│                                                                          │
+└─────────────────────────────────────────────────────────────────────────┘
+```
+### TelegramSearch
+```python
+from search import TelegramSearch
+with TelegramSearch('telegram.db') as search:
+    # Full-text search
+    results = search.search("שלום", limit=50)
+    # With filters
+    results = search.search(
+        "מילה",
+        user_id="user123",
+        from_date=1704067200,  # Unix timestamp
+        to_date=1735689600,
+        has_links=True
+    )
+    # Fuzzy search
+    results = search.fuzzy_search("שלמ", threshold=0.3)
+    # Get thread (DFS)
+    thread = search.get_thread_dfs(message_id=548795)
+    # Get thread with depth
+    thread = search.get_thread_with_depth(message_id=548795)
+    # Returns: [(message_dict, depth), ...]
+    # Autocomplete usernames
+    suggestions = search.autocomplete_user("@user")
+```
+### TelegramAnalyzer
+```python
+from analyzer import TelegramAnalyzer
+with TelegramAnalyzer('telegram.db') as analyzer:
+    # Statistics
+    stats = analyzer.get_stats()
+    # Top users (Heap-based)
+    top_users = analyzer.get_top_users(limit=10)
+    # Similar messages (LCS)
+    similar = analyzer.find_similar_messages(threshold=0.7)
+    # Percentiles (Selection algorithm)
+    percentiles = analyzer.get_message_length_stats()
+    # Returns: {min, max, median, p25, p75, p90, p95, p99}
+    # User rank (Rank Tree)
+    rank_info = analyzer.get_user_rank("user123")
+    # Returns: {rank, total_users, percentile}
+    # Get user by rank
+    user = analyzer.get_user_by_rank(5)
+    # Histogram (Bucket Sort)
+    hist = analyzer.get_activity_histogram(bucket_size=86400)
+```
+---
+## Examples
+### Example 1: Find Most Active Hours
+```python
+from analyzer import TelegramAnalyzer
+with TelegramAnalyzer('telegram.db') as analyzer:
+    hourly = analyzer.get_hourly_activity()
+    # Find peak hour
+    peak_hour = max(hourly, key=hourly.get)
+    print(f"Most active hour: {peak_hour}:00 ({hourly[peak_hour]} messages)")
+```
+### Example 2: Detect Spam/Reposts
+```python
+from analyzer import TelegramAnalyzer
+with TelegramAnalyzer('telegram.db') as analyzer:
+    reposts = analyzer.find_reposts(threshold=0.9)
+    for r in reposts[:10]:
+        print(f"Similarity: {r['similarity']:.0%}")
+        print(f"  User 1: {r['user_1']}")
+        print(f"  User 2: {r['user_2']}")
+        print(f"  Text: {r['text_preview'][:50]}...")
+```
+### Example 3: Conversation Thread Analysis
+```python
+from search import TelegramSearch
+with TelegramSearch('telegram.db') as search:
+    # Get full thread
+    thread = search.get_thread_with_depth(548795)
+    print("Conversation thread:")
+    for msg, depth in thread:
+        indent = "  " * depth
+        print(f"{indent}[{msg['from_name']}]: {msg['text_plain'][:50]}")
+```
+### Example 4: User Ranking
+```python
+from analyzer import TelegramAnalyzer
+with TelegramAnalyzer('telegram.db') as analyzer:
+    # Get rank of specific user
+    rank = analyzer.get_user_rank("user123456")
+    print(f"Rank: #{rank['rank']} of {rank['total_users']}")
+    print(f"Top {rank['percentile']:.1f}%")
+    # Get top 3 users
+    for i in range(1, 4):
+        user = analyzer.get_user_by_rank(i)
+        print(f"#{i}: {user['name']} ({user['count']} messages)")
+```
+---
+## Performance
+Tested on 100,000 messages:
+| Operation | Time |
+|-----------|------|
+| Indexing | ~10 seconds |
+| Full-text search | <10ms |
+| Fuzzy search | ~100ms |
+| Top-K (k=20) | ~50ms |
+| User rank query | <1ms |
+| Thread traversal | <5ms |
+| Similar messages (1000 sample) | ~2 seconds |
+---
+## License
+MIT License - Free for personal and commercial use.
+---
+## Contributing
+1. Fork the repository
+2. Create feature branch
+3. Commit changes
+4. Push and create PR
+---
+## Troubleshooting
+### "Module not found" error
+```bash
+# Make sure you're in the telegram directory
+cd /path/to/telegram
+python indexer.py result.json
+```
+### "Database is locked" error
+```bash
+# Close any other programs using the database
+# Or use a different database name
+python indexer.py result.json --db telegram2.db
+```
+### Hebrew text not displaying correctly
+```bash
+# Ensure your terminal supports UTF-8
+export LANG=en_US.UTF-8
+```
+---
+## Credits
+Algorithms implemented from "Data Structures and Introduction to Algorithms" course:
+- LCS (Longest Common Subsequence)
+- Heap-based Top-K
+- Selection Algorithm (Median of Medians)
+- Rank Tree (Order Statistics Tree)
+- Bucket Sort
+- DFS/BFS Graph Traversal
+- Bloom Filter
+- Trie (Prefix Tree)

ai_search.py ADDED Viewed

	@@ -0,0 +1,776 @@

+"""
+AI-Powered Search for Telegram Analytics
+Supports: Ollama (local), Groq (free API), Google Gemini (free API)
+"""
+import sqlite3
+import json
+import re
+from datetime import datetime
+from typing import List, Dict, Any, Optional
+import os
+# Try to import AI libraries
+try:
+    import requests
+    HAS_REQUESTS = True
+except ImportError:
+    HAS_REQUESTS = False
+try:
+    from groq import Groq
+    HAS_GROQ = True
+except ImportError:
+    HAS_GROQ = False
+try:
+    import google.generativeai as genai
+    HAS_GEMINI = True
+except ImportError:
+    HAS_GEMINI = False
+class AISearchEngine:
+    """AI-powered natural language search for Telegram messages."""
+    def __init__(self, db_path: str, provider: str = "ollama", api_key: str = None):
+        """
+        Initialize AI search engine.
+        Args:
+            db_path: Path to SQLite database
+            provider: "ollama", "groq", or "gemini"
+            api_key: API key for Groq or Gemini (not needed for Ollama)
+        """
+        self.db_path = db_path
+        self.provider = provider
+        self.api_key = api_key or os.getenv(f"{provider.upper()}_API_KEY")
+        # Initialize provider
+        if provider == "groq" and HAS_GROQ:
+            self.client = Groq(api_key=self.api_key)
+            self.model = "llama-3.1-70b-versatile"
+        elif provider == "gemini" and HAS_GEMINI:
+            genai.configure(api_key=self.api_key)
+            # Using 2.5 Flash - free tier, fast, good for SQL
+            self.client = genai.GenerativeModel("gemini-2.5-flash")
+        elif provider == "ollama":
+            self.ollama_url = os.getenv("OLLAMA_URL", "http://localhost:11434")
+            self.model = os.getenv("OLLAMA_MODEL", "llama3.1")
+        else:
+            raise ValueError(f"Provider {provider} not available. Install required packages.")
+    def _get_db_schema(self) -> str:
+        """Dynamically read schema from the actual database to stay in sync."""
+        conn = sqlite3.connect(self.db_path)
+        cursor = conn.cursor()
+        # Get all tables and their columns
+        cursor.execute("SELECT name FROM sqlite_master WHERE type='table' AND name NOT LIKE 'sqlite_%' ORDER BY name")
+        tables = [row[0] for row in cursor.fetchall()]
+        schema_parts = ["Database Schema:"]
+        for table in tables:
+            cursor.execute(f"PRAGMA table_info({table})")
+            cols = cursor.fetchall()
+            col_names = [f"{c[1]} ({c[2]})" for c in cols]
+            schema_parts.append(f"  - {table}: {', '.join(col_names)}")
+        # Note virtual tables (FTS5) separately
+        cursor.execute("SELECT name FROM sqlite_master WHERE type='table' AND sql LIKE '%fts5%'")
+        fts_tables = [row[0] for row in cursor.fetchall()]
+        if fts_tables:
+            schema_parts.append(f"\n  FTS5 tables (use MATCH for search): {', '.join(fts_tables)}")
+        conn.close()
+        schema_parts.append("""
+        Key notes:
+        - date_unixtime: Unix timestamp (INTEGER), use for date comparisons
+        - date: ISO format string (TEXT), use for display
+        - text_plain: Message text content
+        - text_length: Character count of the message
+        - has_links: 1 if message contains URL, 0 otherwise (note: plural)
+        - has_media: 1 if message has any media attachment
+        - has_photo: 1 if message has a photo specifically
+        - from_id: TEXT user ID (e.g., 'user356173100')
+        - For text search: SELECT * FROM messages WHERE id IN (SELECT rowid FROM messages_fts WHERE messages_fts MATCH 'term')
+        """)
+        return '\n'.join(schema_parts)
+    def _get_sample_data(self) -> str:
+        """Get sample data for context."""
+        conn = sqlite3.connect(self.db_path)
+        cursor = conn.cursor()
+        # Get user list
+        cursor.execute("""
+            SELECT from_name, COUNT(*) as cnt
+            FROM messages
+            WHERE from_name IS NOT NULL
+            GROUP BY from_name
+            ORDER BY cnt DESC
+            LIMIT 10
+        """)
+        users = cursor.fetchall()
+        # Get date range
+        cursor.execute("SELECT MIN(date), MAX(date) FROM messages")
+        date_range = cursor.fetchone()
+        conn.close()
+        return f"""
+        Top users: {', '.join([u[0] for u in users])}
+        Date range: {date_range[0]} to {date_range[1]}
+        """
+    def _build_prompt(self, user_query: str) -> str:
+        """Build prompt for AI model."""
+        schema = self._get_db_schema()
+        sample = self._get_sample_data()
+        return f"""You are a SQL query generator for a Telegram chat database.
+Your task is to convert natural language questions into SQLite queries.
+{schema}
+{sample}
+IMPORTANT RULES:
+1. Return ONLY valid SQLite query, no explanations
+2. For text search, use: SELECT * FROM messages WHERE id IN (SELECT id FROM messages_fts WHERE messages_fts MATCH 'search_term')
+3. For Hebrew text, the FTS5 will handle it correctly
+4. Always include relevant columns like date, from_name, text_plain
+5. Limit results to 50 unless specified
+6. For "who" questions, GROUP BY from_name and COUNT(*)
+7. For "when" questions, include date in SELECT
+8. For threads/replies, JOIN messages m2 ON m1.reply_to_message_id = m2.id
+User question: {user_query}
+SQLite query:"""
+    def _call_ollama(self, prompt: str) -> str:
+        """Call Ollama API."""
+        if not HAS_REQUESTS:
+            raise ImportError("requests library required for Ollama")
+        response = requests.post(
+            f"{self.ollama_url}/api/generate",
+            json={
+                "model": self.model,
+                "prompt": prompt,
+                "stream": False,
+                "options": {
+                    "temperature": 0.1,
+                    "num_predict": 500
+                }
+            },
+            timeout=60
+        )
+        response.raise_for_status()
+        return response.json()["response"]
+    def _call_groq(self, prompt: str) -> str:
+        """Call Groq API."""
+        response = self.client.chat.completions.create(
+            model=self.model,
+            messages=[{"role": "user", "content": prompt}],
+            temperature=0.1,
+            max_tokens=500
+        )
+        return response.choices[0].message.content
+    def _call_gemini(self, prompt: str) -> str:
+        """Call Google Gemini API."""
+        response = self.client.generate_content(prompt)
+        return response.text
+    def _generate_sql(self, user_query: str) -> str:
+        """Generate SQL from natural language query."""
+        prompt = self._build_prompt(user_query)
+        if self.provider == "ollama":
+            response = self._call_ollama(prompt)
+        elif self.provider == "groq":
+            response = self._call_groq(prompt)
+        elif self.provider == "gemini":
+            response = self._call_gemini(prompt)
+        else:
+            raise ValueError(f"Unknown provider: {self.provider}")
+        # Extract SQL from response
+        sql = response.strip()
+        # Clean up common issues - handle various code block formats
+        sql = re.sub(r'^```\w*\s*', '', sql)  # Remove opening ```sql or ```
+        sql = re.sub(r'\s*```$', '', sql)      # Remove closing ```
+        sql = re.sub(r'^```', '', sql, flags=re.MULTILINE)  # Remove any remaining ```
+        sql = sql.strip()
+        # Try to extract SELECT statement if there's text before it
+        select_match = re.search(r'(SELECT\s+.+?)(?:;|$)', sql, re.IGNORECASE | re.DOTALL)
+        if select_match:
+            sql = select_match.group(1).strip()
+        # Ensure it's a SELECT query for safety
+        if not sql.upper().startswith("SELECT"):
+            raise ValueError(f"AI generated non-SELECT query: {sql[:100]}")
+        return sql
+    def _execute_sql(self, sql: str) -> List[Dict[str, Any]]:
+        """Execute SQL and return results as list of dicts."""
+        conn = sqlite3.connect(self.db_path)
+        conn.row_factory = sqlite3.Row
+        cursor = conn.cursor()
+        try:
+            cursor.execute(sql)
+            rows = cursor.fetchall()
+            results = [dict(row) for row in rows]
+        except sqlite3.Error as e:
+            results = [{"error": str(e), "sql": sql}]
+        finally:
+            conn.close()
+        return results
+    def _generate_answer(self, user_query: str, results: List[Dict], sql: str) -> str:
+        """Generate natural language answer from results."""
+        if not results:
+            return "לא נמצאו תוצאות."
+        if "error" in results[0]:
+            return f"שגיאה בשאילתה: {results[0]['error']}"
+        # Build answer prompt
+        results_str = json.dumps(results[:20], ensure_ascii=False, indent=2)
+        answer_prompt = f"""Based on the following query results, provide a concise answer in Hebrew.
+User question: {user_query}
+Query results (JSON):
+{results_str}
+Total results: {len(results)}
+Provide a helpful, concise answer in Hebrew. Include specific names, dates, and numbers from the results.
+If showing a list, format it nicely. Keep it brief but informative."""
+        if self.provider == "ollama":
+            answer = self._call_ollama(answer_prompt)
+        elif self.provider == "groq":
+            answer = self._call_groq(answer_prompt)
+        elif self.provider == "gemini":
+            answer = self._call_gemini(answer_prompt)
+        return answer
+    def context_search(self, query: str, user_name: str = None) -> Dict[str, Any]:
+        """
+        Hybrid context-aware search - combines FTS5 keyword search with AI reasoning.
+        1. AI extracts user name and relevant keywords from query
+        2. FTS5 finds messages matching keywords (fast, searches ALL messages)
+        3. AI reads relevant messages and reasons to find the answer
+        Example: "באיזה בית חולים האחות עובדת?"
+        - Extracts: user="האחות", keywords=["בית חולים", "עבודה", "מחלקה", "סורוקה", ...]
+        - FTS5 finds messages from האחות containing these keywords
+        - AI reads and infers the answer
+        """
+        try:
+            conn = sqlite3.connect(self.db_path)
+            conn.row_factory = sqlite3.Row
+            # Step 1: AI extracts user name AND relevant keywords
+            extract_prompt = f"""Analyze this question and extract:
+1. USER_NAME: The specific person being asked about (or NONE if not about a specific person)
+2. KEYWORDS: Hebrew keywords to search for in their messages (related to the question topic)
+Question: {query}
+Return in this exact format (one per line):
+USER_NAME: <name or NONE>
+KEYWORDS: <comma-separated keywords in Hebrew>
+Example for "באיזה בית חולים האחות עובדת?":
+USER_NAME: האחות
+KEYWORDS: בית חולים, עבודה, מחלקה, סורוקה, רמבם, איכילוב, שיבא, הדסה, טיפול נמרץ, אחות
+Extract:"""
+            if self.provider == "gemini":
+                extraction = self._call_gemini(extract_prompt).strip()
+            elif self.provider == "groq":
+                extraction = self._call_groq(extract_prompt).strip()
+            else:
+                extraction = self._call_ollama(extract_prompt).strip()
+            # Parse extraction
+            user_name = None
+            keywords = []
+            for line in extraction.split('\n'):
+                if line.startswith('USER_NAME:'):
+                    name = line.replace('USER_NAME:', '').strip()
+                    if name.upper() != 'NONE' and len(name) < 50:
+                        user_name = name
+                elif line.startswith('KEYWORDS:'):
+                    kw_str = line.replace('KEYWORDS:', '').strip()
+                    keywords = [k.strip() for k in kw_str.split(',') if k.strip()]
+            messages = []
+            # Step 2: Hybrid retrieval - FTS5 keyword search + recent messages
+            if user_name and keywords:
+                # Build FTS5 query for keywords
+                fts_query = ' OR '.join(keywords[:10])  # Limit to 10 keywords
+                # Search for messages from user containing keywords
+                cursor = conn.execute("""
+                    SELECT date, from_name, text
+                    FROM messages
+                    WHERE from_name LIKE ?
+                    AND id IN (SELECT id FROM messages_fts WHERE messages_fts MATCH ?)
+                    ORDER BY date DESC
+                    LIMIT 100
+                """, (f"%{user_name}%", fts_query))
+                messages = [dict(row) for row in cursor.fetchall()]
+                # Also add some recent messages for context (might contain relevant info without keywords)
+                cursor = conn.execute("""
+                    SELECT date, from_name, text
+                    FROM messages
+                    WHERE from_name LIKE ?
+                    ORDER BY date DESC
+                    LIMIT 50
+                """, (f"%{user_name}%",))
+                recent = [dict(row) for row in cursor.fetchall()]
+                # Combine and deduplicate
+                seen_texts = {m['text'] for m in messages if m['text']}
+                for m in recent:
+                    if m['text'] and m['text'] not in seen_texts:
+                        messages.append(m)
+                        seen_texts.add(m['text'])
+            elif user_name:
+                # No keywords, just get user's messages
+                cursor = conn.execute("""
+                    SELECT date, from_name, text
+                    FROM messages
+                    WHERE from_name LIKE ?
+                    ORDER BY date DESC
+                    LIMIT 200
+                """, (f"%{user_name}%",))
+                messages = [dict(row) for row in cursor.fetchall()]
+            elif keywords:
+                # No user, search all messages for keywords
+                fts_query = ' OR '.join(keywords[:10])
+                cursor = conn.execute("""
+                    SELECT date, from_name, text
+                    FROM messages
+                    WHERE id IN (SELECT id FROM messages_fts WHERE messages_fts MATCH ?)
+                    ORDER BY date DESC
+                    LIMIT 100
+                """, (fts_query,))
+                messages = [dict(row) for row in cursor.fetchall()]
+            else:
+                # Fallback: recent messages
+                cursor = conn.execute("""
+                    SELECT date, from_name, text
+                    FROM messages
+                    WHERE text IS NOT NULL AND text != ''
+                    ORDER BY date DESC
+                    LIMIT 100
+                """)
+                messages = [dict(row) for row in cursor.fetchall()]
+            conn.close()
+            if not messages:
+                return {
+                    "query": query,
+                    "answer": "לא נמצאו הודעות רלוונטיות",
+                    "context_messages": 0,
+                    "keywords_used": keywords,
+                    "mode": "context_search"
+                }
+            # Step 3: AI reasons over the retrieved messages
+            context_text = "\n".join([
+                f"[{m['date']}] {m['from_name']}: {m['text'][:500]}"
+                for m in messages if m['text']
+            ])
+            reason_prompt = f"""You are analyzing a Telegram chat history to answer a question.
+Read the messages carefully and infer the answer from context clues.
+The user may not have stated things directly - look for hints, mentions, and implications.
+Question: {query}
+Chat messages (sorted by relevance and date):
+{context_text}
+Based on these messages, answer the question in Hebrew.
+If you can infer information (like workplace, location, profession) from context clues, do so.
+Cite specific messages when possible.
+If you truly cannot find any relevant information, say so.
+Answer:"""
+            if self.provider == "gemini":
+                answer = self._call_gemini(reason_prompt)
+            elif self.provider == "groq":
+                answer = self._call_groq(reason_prompt)
+            else:
+                answer = self._call_ollama(reason_prompt)
+            return {
+                "query": query,
+                "answer": answer,
+                "context_user": user_name,
+                "context_messages": len(messages),
+                "keywords_used": keywords,
+                "mode": "context_search"
+            }
+        except Exception as e:
+            return {
+                "query": query,
+                "error": f"Context search error: {str(e)}",
+                "mode": "context_search"
+            }
+    def search(self, query: str, generate_answer: bool = True) -> Dict[str, Any]:
+        """
+        Perform AI-powered search.
+        Args:
+            query: Natural language question in Hebrew or English
+            generate_answer: Whether to generate natural language answer
+        Returns:
+            Dict with sql, results, and optionally answer
+        """
+        try:
+            # Generate SQL
+            sql = self._generate_sql(query)
+            # Execute query
+            results = self._execute_sql(sql)
+            response = {
+                "query": query,
+                "sql": sql,
+                "results": results,
+                "count": len(results)
+            }
+            # Generate natural language answer
+            if generate_answer and results and "error" not in results[0]:
+                response["answer"] = self._generate_answer(query, results, sql)
+            return response
+        except Exception as e:
+            return {
+                "query": query,
+                "error": str(e),
+                "results": []
+            }
+    def get_thread(self, message_id: int) -> List[Dict[str, Any]]:
+        """Get full conversation thread for a message."""
+        conn = sqlite3.connect(self.db_path)
+        conn.row_factory = sqlite3.Row
+        cursor = conn.cursor()
+        thread = []
+        visited = set()
+        def get_parent(msg_id):
+            """Recursively get parent messages."""
+            if msg_id in visited:
+                return
+            visited.add(msg_id)
+            cursor.execute("""
+                SELECT message_id, date, from_name, text, reply_to_message_id
+                FROM messages WHERE message_id = ?
+            """, (msg_id,))
+            row = cursor.fetchone()
+            if row:
+                if row['reply_to_message_id']:
+                    get_parent(row['reply_to_message_id'])
+                thread.append(dict(row))
+        def get_children(msg_id):
+            """Get all replies to a message."""
+            cursor.execute("""
+                SELECT message_id, date, from_name, text, reply_to_message_id
+                FROM messages WHERE reply_to_message_id = ?
+                ORDER BY date
+            """, (msg_id,))
+            for row in cursor.fetchall():
+                if row['message_id'] not in visited:
+                    visited.add(row['message_id'])
+                    thread.append(dict(row))
+                    get_children(row['message_id'])
+        # Get the original message and its parents
+        get_parent(message_id)
+        # Get all replies
+        get_children(message_id)
+        conn.close()
+        # Sort by date
+        thread.sort(key=lambda x: x['date'])
+        return thread
+    def find_similar_messages(self, message_id: int, limit: int = 10) -> List[Dict[str, Any]]:
+        """Find messages similar to the given message using trigrams."""
+        conn = sqlite3.connect(self.db_path)
+        conn.row_factory = sqlite3.Row
+        cursor = conn.cursor()
+        # Get the original message
+        cursor.execute("SELECT text FROM messages WHERE message_id = ?", (message_id,))
+        row = cursor.fetchone()
+        if not row or not row['text']:
+            return []
+        # Use FTS5 to find similar messages
+        words = row['text'].split()[:5]  # Use first 5 words
+        search_term = ' OR '.join(words)
+        cursor.execute("""
+            SELECT m.message_id, m.date, m.from_name, m.text
+            FROM messages m
+            WHERE m.id IN (
+                SELECT id FROM messages_fts
+                WHERE messages_fts MATCH ?
+            )
+            AND m.message_id != ?
+            LIMIT ?
+        """, (search_term, message_id, limit))
+        results = [dict(row) for row in cursor.fetchall()]
+        conn.close()
+        return results
+class ChatViewer:
+    """View chat messages like Telegram."""
+    def __init__(self, db_path: str):
+        self.db_path = db_path
+    def get_messages(self,
+                     offset: int = 0,
+                     limit: int = 50,
+                     user_id: str = None,
+                     search: str = None,
+                     date_from: str = None,
+                     date_to: str = None,
+                     has_media: bool = None,
+                     has_link: bool = None) -> Dict[str, Any]:
+        """
+        Get messages with Telegram-like pagination.
+        Returns messages in reverse chronological order (newest first).
+        """
+        conn = sqlite3.connect(self.db_path)
+        conn.row_factory = sqlite3.Row
+        cursor = conn.cursor()
+        # Build query
+        conditions = []
+        params = []
+        if user_id:
+            conditions.append("from_id = ?")
+            params.append(user_id)
+        if date_from:
+            conditions.append("date >= ?")
+            params.append(date_from)
+        if date_to:
+            conditions.append("date <= ?")
+            params.append(date_to)
+        if has_media is not None:
+            if has_media:
+                conditions.append("media_type IS NOT NULL")
+            else:
+                conditions.append("media_type IS NULL")
+        if has_link is not None:
+            conditions.append("has_link = ?")
+            params.append(1 if has_link else 0)
+        # Handle search
+        if search:
+            conditions.append("""id IN (
+                SELECT id FROM messages_fts WHERE messages_fts MATCH ?
+            )""")
+            params.append(search)
+        where_clause = " AND ".join(conditions) if conditions else "1=1"
+        # Get total count
+        cursor.execute(f"SELECT COUNT(*) FROM messages WHERE {where_clause}", params)
+        total = cursor.fetchone()[0]
+        # Get messages
+        query = f"""
+            SELECT
+                m.message_id,
+                m.date,
+                m.from_id,
+                m.from_name,
+                m.text,
+                m.reply_to_message_id,
+                m.forwarded_from,
+                m.media_type,
+                m.has_link,
+                m.char_count,
+                r.from_name as reply_to_name,
+                r.text as reply_to_text
+            FROM messages m
+            LEFT JOIN messages r ON m.reply_to_message_id = r.message_id
+            WHERE {where_clause}
+            ORDER BY m.date DESC
+            LIMIT ? OFFSET ?
+        """
+        params.extend([limit, offset])
+        cursor.execute(query, params)
+        messages = [dict(row) for row in cursor.fetchall()]
+        conn.close()
+        return {
+            "messages": messages,
+            "total": total,
+            "offset": offset,
+            "limit": limit,
+            "has_more": offset + limit < total
+        }
+    def get_message_context(self, message_id: int, before: int = 10, after: int = 10) -> Dict[str, Any]:
+        """Get messages around a specific message (for context view)."""
+        conn = sqlite3.connect(self.db_path)
+        conn.row_factory = sqlite3.Row
+        cursor = conn.cursor()
+        # Get the target message date
+        cursor.execute("SELECT date FROM messages WHERE message_id = ?", (message_id,))
+        row = cursor.fetchone()
+        if not row:
+            return {"messages": [], "target_id": message_id}
+        target_date = row['date']
+        # Get messages before
+        cursor.execute("""
+            SELECT message_id, date, from_id, from_name, text,
+                   reply_to_message_id, media_type, has_link
+            FROM messages
+            WHERE date < ?
+            ORDER BY date DESC
+            LIMIT ?
+        """, (target_date, before))
+        before_msgs = list(reversed([dict(row) for row in cursor.fetchall()]))
+        # Get target message
+        cursor.execute("""
+            SELECT message_id, date, from_id, from_name, text,
+                   reply_to_message_id, media_type, has_link
+            FROM messages
+            WHERE message_id = ?
+        """, (message_id,))
+        target_msg = dict(cursor.fetchone())
+        # Get messages after
+        cursor.execute("""
+            SELECT message_id, date, from_id, from_name, text,
+                   reply_to_message_id, media_type, has_link
+            FROM messages
+            WHERE date > ?
+            ORDER BY date ASC
+            LIMIT ?
+        """, (target_date, after))
+        after_msgs = [dict(row) for row in cursor.fetchall()]
+        conn.close()
+        return {
+            "messages": before_msgs + [target_msg] + after_msgs,
+            "target_id": message_id
+        }
+    def get_user_conversation(self, user1_id: str, user2_id: str, limit: int = 100) -> List[Dict]:
+        """Get conversation between two users (their replies to each other)."""
+        conn = sqlite3.connect(self.db_path)
+        conn.row_factory = sqlite3.Row
+        cursor = conn.cursor()
+        cursor.execute("""
+            SELECT m1.message_id, m1.date, m1.from_id, m1.from_name, m1.text,
+                   m1.reply_to_message_id, m2.from_name as reply_to_name
+            FROM messages m1
+            LEFT JOIN messages m2 ON m1.reply_to_message_id = m2.message_id
+            WHERE (m1.from_id = ? AND m2.from_id = ?)
+               OR (m1.from_id = ? AND m2.from_id = ?)
+            ORDER BY m1.date DESC
+            LIMIT ?
+        """, (user1_id, user2_id, user2_id, user1_id, limit))
+        results = [dict(row) for row in cursor.fetchall()]
+        conn.close()
+        return results
+# CLI for testing
+if __name__ == "__main__":
+    import argparse
+    parser = argparse.ArgumentParser(description="AI-powered Telegram search")
+    parser.add_argument("--db", required=True, help="Database path")
+    parser.add_argument("--provider", default="ollama", choices=["ollama", "groq", "gemini"])
+    parser.add_argument("--query", help="Search query")
+    parser.add_argument("--api-key", help="API key for cloud providers")
+    args = parser.parse_args()
+    if args.query:
+        engine = AISearchEngine(args.db, args.provider, args.api_key)
+        result = engine.search(args.query)
+        print(f"\nQuery: {result['query']}")
+        print(f"SQL: {result.get('sql', 'N/A')}")
+        print(f"Results: {result.get('count', 0)}")
+        if 'answer' in result:
+            print(f"\nAnswer:\n{result['answer']}")
+        if result.get('results'):
+            print(f"\nFirst 3 results:")
+            for r in result['results'][:3]:
+                print(json.dumps(r, ensure_ascii=False, indent=2))

algorithms.py ADDED Viewed

	@@ -0,0 +1,819 @@

+#!/usr/bin/env python3
+"""
+Advanced Algorithms Module for Telegram Chat Analysis
+Implements algorithms from Data Structures course:
+- LCS (Longest Common Subsequence) - Similar message detection
+- Heap-based Top-K - Efficient ranking without full sort
+- Selection Algorithm (Median of Medians) - O(n) percentiles
+- Rank Tree (Order Statistics Tree) - O(log n) rank queries
+- Bucket Sort - O(n) time-based histograms
+All algorithms are optimized for the chat indexing use case.
+"""
+import heapq
+from typing import Any, Callable, Generator, Optional
+from collections import defaultdict
+from dataclasses import dataclass, field
+# ============================================
+# LCS - LONGEST COMMON SUBSEQUENCE
+# ============================================
+def lcs_length(s1: str, s2: str) -> int:
+    """
+    Calculate length of Longest Common Subsequence.
+    Time: O(m * n)
+    Space: O(min(m, n)) - optimized to use less space
+    Use case: Measure similarity between two messages.
+    """
+    # Ensure s1 is the shorter string for space optimization
+    if len(s1) > len(s2):
+        s1, s2 = s2, s1
+    m, n = len(s1), len(s2)
+    # Use two rows instead of full matrix
+    prev = [0] * (m + 1)
+    curr = [0] * (m + 1)
+    for j in range(1, n + 1):
+        for i in range(1, m + 1):
+            if s1[i-1] == s2[j-1]:
+                curr[i] = prev[i-1] + 1
+            else:
+                curr[i] = max(prev[i], curr[i-1])
+        prev, curr = curr, prev
+    return prev[m]
+def lcs_string(s1: str, s2: str) -> str:
+    """
+    Find the actual Longest Common Subsequence string.
+    Time: O(m * n)
+    Space: O(m * n)
+    Use case: Find common content between messages.
+    """
+    m, n = len(s1), len(s2)
+    # Build full DP table
+    dp = [[0] * (n + 1) for _ in range(m + 1)]
+    for i in range(1, m + 1):
+        for j in range(1, n + 1):
+            if s1[i-1] == s2[j-1]:
+                dp[i][j] = dp[i-1][j-1] + 1
+            else:
+                dp[i][j] = max(dp[i-1][j], dp[i][j-1])
+    # Backtrack to find the actual subsequence
+    result = []
+    i, j = m, n
+    while i > 0 and j > 0:
+        if s1[i-1] == s2[j-1]:
+            result.append(s1[i-1])
+            i -= 1
+            j -= 1
+        elif dp[i-1][j] > dp[i][j-1]:
+            i -= 1
+        else:
+            j -= 1
+    return ''.join(reversed(result))
+def lcs_similarity(s1: str, s2: str) -> float:
+    """
+    Calculate LCS-based similarity ratio between two strings.
+    Returns value between 0 (no similarity) and 1 (identical).
+    Use case: Detect near-duplicate messages, reposts.
+    """
+    if not s1 or not s2:
+        return 0.0
+    lcs_len = lcs_length(s1, s2)
+    max_len = max(len(s1), len(s2))
+    return lcs_len / max_len
+def find_similar_messages(
+    messages: list[tuple[int, str]],
+    threshold: float = 0.7,
+    min_length: int = 20
+) -> list[tuple[int, int, float]]:
+    """
+    Find pairs of similar messages using LCS.
+    Args:
+        messages: List of (id, text) tuples
+        threshold: Minimum similarity to report (0-1)
+        min_length: Minimum message length to consider
+    Returns:
+        List of (id1, id2, similarity) tuples
+    Time: O(n² * m) where n=messages, m=avg length
+    """
+    # Filter by length
+    filtered = [(id_, text) for id_, text in messages if len(text) >= min_length]
+    similar_pairs = []
+    n = len(filtered)
+    for i in range(n):
+        for j in range(i + 1, n):
+            id1, text1 = filtered[i]
+            id2, text2 = filtered[j]
+            # Quick length check - if lengths differ too much, skip
+            len_ratio = min(len(text1), len(text2)) / max(len(text1), len(text2))
+            if len_ratio < threshold:
+                continue
+            sim = lcs_similarity(text1, text2)
+            if sim >= threshold:
+                similar_pairs.append((id1, id2, sim))
+    return sorted(similar_pairs, key=lambda x: x[2], reverse=True)
+# ============================================
+# HEAP-BASED TOP-K
+# ============================================
+class TopK:
+    """
+    Efficient Top-K tracker using min-heap.
+    Maintains the K largest elements seen so far.
+    Time: O(n log k) for n insertions
+    Space: O(k)
+    Use case: Top users, top words, top domains without sorting all data.
+    """
+    def __init__(self, k: int, key: Callable[[Any], float] = None):
+        """
+        Args:
+            k: Number of top elements to track
+            key: Function to extract comparison value (default: identity)
+        """
+        self.k = k
+        self.key = key or (lambda x: x)
+        self.heap: list[tuple[float, int, Any]] = []  # (key_value, counter, item)
+        self.counter = 0  # For stable sorting
+    def push(self, item: Any) -> None:
+        """Add an item. O(log k)."""
+        key_val = self.key(item)
+        if len(self.heap) < self.k:
+            heapq.heappush(self.heap, (key_val, self.counter, item))
+        elif key_val > self.heap[0][0]:
+            heapq.heapreplace(self.heap, (key_val, self.counter, item))
+        self.counter += 1
+    def get_top(self) -> list[Any]:
+        """Get top K items sorted by key descending. O(k log k)."""
+        return [item for _, _, item in sorted(self.heap, reverse=True)]
+    def __len__(self) -> int:
+        return len(self.heap)
+def top_k_frequent(items: list[Any], k: int) -> list[tuple[Any, int]]:
+    """
+    Find top K most frequent items.
+    Time: O(n + m log k) where n=items, m=unique items
+    Space: O(m)
+    Use case: Top words, top users, top mentioned usernames.
+    """
+    # Count frequencies
+    freq = defaultdict(int)
+    for item in items:
+        freq[item] += 1
+    # Use heap to find top K
+    top = TopK(k, key=lambda x: x[1])
+    for item, count in freq.items():
+        top.push((item, count))
+    return top.get_top()
+def top_k_by_field(
+    records: list[dict],
+    field: str,
+    k: int,
+    reverse: bool = True
+) -> list[dict]:
+    """
+    Get top K records by a specific field value.
+    Time: O(n log k)
+    Use case: Top messages by length, top users by message count.
+    """
+    if reverse:
+        # Max K - use min heap
+        top = TopK(k, key=lambda x: x.get(field, 0))
+    else:
+        # Min K - negate the key
+        top = TopK(k, key=lambda x: -x.get(field, 0))
+    for record in records:
+        top.push(record)
+    return top.get_top()
+# ============================================
+# SELECTION ALGORITHM (MEDIAN OF MEDIANS)
+# ============================================
+def partition(arr: list, left: int, right: int, pivot_idx: int) -> int:
+    """
+    Partition array around pivot (Lomuto scheme).
+    Returns final position of pivot.
+    """
+    pivot_val = arr[pivot_idx]
+    # Move pivot to end
+    arr[pivot_idx], arr[right] = arr[right], arr[pivot_idx]
+    store_idx = left
+    for i in range(left, right):
+        if arr[i] < pivot_val:
+            arr[store_idx], arr[i] = arr[i], arr[store_idx]
+            store_idx += 1
+    # Move pivot to final position
+    arr[store_idx], arr[right] = arr[right], arr[store_idx]
+    return store_idx
+def median_of_five(arr: list, left: int, right: int) -> int:
+    """Find median of up to 5 elements, return its index."""
+    sub = [(arr[i], i) for i in range(left, right + 1)]
+    sub.sort()
+    return sub[len(sub) // 2][1]
+def median_of_medians(arr: list, left: int, right: int) -> int:
+    """
+    Find approximate median using median-of-medians algorithm.
+    Returns index of the pivot.
+    """
+    n = right - left + 1
+    if n <= 5:
+        return median_of_five(arr, left, right)
+    # Divide into groups of 5 and find medians
+    medians = []
+    for i in range(left, right + 1, 5):
+        group_right = min(i + 4, right)
+        median_idx = median_of_five(arr, i, group_right)
+        medians.append(arr[median_idx])
+    # Recursively find median of medians
+    # For simplicity, use sorting for small arrays
+    medians.sort()
+    pivot_val = medians[len(medians) // 2]
+    # Find index of this value in original array
+    for i in range(left, right + 1):
+        if arr[i] == pivot_val:
+            return i
+    return left  # Fallback
+def quickselect(arr: list, k: int) -> Any:
+    """
+    Find the k-th smallest element (0-indexed).
+    Time: O(n) average, O(n) worst case with median-of-medians
+    Space: O(1) - in-place
+    Use case: Find median, percentiles without sorting.
+    """
+    arr = arr.copy()  # Don't modify original
+    left, right = 0, len(arr) - 1
+    while left < right:
+        # Use median of medians for pivot selection
+        pivot_idx = median_of_medians(arr, left, right)
+        pivot_idx = partition(arr, left, right, pivot_idx)
+        if k == pivot_idx:
+            return arr[k]
+        elif k < pivot_idx:
+            right = pivot_idx - 1
+        else:
+            left = pivot_idx + 1
+    return arr[left]
+def find_median(arr: list) -> float:
+    """
+    Find median in O(n) time.
+    Use case: Median message length, median activity time.
+    """
+    n = len(arr)
+    if n == 0:
+        return 0.0
+    if n % 2 == 1:
+        return float(quickselect(arr, n // 2))
+    else:
+        return (quickselect(arr, n // 2 - 1) + quickselect(arr, n // 2)) / 2
+def find_percentile(arr: list, p: float) -> float:
+    """
+    Find the p-th percentile (0-100) in O(n) time.
+    Use case: 90th percentile response time, activity distribution.
+    """
+    if not arr:
+        return 0.0
+    k = int((p / 100) * (len(arr) - 1))
+    return float(quickselect(arr, k))
+# ============================================
+# RANK TREE (ORDER STATISTICS TREE)
+# ============================================
+@dataclass
+class RankTreeNode:
+    """Node in an Order Statistics Tree (augmented BST)."""
+    key: Any
+    value: Any = None
+    left: 'RankTreeNode' = None
+    right: 'RankTreeNode' = None
+    size: int = 1  # Size of subtree (for rank queries)
+    height: int = 1  # For AVL balancing
+class RankTree:
+    """
+    Order Statistics Tree with AVL balancing.
+    Supports:
+    - O(log n) insert, delete, search
+    - O(log n) select(k) - find k-th smallest
+    - O(log n) rank(x) - find rank of element x
+    Use case: "What rank is this user?", "Who is the 100th most active?"
+    """
+    def __init__(self, key_func: Callable[[Any], Any] = None):
+        self.root: Optional[RankTreeNode] = None
+        self.key_func = key_func or (lambda x: x)
+    def _get_size(self, node: RankTreeNode) -> int:
+        return node.size if node else 0
+    def _get_height(self, node: RankTreeNode) -> int:
+        return node.height if node else 0
+    def _get_balance(self, node: RankTreeNode) -> int:
+        return self._get_height(node.left) - self._get_height(node.right) if node else 0
+    def _update(self, node: RankTreeNode) -> None:
+        """Update size and height of a node."""
+        if node:
+            node.size = 1 + self._get_size(node.left) + self._get_size(node.right)
+            node.height = 1 + max(self._get_height(node.left), self._get_height(node.right))
+    def _rotate_right(self, y: RankTreeNode) -> RankTreeNode:
+        """Right rotation for AVL balance."""
+        x = y.left
+        T2 = x.right
+        x.right = y
+        y.left = T2
+        self._update(y)
+        self._update(x)
+        return x
+    def _rotate_left(self, x: RankTreeNode) -> RankTreeNode:
+        """Left rotation for AVL balance."""
+        y = x.right
+        T2 = y.left
+        y.left = x
+        x.right = T2
+        self._update(x)
+        self._update(y)
+        return y
+    def _balance(self, node: RankTreeNode) -> RankTreeNode:
+        """Balance the node if needed (AVL)."""
+        self._update(node)
+        balance = self._get_balance(node)
+        # Left heavy
+        if balance > 1:
+            if self._get_balance(node.left) < 0:
+                node.left = self._rotate_left(node.left)
+            return self._rotate_right(node)
+        # Right heavy
+        if balance < -1:
+            if self._get_balance(node.right) > 0:
+                node.right = self._rotate_right(node.right)
+            return self._rotate_left(node)
+        return node
+    def insert(self, key: Any, value: Any = None) -> None:
+        """Insert a key-value pair. O(log n)."""
+        self.root = self._insert(self.root, key, value)
+    def _insert(self, node: RankTreeNode, key: Any, value: Any) -> RankTreeNode:
+        if not node:
+            return RankTreeNode(key=key, value=value)
+        if key < node.key:
+            node.left = self._insert(node.left, key, value)
+        elif key > node.key:
+            node.right = self._insert(node.right, key, value)
+        else:
+            node.value = value  # Update existing
+            return node
+        return self._balance(node)
+    def select(self, k: int) -> Optional[Any]:
+        """
+        Find the k-th smallest element (1-indexed).
+        O(log n)
+        Use case: "Who is the 10th most active user?"
+        """
+        return self._select(self.root, k)
+    def _select(self, node: RankTreeNode, k: int) -> Optional[Any]:
+        if not node:
+            return None
+        left_size = self._get_size(node.left)
+        if k == left_size + 1:
+            return node.value
+        elif k <= left_size:
+            return self._select(node.left, k)
+        else:
+            return self._select(node.right, k - left_size - 1)
+    def rank(self, key: Any) -> int:
+        """
+        Find the rank of an element (1-indexed).
+        O(log n)
+        Use case: "What rank is user X?"
+        """
+        return self._rank(self.root, key)
+    def _rank(self, node: RankTreeNode, key: Any) -> int:
+        if not node:
+            return 0
+        if key < node.key:
+            return self._rank(node.left, key)
+        elif key > node.key:
+            return 1 + self._get_size(node.left) + self._rank(node.right, key)
+        else:
+            return self._get_size(node.left) + 1
+    def __len__(self) -> int:
+        return self._get_size(self.root)
+    def inorder(self) -> Generator[tuple[Any, Any], None, None]:
+        """Iterate in sorted order."""
+        def _inorder(node):
+            if node:
+                yield from _inorder(node.left)
+                yield (node.key, node.value)
+                yield from _inorder(node.right)
+        yield from _inorder(self.root)
+# ============================================
+# BUCKET SORT FOR TIME-BASED DATA
+# ============================================
+def bucket_sort_by_time(
+    records: list[dict],
+    time_field: str,
+    bucket_size: int = 3600,  # Default: 1 hour
+    start_time: int = None,
+    end_time: int = None
+) -> list[list[dict]]:
+    """
+    Sort records into time-based buckets.
+    Time: O(n + k) where k = number of buckets
+    Space: O(n)
+    Use case: Group messages by hour, day, week for histograms.
+    Args:
+        records: List of dicts with timestamp field
+        time_field: Name of the timestamp field
+        bucket_size: Size of each bucket in seconds
+        start_time: Start of range (default: min timestamp)
+        end_time: End of range (default: max timestamp)
+    Returns:
+        List of buckets, each containing records in that time range
+    """
+    if not records:
+        return []
+    # Extract timestamps
+    timestamps = [r.get(time_field, 0) for r in records]
+    if start_time is None:
+        start_time = min(timestamps)
+    if end_time is None:
+        end_time = max(timestamps)
+    # Calculate number of buckets
+    n_buckets = max(1, (end_time - start_time) // bucket_size + 1)
+    # Initialize buckets
+    buckets: list[list[dict]] = [[] for _ in range(n_buckets)]
+    # Distribute records into buckets
+    for record in records:
+        ts = record.get(time_field, 0)
+        if ts < start_time or ts > end_time:
+            continue
+        bucket_idx = min((ts - start_time) // bucket_size, n_buckets - 1)
+        buckets[bucket_idx].append(record)
+    return buckets
+def time_histogram(
+    records: list[dict],
+    time_field: str,
+    bucket_size: int = 3600
+) -> list[tuple[int, int]]:
+    """
+    Create a histogram of record counts over time.
+    Returns list of (bucket_start_time, count) tuples.
+    Use case: Activity over time visualization.
+    """
+    if not records:
+        return []
+    timestamps = [r.get(time_field, 0) for r in records]
+    start_time = min(timestamps)
+    end_time = max(timestamps)
+    buckets = bucket_sort_by_time(records, time_field, bucket_size, start_time, end_time)
+    result = []
+    for i, bucket in enumerate(buckets):
+        bucket_time = start_time + i * bucket_size
+        result.append((bucket_time, len(bucket)))
+    return result
+def hourly_distribution(
+    records: list[dict],
+    time_field: str
+) -> dict[int, int]:
+    """
+    Get distribution of records by hour of day (0-23).
+    Time: O(n)
+    Use case: When are users most active?
+    """
+    from datetime import datetime
+    dist = defaultdict(int)
+    for record in records:
+        ts = record.get(time_field, 0)
+        if ts:
+            hour = datetime.fromtimestamp(ts).hour
+            dist[hour] += 1
+    return dict(dist)
+# ============================================
+# COMBINED DATA STRUCTURE: RANKED TIME INDEX
+# ============================================
+class RankedTimeIndex:
+    """
+    Combined data structure for efficient time-based and rank queries.
+    Combines:
+    - Bucket sort for O(1) time range access
+    - Rank tree for O(log n) rank queries
+    - Top-K heap for efficient top queries
+    Use case: "Top 10 users in the last hour", "Rank of user X this week"
+    """
+    def __init__(self, bucket_size: int = 3600):
+        self.bucket_size = bucket_size
+        self.buckets: dict[int, list[dict]] = defaultdict(list)  # bucket_id -> records
+        self.rank_tree = RankTree()  # For rank queries
+        self.total_count = 0
+        self.min_time = float('inf')
+        self.max_time = 0
+    def add(self, record: dict, time_field: str = 'date_unixtime', rank_field: str = None) -> None:
+        """Add a record to the index. O(log n)."""
+        ts = record.get(time_field, 0)
+        # Update time bounds
+        self.min_time = min(self.min_time, ts)
+        self.max_time = max(self.max_time, ts)
+        # Add to time bucket
+        bucket_id = ts // self.bucket_size
+        self.buckets[bucket_id].append(record)
+        # Add to rank tree if rank field specified
+        if rank_field and rank_field in record:
+            self.rank_tree.insert(record[rank_field], record)
+        self.total_count += 1
+    def get_time_range(self, start_time: int, end_time: int) -> list[dict]:
+        """
+        Get all records in time range. O(k) where k = records in range.
+        """
+        start_bucket = start_time // self.bucket_size
+        end_bucket = end_time // self.bucket_size
+        results = []
+        for bucket_id in range(start_bucket, end_bucket + 1):
+            for record in self.buckets.get(bucket_id, []):
+                ts = record.get('date_unixtime', 0)
+                if start_time <= ts <= end_time:
+                    results.append(record)
+        return results
+    def top_k_in_range(
+        self,
+        start_time: int,
+        end_time: int,
+        k: int,
+        score_field: str
+    ) -> list[dict]:
+        """
+        Get top K records by score in time range.
+        O(m log k) where m = records in range
+        """
+        records = self.get_time_range(start_time, end_time)
+        return top_k_by_field(records, score_field, k)
+    def get_rank(self, key: Any) -> int:
+        """Get rank of element. O(log n)."""
+        return self.rank_tree.rank(key)
+    def get_by_rank(self, k: int) -> Optional[dict]:
+        """Get element by rank. O(log n)."""
+        return self.rank_tree.select(k)
+# ============================================
+# TESTS AND DEMOS
+# ============================================
+def run_tests():
+    """Run tests for all algorithms."""
+    print("=" * 60)
+    print("ALGORITHM TESTS")
+    print("=" * 60)
+    # Test LCS
+    print("\n--- LCS (Longest Common Subsequence) ---")
+    s1 = "שלום לכולם מה קורה"
+    s2 = "שלום לכולם מה נשמע"
+    lcs = lcs_string(s1, s2)
+    sim = lcs_similarity(s1, s2)
+    print(f"String 1: {s1}")
+    print(f"String 2: {s2}")
+    print(f"LCS: '{lcs}'")
+    print(f"Similarity: {sim:.2%}")
+    # Test similar message detection
+    messages = [
+        (1, "היי מה קורה איך אתה"),
+        (2, "היי מה קורה איך את"),
+        (3, "שלום לכולם"),
+        (4, "היי מה קורה איך אתם"),
+    ]
+    similar = find_similar_messages(messages, threshold=0.7, min_length=5)
+    print(f"\nSimilar message pairs (threshold 0.7):")
+    for id1, id2, sim in similar:
+        print(f"  Messages {id1} & {id2}: {sim:.2%}")
+    # Test Top-K
+    print("\n--- Heap-based Top-K ---")
+    items = ['apple', 'banana', 'apple', 'cherry', 'banana', 'apple', 'date', 'banana']
+    top = top_k_frequent(items, k=2)
+    print(f"Items: {items}")
+    print(f"Top 2 frequent: {top}")
+    # Test Selection (Median)
+    print("\n--- Selection Algorithm (Median) ---")
+    arr = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5]
+    median = find_median(arr)
+    p90 = find_percentile(arr, 90)
+    print(f"Array: {arr}")
+    print(f"Median: {median}")
+    print(f"90th percentile: {p90}")
+    # Test Rank Tree
+    print("\n--- Rank Tree (Order Statistics) ---")
+    tree = RankTree()
+    users = [
+        (100, "Alice"),
+        (250, "Bob"),
+        (50, "Charlie"),
+        (300, "Diana"),
+        (150, "Eve"),
+    ]
+    for score, name in users:
+        tree.insert(score, name)
+    print(f"Users by score: {users}")
+    print(f"3rd ranked (by score): {tree.select(3)}")
+    print(f"Rank of score 150: {tree.rank(150)}")
+    print(f"All in order: {list(tree.inorder())}")
+    # Test Bucket Sort
+    print("\n--- Bucket Sort (Time-based) ---")
+    records = [
+        {'id': 1, 'ts': 1000},
+        {'id': 2, 'ts': 1500},
+        {'id': 3, 'ts': 2500},
+        {'id': 4, 'ts': 1200},
+        {'id': 5, 'ts': 3000},
+    ]
+    hist = time_histogram(records, 'ts', bucket_size=1000)
+    print(f"Records: {records}")
+    print(f"Histogram (bucket=1000): {hist}")
+    # Test Combined Structure
+    print("\n--- Combined RankedTimeIndex ---")
+    index = RankedTimeIndex(bucket_size=1000)
+    for r in records:
+        index.add(r, time_field='ts', rank_field='id')
+    range_result = index.get_time_range(1000, 2000)
+    print(f"Records in time range 1000-2000: {[r['id'] for r in range_result]}")
+    print("\n" + "=" * 60)
+    print("ALL TESTS PASSED!")
+    print("=" * 60)
+if __name__ == '__main__':
+    run_tests()

dashboard.py ADDED Viewed

	@@ -0,0 +1,2086 @@

+#!/usr/bin/env python3
+"""
+Telegram Analytics Dashboard - Web Server
+A Flask-based web dashboard for visualizing Telegram chat analytics.
+Inspired by Combot and other Telegram statistics bots.
+Usage:
+    python dashboard.py --db telegram.db --port 5000
+    Then open http://localhost:5000 in your browser
+Requirements:
+    pip install flask
+"""
+import sqlite3
+import json
+import csv
+import io
+import os
+from datetime import datetime, timedelta
+from flask import Flask, render_template, jsonify, request, Response
+from typing import Optional
+from collections import defaultdict
+# ==========================================
+# AI CONFIGURATION
+# Set via environment variables (e.g. in .env or hosting platform settings)
+# ==========================================
+if not os.environ.get('AI_PROVIDER'):
+    os.environ['AI_PROVIDER'] = 'gemini'
+# GEMINI_API_KEY should be set as an environment variable, not hardcoded
+# Import our algorithms
+from algorithms import (
+    TopK, find_median, find_percentile, top_k_frequent,
+    RankTree, lcs_similarity, find_similar_messages,
+    bucket_sort_by_time, time_histogram, RankedTimeIndex
+)
+# Import semantic search (uses pre-computed embeddings)
+try:
+    from semantic_search import get_semantic_search
+    HAS_SEMANTIC_SEARCH = True
+except ImportError:
+    HAS_SEMANTIC_SEARCH = False
+    get_semantic_search = None
+app = Flask(__name__)
+DB_PATH = 'telegram.db'
+def get_db():
+    """Get database connection."""
+    conn = sqlite3.connect(DB_PATH)
+    conn.row_factory = sqlite3.Row
+    return conn
+def parse_timeframe(timeframe: str) -> tuple[int, int]:
+    """Parse timeframe string to Unix timestamps."""
+    now = datetime.now()
+    today_start = datetime(now.year, now.month, now.day)
+    if timeframe == 'today':
+        start = today_start
+        end = now
+    elif timeframe == 'yesterday':
+        start = today_start - timedelta(days=1)
+        end = today_start
+    elif timeframe == 'week':
+        start = today_start - timedelta(days=7)
+        end = now
+    elif timeframe == 'month':
+        start = today_start - timedelta(days=30)
+        end = now
+    elif timeframe == 'year':
+        start = today_start - timedelta(days=365)
+        end = now
+    elif timeframe == 'all':
+        return 0, int(now.timestamp())
+    else:
+        # Custom range: "start,end" as Unix timestamps
+        try:
+            parts = timeframe.split(',')
+            return int(parts[0]), int(parts[1])
+        except:
+            return 0, int(now.timestamp())
+    return int(start.timestamp()), int(end.timestamp())
+# ==========================================
+# CACHE INVALIDATION SYSTEM
+# ==========================================
+_cache_version = 0  # Incremented on DB updates to invalidate all caches
+def invalidate_caches():
+    """Invalidate all cached data. Call after DB updates (sync, import, etc.)."""
+    global _cache_version, _user_rank_tree, _user_rank_tree_timeframe
+    _cache_version += 1
+    _user_rank_tree = None
+    _user_rank_tree_timeframe = None
+# ==========================================
+# GLOBAL ALGORITHM CACHES
+# ==========================================
+# RankTree for O(log n) user ranking - rebuilt on demand
+_user_rank_tree = None
+_user_rank_tree_timeframe = None
+_user_rank_tree_version = -1
+def get_user_rank_tree(timeframe: str):
+    """
+    Get or rebuild the user rank tree for efficient O(log n) rank queries.
+    Tree is cached and rebuilt only when timeframe or DB version changes.
+    """
+    global _user_rank_tree, _user_rank_tree_timeframe, _user_rank_tree_version
+    if (_user_rank_tree is not None
+            and _user_rank_tree_timeframe == timeframe
+            and _user_rank_tree_version == _cache_version):
+        return _user_rank_tree
+    start_ts, end_ts = parse_timeframe(timeframe)
+    conn = get_db()
+    cursor = conn.execute('''
+        SELECT from_id, from_name, COUNT(*) as message_count
+        FROM messages
+        WHERE date_unixtime BETWEEN ? AND ?
+        AND from_id IS NOT NULL AND from_id != ''
+        GROUP BY from_id
+        ORDER BY message_count DESC
+    ''', (start_ts, end_ts))
+    _user_rank_tree = RankTree()
+    for row in cursor.fetchall():
+        _user_rank_tree.insert(
+            -row['message_count'],
+            {'user_id': row['from_id'], 'name': row['from_name'], 'messages': row['message_count']}
+        )
+    conn.close()
+    _user_rank_tree_timeframe = timeframe
+    _user_rank_tree_version = _cache_version
+    return _user_rank_tree
+# ==========================================
+# PAGE ROUTES
+# ==========================================
+@app.route('/')
+def index():
+    """Main dashboard page."""
+    return render_template('index.html')
+@app.route('/users')
+def users_page():
+    """User leaderboard page."""
+    return render_template('users.html')
+@app.route('/moderation')
+def moderation_page():
+    """Moderation analytics page."""
+    return render_template('moderation.html')
+@app.route('/search')
+def search_page():
+    """Search page."""
+    return render_template('search.html')
+@app.route('/chat')
+def chat_page():
+    """Chat view page - Telegram-like interface."""
+    return render_template('chat.html')
+@app.route('/user/<user_id>')
+def user_profile_page(user_id):
+    """User profile page with comprehensive statistics."""
+    return render_template('user_profile.html', user_id=user_id)
+@app.route('/settings')
+def settings_page():
+    """Settings and data update page."""
+    return render_template('settings.html')
+# ==========================================
+# API ENDPOINTS - OVERVIEW STATS
+# ==========================================
+@app.route('/api/overview')
+def api_overview():
+    """Get overview statistics."""
+    timeframe = request.args.get('timeframe', 'all')
+    start_ts, end_ts = parse_timeframe(timeframe)
+    conn = get_db()
+    # Total messages
+    cursor = conn.execute('''
+        SELECT COUNT(*) FROM messages
+        WHERE date_unixtime BETWEEN ? AND ?
+    ''', (start_ts, end_ts))
+    total_messages = cursor.fetchone()[0]
+    # Active users
+    cursor = conn.execute('''
+        SELECT COUNT(DISTINCT from_id) FROM messages
+        WHERE date_unixtime BETWEEN ? AND ?
+    ''', (start_ts, end_ts))
+    active_users = cursor.fetchone()[0]
+    # Total users (all time)
+    cursor = conn.execute('SELECT COUNT(*) FROM users')
+    total_users = cursor.fetchone()[0]
+    # Date range
+    cursor = conn.execute('''
+        SELECT MIN(date_unixtime), MAX(date_unixtime) FROM messages
+        WHERE date_unixtime BETWEEN ? AND ?
+    ''', (start_ts, end_ts))
+    row = cursor.fetchone()
+    first_msg = row[0] or start_ts
+    last_msg = row[1] or end_ts
+    # Calculate days
+    days = max(1, (last_msg - first_msg) // 86400)
+    # Messages per day
+    messages_per_day = total_messages / days
+    # Users per day (average unique users)
+    cursor = conn.execute('''
+        SELECT COUNT(DISTINCT from_id) as users,
+               date(datetime(date_unixtime, 'unixepoch')) as day
+        FROM messages
+        WHERE date_unixtime BETWEEN ? AND ?
+        GROUP BY day
+    ''', (start_ts, end_ts))
+    daily_users = [r[0] for r in cursor.fetchall()]
+    users_per_day = sum(daily_users) / len(daily_users) if daily_users else 0
+    # Messages with media/links
+    cursor = conn.execute('''
+        SELECT
+            SUM(has_media) as media,
+            SUM(has_links) as links,
+            SUM(has_mentions) as mentions
+        FROM messages
+        WHERE date_unixtime BETWEEN ? AND ?
+    ''', (start_ts, end_ts))
+    row = cursor.fetchone()
+    media_count = row[0] or 0
+    links_count = row[1] or 0
+    mentions_count = row[2] or 0
+    # Replies
+    cursor = conn.execute('''
+        SELECT COUNT(*) FROM messages
+        WHERE date_unixtime BETWEEN ? AND ?
+        AND reply_to_message_id IS NOT NULL
+    ''', (start_ts, end_ts))
+    replies_count = cursor.fetchone()[0]
+    # Forwards
+    cursor = conn.execute('''
+        SELECT COUNT(*) FROM messages
+        WHERE date_unixtime BETWEEN ? AND ?
+        AND forwarded_from IS NOT NULL
+    ''', (start_ts, end_ts))
+    forwards_count = cursor.fetchone()[0]
+    conn.close()
+    return jsonify({
+        'total_messages': total_messages,
+        'active_users': active_users,
+        'total_users': total_users,
+        'messages_per_day': round(messages_per_day, 1),
+        'users_per_day': round(users_per_day, 1),
+        'messages_per_user': round(total_messages / active_users, 1) if active_users else 0,
+        'media_count': media_count,
+        'links_count': links_count,
+        'mentions_count': mentions_count,
+        'replies_count': replies_count,
+        'forwards_count': forwards_count,
+        'days_span': days,
+        'first_message': first_msg,
+        'last_message': last_msg
+    })
+# ==========================================
+# API ENDPOINTS - CHARTS
+# ==========================================
+@app.route('/api/chart/messages')
+def api_chart_messages():
+    """Get message volume over time."""
+    timeframe = request.args.get('timeframe', 'month')
+    granularity = request.args.get('granularity', 'day')  # hour, day, week
+    start_ts, end_ts = parse_timeframe(timeframe)
+    conn = get_db()
+    if granularity == 'hour':
+        format_str = '%Y-%m-%d %H:00'
+    elif granularity == 'week':
+        format_str = '%Y-W%W'
+    else:  # day
+        format_str = '%Y-%m-%d'
+    cursor = conn.execute(f'''
+        SELECT
+            strftime('{format_str}', datetime(date_unixtime, 'unixepoch')) as period,
+            COUNT(*) as count
+        FROM messages
+        WHERE date_unixtime BETWEEN ? AND ?
+        GROUP BY period
+        ORDER BY period
+    ''', (start_ts, end_ts))
+    data = [{'label': row[0], 'value': row[1]} for row in cursor.fetchall()]
+    conn.close()
+    return jsonify(data)
+@app.route('/api/chart/users')
+def api_chart_users():
+    """Get active users over time."""
+    timeframe = request.args.get('timeframe', 'month')
+    granularity = request.args.get('granularity', 'day')
+    start_ts, end_ts = parse_timeframe(timeframe)
+    conn = get_db()
+    if granularity == 'hour':
+        format_str = '%Y-%m-%d %H:00'
+    elif granularity == 'week':
+        format_str = '%Y-W%W'
+    else:
+        format_str = '%Y-%m-%d'
+    cursor = conn.execute(f'''
+        SELECT
+            strftime('{format_str}', datetime(date_unixtime, 'unixepoch')) as period,
+            COUNT(DISTINCT from_id) as count
+        FROM messages
+        WHERE date_unixtime BETWEEN ? AND ?
+        GROUP BY period
+        ORDER BY period
+    ''', (start_ts, end_ts))
+    data = [{'label': row[0], 'value': row[1]} for row in cursor.fetchall()]
+    conn.close()
+    return jsonify(data)
+@app.route('/api/chart/heatmap')
+def api_chart_heatmap():
+    """Get activity heatmap (hour of day vs day of week)."""
+    timeframe = request.args.get('timeframe', 'all')
+    start_ts, end_ts = parse_timeframe(timeframe)
+    conn = get_db()
+    cursor = conn.execute('''
+        SELECT
+            CAST(strftime('%w', datetime(date_unixtime, 'unixepoch')) AS INTEGER) as dow,
+            CAST(strftime('%H', datetime(date_unixtime, 'unixepoch')) AS INTEGER) as hour,
+            COUNT(*) as count
+        FROM messages
+        WHERE date_unixtime BETWEEN ? AND ?
+        GROUP BY dow, hour
+    ''', (start_ts, end_ts))
+    # Initialize grid
+    heatmap = [[0 for _ in range(24)] for _ in range(7)]
+    for row in cursor.fetchall():
+        dow, hour, count = row
+        heatmap[dow][hour] = count
+    conn.close()
+    return jsonify({
+        'data': heatmap,
+        'days': ['Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday'],
+        'hours': list(range(24))
+    })
+@app.route('/api/chart/daily')
+def api_chart_daily():
+    """Get activity by day of week."""
+    timeframe = request.args.get('timeframe', 'all')
+    start_ts, end_ts = parse_timeframe(timeframe)
+    conn = get_db()
+    days = ['Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday']
+    cursor = conn.execute('''
+        SELECT
+            CAST(strftime('%w', datetime(date_unixtime, 'unixepoch')) AS INTEGER) as dow,
+            COUNT(*) as count
+        FROM messages
+        WHERE date_unixtime BETWEEN ? AND ?
+        GROUP BY dow
+        ORDER BY dow
+    ''', (start_ts, end_ts))
+    data = {days[row[0]]: row[1] for row in cursor.fetchall()}
+    conn.close()
+    return jsonify([{'label': day, 'value': data.get(day, 0)} for day in days])
+@app.route('/api/chart/hourly')
+def api_chart_hourly():
+    """Get activity by hour of day."""
+    timeframe = request.args.get('timeframe', 'all')
+    start_ts, end_ts = parse_timeframe(timeframe)
+    conn = get_db()
+    cursor = conn.execute('''
+        SELECT
+            CAST(strftime('%H', datetime(date_unixtime, 'unixepoch')) AS INTEGER) as hour,
+            COUNT(*) as count
+        FROM messages
+        WHERE date_unixtime BETWEEN ? AND ?
+        GROUP BY hour
+        ORDER BY hour
+    ''', (start_ts, end_ts))
+    data = {row[0]: row[1] for row in cursor.fetchall()}
+    conn.close()
+    return jsonify([{'label': f'{h:02d}:00', 'value': data.get(h, 0)} for h in range(24)])
+# ==========================================
+# API ENDPOINTS - USERS
+# ==========================================
+@app.route('/api/users')
+def api_users():
+    """Get user leaderboard including participants who never sent messages."""
+    timeframe = request.args.get('timeframe', 'all')
+    start_ts, end_ts = parse_timeframe(timeframe)
+    limit = int(request.args.get('limit', 50))
+    offset = int(request.args.get('offset', 0))
+    include_inactive = request.args.get('include_inactive', '1') == '1'
+    conn = get_db()
+    # Get total messages for percentage calculation
+    cursor = conn.execute('''
+        SELECT COUNT(*) FROM messages
+        WHERE date_unixtime BETWEEN ? AND ?
+    ''', (start_ts, end_ts))
+    total_messages = cursor.fetchone()[0]
+    # Get user stats from messages
+    cursor = conn.execute('''
+        SELECT
+            from_id,
+            from_name,
+            COUNT(*) as message_count,
+            SUM(LENGTH(text_plain)) as char_count,
+            SUM(has_links) as links,
+            SUM(has_media) as media,
+            MIN(date_unixtime) as first_seen,
+            MAX(date_unixtime) as last_seen,
+            COUNT(DISTINCT date(datetime(date_unixtime, 'unixepoch'))) as active_days
+        FROM messages
+        WHERE date_unixtime BETWEEN ? AND ?
+        AND from_id IS NOT NULL AND from_id != ''
+        GROUP BY from_id
+        ORDER BY message_count DESC
+    ''', (start_ts, end_ts))
+    active_users = []
+    active_user_ids = set()
+    for row in cursor.fetchall():
+        active_user_ids.add(row['from_id'])
+        active_users.append({
+            'user_id': row['from_id'],
+            'name': row['from_name'] or 'Unknown',
+            'messages': row['message_count'],
+            'characters': row['char_count'] or 0,
+            'percentage': round(100 * row['message_count'] / total_messages, 2) if total_messages else 0,
+            'links': row['links'] or 0,
+            'media': row['media'] or 0,
+            'first_seen': row['first_seen'],
+            'last_seen': row['last_seen'],
+            'active_days': row['active_days'],
+            'daily_average': round(row['message_count'] / max(1, row['active_days']), 1),
+            'is_participant': False,
+            'role': None,
+        })
+    # Try to enrich with participant data and add inactive participants
+    has_participants = False
+    try:
+        cursor = conn.execute('SELECT COUNT(*) FROM participants')
+        has_participants = cursor.fetchone()[0] > 0
+    except Exception:
+        pass
+    if has_participants:
+        # Enrich active users with participant data
+        participant_map = {}
+        cursor = conn.execute('SELECT * FROM participants')
+        for row in cursor.fetchall():
+            participant_map[row['user_id']] = dict(row)
+        for user in active_users:
+            p = participant_map.get(user['user_id'])
+            if p:
+                user['is_participant'] = True
+                user['username'] = p.get('username', '')
+                if p.get('is_creator'):
+                    user['role'] = 'creator'
+                elif p.get('is_admin'):
+                    user['role'] = 'admin'
+                elif p.get('is_bot'):
+                    user['role'] = 'bot'
+        # Add inactive participants (those who never sent messages)
+        if include_inactive:
+            for uid, p in participant_map.items():
+                if uid not in active_user_ids:
+                    name = f"{p.get('first_name', '')} {p.get('last_name', '')}".strip()
+                    role = None
+                    if p.get('is_creator'):
+                        role = 'creator'
+                    elif p.get('is_admin'):
+                        role = 'admin'
+                    elif p.get('is_bot'):
+                        role = 'bot'
+                    active_users.append({
+                        'user_id': uid,
+                        'name': name or 'Unknown',
+                        'messages': 0,
+                        'characters': 0,
+                        'percentage': 0,
+                        'links': 0,
+                        'media': 0,
+                        'first_seen': None,
+                        'last_seen': None,
+                        'active_days': 0,
+                        'daily_average': 0,
+                        'is_participant': True,
+                        'username': p.get('username', ''),
+                        'role': role,
+                    })
+    # Assign ranks (active users first, then inactive)
+    users_with_rank = []
+    for i, user in enumerate(active_users):
+        user['rank'] = i + 1 if user['messages'] > 0 else None
+        users_with_rank.append(user)
+    total_users = len(users_with_rank)
+    total_active = len(active_user_ids)
+    # Apply pagination
+    page_users = users_with_rank[offset:offset + limit]
+    conn.close()
+    return jsonify({
+        'users': page_users,
+        'total': total_users,
+        'total_active': total_active,
+        'total_participants': total_users - total_active if has_participants else 0,
+        'limit': limit,
+        'offset': offset
+    })
+@app.route('/api/user/<user_id>')
+def api_user_detail(user_id):
+    """Get detailed stats for a specific user."""
+    timeframe = request.args.get('timeframe', 'all')
+    start_ts, end_ts = parse_timeframe(timeframe)
+    conn = get_db()
+    # Basic stats
+    cursor = conn.execute('''
+        SELECT
+            from_name,
+            COUNT(*) as messages,
+            SUM(LENGTH(text_plain)) as characters,
+            SUM(has_links) as links,
+            SUM(has_media) as media,
+            SUM(has_mentions) as mentions,
+            MIN(date_unixtime) as first_seen,
+            MAX(date_unixtime) as last_seen,
+            COUNT(DISTINCT date(datetime(date_unixtime, 'unixepoch'))) as active_days
+        FROM messages
+        WHERE from_id = ?
+        AND date_unixtime BETWEEN ? AND ?
+    ''', (user_id, start_ts, end_ts))
+    row = cursor.fetchone()
+    if not row or not row['messages']:
+        conn.close()
+        return jsonify({'error': 'User not found'}), 404
+    # Replies sent
+    cursor = conn.execute('''
+        SELECT COUNT(*) FROM messages
+        WHERE from_id = ? AND reply_to_message_id IS NOT NULL
+        AND date_unixtime BETWEEN ? AND ?
+    ''', (user_id, start_ts, end_ts))
+    replies_sent = cursor.fetchone()[0]
+    # Replies received
+    cursor = conn.execute('''
+        SELECT COUNT(*) FROM messages m1
+        JOIN messages m2 ON m1.reply_to_message_id = m2.id
+        WHERE m2.from_id = ?
+        AND m1.date_unixtime BETWEEN ? AND ?
+    ''', (user_id, start_ts, end_ts))
+    replies_received = cursor.fetchone()[0]
+    # Activity by hour
+    cursor = conn.execute('''
+        SELECT
+            CAST(strftime('%H', datetime(date_unixtime, 'unixepoch')) AS INTEGER) as hour,
+            COUNT(*) as count
+        FROM messages
+        WHERE from_id = ?
+        AND date_unixtime BETWEEN ? AND ?
+        GROUP BY hour
+    ''', (user_id, start_ts, end_ts))
+    hourly = {row[0]: row[1] for row in cursor.fetchall()}
+    # Activity over time
+    cursor = conn.execute('''
+        SELECT
+            date(datetime(date_unixtime, 'unixepoch')) as day,
+            COUNT(*) as count
+        FROM messages
+        WHERE from_id = ?
+        AND date_unixtime BETWEEN ? AND ?
+        GROUP BY day
+        ORDER BY day DESC
+        LIMIT 30
+    ''', (user_id, start_ts, end_ts))
+    daily = [{'date': r[0], 'count': r[1]} for r in cursor.fetchall()]
+    # Rank
+    cursor = conn.execute('''
+        SELECT COUNT(*) + 1 FROM (
+            SELECT from_id, COUNT(*) as cnt FROM messages
+            WHERE date_unixtime BETWEEN ? AND ?
+            GROUP BY from_id
+        ) WHERE cnt > ?
+    ''', (start_ts, end_ts, row['messages']))
+    rank = cursor.fetchone()[0]
+    conn.close()
+    return jsonify({
+        'user_id': user_id,
+        'name': row['from_name'] or 'Unknown',
+        'messages': row['messages'],
+        'characters': row['characters'] or 0,
+        'links': row['links'] or 0,
+        'media': row['media'] or 0,
+        'mentions': row['mentions'] or 0,
+        'first_seen': row['first_seen'],
+        'last_seen': row['last_seen'],
+        'active_days': row['active_days'],
+        'daily_average': round(row['messages'] / max(1, row['active_days']), 1),
+        'replies_sent': replies_sent,
+        'replies_received': replies_received,
+        'rank': rank,
+        'hourly_activity': [hourly.get(h, 0) for h in range(24)],
+        'daily_activity': daily
+    })
+@app.route('/api/user/<user_id>/profile')
+def api_user_profile(user_id):
+    """Get comprehensive user profile with all available statistics."""
+    conn = get_db()
+    # ---- Participant info (from Telethon sync) ----
+    participant = None
+    try:
+        cursor = conn.execute('SELECT * FROM participants WHERE user_id = ?', (user_id,))
+        row = cursor.fetchone()
+        if row:
+            participant = dict(row)
+    except Exception:
+        pass  # Table might not exist yet
+    # ---- Basic message stats ----
+    cursor = conn.execute('''
+        SELECT
+            from_name,
+            COUNT(*) as total_messages,
+            SUM(text_length) as total_chars,
+            AVG(text_length) as avg_length,
+            MAX(text_length) as max_length,
+            SUM(has_links) as links_shared,
+            SUM(has_media) as media_sent,
+            SUM(has_photo) as photos_sent,
+            SUM(has_mentions) as mentions_made,
+            SUM(is_edited) as edits,
+            MIN(date_unixtime) as first_message,
+            MAX(date_unixtime) as last_message,
+            COUNT(DISTINCT date(datetime(date_unixtime, 'unixepoch'))) as active_days
+        FROM messages WHERE from_id = ?
+    ''', (user_id,))
+    stats = cursor.fetchone()
+    if not stats or not stats['total_messages']:
+        # User might be a participant who never sent a message
+        if participant:
+            conn.close()
+            return jsonify({
+                'user_id': user_id,
+                'participant': participant,
+                'has_messages': False,
+                'name': f"{participant.get('first_name', '')} {participant.get('last_name', '')}".strip()
+            })
+        conn.close()
+        return jsonify({'error': 'User not found'}), 404
+    stats = dict(stats)
+    # ---- Replies sent (who does this user reply to most) ----
+    cursor = conn.execute('''
+        SELECT r.from_name, r.from_id, COUNT(*) as cnt
+        FROM messages m
+        JOIN messages r ON m.reply_to_message_id = r.id
+        WHERE m.from_id = ? AND r.from_id != ?
+        GROUP BY r.from_id
+        ORDER BY cnt DESC
+        LIMIT 10
+    ''', (user_id, user_id))
+    replies_to = [{'name': r[0], 'user_id': r[1], 'count': r[2]} for r in cursor.fetchall()]
+    # ---- Replies received (who replies to this user most) ----
+    cursor = conn.execute('''
+        SELECT m.from_name, m.from_id, COUNT(*) as cnt
+        FROM messages m
+        JOIN messages r ON m.reply_to_message_id = r.id
+        WHERE r.from_id = ? AND m.from_id != ?
+        GROUP BY m.from_id
+        ORDER BY cnt DESC
+        LIMIT 10
+    ''', (user_id, user_id))
+    replies_from = [{'name': r[0], 'user_id': r[1], 'count': r[2]} for r in cursor.fetchall()]
+    # ---- Total replies sent/received ----
+    cursor = conn.execute('''
+        SELECT COUNT(*) FROM messages
+        WHERE from_id = ? AND reply_to_message_id IS NOT NULL
+    ''', (user_id,))
+    total_replies_sent = cursor.fetchone()[0]
+    cursor = conn.execute('''
+        SELECT COUNT(*) FROM messages m
+        JOIN messages r ON m.reply_to_message_id = r.id
+        WHERE r.from_id = ? AND m.from_id != ?
+    ''', (user_id, user_id))
+    total_replies_received = cursor.fetchone()[0]
+    # ---- Forwarded messages ----
+    cursor = conn.execute('''
+        SELECT COUNT(*) FROM messages
+        WHERE from_id = ? AND forwarded_from IS NOT NULL
+    ''', (user_id,))
+    forwards_sent = cursor.fetchone()[0]
+    # ---- Top forwarded sources ----
+    cursor = conn.execute('''
+        SELECT forwarded_from, COUNT(*) as cnt
+        FROM messages
+        WHERE from_id = ? AND forwarded_from IS NOT NULL
+        GROUP BY forwarded_from
+        ORDER BY cnt DESC
+        LIMIT 5
+    ''', (user_id,))
+    top_forward_sources = [{'name': r[0], 'count': r[1]} for r in cursor.fetchall()]
+    # ---- Activity by hour ----
+    cursor = conn.execute('''
+        SELECT
+            CAST(strftime('%H', datetime(date_unixtime, 'unixepoch')) AS INTEGER) as hour,
+            COUNT(*) as count
+        FROM messages WHERE from_id = ?
+        GROUP BY hour
+    ''', (user_id,))
+    hourly = {r[0]: r[1] for r in cursor.fetchall()}
+    # ---- Activity by weekday ----
+    cursor = conn.execute('''
+        SELECT
+            CAST(strftime('%w', datetime(date_unixtime, 'unixepoch')) AS INTEGER) as weekday,
+            COUNT(*) as count
+        FROM messages WHERE from_id = ?
+        GROUP BY weekday
+    ''', (user_id,))
+    weekday_names = ['Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday']
+    weekday_data = {r[0]: r[1] for r in cursor.fetchall()}
+    weekday_activity = [{'day': weekday_names[d], 'count': weekday_data.get(d, 0)} for d in range(7)]
+    # ---- Activity trend (last 90 days) ----
+    cursor = conn.execute('''
+        SELECT
+            date(datetime(date_unixtime, 'unixepoch')) as day,
+            COUNT(*) as count
+        FROM messages WHERE from_id = ?
+        GROUP BY day
+        ORDER BY day DESC
+        LIMIT 90
+    ''', (user_id,))
+    daily_activity = [{'date': r[0], 'count': r[1]} for r in cursor.fetchall()]
+    # ---- Monthly trend ----
+    cursor = conn.execute('''
+        SELECT
+            strftime('%Y-%m', datetime(date_unixtime, 'unixepoch')) as month,
+            COUNT(*) as count
+        FROM messages WHERE from_id = ?
+        GROUP BY month
+        ORDER BY month
+    ''', (user_id,))
+    monthly_activity = [{'month': r[0], 'count': r[1]} for r in cursor.fetchall()]
+    # ---- Top links shared ----
+    cursor = conn.execute('''
+        SELECT e.value, COUNT(*) as cnt
+        FROM entities e
+        JOIN messages m ON e.message_id = m.id
+        WHERE m.from_id = ? AND e.type = 'link'
+        GROUP BY e.value
+        ORDER BY cnt DESC
+        LIMIT 10
+    ''', (user_id,))
+    top_links = [{'url': r[0], 'count': r[1]} for r in cursor.fetchall()]
+    # ---- Rank among all users ----
+    cursor = conn.execute('''
+        SELECT COUNT(*) + 1 FROM (
+            SELECT from_id, COUNT(*) as cnt FROM messages GROUP BY from_id
+        ) WHERE cnt > ?
+    ''', (stats['total_messages'],))
+    rank = cursor.fetchone()[0]
+    cursor = conn.execute('SELECT COUNT(DISTINCT from_id) FROM messages')
+    total_users = cursor.fetchone()[0]
+    # ---- Average reply time (when replying to someone) ----
+    cursor = conn.execute('''
+        SELECT AVG(m.date_unixtime - r.date_unixtime)
+        FROM messages m
+        JOIN messages r ON m.reply_to_message_id = r.id
+        WHERE m.from_id = ?
+        AND m.date_unixtime - r.date_unixtime > 0
+        AND m.date_unixtime - r.date_unixtime < 86400
+    ''', (user_id,))
+    avg_reply_time = cursor.fetchone()[0]
+    conn.close()
+    # ---- Build response ----
+    total_msgs = stats['total_messages']
+    active_days = stats['active_days'] or 1
+    first_msg = stats['first_message']
+    last_msg = stats['last_message']
+    span_days = max(1, (last_msg - first_msg) / 86400) if first_msg and last_msg else 1
+    return jsonify({
+        'user_id': user_id,
+        'name': stats['from_name'] or 'Unknown',
+        'has_messages': True,
+        'participant': participant,
+        # Core stats
+        'total_messages': total_msgs,
+        'total_characters': stats['total_chars'] or 0,
+        'avg_message_length': round(stats['avg_length'] or 0, 1),
+        'max_message_length': stats['max_length'] or 0,
+        'links_shared': stats['links_shared'] or 0,
+        'media_sent': stats['media_sent'] or 0,
+        'photos_sent': stats['photos_sent'] or 0,
+        'mentions_made': stats['mentions_made'] or 0,
+        'edits': stats['edits'] or 0,
+        'forwards_sent': forwards_sent,
+        # Time stats
+        'first_message': first_msg,
+        'last_message': last_msg,
+        'active_days': active_days,
+        'daily_average': round(total_msgs / active_days, 1),
+        'messages_per_calendar_day': round(total_msgs / span_days, 1),
+        # Reply stats
+        'total_replies_sent': total_replies_sent,
+        'total_replies_received': total_replies_received,
+        'reply_ratio': round(total_replies_sent / max(1, total_msgs) * 100, 1),
+        'avg_reply_time_seconds': round(avg_reply_time) if avg_reply_time else None,
+        'replies_to': replies_to,
+        'replies_from': replies_from,
+        # Forward stats
+        'top_forward_sources': top_forward_sources,
+        # Ranking
+        'rank': rank,
+        'total_active_users': total_users,
+        # Activity patterns
+        'hourly_activity': [hourly.get(h, 0) for h in range(24)],
+        'weekday_activity': weekday_activity,
+        'daily_activity': daily_activity,
+        'monthly_activity': monthly_activity,
+        # Content
+        'top_links': top_links,
+    })
+# ==========================================
+# API ENDPOINTS - CONTENT ANALYTICS
+# ==========================================
+@app.route('/api/top/words')
+def api_top_words():
+    """Get top words."""
+    timeframe = request.args.get('timeframe', 'all')
+    start_ts, end_ts = parse_timeframe(timeframe)
+    limit = int(request.args.get('limit', 30))
+    conn = get_db()
+    cursor = conn.execute('''
+        SELECT text_plain FROM messages
+        WHERE date_unixtime BETWEEN ? AND ?
+        AND text_plain IS NOT NULL
+    ''', (start_ts, end_ts))
+    import re
+    word_pattern = re.compile(r'[\u0590-\u05FFa-zA-Z]{3,}')
+    words = []
+    for row in cursor.fetchall():
+        words.extend(word_pattern.findall(row[0].lower()))
+    conn.close()
+    top_words = top_k_frequent(words, limit)
+    return jsonify([{'word': w, 'count': c} for w, c in top_words])
+@app.route('/api/top/domains')
+def api_top_domains():
+    """Get top shared domains."""
+    timeframe = request.args.get('timeframe', 'all')
+    start_ts, end_ts = parse_timeframe(timeframe)
+    limit = int(request.args.get('limit', 20))
+    conn = get_db()
+    cursor = conn.execute('''
+        SELECT e.value FROM entities e
+        JOIN messages m ON e.message_id = m.id
+        WHERE e.type = 'link'
+        AND m.date_unixtime BETWEEN ? AND ?
+    ''', (start_ts, end_ts))
+    import re
+    domain_pattern = re.compile(r'https?://(?:www\.)?([^/]+)')
+    domains = []
+    for row in cursor.fetchall():
+        match = domain_pattern.match(row[0])
+        if match:
+            domains.append(match.group(1))
+    conn.close()
+    top_domains = top_k_frequent(domains, limit)
+    return jsonify([{'domain': d, 'count': c} for d, c in top_domains])
+@app.route('/api/top/mentions')
+def api_top_mentions():
+    """Get top mentioned users."""
+    timeframe = request.args.get('timeframe', 'all')
+    start_ts, end_ts = parse_timeframe(timeframe)
+    limit = int(request.args.get('limit', 20))
+    conn = get_db()
+    cursor = conn.execute('''
+        SELECT e.value, COUNT(*) as count FROM entities e
+        JOIN messages m ON e.message_id = m.id
+        WHERE e.type = 'mention'
+        AND m.date_unixtime BETWEEN ? AND ?
+        GROUP BY e.value
+        ORDER BY count DESC
+        LIMIT ?
+    ''', (start_ts, end_ts, limit))
+    data = [{'mention': row[0], 'count': row[1]} for row in cursor.fetchall()]
+    conn.close()
+    return jsonify(data)
+# ==========================================
+# API ENDPOINTS - ADVANCED ANALYTICS (Course Algorithms)
+# ==========================================
+@app.route('/api/similar/<int:message_id>')
+def api_similar_messages(message_id):
+    """
+    Find messages similar to a given message using LCS algorithm.
+    Algorithm: LCS (Longest Common Subsequence)
+    Time: O(n * m) where n = sample size, m = avg message length
+    Use case: Detect reposts, spam, similar content
+    """
+    threshold = float(request.args.get('threshold', 0.7))
+    limit = int(request.args.get('limit', 10))
+    sample_size = int(request.args.get('sample', 1000))
+    conn = get_db()
+    # Get the target message
+    cursor = conn.execute('''
+        SELECT text_plain, from_name, date FROM messages WHERE id = ?
+    ''', (message_id,))
+    target = cursor.fetchone()
+    if not target or not target['text_plain']:
+        conn.close()
+        return jsonify({'error': 'Message not found or empty'}), 404
+    target_text = target['text_plain']
+    # Get sample of messages to compare (excluding the target)
+    cursor = conn.execute('''
+        SELECT id, text_plain, from_name, date FROM messages
+        WHERE id != ? AND text_plain IS NOT NULL AND LENGTH(text_plain) > 20
+        ORDER BY RANDOM()
+        LIMIT ?
+    ''', (message_id, sample_size))
+    messages = [(row['id'], row['text_plain']) for row in cursor.fetchall()]
+    conn.close()
+    # Find similar messages using LCS
+    similar = []
+    for msg_id, text in messages:
+        sim = lcs_similarity(target_text, text)
+        if sim >= threshold:
+            similar.append({
+                'id': msg_id,
+                'similarity': round(sim * 100, 1),
+                'text': text[:200] + '...' if len(text) > 200 else text
+            })
+    # Sort by similarity descending and limit
+    similar.sort(key=lambda x: x['similarity'], reverse=True)
+    similar = similar[:limit]
+    return jsonify({
+        'target': {
+            'id': message_id,
+            'text': target_text[:200] + '...' if len(target_text) > 200 else target_text,
+            'from': target['from_name'],
+            'date': target['date']
+        },
+        'similar': similar,
+        'algorithm': 'LCS (Longest Common Subsequence)',
+        'threshold': threshold
+    })
+@app.route('/api/analytics/similar')
+def api_find_all_similar():
+    """
+    Find all similar message pairs in the database.
+    Algorithm: LCS with early termination
+    Time: O(n² * m) where n = sample size, m = avg message length
+    Use case: Detect spam campaigns, repeated content
+    """
+    timeframe = request.args.get('timeframe', 'all')
+    threshold = float(request.args.get('threshold', 0.8))
+    sample_size = int(request.args.get('sample', 500))
+    start_ts, end_ts = parse_timeframe(timeframe)
+    conn = get_db()
+    cursor = conn.execute('''
+        SELECT id, text_plain, from_name, from_id FROM messages
+        WHERE date_unixtime BETWEEN ? AND ?
+        AND text_plain IS NOT NULL AND LENGTH(text_plain) > 30
+        ORDER BY RANDOM()
+        LIMIT ?
+    ''', (start_ts, end_ts, sample_size))
+    messages = [(row['id'], row['text_plain'], row['from_name'], row['from_id'])
+                for row in cursor.fetchall()]
+    conn.close()
+    # Use our LCS algorithm to find similar pairs
+    message_pairs = [(id_, text) for id_, text, _, _ in messages]
+    similar_pairs = find_similar_messages(message_pairs, threshold=threshold, min_length=30)
+    # Build result with user info
+    id_to_info = {id_: (name, uid) for id_, _, name, uid in messages}
+    id_to_text = {id_: text for id_, text, _, _ in messages}
+    results = []
+    for id1, id2, sim in similar_pairs[:50]:  # Limit to top 50
+        results.append({
+            'message1': {
+                'id': id1,
+                'text': id_to_text[id1][:150],
+                'from': id_to_info[id1][0]
+            },
+            'message2': {
+                'id': id2,
+                'text': id_to_text[id2][:150],
+                'from': id_to_info[id2][0]
+            },
+            'similarity': round(sim * 100, 1)
+        })
+    return jsonify({
+        'pairs': results,
+        'total_found': len(similar_pairs),
+        'algorithm': 'LCS (Longest Common Subsequence)',
+        'threshold': threshold,
+        'sample_size': sample_size
+    })
+@app.route('/api/user/rank/<user_id>')
+def api_user_rank_efficient(user_id):
+    """
+    Get user rank using RankTree for O(log n) lookup.
+    Algorithm: Order Statistics Tree (AVL-based Rank Tree)
+    Time: O(log n) instead of O(n) SQL scan
+    Use case: Real-time user ranking queries
+    """
+    timeframe = request.args.get('timeframe', 'all')
+    tree = get_user_rank_tree(timeframe)
+    # Find user in tree by iterating (still O(n) for lookup, but rank is O(log n))
+    # For true O(log n), we'd need to store user_id as key
+    start_ts, end_ts = parse_timeframe(timeframe)
+    conn = get_db()
+    cursor = conn.execute('''
+        SELECT COUNT(*) as count FROM messages
+        WHERE from_id = ? AND date_unixtime BETWEEN ? AND ?
+    ''', (user_id, start_ts, end_ts))
+    user_count = cursor.fetchone()['count']
+    if user_count == 0:
+        conn.close()
+        return jsonify({'error': 'User not found'}), 404
+    # Use rank tree to find rank (O(log n))
+    rank = tree.rank(-user_count)  # Negative because tree uses negative counts
+    # Get total users
+    total = len(tree)
+    conn.close()
+    return jsonify({
+        'user_id': user_id,
+        'messages': user_count,
+        'rank': rank,
+        'total_users': total,
+        'percentile': round(100 * (total - rank + 1) / total, 1) if total > 0 else 0,
+        'algorithm': 'RankTree (Order Statistics Tree)',
+        'complexity': 'O(log n)'
+    })
+@app.route('/api/user/by-rank/<int:rank>')
+def api_user_by_rank(rank):
+    """
+    Get user at specific rank using RankTree.
+    Algorithm: Order Statistics Tree select(k)
+    Time: O(log n)
+    Use case: "Who is the 10th most active user?"
+    """
+    timeframe = request.args.get('timeframe', 'all')
+    tree = get_user_rank_tree(timeframe)
+    if rank < 1 or rank > len(tree):
+        return jsonify({'error': f'Rank must be between 1 and {len(tree)}'}), 400
+    user = tree.select(rank)
+    if not user:
+        return jsonify({'error': 'User not found'}), 404
+    return jsonify({
+        'rank': rank,
+        'user': user,
+        'total_users': len(tree),
+        'algorithm': 'RankTree select(k)',
+        'complexity': 'O(log n)'
+    })
+@app.route('/api/analytics/histogram')
+def api_activity_histogram():
+    """
+    Get activity histogram using Bucket Sort.
+    Algorithm: Bucket Sort
+    Time: O(n + k) where k = number of buckets
+    Use case: Efficient time-based grouping without SQL GROUP BY
+    """
+    timeframe = request.args.get('timeframe', 'month')
+    bucket_seconds = int(request.args.get('bucket', 86400))  # Default: 1 day
+    start_ts, end_ts = parse_timeframe(timeframe)
+    conn = get_db()
+    cursor = conn.execute('''
+        SELECT date_unixtime FROM messages
+        WHERE date_unixtime BETWEEN ? AND ?
+    ''', (start_ts, end_ts))
+    records = [{'date_unixtime': row[0]} for row in cursor.fetchall()]
+    conn.close()
+    # Use bucket sort algorithm
+    histogram = time_histogram(records, 'date_unixtime', bucket_size=bucket_seconds)
+    # Format for frontend
+    from datetime import datetime
+    result = []
+    for bucket_time, count in histogram:
+        result.append({
+            'timestamp': bucket_time,
+            'date': datetime.fromtimestamp(bucket_time).strftime('%Y-%m-%d %H:%M'),
+            'count': count
+        })
+    return jsonify({
+        'histogram': result,
+        'bucket_size_seconds': bucket_seconds,
+        'total_records': len(records),
+        'algorithm': 'Bucket Sort',
+        'complexity': 'O(n + k)'
+    })
+@app.route('/api/analytics/percentiles')
+def api_message_percentiles():
+    """
+    Get message length percentiles using Selection Algorithm.
+    Algorithm: Quickselect with Median of Medians
+    Time: O(n) guaranteed
+    Use case: Analyze message length distribution without sorting
+    """
+    timeframe = request.args.get('timeframe', 'all')
+    start_ts, end_ts = parse_timeframe(timeframe)
+    conn = get_db()
+    cursor = conn.execute('''
+        SELECT LENGTH(text_plain) as length FROM messages
+        WHERE date_unixtime BETWEEN ? AND ?
+        AND text_plain IS NOT NULL
+    ''', (start_ts, end_ts))
+    lengths = [row[0] for row in cursor.fetchall() if row[0]]
+    conn.close()
+    if not lengths:
+        return jsonify({'error': 'No messages found'}), 404
+    # Use our O(n) selection algorithm
+    result = {
+        'count': len(lengths),
+        'min': min(lengths),
+        'max': max(lengths),
+        'median': find_median(lengths),
+        'p25': find_percentile(lengths, 25),
+        'p75': find_percentile(lengths, 75),
+        'p90': find_percentile(lengths, 90),
+        'p95': find_percentile(lengths, 95),
+        'p99': find_percentile(lengths, 99),
+        'algorithm': 'Quickselect with Median of Medians',
+        'complexity': 'O(n) guaranteed'
+    }
+    return jsonify(result)
+# ==========================================
+# API ENDPOINTS - SEARCH
+# ==========================================
+@app.route('/api/search')
+def api_search():
+    """Search messages."""
+    query = request.args.get('q', '')
+    timeframe = request.args.get('timeframe', 'all')
+    start_ts, end_ts = parse_timeframe(timeframe)
+    limit = int(request.args.get('limit', 50))
+    offset = int(request.args.get('offset', 0))
+    if not query:
+        return jsonify({'results': [], 'total': 0})
+    conn = get_db()
+    cursor = conn.execute('''
+        SELECT
+            m.id,
+            m.date,
+            m.from_name,
+            m.from_id,
+            m.text_plain,
+            m.has_links,
+            m.has_media
+        FROM messages_fts
+        JOIN messages m ON messages_fts.rowid = m.id
+        WHERE messages_fts MATCH ?
+        AND m.date_unixtime BETWEEN ? AND ?
+        ORDER BY m.date_unixtime DESC
+        LIMIT ? OFFSET ?
+    ''', (query, start_ts, end_ts, limit, offset))
+    results = [{
+        'id': row['id'],
+        'date': row['date'],
+        'from_name': row['from_name'],
+        'from_id': row['from_id'],
+        'text': row['text_plain'][:300] if row['text_plain'] else '',
+        'has_links': bool(row['has_links']),
+        'has_media': bool(row['has_media'])
+    } for row in cursor.fetchall()]
+    conn.close()
+    return jsonify({
+        'results': results,
+        'query': query,
+        'limit': limit,
+        'offset': offset
+    })
+# ==========================================
+# API ENDPOINTS - CHAT VIEW
+# ==========================================
+@app.route('/api/chat/messages')
+def api_chat_messages():
+    """Get messages for chat view with filters."""
+    offset = int(request.args.get('offset', 0))
+    limit = int(request.args.get('limit', 50))
+    user_id = request.args.get('user_id')
+    search = request.args.get('search')
+    date_from = request.args.get('date_from')
+    date_to = request.args.get('date_to')
+    has_media = request.args.get('has_media')
+    has_link = request.args.get('has_link')
+    conn = get_db()
+    # Build query
+    conditions = ["1=1"]
+    params = []
+    if user_id:
+        conditions.append("m.from_id = ?")
+        params.append(user_id)
+    if date_from:
+        conditions.append("m.date >= ?")
+        params.append(date_from)
+    if date_to:
+        conditions.append("m.date <= ?")
+        params.append(date_to)
+    if has_media == '1':
+        conditions.append("m.has_media = 1")
+    elif has_media == '0':
+        conditions.append("m.has_media = 0")
+    if has_link == '1':
+        conditions.append("m.has_links = 1")
+    # Handle FTS search
+    if search:
+        conditions.append("""m.id IN (
+            SELECT rowid FROM messages_fts WHERE messages_fts MATCH ?
+        )""")
+        params.append(search)
+    where_clause = " AND ".join(conditions)
+    # Get total count
+    cursor = conn.execute(f"SELECT COUNT(*) FROM messages m WHERE {where_clause}", params)
+    total = cursor.fetchone()[0]
+    # Get messages with reply info
+    query = f"""
+        SELECT
+            m.id,
+            m.id as message_id,
+            m.date,
+            m.from_id,
+            m.from_name,
+            m.text_plain as text,
+            m.reply_to_message_id,
+            m.forwarded_from,
+            m.forwarded_from_id,
+            m.has_media,
+            m.has_photo,
+            m.has_links as has_link,
+            m.has_mentions,
+            m.is_edited,
+            r.from_name as reply_to_name,
+            substr(r.text_plain, 1, 100) as reply_to_text
+        FROM messages m
+        LEFT JOIN messages r ON m.reply_to_message_id = r.id
+        WHERE {where_clause}
+        ORDER BY m.date ASC
+        LIMIT ? OFFSET ?
+    """
+    params.extend([limit, offset])
+    cursor = conn.execute(query, params)
+    messages = [dict(row) for row in cursor.fetchall()]
+    # Fetch entities (links, mentions) for these messages
+    if messages:
+        msg_ids = [m['id'] for m in messages]
+        placeholders = ','.join('?' * len(msg_ids))
+        ent_cursor = conn.execute(f"""
+            SELECT message_id, type, value
+            FROM entities
+            WHERE message_id IN ({placeholders})
+        """, msg_ids)
+        # Group entities by message_id
+        entities_map = {}
+        for row in ent_cursor.fetchall():
+            mid = row[0]
+            if mid not in entities_map:
+                entities_map[mid] = []
+            entities_map[mid].append({'type': row[1], 'value': row[2]})
+        # Attach entities to messages
+        for msg in messages:
+            msg['entities'] = entities_map.get(msg['id'], [])
+    conn.close()
+    return jsonify({
+        'messages': messages,
+        'total': total,
+        'offset': offset,
+        'limit': limit,
+        'has_more': offset + limit < total
+    })
+@app.route('/api/chat/thread/<int:message_id>')
+def api_chat_thread(message_id):
+    """Get conversation thread for a message."""
+    conn = get_db()
+    thread = []
+    visited = set()
+    def get_parent(msg_id):
+        """Recursively get parent messages."""
+        if msg_id in visited:
+            return
+        visited.add(msg_id)
+        cursor = conn.execute("""
+            SELECT id as message_id, date, from_name, text_plain as text, reply_to_message_id
+            FROM messages WHERE id = ?
+        """, (msg_id,))
+        row = cursor.fetchone()
+        if row:
+            if row['reply_to_message_id']:
+                get_parent(row['reply_to_message_id'])
+            thread.append(dict(row))
+    def get_children(msg_id):
+        """Get all replies to a message."""
+        cursor = conn.execute("""
+            SELECT id as message_id, date, from_name, text_plain as text, reply_to_message_id
+            FROM messages WHERE reply_to_message_id = ?
+            ORDER BY date
+        """, (msg_id,))
+        for row in cursor.fetchall():
+            if row['message_id'] not in visited:
+                visited.add(row['message_id'])
+                thread.append(dict(row))
+                get_children(row['message_id'])
+    # Get the original message and its parents
+    get_parent(message_id)
+    # Get all replies
+    get_children(message_id)
+    conn.close()
+    # Sort by date
+    thread.sort(key=lambda x: x['date'])
+    return jsonify(thread)
+@app.route('/api/chat/context/<int:message_id>')
+def api_chat_context(message_id):
+    """Get messages around a specific message."""
+    before = int(request.args.get('before', 20))
+    after = int(request.args.get('after', 20))
+    conn = get_db()
+    # Get target message date
+    cursor = conn.execute("SELECT date FROM messages WHERE id = ?", (message_id,))
+    row = cursor.fetchone()
+    if not row:
+        conn.close()
+        return jsonify({'messages': [], 'target_id': message_id})
+    target_date = row['date']
+    # Get messages before
+    cursor = conn.execute("""
+        SELECT id as message_id, date, from_id, from_name, text_plain as text,
+               reply_to_message_id, has_media, has_links as has_link
+        FROM messages
+        WHERE date < ?
+        ORDER BY date DESC
+        LIMIT ?
+    """, (target_date, before))
+    before_msgs = list(reversed([dict(row) for row in cursor.fetchall()]))
+    # Get target message
+    cursor = conn.execute("""
+        SELECT id as message_id, date, from_id, from_name, text_plain as text,
+               reply_to_message_id, has_media, has_links as has_link
+        FROM messages
+        WHERE id = ?
+    """, (message_id,))
+    target_msg = dict(cursor.fetchone())
+    # Get messages after
+    cursor = conn.execute("""
+        SELECT id as message_id, date, from_id, from_name, text_plain as text,
+               reply_to_message_id, has_media, has_links as has_link
+        FROM messages
+        WHERE date > ?
+        ORDER BY date ASC
+        LIMIT ?
+    """, (target_date, after))
+    after_msgs = [dict(row) for row in cursor.fetchall()]
+    conn.close()
+    return jsonify({
+        'messages': before_msgs + [target_msg] + after_msgs,
+        'target_id': message_id
+    })
+# ==========================================
+# API ENDPOINTS - AI SEARCH
+# ==========================================
+# Global AI engine (lazy loaded)
+_ai_engine = None
+_ai_engine_init_attempted = False
+def get_ai_engine():
+    """Get or create AI search engine."""
+    global _ai_engine, _ai_engine_init_attempted
+    if _ai_engine is not None:
+        return _ai_engine
+    if _ai_engine_init_attempted:
+        return None  # Already tried and failed
+    _ai_engine_init_attempted = True
+    try:
+        from ai_search import AISearchEngine
+        import os
+        provider = os.getenv('AI_PROVIDER', 'ollama')
+        # Get API key - check both generic and provider-specific env vars
+        api_key = os.getenv('AI_API_KEY') or os.getenv(f'{provider.upper()}_API_KEY')
+        print(f"Initializing AI engine with provider: {provider}")
+        _ai_engine = AISearchEngine(DB_PATH, provider, api_key)
+        print(f"AI engine initialized successfully")
+        return _ai_engine
+    except Exception as e:
+        print(f"AI Search not available: {e}")
+        import traceback
+        traceback.print_exc()
+        return None
+@app.route('/api/ai/status')
+def api_ai_status():
+    """Get AI engine status for debugging."""
+    provider = os.getenv('AI_PROVIDER', 'ollama')
+    api_key = os.getenv('AI_API_KEY') or os.getenv(f'{provider.upper()}_API_KEY')
+    status = {
+        'provider': provider,
+        'api_key_set': bool(api_key),
+        'api_key_preview': f"{api_key[:8]}..." if api_key and len(api_key) > 8 else None,
+        'ai_engine_initialized': _ai_engine is not None,
+        'init_attempted': _ai_engine_init_attempted,
+        'semantic_search_available': HAS_SEMANTIC_SEARCH,
+    }
+    # Check if we can initialize now
+    if _ai_engine is None and not _ai_engine_init_attempted:
+        engine = get_ai_engine()
+        status['ai_engine_initialized'] = engine is not None
+    # Check for embeddings
+    if HAS_SEMANTIC_SEARCH:
+        try:
+            ss = get_semantic_search()
+            status['embeddings_available'] = ss.is_available()
+            status['embeddings_stats'] = ss.stats()
+        except Exception as e:
+            status['embeddings_error'] = str(e)
+    return jsonify(status)
+@app.route('/api/ai/reset')
+def api_ai_reset():
+    """Reset AI engine to allow re-initialization."""
+    global _ai_engine, _ai_engine_init_attempted
+    _ai_engine = None
+    _ai_engine_init_attempted = False
+    return jsonify({'status': 'reset', 'message': 'AI engine will be reinitialized on next request'})
+@app.route('/api/cache/invalidate')
+def api_cache_invalidate():
+    """Invalidate all caches. Call after DB updates (daily sync, import, etc.)."""
+    invalidate_caches()
+    return jsonify({'status': 'invalidated', 'new_version': _cache_version})
+@app.route('/api/embeddings/reload')
+def api_embeddings_reload():
+    """Reload embeddings from DB (call after daily sync adds new embeddings)."""
+    if not HAS_SEMANTIC_SEARCH:
+        return jsonify({'error': 'Semantic search not available'})
+    try:
+        ss = get_semantic_search()
+        old_count = len(ss.message_ids) if ss.embeddings_loaded else 0
+        ss.reload_embeddings()
+        new_count = len(ss.message_ids)
+        return jsonify({
+            'status': 'reloaded',
+            'previous_count': old_count,
+            'new_count': new_count,
+            'added': new_count - old_count
+        })
+    except Exception as e:
+        return jsonify({'error': str(e)})
+@app.route('/api/ai/search', methods=['POST'])
+def api_ai_search():
+    """AI-powered natural language search."""
+    data = request.get_json()
+    query = data.get('query', '')
+    mode = data.get('mode', 'auto')  # 'auto', 'sql', 'context', or 'semantic'
+    if not query:
+        return jsonify({'error': 'Query required'})
+    # Semantic mode: Use pre-computed embeddings + AI reasoning
+    if mode == 'semantic':
+        if not HAS_SEMANTIC_SEARCH:
+            return jsonify({'error': 'Semantic search not available. Install sentence-transformers.'})
+        try:
+            ss = get_semantic_search()
+            if not ss.is_available():
+                return jsonify({'error': 'embeddings.db not found. Run the Colab notebook first.'})
+            # Get AI engine for reasoning
+            ai_engine = get_ai_engine()
+            if ai_engine:
+                # Semantic search + AI reasoning
+                result = ss.search_with_ai_answer(query, ai_engine, limit=30)
+                return jsonify(result)
+            else:
+                # Just semantic search without AI reasoning
+                results = ss.search_with_full_text(query, limit=30)
+                provider = os.getenv('AI_PROVIDER', 'ollama')
+                api_key_set = bool(os.getenv('AI_API_KEY') or os.getenv(f'{provider.upper()}_API_KEY'))
+                return jsonify({
+                    'query': query,
+                    'mode': 'semantic',
+                    'results': results,
+                    'count': len(results),
+                    'answer': f"נמצאו {len(results)} הודעות דומות סמנטית לשאילתה.\n\n⚠️ AI לא זמין - בדוק שה-API key מוגדר (provider: {provider}, key set: {api_key_set})"
+                })
+        except Exception as e:
+            return jsonify({'error': f'Semantic search error: {str(e)}'})
+    engine = get_ai_engine()
+    if engine is None:
+        # Fallback: Use basic SQL search
+        return fallback_ai_search(query)
+    try:
+        # Context mode: AI reads messages and reasons over them
+        if mode == 'context':
+            result = engine.context_search(query)
+        # SQL mode: Generate SQL and execute
+        elif mode == 'sql':
+            result = engine.search(query, generate_answer=True)
+        # Auto mode: Try SQL first, fall back to context if no results
+        else:
+            result = engine.search(query, generate_answer=True)
+            # If no results or error, try context search
+            if result.get('count', 0) == 0 or 'error' in result:
+                result = engine.context_search(query)
+        return jsonify(result)
+    except Exception as e:
+        return jsonify({'error': str(e), 'query': query})
+def fallback_ai_search(query: str):
+    """Fallback search when AI is not available."""
+    conn = get_db()
+    # Simple keyword extraction and search
+    keywords = [w for w in query.split() if len(w) > 2]
+    if not keywords:
+        return jsonify({'error': 'No valid keywords', 'query': query})
+    # Build FTS query
+    fts_query = ' OR '.join(keywords)
+    try:
+        cursor = conn.execute('''
+            SELECT
+                m.id as message_id, m.date, m.from_name, m.text_plain as text
+            FROM messages_fts
+            JOIN messages m ON messages_fts.rowid = m.id
+            WHERE messages_fts MATCH ?
+            ORDER BY m.date DESC
+            LIMIT 20
+        ''', (fts_query,))
+        results = [dict(row) for row in cursor.fetchall()]
+        conn.close()
+        # Generate simple answer
+        if results:
+            answer = f"נמצאו {len(results)} הודעות עם המילים: {', '.join(keywords)}"
+        else:
+            answer = f"לא נמצאו הודעות עם המילים: {', '.join(keywords)}"
+        return jsonify({
+            'query': query,
+            'sql': f"FTS MATCH: {fts_query}",
+            'results': results,
+            'count': len(results),
+            'answer': answer,
+            'fallback': True
+        })
+    except Exception as e:
+        conn.close()
+        return jsonify({'error': str(e), 'query': query})
+@app.route('/api/ai/thread/<int:message_id>')
+def api_ai_thread(message_id):
+    """Get full thread using AI-powered analysis."""
+    engine = get_ai_engine()
+    if engine is None:
+        # Use basic thread retrieval
+        return api_chat_thread(message_id)
+    try:
+        thread = engine.get_thread(message_id)
+        return jsonify(thread)
+    except Exception as e:
+        return jsonify({'error': str(e)})
+@app.route('/api/ai/similar/<int:message_id>')
+def api_ai_similar(message_id):
+    """Find similar messages."""
+    limit = int(request.args.get('limit', 10))
+    engine = get_ai_engine()
+    if engine is None:
+        return jsonify({'error': 'AI not available'})
+    try:
+        similar = engine.find_similar_messages(message_id, limit)
+        return jsonify(similar)
+    except Exception as e:
+        return jsonify({'error': str(e)})
+# ==========================================
+# API ENDPOINTS - DATABASE UPDATE
+# ==========================================
+@app.route('/api/update', methods=['POST'])
+def api_update_database():
+    """
+    Update database with new JSON data.
+    Disabled in production - updates are done locally via daily_sync.py.
+    """
+    return jsonify({'error': 'Database updates are disabled on this server. Run daily_sync.py locally.'}), 403
+    try:
+        # Check if file was uploaded
+        if 'file' in request.files:
+            file = request.files['file']
+            if file.filename == '':
+                return jsonify({'error': 'No file selected'}), 400
+            # Read and parse JSON
+            try:
+                json_data = json.loads(file.read().decode('utf-8'))
+            except json.JSONDecodeError as e:
+                return jsonify({'error': f'Invalid JSON: {str(e)}'}), 400
+        else:
+            # Try to get JSON from request body
+            json_data = request.get_json()
+            if not json_data:
+                return jsonify({'error': 'No JSON data provided'}), 400
+        # Import and use IncrementalIndexer
+        from indexer import IncrementalIndexer
+        indexer = IncrementalIndexer(DB_PATH)
+        try:
+            stats = indexer.update_from_json_data(json_data, show_progress=False)
+        finally:
+            indexer.close()
+        return jsonify({
+            'success': True,
+            'stats': {
+                'total_in_file': stats['total_in_file'],
+                'new_messages': stats['new_messages'],
+                'duplicates': stats['duplicates'],
+                'entities': stats['entities'],
+                'elapsed_seconds': round(stats['elapsed_seconds'], 2)
+            }
+        })
+    except FileNotFoundError as e:
+        return jsonify({'error': str(e)}), 404
+    except Exception as e:
+        return jsonify({'error': str(e)}), 500
+@app.route('/api/db/stats')
+def api_db_stats():
+    """Get database statistics."""
+    conn = get_db()
+    stats = {}
+    # Total messages
+    cursor = conn.execute('SELECT COUNT(*) FROM messages')
+    stats['total_messages'] = cursor.fetchone()[0]
+    # Total users
+    cursor = conn.execute('SELECT COUNT(DISTINCT from_id) FROM messages WHERE from_id IS NOT NULL')
+    stats['total_users'] = cursor.fetchone()[0]
+    # Date range
+    cursor = conn.execute('SELECT MIN(date), MAX(date) FROM messages')
+    row = cursor.fetchone()
+    stats['first_message'] = row[0]
+    stats['last_message'] = row[1]
+    # Database file size
+    import os
+    if os.path.exists(DB_PATH):
+        stats['db_size_mb'] = round(os.path.getsize(DB_PATH) / (1024 * 1024), 2)
+    conn.close()
+    return jsonify(stats)
+# ==========================================
+# API ENDPOINTS - EXPORT
+# ==========================================
+@app.route('/api/export/users')
+def api_export_users():
+    """Export user data as CSV."""
+    timeframe = request.args.get('timeframe', 'all')
+    start_ts, end_ts = parse_timeframe(timeframe)
+    conn = get_db()
+    cursor = conn.execute('''
+        SELECT
+            from_id,
+            from_name,
+            COUNT(*) as message_count,
+            SUM(LENGTH(text_plain)) as char_count,
+            SUM(has_links) as links,
+            SUM(has_media) as media,
+            MIN(date_unixtime) as first_seen,
+            MAX(date_unixtime) as last_seen
+        FROM messages
+        WHERE date_unixtime BETWEEN ? AND ?
+        AND from_id IS NOT NULL
+        GROUP BY from_id
+        ORDER BY message_count DESC
+    ''', (start_ts, end_ts))
+    output = io.StringIO()
+    writer = csv.writer(output)
+    writer.writerow(['User ID', 'Name', 'Messages', 'Characters', 'Links', 'Media', 'First Seen', 'Last Seen'])
+    for row in cursor.fetchall():
+        writer.writerow([
+            row['from_id'],
+            row['from_name'],
+            row['message_count'],
+            row['char_count'] or 0,
+            row['links'] or 0,
+            row['media'] or 0,
+            datetime.fromtimestamp(row['first_seen']).isoformat() if row['first_seen'] else '',
+            datetime.fromtimestamp(row['last_seen']).isoformat() if row['last_seen'] else ''
+        ])
+    conn.close()
+    output.seek(0)
+    return Response(
+        output.getvalue(),
+        mimetype='text/csv',
+        headers={'Content-Disposition': 'attachment; filename=users_export.csv'}
+    )
+@app.route('/api/export/messages')
+def api_export_messages():
+    """Export messages as CSV."""
+    timeframe = request.args.get('timeframe', 'all')
+    start_ts, end_ts = parse_timeframe(timeframe)
+    limit = int(request.args.get('limit', 10000))
+    conn = get_db()
+    cursor = conn.execute('''
+        SELECT
+            id, date, from_id, from_name, text_plain,
+            has_links, has_media, has_mentions,
+            reply_to_message_id
+        FROM messages
+        WHERE date_unixtime BETWEEN ? AND ?
+        ORDER BY date_unixtime DESC
+        LIMIT ?
+    ''', (start_ts, end_ts, limit))
+    output = io.StringIO()
+    writer = csv.writer(output)
+    writer.writerow(['ID', 'Date', 'User ID', 'User Name', 'Text', 'Has Links', 'Has Media', 'Has Mentions', 'Reply To'])
+    for row in cursor.fetchall():
+        writer.writerow([
+            row['id'],
+            row['date'],
+            row['from_id'],
+            row['from_name'],
+            row['text_plain'][:500] if row['text_plain'] else '',
+            row['has_links'],
+            row['has_media'],
+            row['has_mentions'],
+            row['reply_to_message_id']
+        ])
+    conn.close()
+    output.seek(0)
+    return Response(
+        output.getvalue(),
+        mimetype='text/csv',
+        headers={'Content-Disposition': 'attachment; filename=messages_export.csv'}
+    )
+# ==========================================
+# MAIN
+# ==========================================
+def main():
+    import argparse
+    parser = argparse.ArgumentParser(description='Telegram Analytics Dashboard')
+    parser.add_argument('--db', default=os.environ.get('DB_PATH', 'telegram.db'), help='Database path')
+    parser.add_argument('--port', type=int, default=int(os.environ.get('PORT', 5000)), help='Server port')
+    parser.add_argument('--host', default=os.environ.get('HOST', '127.0.0.1'), help='Server host')
+    parser.add_argument('--debug', action='store_true', help='Debug mode')
+    args = parser.parse_args()
+    global DB_PATH
+    DB_PATH = args.db
+    print(f"""
+╔══════════════════════════════════════════════════════════════╗
+║           TELEGRAM ANALYTICS DASHBOARD                        ║
+╠══════════════════════════════════════════════════════════════╣
+║  Database: {args.db:47} ║
+║  Server:   http://{args.host}:{args.port:<37} ║
+╚══════════════════════════════════════════════════════════════╝
+    """)
+    app.run(host=args.host, port=args.port, debug=args.debug)
+if __name__ == '__main__':
+    main()

data_structures.py ADDED Viewed

	@@ -0,0 +1,773 @@

+#!/usr/bin/env python3
+"""
+Advanced Data Structures for Efficient Search and Traversal
+Includes:
+- Bloom Filter: O(1) "definitely not in set" checks
+- Trie: O(k) prefix search and autocomplete
+- LRU Cache: O(1) cached query results
+- Graph algorithms: DFS, BFS for thread traversal
+"""
+import hashlib
+import math
+from collections import OrderedDict, defaultdict, deque
+from typing import Any, Callable, Generator, Iterator, Optional
+from functools import wraps
+# ============================================
+# BLOOM FILTER
+# ============================================
+class BloomFilter:
+    """
+    Space-efficient probabilistic data structure for set membership testing.
+    - O(k) insert and lookup where k is number of hash functions
+    - False positives possible, false negatives impossible
+    - Use case: Quick "message ID exists?" check before DB query
+    Example:
+        bf = BloomFilter(expected_items=100000, fp_rate=0.01)
+        bf.add("message_123")
+        if "message_123" in bf:  # O(1) check
+            # Might exist, check DB
+        else:
+            # Definitely doesn't exist, skip DB
+    """
+    def __init__(self, expected_items: int = 100000, fp_rate: float = 0.01):
+        """
+        Initialize Bloom filter.
+        Args:
+            expected_items: Expected number of items to store
+            fp_rate: Desired false positive rate (0.01 = 1%)
+        """
+        # Calculate optimal size and hash count
+        self.size = self._optimal_size(expected_items, fp_rate)
+        self.hash_count = self._optimal_hash_count(self.size, expected_items)
+        self.bit_array = bytearray(math.ceil(self.size / 8))
+        self.count = 0
+    @staticmethod
+    def _optimal_size(n: int, p: float) -> int:
+        """Calculate optimal bit array size: m = -n*ln(p) / (ln2)^2"""
+        return int(-n * math.log(p) / (math.log(2) ** 2))
+    @staticmethod
+    def _optimal_hash_count(m: int, n: int) -> int:
+        """Calculate optimal hash count: k = (m/n) * ln2"""
+        return max(1, int((m / n) * math.log(2)))
+    def _get_hash_values(self, item: str) -> Generator[int, None, None]:
+        """Generate k hash values using double hashing technique."""
+        h1 = int(hashlib.md5(item.encode()).hexdigest(), 16)
+        h2 = int(hashlib.sha1(item.encode()).hexdigest(), 16)
+        for i in range(self.hash_count):
+            yield (h1 + i * h2) % self.size
+    def add(self, item: str) -> None:
+        """Add an item to the filter. O(k) where k is hash count."""
+        for pos in self._get_hash_values(item):
+            byte_idx, bit_idx = divmod(pos, 8)
+            self.bit_array[byte_idx] |= (1 << bit_idx)
+        self.count += 1
+    def __contains__(self, item: str) -> bool:
+        """Check if item might be in the filter. O(k)."""
+        for pos in self._get_hash_values(item):
+            byte_idx, bit_idx = divmod(pos, 8)
+            if not (self.bit_array[byte_idx] & (1 << bit_idx)):
+                return False  # Definitely not in set
+        return True  # Might be in set
+    def __len__(self) -> int:
+        return self.count
+    @property
+    def memory_usage(self) -> int:
+        """Return memory usage in bytes."""
+        return len(self.bit_array)
+# ============================================
+# TRIE (PREFIX TREE)
+# ============================================
+class TrieNode:
+    """Node in a Trie data structure."""
+    __slots__ = ['children', 'is_end', 'data', 'count']
+    def __init__(self):
+        self.children: dict[str, TrieNode] = {}
+        self.is_end: bool = False
+        self.data: Any = None  # Store associated data (e.g., message IDs)
+        self.count: int = 0  # Frequency count
+class Trie:
+    """
+    Trie (Prefix Tree) for fast prefix-based search and autocomplete.
+    - O(k) insert/search where k is key length
+    - O(p + n) prefix search where p is prefix length, n is results
+    - Use case: Autocomplete usernames, find all messages starting with prefix
+    Example:
+        trie = Trie()
+        trie.insert("@username1", message_ids=[1, 2, 3])
+        trie.insert("@username2", message_ids=[4, 5])
+        results = trie.search_prefix("@user")  # Returns both
+        completions = trie.autocomplete("@user", limit=5)
+    """
+    def __init__(self):
+        self.root = TrieNode()
+        self.size = 0
+    def insert(self, key: str, data: Any = None) -> None:
+        """Insert a key with optional associated data. O(k)."""
+        node = self.root
+        for char in key.lower():
+            if char not in node.children:
+                node.children[char] = TrieNode()
+            node = node.children[char]
+            node.count += 1
+        if not node.is_end:
+            self.size += 1
+        node.is_end = True
+        # Store or append data
+        if data is not None:
+            if node.data is None:
+                node.data = []
+            if isinstance(data, list):
+                node.data.extend(data)
+            else:
+                node.data.append(data)
+    def search(self, key: str) -> Optional[Any]:
+        """Search for exact key. O(k). Returns associated data or None."""
+        node = self._find_node(key.lower())
+        return node.data if node and node.is_end else None
+    def __contains__(self, key: str) -> bool:
+        """Check if key exists. O(k)."""
+        node = self._find_node(key.lower())
+        return node is not None and node.is_end
+    def _find_node(self, prefix: str) -> Optional[TrieNode]:
+        """Find the node for a given prefix."""
+        node = self.root
+        for char in prefix:
+            if char not in node.children:
+                return None
+            node = node.children[char]
+        return node
+    def search_prefix(self, prefix: str) -> list[tuple[str, Any]]:
+        """
+        Find all keys with given prefix. O(p + n).
+        Returns list of (key, data) tuples.
+        """
+        results = []
+        node = self._find_node(prefix.lower())
+        if node:
+            self._collect_all(node, prefix.lower(), results)
+        return results
+    def _collect_all(
+        self,
+        node: TrieNode,
+        prefix: str,
+        results: list[tuple[str, Any]]
+    ) -> None:
+        """Recursively collect all keys under a node."""
+        if node.is_end:
+            results.append((prefix, node.data))
+        for char, child in node.children.items():
+            self._collect_all(child, prefix + char, results)
+    def autocomplete(self, prefix: str, limit: int = 10) -> list[str]:
+        """
+        Get autocomplete suggestions for prefix.
+        Returns most frequent completions up to limit.
+        """
+        node = self._find_node(prefix.lower())
+        if not node:
+            return []
+        suggestions = []
+        self._collect_suggestions(node, prefix.lower(), suggestions)
+        # Sort by frequency and return top results
+        suggestions.sort(key=lambda x: x[1], reverse=True)
+        return [s[0] for s in suggestions[:limit]]
+    def _collect_suggestions(
+        self,
+        node: TrieNode,
+        prefix: str,
+        suggestions: list[tuple[str, int]]
+    ) -> None:
+        """Collect suggestions with their frequency counts."""
+        if node.is_end:
+            suggestions.append((prefix, node.count))
+        for char, child in node.children.items():
+            self._collect_suggestions(child, prefix + char, suggestions)
+    def __len__(self) -> int:
+        return self.size
+# ============================================
+# LRU CACHE
+# ============================================
+class LRUCache:
+    """
+    Least Recently Used (LRU) Cache for query results.
+    - O(1) get/put operations
+    - Automatically evicts least recently used items when full
+    - Use case: Cache expensive query results
+    Example:
+        cache = LRUCache(maxsize=1000)
+        cache.put("query:hello", results)
+        results = cache.get("query:hello")  # O(1)
+    """
+    def __init__(self, maxsize: int = 1000):
+        self.maxsize = maxsize
+        self.cache: OrderedDict[str, Any] = OrderedDict()
+        self.hits = 0
+        self.misses = 0
+    def get(self, key: str) -> Optional[Any]:
+        """Get item from cache. O(1). Returns None if not found."""
+        if key in self.cache:
+            self.cache.move_to_end(key)
+            self.hits += 1
+            return self.cache[key]
+        self.misses += 1
+        return None
+    def put(self, key: str, value: Any) -> None:
+        """Put item in cache. O(1). Evicts LRU item if full."""
+        if key in self.cache:
+            self.cache.move_to_end(key)
+        else:
+            if len(self.cache) >= self.maxsize:
+                self.cache.popitem(last=False)
+        self.cache[key] = value
+    def __contains__(self, key: str) -> bool:
+        return key in self.cache
+    def __len__(self) -> int:
+        return len(self.cache)
+    def clear(self) -> None:
+        """Clear the cache."""
+        self.cache.clear()
+        self.hits = 0
+        self.misses = 0
+    @property
+    def hit_rate(self) -> float:
+        """Return cache hit rate."""
+        total = self.hits + self.misses
+        return self.hits / total if total > 0 else 0.0
+    @property
+    def stats(self) -> dict:
+        """Return cache statistics."""
+        return {
+            'size': len(self.cache),
+            'maxsize': self.maxsize,
+            'hits': self.hits,
+            'misses': self.misses,
+            'hit_rate': self.hit_rate
+        }
+def lru_cached(cache: LRUCache, key_func: Callable[..., str] = None):
+    """
+    Decorator to cache function results using LRUCache.
+    Example:
+        cache = LRUCache(1000)
+        @lru_cached(cache, key_func=lambda q, **kw: f"search:{q}")
+        def search(query, limit=100):
+            return expensive_search(query, limit)
+    """
+    def decorator(func: Callable) -> Callable:
+        @wraps(func)
+        def wrapper(*args, **kwargs):
+            if key_func:
+                key = key_func(*args, **kwargs)
+            else:
+                key = f"{func.__name__}:{args}:{kwargs}"
+            result = cache.get(key)
+            if result is not None:
+                return result
+            result = func(*args, **kwargs)
+            cache.put(key, result)
+            return result
+        return wrapper
+    return decorator
+# ============================================
+# GRAPH ALGORITHMS FOR REPLY THREADS
+# ============================================
+class ReplyGraph:
+    """
+    Graph structure for message reply relationships.
+    Supports:
+    - DFS: Depth-first traversal for finding all descendants
+    - BFS: Breadth-first traversal for level-order exploration
+    - Connected components: Find isolated conversation threads
+    - Topological sort: Order messages by reply chain
+    Time complexity: O(V + E) for traversals
+    Space complexity: O(V) for visited set
+    """
+    def __init__(self):
+        # Adjacency lists
+        self.children: dict[int, list[int]] = defaultdict(list)  # parent -> [children]
+        self.parents: dict[int, int] = {}  # child -> parent
+        self.nodes: set[int] = set()
+    def add_edge(self, parent_id: int, child_id: int) -> None:
+        """Add a reply relationship. O(1)."""
+        self.children[parent_id].append(child_id)
+        self.parents[child_id] = parent_id
+        self.nodes.add(parent_id)
+        self.nodes.add(child_id)
+    def add_message(self, message_id: int, reply_to: Optional[int] = None) -> None:
+        """Add a message, optionally with its reply relationship."""
+        self.nodes.add(message_id)
+        if reply_to is not None:
+            self.add_edge(reply_to, message_id)
+    def get_children(self, message_id: int) -> list[int]:
+        """Get direct replies to a message. O(1)."""
+        return self.children.get(message_id, [])
+    def get_parent(self, message_id: int) -> Optional[int]:
+        """Get the message this is a reply to. O(1)."""
+        return self.parents.get(message_id)
+    # ==================
+    # DFS - Depth First Search
+    # ==================
+    def dfs_descendants(self, start_id: int) -> list[int]:
+        """
+        DFS: Get all descendants of a message (entire sub-thread).
+        Time: O(V + E)
+        Space: O(V)
+        Returns messages in DFS order (deep before wide).
+        """
+        result = []
+        visited = set()
+        def dfs(node_id: int) -> None:
+            if node_id in visited:
+                return
+            visited.add(node_id)
+            result.append(node_id)
+            for child_id in self.children.get(node_id, []):
+                dfs(child_id)
+        dfs(start_id)
+        return result
+    def dfs_iterative(self, start_id: int) -> Iterator[int]:
+        """
+        Iterative DFS using explicit stack (avoids recursion limit).
+        Yields message IDs in DFS order.
+        """
+        stack = [start_id]
+        visited = set()
+        while stack:
+            node_id = stack.pop()
+            if node_id in visited:
+                continue
+            visited.add(node_id)
+            yield node_id
+            # Add children in reverse order for correct DFS order
+            for child_id in reversed(self.children.get(node_id, [])):
+                if child_id not in visited:
+                    stack.append(child_id)
+    # ==================
+    # BFS - Breadth First Search
+    # ==================
+    def bfs_descendants(self, start_id: int) -> list[int]:
+        """
+        BFS: Get all descendants level by level.
+        Time: O(V + E)
+        Space: O(V)
+        Returns messages in BFS order (level by level).
+        """
+        result = []
+        visited = set()
+        queue = deque([start_id])
+        while queue:
+            node_id = queue.popleft()
+            if node_id in visited:
+                continue
+            visited.add(node_id)
+            result.append(node_id)
+            for child_id in self.children.get(node_id, []):
+                if child_id not in visited:
+                    queue.append(child_id)
+        return result
+    def bfs_with_depth(self, start_id: int) -> list[tuple[int, int]]:
+        """
+        BFS with depth information.
+        Returns list of (message_id, depth) tuples.
+        """
+        result = []
+        visited = set()
+        queue = deque([(start_id, 0)])
+        while queue:
+            node_id, depth = queue.popleft()
+            if node_id in visited:
+                continue
+            visited.add(node_id)
+            result.append((node_id, depth))
+            for child_id in self.children.get(node_id, []):
+                if child_id not in visited:
+                    queue.append((child_id, depth + 1))
+        return result
+    # ==================
+    # THREAD RECONSTRUCTION
+    # ==================
+    def get_thread_root(self, message_id: int) -> int:
+        """
+        Find the root message of a thread. O(d) where d is depth.
+        """
+        current = message_id
+        while current in self.parents:
+            current = self.parents[current]
+        return current
+    def get_full_thread(self, message_id: int) -> list[int]:
+        """
+        Get the complete thread containing a message.
+        1. Find root via parent traversal
+        2. BFS from root to get all descendants
+        """
+        root = self.get_thread_root(message_id)
+        return self.bfs_descendants(root)
+    def get_ancestors(self, message_id: int) -> list[int]:
+        """
+        Get all ancestors (path to root). O(d).
+        Returns in order from message to root.
+        """
+        ancestors = []
+        current = message_id
+        while current in self.parents:
+            parent = self.parents[current]
+            ancestors.append(parent)
+            current = parent
+        return ancestors
+    def get_thread_path(self, message_id: int) -> list[int]:
+        """
+        Get path from root to message. O(d).
+        """
+        path = [message_id]
+        current = message_id
+        while current in self.parents:
+            parent = self.parents[current]
+            path.append(parent)
+            current = parent
+        return list(reversed(path))
+    # ==================
+    # CONNECTED COMPONENTS
+    # ==================
+    def find_connected_components(self) -> list[set[int]]:
+        """
+        Find all isolated conversation threads.
+        Time: O(V + E)
+        Returns list of sets, each set is a connected thread.
+        """
+        visited = set()
+        components = []
+        for node in self.nodes:
+            if node not in visited:
+                component = set()
+                # Use BFS to find all connected nodes
+                queue = deque([node])
+                while queue:
+                    current = queue.popleft()
+                    if current in visited:
+                        continue
+                    visited.add(current)
+                    component.add(current)
+                    # Add parent
+                    if current in self.parents:
+                        parent = self.parents[current]
+                        if parent not in visited:
+                            queue.append(parent)
+                    # Add children
+                    for child in self.children.get(current, []):
+                        if child not in visited:
+                            queue.append(child)
+                components.append(component)
+        return components
+    def get_thread_roots(self) -> list[int]:
+        """Get all thread root messages (messages with no parent)."""
+        return [node for node in self.nodes if node not in self.parents]
+    # ==================
+    # STATISTICS
+    # ==================
+    def get_thread_depth(self, root_id: int) -> int:
+        """Get maximum depth of a thread from root."""
+        max_depth = 0
+        for _, depth in self.bfs_with_depth(root_id):
+            max_depth = max(max_depth, depth)
+        return max_depth
+    def get_subtree_size(self, message_id: int) -> int:
+        """Get number of messages in subtree including root."""
+        return len(self.dfs_descendants(message_id))
+    @property
+    def stats(self) -> dict:
+        """Get graph statistics."""
+        return {
+            'total_nodes': len(self.nodes),
+            'total_edges': sum(len(children) for children in self.children.values()),
+            'root_messages': len(self.get_thread_roots()),
+            'connected_components': len(self.find_connected_components())
+        }
+# ============================================
+# TRIGRAM SIMILARITY
+# ============================================
+def generate_trigrams(text: str) -> set[str]:
+    """
+    Generate trigrams (3-character subsequences) for fuzzy matching.
+    Example: "hello" -> {"hel", "ell", "llo"}
+    """
+    text = text.lower().strip()
+    if len(text) < 3:
+        return {text} if text else set()
+    return {text[i:i+3] for i in range(len(text) - 2)}
+def trigram_similarity(text1: str, text2: str) -> float:
+    """
+    Calculate Jaccard similarity between trigram sets.
+    Returns value between 0 (no similarity) and 1 (identical).
+    """
+    tri1 = generate_trigrams(text1)
+    tri2 = generate_trigrams(text2)
+    if not tri1 or not tri2:
+        return 0.0
+    intersection = len(tri1 & tri2)
+    union = len(tri1 | tri2)
+    return intersection / union if union > 0 else 0.0
+class TrigramIndex:
+    """
+    Inverted index of trigrams for fuzzy search.
+    Time complexity:
+    - Insert: O(k) where k is text length
+    - Search: O(t * m) where t is trigrams in query, m is avg matches
+    Example:
+        index = TrigramIndex()
+        index.add(1, "שלום עולם")
+        index.add(2, "שלום לכולם")
+        results = index.search("שלום", threshold=0.3)
+    """
+    def __init__(self):
+        self.index: dict[str, set[int]] = defaultdict(set)
+        self.texts: dict[int, str] = {}
+    def add(self, doc_id: int, text: str) -> None:
+        """Add a document to the index."""
+        self.texts[doc_id] = text
+        for trigram in generate_trigrams(text):
+            self.index[trigram].add(doc_id)
+    def search(self, query: str, threshold: float = 0.3, limit: int = 100) -> list[tuple[int, float]]:
+        """
+        Search for documents similar to query.
+        Returns list of (doc_id, similarity) tuples, sorted by similarity.
+        """
+        query_trigrams = generate_trigrams(query)
+        if not query_trigrams:
+            return []
+        # Find candidate documents
+        candidates: dict[int, int] = defaultdict(int)
+        for trigram in query_trigrams:
+            for doc_id in self.index.get(trigram, []):
+                candidates[doc_id] += 1
+        # Calculate similarity for candidates
+        results = []
+        query_len = len(query_trigrams)
+        for doc_id, match_count in candidates.items():
+            doc_trigrams = generate_trigrams(self.texts[doc_id])
+            doc_len = len(doc_trigrams)
+            # Jaccard similarity approximation
+            similarity = match_count / (query_len + doc_len - match_count)
+            if similarity >= threshold:
+                results.append((doc_id, similarity))
+        # Sort by similarity descending
+        results.sort(key=lambda x: x[1], reverse=True)
+        return results[:limit]
+    def __len__(self) -> int:
+        return len(self.texts)
+# ============================================
+# INVERTED INDEX
+# ============================================
+class InvertedIndex:
+    """
+    Simple inverted index for fast word-to-document lookup.
+    Time complexity:
+    - Insert: O(w) where w is word count
+    - Search: O(1) for single word
+    - AND/OR queries: O(min(n1, n2)) for set operations
+    """
+    def __init__(self):
+        self.index: dict[str, set[int]] = defaultdict(set)
+        self.doc_count = 0
+    def add(self, doc_id: int, text: str) -> None:
+        """Add document to index."""
+        words = self._tokenize(text)
+        for word in words:
+            self.index[word].add(doc_id)
+        self.doc_count += 1
+    def _tokenize(self, text: str) -> list[str]:
+        """Simple tokenization."""
+        import re
+        return re.findall(r'[\u0590-\u05FFa-zA-Z]+', text.lower())
+    def search(self, word: str) -> set[int]:
+        """Find all documents containing word."""
+        return self.index.get(word.lower(), set())
+    def search_and(self, words: list[str]) -> set[int]:
+        """Find documents containing ALL words."""
+        if not words:
+            return set()
+        result = self.search(words[0])
+        for word in words[1:]:
+            result &= self.search(word)
+        return result
+    def search_or(self, words: list[str]) -> set[int]:
+        """Find documents containing ANY word."""
+        result = set()
+        for word in words:
+            result |= self.search(word)
+        return result
+if __name__ == '__main__':
+    # Demo
+    print("=== Bloom Filter Demo ===")
+    bf = BloomFilter(expected_items=1000, fp_rate=0.01)
+    bf.add("message_1")
+    bf.add("message_2")
+    print(f"message_1 in filter: {'message_1' in bf}")
+    print(f"message_999 in filter: {'message_999' in bf}")
+    print(f"Memory usage: {bf.memory_usage} bytes")
+    print("\n=== Trie Demo ===")
+    trie = Trie()
+    trie.insert("@username1", data=1)
+    trie.insert("@username2", data=2)
+    trie.insert("@user_test", data=3)
+    print(f"Autocomplete '@user': {trie.autocomplete('@user')}")
+    print("\n=== Reply Graph Demo ===")
+    graph = ReplyGraph()
+    graph.add_message(1)
+    graph.add_message(2, reply_to=1)
+    graph.add_message(3, reply_to=1)
+    graph.add_message(4, reply_to=2)
+    graph.add_message(5, reply_to=2)
+    print(f"DFS from 1: {graph.dfs_descendants(1)}")
+    print(f"BFS from 1: {graph.bfs_descendants(1)}")
+    print(f"Thread path for 4: {graph.get_thread_path(4)}")
+    print(f"Stats: {graph.stats}")

indexer.py ADDED Viewed

	@@ -0,0 +1,817 @@

+#!/usr/bin/env python3
+"""
+Telegram JSON Chat Indexer (Optimized)
+Features:
+- Batch processing for faster indexing
+- Graph building for reply threads
+- Trigram index for fuzzy search
+- Progress tracking
+- Memory-efficient streaming
+Usage:
+    python indexer.py <json_file> [--db <database_file>]
+    python indexer.py result.json --db telegram.db
+    python indexer.py result.json --batch-size 5000 --build-trigrams
+"""
+import json
+import sqlite3
+import argparse
+try:
+    import ijson
+    HAS_IJSON = True
+except ImportError:
+    HAS_IJSON = False
+import os
+import time
+from pathlib import Path
+from typing import Any, Generator
+from collections import defaultdict
+from data_structures import BloomFilter, ReplyGraph, generate_trigrams
+def flatten_text(text_field: Any) -> str:
+    """
+    Flatten the text field which can be either a string or array of mixed content.
+    """
+    if isinstance(text_field, str):
+        return text_field
+    if isinstance(text_field, list):
+        parts = []
+        for item in text_field:
+            if isinstance(item, str):
+                parts.append(item)
+            elif isinstance(item, dict) and 'text' in item:
+                parts.append(item['text'])
+        return ''.join(parts)
+    return ''
+def extract_entities(text_entities: list) -> list[dict]:
+    """Extract typed entities (links, mentions, etc.) from text_entities array."""
+    entities = []
+    for entity in text_entities or []:
+        if isinstance(entity, dict):
+            entity_type = entity.get('type', 'plain')
+            if entity_type != 'plain':
+                entities.append({
+                    'type': entity_type,
+                    'value': entity.get('text', '')
+                })
+    return entities
+def parse_message(msg: dict) -> dict | None:
+    """Parse a single message from Telegram JSON format."""
+    if msg.get('type') != 'message':
+        return None
+    text_plain = flatten_text(msg.get('text', ''))
+    entities = extract_entities(msg.get('text_entities', []))
+    has_links = any(e['type'] == 'link' for e in entities)
+    has_mentions = any(e['type'] == 'mention' for e in entities)
+    return {
+        'id': msg.get('id'),
+        'type': msg.get('type', 'message'),
+        'date': msg.get('date'),
+        'date_unixtime': int(msg.get('date_unixtime', 0)) if msg.get('date_unixtime') else 0,
+        'from_name': msg.get('from', ''),
+        'from_id': msg.get('from_id', ''),
+        'reply_to_message_id': msg.get('reply_to_message_id'),
+        'forwarded_from': msg.get('forwarded_from'),
+        'forwarded_from_id': msg.get('forwarded_from_id'),
+        'text_plain': text_plain,
+        'text_length': len(text_plain),
+        'has_media': 1 if msg.get('photo') or msg.get('file') or msg.get('media_type') else 0,
+        'has_photo': 1 if msg.get('photo') else 0,
+        'has_links': 1 if has_links else 0,
+        'has_mentions': 1 if has_mentions else 0,
+        'is_edited': 1 if msg.get('edited') else 0,
+        'edited_unixtime': int(msg.get('edited_unixtime', 0)) if msg.get('edited_unixtime') else None,
+        'photo_file_size': msg.get('photo_file_size'),
+        'photo_width': msg.get('width'),
+        'photo_height': msg.get('height'),
+        'raw_json': json.dumps(msg, ensure_ascii=False),
+        'entities': entities
+    }
+def _detect_json_structure(json_path: str) -> str:
+    """Peek at JSON to determine if root is a list or object with 'messages' key."""
+    with open(json_path, 'r', encoding='utf-8') as f:
+        for char in iter(lambda: f.read(1), ''):
+            if char in ' \t\n\r':
+                continue
+            if char == '[':
+                return 'list'
+            return 'object'
+    return 'object'
+def load_json_messages(json_path: str) -> Generator[dict, None, None]:
+    """
+    Load messages from Telegram export JSON file.
+    Uses ijson for streaming (constant memory) if available,
+    otherwise falls back to full json.load().
+    """
+    if HAS_IJSON:
+        structure = _detect_json_structure(json_path)
+        prefix = 'item' if structure == 'list' else 'messages.item'
+        with open(json_path, 'rb') as f:
+            for msg in ijson.items(f, prefix):
+                parsed = parse_message(msg)
+                if parsed:
+                    yield parsed
+    else:
+        with open(json_path, 'r', encoding='utf-8') as f:
+            data = json.load(f)
+        messages = data if isinstance(data, list) else data.get('messages', [])
+        for msg in messages:
+            parsed = parse_message(msg)
+            if parsed:
+                yield parsed
+def count_messages(json_path: str) -> int:
+    """Count messages in JSON file. Uses streaming if ijson available."""
+    if HAS_IJSON:
+        structure = _detect_json_structure(json_path)
+        prefix = 'item' if structure == 'list' else 'messages.item'
+        count = 0
+        with open(json_path, 'rb') as f:
+            for msg in ijson.items(f, prefix):
+                if msg.get('type') == 'message':
+                    count += 1
+        return count
+    else:
+        with open(json_path, 'r', encoding='utf-8') as f:
+            data = json.load(f)
+        messages = data if isinstance(data, list) else data.get('messages', [])
+        return sum(1 for msg in messages if msg.get('type') == 'message')
+def init_database(db_path: str) -> sqlite3.Connection:
+    """Initialize SQLite database with optimized schema."""
+    conn = sqlite3.connect(db_path)
+    conn.row_factory = sqlite3.Row
+    # Read and execute schema
+    schema_path = Path(__file__).parent / 'schema.sql'
+    if schema_path.exists():
+        with open(schema_path, 'r') as f:
+            conn.executescript(f.read())
+    else:
+        raise FileNotFoundError(f"Schema file not found: {schema_path}")
+    return conn
+class OptimizedIndexer:
+    """
+    High-performance indexer with batch processing and graph building.
+    Features:
+    - Batch inserts (100x faster than individual inserts)
+    - Bloom filter for duplicate detection
+    - Reply graph construction
+    - Trigram index building
+    - Progress tracking
+    """
+    def __init__(
+        self,
+        db_path: str,
+        batch_size: int = 1000,
+        build_trigrams: bool = False,
+        build_graph: bool = True
+    ):
+        self.db_path = db_path
+        self.batch_size = batch_size
+        self.build_trigrams = build_trigrams
+        self.build_graph = build_graph
+        self.conn = init_database(db_path)
+        self.bloom = BloomFilter(expected_items=1000000, fp_rate=0.01)
+        self.graph = ReplyGraph() if build_graph else None
+        # Batch buffers
+        self.message_batch: list[tuple] = []
+        self.entity_batch: list[tuple] = []
+        self.trigram_batch: list[tuple] = []
+        # Stats
+        self.stats = {
+            'messages': 0,
+            'entities': 0,
+            'trigrams': 0,
+            'users': {},
+            'skipped': 0,
+            'duplicates': 0
+        }
+    def index_file(self, json_path: str, show_progress: bool = True) -> dict:
+        """
+        Index a JSON file into the database.
+        Returns statistics dict.
+        """
+        start_time = time.time()
+        # Count total for progress
+        if show_progress:
+            print(f"Counting messages in {json_path}...")
+            total = count_messages(json_path)
+            print(f"Found {total:,} messages to index")
+        else:
+            total = 0
+        # Disable auto-commit for batch processing
+        self.conn.execute('BEGIN TRANSACTION')
+        try:
+            for i, msg in enumerate(load_json_messages(json_path)):
+                self._index_message(msg)
+                # Progress update
+                if show_progress and (i + 1) % 10000 == 0:
+                    elapsed = time.time() - start_time
+                    rate = (i + 1) / elapsed
+                    eta = (total - i - 1) / rate if rate > 0 else 0
+                    print(f"  Indexed {i+1:,}/{total:,} ({100*(i+1)/total:.1f}%) "
+                          f"- {rate:.0f} msg/s - ETA: {eta:.0f}s")
+            # Flush remaining batches
+            self._flush_batches()
+            # Build reply graph in database
+            if self.build_graph:
+                self._build_graph_tables()
+            # Update users table
+            self._update_users()
+            # Commit transaction
+            self.conn.commit()
+            # Optimize FTS index
+            print("Optimizing FTS index...")
+            self.conn.execute("INSERT INTO messages_fts(messages_fts) VALUES('optimize')")
+            self.conn.commit()
+        except Exception as e:
+            self.conn.rollback()
+            raise e
+        elapsed = time.time() - start_time
+        self.stats['elapsed_seconds'] = elapsed
+        self.stats['messages_per_second'] = self.stats['messages'] / elapsed if elapsed > 0 else 0
+        return self.stats
+    def _index_message(self, msg: dict) -> None:
+        """Index a single message into batch buffers."""
+        msg_id = msg['id']
+        # Duplicate check with Bloom filter
+        msg_key = f"msg_{msg_id}"
+        if msg_key in self.bloom:
+            self.stats['duplicates'] += 1
+            return
+        self.bloom.add(msg_key)
+        # Add to message batch
+        self.message_batch.append((
+            msg['id'], msg['type'], msg['date'], msg['date_unixtime'],
+            msg['from_name'], msg['from_id'], msg['reply_to_message_id'],
+            msg['forwarded_from'], msg['forwarded_from_id'], msg['text_plain'],
+            msg['text_length'], msg['has_media'], msg['has_photo'],
+            msg['has_links'], msg['has_mentions'], msg['is_edited'],
+            msg['edited_unixtime'], msg['photo_file_size'],
+            msg['photo_width'], msg['photo_height'], msg['raw_json']
+        ))
+        # Add entities to batch
+        for entity in msg['entities']:
+            self.entity_batch.append((msg_id, entity['type'], entity['value']))
+        # Add trigrams if enabled
+        if self.build_trigrams and msg['text_plain']:
+            for i, trigram in enumerate(generate_trigrams(msg['text_plain'])):
+                self.trigram_batch.append((trigram, msg_id, i))
+        # Build graph
+        if self.graph:
+            self.graph.add_message(msg_id, msg['reply_to_message_id'])
+        # Track users
+        user_id = msg['from_id']
+        if user_id:
+            if user_id not in self.stats['users']:
+                self.stats['users'][user_id] = {
+                    'display_name': msg['from_name'],
+                    'first_seen': msg['date_unixtime'],
+                    'last_seen': msg['date_unixtime'],
+                    'count': 0
+                }
+            self.stats['users'][user_id]['count'] += 1
+            ts = msg['date_unixtime']
+            if ts and ts < self.stats['users'][user_id]['first_seen']:
+                self.stats['users'][user_id]['first_seen'] = ts
+            if ts and ts > self.stats['users'][user_id]['last_seen']:
+                self.stats['users'][user_id]['last_seen'] = ts
+        self.stats['messages'] += 1
+        # Flush if batch is full
+        if len(self.message_batch) >= self.batch_size:
+            self._flush_batches()
+    def _flush_batches(self) -> None:
+        """Flush all batch buffers to database."""
+        cursor = self.conn.cursor()
+        # Insert messages
+        if self.message_batch:
+            cursor.executemany('''
+                INSERT OR REPLACE INTO messages (
+                    id, type, date, date_unixtime, from_name, from_id,
+                    reply_to_message_id, forwarded_from, forwarded_from_id,
+                    text_plain, text_length, has_media, has_photo, has_links,
+                    has_mentions, is_edited, edited_unixtime, photo_file_size,
+                    photo_width, photo_height, raw_json
+                ) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
+            ''', self.message_batch)
+            self.message_batch = []
+        # Insert entities
+        if self.entity_batch:
+            cursor.executemany('''
+                INSERT INTO entities (message_id, type, value)
+                VALUES (?, ?, ?)
+            ''', self.entity_batch)
+            self.stats['entities'] += len(self.entity_batch)
+            self.entity_batch = []
+        # Insert trigrams
+        if self.trigram_batch:
+            cursor.executemany('''
+                INSERT OR IGNORE INTO trigrams (trigram, message_id, position)
+                VALUES (?, ?, ?)
+            ''', self.trigram_batch)
+            self.stats['trigrams'] += len(self.trigram_batch)
+            self.trigram_batch = []
+    def _build_graph_tables(self) -> None:
+        """Build reply graph tables from in-memory graph."""
+        if not self.graph:
+            return
+        print("Building reply graph tables...")
+        cursor = self.conn.cursor()
+        # Insert edges into reply_graph
+        edges = []
+        for parent_id, children in self.graph.children.items():
+            for child_id in children:
+                edges.append((parent_id, child_id, 1))
+        if edges:
+            cursor.executemany('''
+                INSERT OR IGNORE INTO reply_graph (parent_id, child_id, depth)
+                VALUES (?, ?, ?)
+            ''', edges)
+        # Find connected components (threads)
+        print("Finding conversation threads...")
+        components = self.graph.find_connected_components()
+        thread_data = []
+        message_thread_data = []
+        for thread_id, component in enumerate(components):
+            if not component:
+                continue
+            # Find root (message with no parent in this component)
+            root_id = None
+            for msg_id in component:
+                if msg_id not in self.graph.parents:
+                    root_id = msg_id
+                    break
+            if root_id is None:
+                root_id = min(component)
+            # Get thread stats
+            cursor.execute('''
+                SELECT MIN(date_unixtime), MAX(date_unixtime), COUNT(DISTINCT from_id)
+                FROM messages WHERE id IN ({})
+            '''.format(','.join('?' * len(component))), list(component))
+            row = cursor.fetchone()
+            thread_data.append((
+                root_id,
+                len(component),
+                row[0],  # first_message_time
+                row[1],  # last_message_time
+                row[2]   # participant_count
+            ))
+            # Map messages to threads with depth
+            for msg_id in component:
+                depth = len(self.graph.get_ancestors(msg_id))
+                message_thread_data.append((msg_id, len(thread_data), depth))
+        # Insert thread data
+        cursor.executemany('''
+            INSERT INTO threads (root_message_id, message_count, first_message_time,
+                                last_message_time, participant_count)
+            VALUES (?, ?, ?, ?, ?)
+        ''', thread_data)
+        cursor.executemany('''
+            INSERT OR REPLACE INTO message_threads (message_id, thread_id, depth)
+            VALUES (?, ?, ?)
+        ''', message_thread_data)
+        print(f"  Created {len(thread_data)} conversation threads")
+    def _update_users(self) -> None:
+        """Update users table from tracked data."""
+        cursor = self.conn.cursor()
+        user_data = [
+            (user_id, data['display_name'], data['first_seen'],
+             data['last_seen'], data['count'])
+            for user_id, data in self.stats['users'].items()
+        ]
+        cursor.executemany('''
+            INSERT OR REPLACE INTO users (user_id, display_name, first_seen, last_seen, message_count)
+            VALUES (?, ?, ?, ?, ?)
+        ''', user_data)
+    def close(self) -> None:
+        """Close database connection."""
+        self.conn.close()
+class IncrementalIndexer:
+    """
+    Incremental indexer for adding new JSON data to existing database.
+    Features:
+    - Loads existing message IDs into Bloom filter
+    - Only processes new messages
+    - Updates FTS index automatically
+    - Fast duplicate detection O(1)
+    """
+    def __init__(self, db_path: str, batch_size: int = 1000):
+        self.db_path = db_path
+        self.batch_size = batch_size
+        if not os.path.exists(db_path):
+            raise FileNotFoundError(f"Database not found: {db_path}. Use OptimizedIndexer for initial import.")
+        self.conn = sqlite3.connect(db_path)
+        self.conn.row_factory = sqlite3.Row
+        # Batch buffers
+        self.message_batch: list[tuple] = []
+        self.entity_batch: list[tuple] = []
+        # Stats (must be initialized before _load_existing_ids)
+        self.stats = {
+            'total_in_file': 0,
+            'new_messages': 0,
+            'duplicates': 0,
+            'entities': 0,
+            'users_updated': 0
+        }
+        # Load existing message IDs into Bloom filter
+        self.bloom = BloomFilter(expected_items=2000000, fp_rate=0.001)
+        self._load_existing_ids()
+    def _load_existing_ids(self) -> None:
+        """Load existing message IDs into Bloom filter for O(1) duplicate detection."""
+        cursor = self.conn.cursor()
+        cursor.execute("SELECT id FROM messages")
+        count = 0
+        for row in cursor:
+            self.bloom.add(f"msg_{row[0]}")
+            count += 1
+        print(f"Loaded {count:,} existing message IDs into Bloom filter")
+        self.stats['existing_count'] = count
+    def update_from_json(self, json_path: str, show_progress: bool = True) -> dict:
+        """
+        Add new messages from JSON file to existing database.
+        Only messages that don't exist in the database will be added.
+        FTS5 index is updated automatically.
+        Uses streaming JSON parser (ijson) when available for constant memory usage.
+        """
+        start_time = time.time()
+        # Count total for progress (streaming-aware)
+        total_hint = 0
+        if show_progress:
+            total_hint = count_messages(json_path)
+            print(f"Processing ~{total_hint:,} messages from {json_path}")
+        self.stats['total_in_file'] = total_hint
+        # Start transaction
+        self.conn.execute('BEGIN TRANSACTION')
+        try:
+            if HAS_IJSON:
+                structure = _detect_json_structure(json_path)
+                prefix = 'item' if structure == 'list' else 'messages.item'
+                with open(json_path, 'rb') as f:
+                    for i, msg in enumerate(ijson.items(f, prefix)):
+                        if msg.get('type') != 'message':
+                            continue
+                        parsed = parse_message(msg)
+                        if parsed:
+                            self._process_message(parsed)
+                        if show_progress and (i + 1) % 10000 == 0:
+                            print(f"  Processed {i+1:,} - "
+                                  f"New: {self.stats['new_messages']:,}, "
+                                  f"Duplicates: {self.stats['duplicates']:,}")
+            else:
+                with open(json_path, 'r', encoding='utf-8') as f:
+                    data = json.load(f)
+                messages = data if isinstance(data, list) else data.get('messages', [])
+                self.stats['total_in_file'] = len(messages)
+                for i, msg in enumerate(messages):
+                    if msg.get('type') != 'message':
+                        continue
+                    parsed = parse_message(msg)
+                    if parsed:
+                        self._process_message(parsed)
+                    if show_progress and (i + 1) % 10000 == 0:
+                        print(f"  Processed {i+1:,}/{len(messages):,} - "
+                              f"New: {self.stats['new_messages']:,}, "
+                              f"Duplicates: {self.stats['duplicates']:,}")
+            # Flush remaining
+            self._flush_batches()
+            # Update user stats
+            self._update_user_stats()
+            # Commit
+            self.conn.commit()
+            # Optimize FTS if we added new data
+            if self.stats['new_messages'] > 0:
+                print("Optimizing FTS index...")
+                self.conn.execute("INSERT INTO messages_fts(messages_fts) VALUES('optimize')")
+                self.conn.commit()
+        except Exception as e:
+            self.conn.rollback()
+            raise e
+        elapsed = time.time() - start_time
+        self.stats['elapsed_seconds'] = elapsed
+        return self.stats
+    def update_from_json_data(self, json_data: dict | list, show_progress: bool = False) -> dict:
+        """
+        Add new messages from JSON data (already parsed, not from file).
+        Useful for API uploads.
+        """
+        start_time = time.time()
+        messages = json_data if isinstance(json_data, list) else json_data.get('messages', [])
+        self.stats['total_in_file'] = len(messages)
+        # Start transaction
+        self.conn.execute('BEGIN TRANSACTION')
+        try:
+            for msg in messages:
+                if msg.get('type') != 'message':
+                    continue
+                parsed = parse_message(msg)
+                if parsed:
+                    self._process_message(parsed)
+            # Flush remaining
+            self._flush_batches()
+            # Update user stats
+            self._update_user_stats()
+            # Commit
+            self.conn.commit()
+            # Optimize FTS if we added new data
+            if self.stats['new_messages'] > 0:
+                self.conn.execute("INSERT INTO messages_fts(messages_fts) VALUES('optimize')")
+                self.conn.commit()
+        except Exception as e:
+            self.conn.rollback()
+            raise e
+        elapsed = time.time() - start_time
+        self.stats['elapsed_seconds'] = elapsed
+        return self.stats
+    def _process_message(self, msg: dict) -> None:
+        """Process a single message, adding to batch if new."""
+        msg_id = msg['id']
+        msg_key = f"msg_{msg_id}"
+        # Check if already exists (Bloom filter first, then DB if needed)
+        if msg_key in self.bloom:
+            self.stats['duplicates'] += 1
+            return
+        # Add to Bloom filter
+        self.bloom.add(msg_key)
+        # Add to message batch
+        self.message_batch.append((
+            msg['id'], msg['type'], msg['date'], msg['date_unixtime'],
+            msg['from_name'], msg['from_id'], msg['reply_to_message_id'],
+            msg['forwarded_from'], msg['forwarded_from_id'], msg['text_plain'],
+            msg['text_length'], msg['has_media'], msg['has_photo'],
+            msg['has_links'], msg['has_mentions'], msg['is_edited'],
+            msg['edited_unixtime'], msg['photo_file_size'],
+            msg['photo_width'], msg['photo_height'], msg['raw_json']
+        ))
+        # Add entities to batch
+        for entity in msg['entities']:
+            self.entity_batch.append((msg_id, entity['type'], entity['value']))
+        self.stats['new_messages'] += 1
+        # Flush if batch is full
+        if len(self.message_batch) >= self.batch_size:
+            self._flush_batches()
+    def _flush_batches(self) -> None:
+        """Flush batch buffers to database."""
+        cursor = self.conn.cursor()
+        # Insert messages (FTS5 trigger will update automatically)
+        if self.message_batch:
+            cursor.executemany('''
+                INSERT OR IGNORE INTO messages (
+                    id, type, date, date_unixtime, from_name, from_id,
+                    reply_to_message_id, forwarded_from, forwarded_from_id,
+                    text_plain, text_length, has_media, has_photo, has_links,
+                    has_mentions, is_edited, edited_unixtime, photo_file_size,
+                    photo_width, photo_height, raw_json
+                ) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
+            ''', self.message_batch)
+            self.message_batch = []
+        # Insert entities
+        if self.entity_batch:
+            cursor.executemany('''
+                INSERT OR IGNORE INTO entities (message_id, type, value)
+                VALUES (?, ?, ?)
+            ''', self.entity_batch)
+            self.stats['entities'] += len(self.entity_batch)
+            self.entity_batch = []
+    def _update_user_stats(self) -> None:
+        """Update users table with aggregated stats."""
+        cursor = self.conn.cursor()
+        # Upsert users from messages
+        cursor.execute('''
+            INSERT OR REPLACE INTO users (user_id, display_name, first_seen, last_seen, message_count)
+            SELECT
+                from_id,
+                from_name,
+                MIN(date_unixtime),
+                MAX(date_unixtime),
+                COUNT(*)
+            FROM messages
+            WHERE from_id IS NOT NULL AND from_id != ''
+            GROUP BY from_id
+        ''')
+        self.stats['users_updated'] = cursor.rowcount
+    def close(self) -> None:
+        """Close database connection."""
+        self.conn.close()
+def update_database(db_path: str, json_path: str) -> dict:
+    """
+    Convenience function to update database with new JSON file.
+    Args:
+        db_path: Path to existing SQLite database
+        json_path: Path to new JSON file
+    Returns:
+        Statistics dict
+    """
+    indexer = IncrementalIndexer(db_path)
+    try:
+        stats = indexer.update_from_json(json_path)
+        return stats
+    finally:
+        indexer.close()
+def main():
+    parser = argparse.ArgumentParser(description='Index Telegram JSON export to SQLite (Optimized)')
+    parser.add_argument('json_file', help='Path to Telegram export JSON file')
+    parser.add_argument('--db', default='telegram.db', help='SQLite database path')
+    parser.add_argument('--batch-size', type=int, default=1000, help='Batch size for inserts')
+    parser.add_argument('--build-trigrams', action='store_true', help='Build trigram index for fuzzy search')
+    parser.add_argument('--no-graph', action='store_true', help='Skip building reply graph')
+    parser.add_argument('--quiet', action='store_true', help='Suppress progress output')
+    parser.add_argument('--update', action='store_true',
+                       help='Update existing database (add only new messages)')
+    args = parser.parse_args()
+    if not os.path.exists(args.json_file):
+        print(f"Error: JSON file not found: {args.json_file}")
+        return 1
+    # Update mode: add new messages to existing database
+    if args.update:
+        if not os.path.exists(args.db):
+            print(f"Error: Database not found: {args.db}")
+            print("Use without --update flag for initial import")
+            return 1
+        print(f"{'='*50}")
+        print(f"INCREMENTAL UPDATE MODE")
+        print(f"{'='*50}")
+        print(f"Database: {args.db}")
+        print(f"New JSON: {args.json_file}")
+        print()
+        indexer = IncrementalIndexer(args.db, args.batch_size)
+        stats = indexer.update_from_json(args.json_file, show_progress=not args.quiet)
+        print(f"\n{'='*50}")
+        print(f"Update complete!")
+        print(f"{'='*50}")
+        print(f"  Messages in file:    {stats['total_in_file']:,}")
+        print(f"  Already existed:     {stats['duplicates']:,}")
+        print(f"  New messages added:  {stats['new_messages']:,}")
+        print(f"  New entities:        {stats['entities']:,}")
+        print(f"  Time elapsed:        {stats['elapsed_seconds']:.1f}s")
+        indexer.close()
+        return 0
+    # Initial import mode
+    print(f"Initializing database: {args.db}")
+    indexer = OptimizedIndexer(
+        db_path=args.db,
+        batch_size=args.batch_size,
+        build_trigrams=args.build_trigrams,
+        build_graph=not args.no_graph
+    )
+    print(f"Indexing: {args.json_file}")
+    stats = indexer.index_file(args.json_file, show_progress=not args.quiet)
+    print(f"\n{'='*50}")
+    print(f"Indexing complete!")
+    print(f"{'='*50}")
+    print(f"  Messages indexed:    {stats['messages']:,}")
+    print(f"  Entities extracted:  {stats['entities']:,}")
+    print(f"  Unique users:        {len(stats['users']):,}")
+    print(f"  Duplicates skipped:  {stats['duplicates']:,}")
+    if stats.get('trigrams'):
+        print(f"  Trigrams indexed:    {stats['trigrams']:,}")
+    print(f"  Time elapsed:        {stats['elapsed_seconds']:.1f}s")
+    print(f"  Speed:               {stats['messages_per_second']:.0f} msg/s")
+    print(f"\nDatabase saved to: {args.db}")
+    indexer.close()
+    return 0
+if __name__ == '__main__':
+    exit(main())

requirements.txt ADDED Viewed

	@@ -0,0 +1,4 @@

+flask>=3.0
+gunicorn>=21.2
+requests>=2.31
+ijson>=3.2

schema.sql ADDED Viewed

	@@ -0,0 +1,200 @@

+-- Telegram Chat Indexing Schema (Optimized)
+-- SQLite with FTS5 for full-text search + performance optimizations
+-- ============================================
+-- PRAGMA OPTIMIZATIONS
+-- ============================================
+PRAGMA journal_mode = WAL;           -- Write-Ahead Logging for better concurrency
+PRAGMA synchronous = NORMAL;         -- Balance between safety and speed
+PRAGMA cache_size = -64000;          -- 64MB cache
+PRAGMA temp_store = MEMORY;          -- Store temp tables in memory
+PRAGMA mmap_size = 268435456;        -- 256MB memory-mapped I/O
+-- ============================================
+-- MAIN TABLES
+-- ============================================
+-- Main messages table
+CREATE TABLE IF NOT EXISTS messages (
+    id INTEGER PRIMARY KEY,
+    type TEXT DEFAULT 'message',
+    date TEXT,
+    date_unixtime INTEGER NOT NULL,
+    from_name TEXT,
+    from_id TEXT NOT NULL,
+    reply_to_message_id INTEGER,
+    forwarded_from TEXT,
+    forwarded_from_id TEXT,
+    text_plain TEXT,
+    text_length INTEGER DEFAULT 0,
+    has_media INTEGER DEFAULT 0,
+    has_photo INTEGER DEFAULT 0,
+    has_links INTEGER DEFAULT 0,
+    has_mentions INTEGER DEFAULT 0,
+    is_edited INTEGER DEFAULT 0,
+    edited_unixtime INTEGER,
+    photo_file_size INTEGER,
+    photo_width INTEGER,
+    photo_height INTEGER,
+    raw_json TEXT
+);
+-- Users table (extracted from messages)
+CREATE TABLE IF NOT EXISTS users (
+    user_id TEXT PRIMARY KEY,
+    display_name TEXT,
+    first_seen INTEGER,
+    last_seen INTEGER,
+    message_count INTEGER DEFAULT 0
+);
+-- Entities table (links, mentions, etc.)
+CREATE TABLE IF NOT EXISTS entities (
+    id INTEGER PRIMARY KEY AUTOINCREMENT,
+    message_id INTEGER NOT NULL,
+    type TEXT NOT NULL,
+    value TEXT NOT NULL,
+    FOREIGN KEY (message_id) REFERENCES messages(id) ON DELETE CASCADE
+);
+-- ============================================
+-- GRAPH STRUCTURE FOR REPLY THREADS
+-- ============================================
+-- Pre-computed reply graph edges for fast traversal
+CREATE TABLE IF NOT EXISTS reply_graph (
+    parent_id INTEGER NOT NULL,
+    child_id INTEGER NOT NULL,
+    depth INTEGER DEFAULT 1,
+    PRIMARY KEY (parent_id, child_id)
+);
+-- Conversation threads (connected components)
+CREATE TABLE IF NOT EXISTS threads (
+    thread_id INTEGER PRIMARY KEY AUTOINCREMENT,
+    root_message_id INTEGER UNIQUE,
+    message_count INTEGER DEFAULT 0,
+    first_message_time INTEGER,
+    last_message_time INTEGER,
+    participant_count INTEGER DEFAULT 0
+);
+-- Message to thread mapping
+CREATE TABLE IF NOT EXISTS message_threads (
+    message_id INTEGER PRIMARY KEY,
+    thread_id INTEGER NOT NULL,
+    depth INTEGER DEFAULT 0,
+    FOREIGN KEY (thread_id) REFERENCES threads(thread_id)
+);
+-- ============================================
+-- TRIGRAM INDEX FOR FUZZY SEARCH
+-- ============================================
+-- Trigrams for fuzzy/approximate string matching
+CREATE TABLE IF NOT EXISTS trigrams (
+    trigram TEXT NOT NULL,
+    message_id INTEGER NOT NULL,
+    position INTEGER NOT NULL,
+    PRIMARY KEY (trigram, message_id, position)
+);
+-- ============================================
+-- FTS5 FULL-TEXT SEARCH (OPTIMIZED)
+-- ============================================
+-- Full-text search with prefix index for autocomplete
+CREATE VIRTUAL TABLE IF NOT EXISTS messages_fts USING fts5(
+    text_plain,
+    from_name,
+    content='messages',
+    content_rowid='id',
+    tokenize='unicode61 remove_diacritics 2',
+    prefix='2 3 4'  -- Enable prefix queries for autocomplete
+);
+-- Triggers to keep FTS in sync
+CREATE TRIGGER IF NOT EXISTS messages_ai AFTER INSERT ON messages BEGIN
+    INSERT INTO messages_fts(rowid, text_plain, from_name)
+    VALUES (new.id, new.text_plain, new.from_name);
+END;
+CREATE TRIGGER IF NOT EXISTS messages_ad AFTER DELETE ON messages BEGIN
+    INSERT INTO messages_fts(messages_fts, rowid, text_plain, from_name)
+    VALUES ('delete', old.id, old.text_plain, old.from_name);
+END;
+CREATE TRIGGER IF NOT EXISTS messages_au AFTER UPDATE ON messages BEGIN
+    INSERT INTO messages_fts(messages_fts, rowid, text_plain, from_name)
+    VALUES ('delete', old.id, old.text_plain, old.from_name);
+    INSERT INTO messages_fts(rowid, text_plain, from_name)
+    VALUES (new.id, new.text_plain, new.from_name);
+END;
+-- ============================================
+-- OPTIMIZED INDEXES
+-- ============================================
+-- Composite indexes for common query patterns
+CREATE INDEX IF NOT EXISTS idx_messages_date ON messages(date_unixtime);
+CREATE INDEX IF NOT EXISTS idx_messages_from ON messages(from_id);
+CREATE INDEX IF NOT EXISTS idx_messages_from_date ON messages(from_id, date_unixtime);
+CREATE INDEX IF NOT EXISTS idx_messages_reply ON messages(reply_to_message_id) WHERE reply_to_message_id IS NOT NULL;
+CREATE INDEX IF NOT EXISTS idx_messages_forwarded ON messages(forwarded_from_id) WHERE forwarded_from_id IS NOT NULL;
+CREATE INDEX IF NOT EXISTS idx_messages_has_links ON messages(has_links) WHERE has_links = 1;
+CREATE INDEX IF NOT EXISTS idx_messages_has_media ON messages(has_media) WHERE has_media = 1;
+-- Entity indexes
+CREATE INDEX IF NOT EXISTS idx_entities_message ON entities(message_id);
+CREATE INDEX IF NOT EXISTS idx_entities_type_value ON entities(type, value);
+CREATE INDEX IF NOT EXISTS idx_entities_value ON entities(value);
+-- Graph indexes
+CREATE INDEX IF NOT EXISTS idx_reply_graph_child ON reply_graph(child_id);
+CREATE INDEX IF NOT EXISTS idx_message_threads_thread ON message_threads(thread_id);
+-- Trigram index
+CREATE INDEX IF NOT EXISTS idx_trigrams_trigram ON trigrams(trigram);
+-- ============================================
+-- PARTICIPANTS TABLE (from Telethon API)
+-- ============================================
+CREATE TABLE IF NOT EXISTS participants (
+    user_id TEXT PRIMARY KEY,
+    first_name TEXT,
+    last_name TEXT,
+    username TEXT,
+    phone TEXT,
+    is_bot INTEGER DEFAULT 0,
+    is_admin INTEGER DEFAULT 0,
+    is_creator INTEGER DEFAULT 0,
+    is_premium INTEGER DEFAULT 0,
+    join_date INTEGER,
+    last_status TEXT DEFAULT 'unknown',
+    last_online INTEGER,
+    about TEXT,
+    updated_at INTEGER
+);
+-- ============================================
+-- STATISTICS TABLE FOR FAST AGGREGATIONS
+-- ============================================
+CREATE TABLE IF NOT EXISTS stats_cache (
+    key TEXT PRIMARY KEY,
+    value TEXT,
+    updated_at INTEGER
+);
+-- ============================================
+-- VECTOR EMBEDDINGS TABLE (OPTIONAL)
+-- ============================================
+-- For semantic search with FAISS
+CREATE TABLE IF NOT EXISTS embeddings (
+    message_id INTEGER PRIMARY KEY,
+    embedding BLOB,  -- Serialized numpy array
+    model_name TEXT DEFAULT 'default',
+    FOREIGN KEY (message_id) REFERENCES messages(id) ON DELETE CASCADE
+);

search.py ADDED Viewed

	@@ -0,0 +1,564 @@

+#!/usr/bin/env python3
+"""
+Telegram Chat Search Utilities (Optimized)
+Features:
+- Full-text search with BM25 ranking
+- LRU caching for repeated queries
+- Fuzzy search with trigram similarity
+- Thread traversal with DFS/BFS
+- Autocomplete suggestions
+Usage:
+    python search.py <query> [options]
+    python search.py "שלום" --db telegram.db
+    python search.py "link" --user user123 --fuzzy
+"""
+import sqlite3
+import argparse
+from datetime import datetime
+from typing import Optional
+from functools import lru_cache
+from data_structures import LRUCache, Trie, TrigramIndex, ReplyGraph, lru_cached
+class TelegramSearch:
+    """
+    High-performance search interface for indexed Telegram messages.
+    Features:
+    - Full-text search with FTS5 and BM25 ranking
+    - Query result caching (LRU)
+    - Fuzzy/approximate search with trigrams
+    - Thread reconstruction with graph traversal
+    - Autocomplete for usernames and common terms
+    """
+    def __init__(self, db_path: str = 'telegram.db', cache_size: int = 1000):
+        self.db_path = db_path
+        self.conn = sqlite3.connect(db_path)
+        self.conn.row_factory = sqlite3.Row
+        # Initialize caches
+        self.query_cache = LRUCache(maxsize=cache_size)
+        self.user_trie: Optional[Trie] = None
+        self.trigram_index: Optional[TrigramIndex] = None
+        self.reply_graph: Optional[ReplyGraph] = None
+    def close(self):
+        self.conn.close()
+    def __enter__(self):
+        return self
+    def __exit__(self, *args):
+        self.close()
+    # ==========================================
+    # FULL-TEXT SEARCH
+    # ==========================================
+    def search(
+        self,
+        query: str,
+        user_id: Optional[str] = None,
+        from_date: Optional[int] = None,
+        to_date: Optional[int] = None,
+        has_links: Optional[bool] = None,
+        has_mentions: Optional[bool] = None,
+        has_media: Optional[bool] = None,
+        limit: int = 100,
+        offset: int = 0,
+        use_cache: bool = True
+    ) -> list[dict]:
+        """
+        Full-text search with BM25 ranking and optional filters.
+        Args:
+            query: FTS5 query (supports AND, OR, NOT, "phrase", prefix*)
+            user_id: Filter by user ID
+            from_date: Unix timestamp lower bound
+            to_date: Unix timestamp upper bound
+            has_links/has_mentions/has_media: Boolean filters
+            limit: Max results
+            offset: Pagination offset
+            use_cache: Whether to use LRU cache
+        Returns:
+            List of message dicts with relevance scores
+        """
+        # Build cache key
+        cache_key = f"search:{query}:{user_id}:{from_date}:{to_date}:{has_links}:{has_mentions}:{has_media}:{limit}:{offset}"
+        if use_cache:
+            cached = self.query_cache.get(cache_key)
+            if cached is not None:
+                return cached
+        # Build query conditions
+        conditions = []
+        params = []
+        if user_id:
+            conditions.append("m.from_id = ?")
+            params.append(user_id)
+        if from_date:
+            conditions.append("m.date_unixtime >= ?")
+            params.append(from_date)
+        if to_date:
+            conditions.append("m.date_unixtime <= ?")
+            params.append(to_date)
+        if has_links is not None:
+            conditions.append("m.has_links = ?")
+            params.append(1 if has_links else 0)
+        if has_mentions is not None:
+            conditions.append("m.has_mentions = ?")
+            params.append(1 if has_mentions else 0)
+        if has_media is not None:
+            conditions.append("m.has_media = ?")
+            params.append(1 if has_media else 0)
+        where_clause = " AND ".join(conditions) if conditions else "1=1"
+        sql = f'''
+            SELECT
+                m.id,
+                m.date,
+                m.date_unixtime,
+                m.from_name,
+                m.from_id,
+                m.text_plain,
+                m.reply_to_message_id,
+                m.forwarded_from,
+                m.has_links,
+                m.has_mentions,
+                m.has_media,
+                bm25(messages_fts, 1.0, 0.5) as relevance
+            FROM messages_fts
+            JOIN messages m ON messages_fts.rowid = m.id
+            WHERE messages_fts MATCH ?
+            AND {where_clause}
+            ORDER BY relevance
+            LIMIT ? OFFSET ?
+        '''
+        params = [query] + params + [limit, offset]
+        cursor = self.conn.execute(sql, params)
+        results = [dict(row) for row in cursor.fetchall()]
+        if use_cache:
+            self.query_cache.put(cache_key, results)
+        return results
+    def search_prefix(self, prefix: str, limit: int = 100) -> list[dict]:
+        """
+        Search using prefix matching (autocomplete-style).
+        Uses FTS5 prefix index for fast prefix queries.
+        """
+        # FTS5 prefix search syntax
+        query = f'{prefix}*'
+        return self.search(query, limit=limit, use_cache=True)
+    # ==========================================
+    # FUZZY SEARCH
+    # ==========================================
+    def fuzzy_search(
+        self,
+        query: str,
+        threshold: float = 0.3,
+        limit: int = 50
+    ) -> list[dict]:
+        """
+        Fuzzy search using trigram similarity.
+        Finds messages even with typos or slight variations.
+        Args:
+            query: Search query
+            threshold: Minimum similarity (0-1)
+            limit: Max results
+        Returns:
+            List of (message, similarity) tuples
+        """
+        # Build trigram index if not exists
+        if self.trigram_index is None:
+            self._build_trigram_index()
+        # Search trigram index
+        matches = self.trigram_index.search(query, threshold=threshold, limit=limit)
+        # Fetch full messages
+        results = []
+        for msg_id, similarity in matches:
+            cursor = self.conn.execute(
+                'SELECT * FROM messages WHERE id = ?',
+                (msg_id,)
+            )
+            row = cursor.fetchone()
+            if row:
+                msg = dict(row)
+                msg['similarity'] = similarity
+                results.append(msg)
+        return results
+    def _build_trigram_index(self) -> None:
+        """Build in-memory trigram index from database."""
+        print("Building trigram index (first time only)...")
+        self.trigram_index = TrigramIndex()
+        cursor = self.conn.execute(
+            'SELECT id, text_plain FROM messages WHERE text_plain IS NOT NULL'
+        )
+        for row in cursor.fetchall():
+            self.trigram_index.add(row[0], row[1])
+        print(f"Trigram index built: {len(self.trigram_index)} documents")
+    # ==========================================
+    # THREAD TRAVERSAL
+    # ==========================================
+    def get_thread_dfs(self, message_id: int) -> list[dict]:
+        """
+        Get full conversation thread using DFS traversal.
+        Returns messages in depth-first order (follows reply chains deep).
+        """
+        if self.reply_graph is None:
+            self._build_reply_graph()
+        # Find thread root
+        root_id = self.reply_graph.get_thread_root(message_id)
+        # DFS traversal
+        msg_ids = self.reply_graph.dfs_descendants(root_id)
+        # Fetch messages in order
+        return self._fetch_messages_ordered(msg_ids)
+    def get_thread_bfs(self, message_id: int) -> list[dict]:
+        """
+        Get conversation thread using BFS traversal.
+        Returns messages level by level.
+        """
+        if self.reply_graph is None:
+            self._build_reply_graph()
+        root_id = self.reply_graph.get_thread_root(message_id)
+        msg_ids = self.reply_graph.bfs_descendants(root_id)
+        return self._fetch_messages_ordered(msg_ids)
+    def get_thread_with_depth(self, message_id: int) -> list[tuple[dict, int]]:
+        """
+        Get thread with depth information for each message.
+        Returns list of (message, depth) tuples.
+        """
+        if self.reply_graph is None:
+            self._build_reply_graph()
+        root_id = self.reply_graph.get_thread_root(message_id)
+        items = self.reply_graph.bfs_with_depth(root_id)
+        results = []
+        for msg_id, depth in items:
+            cursor = self.conn.execute(
+                'SELECT * FROM messages WHERE id = ?',
+                (msg_id,)
+            )
+            row = cursor.fetchone()
+            if row:
+                results.append((dict(row), depth))
+        return results
+    def get_replies(self, message_id: int) -> list[dict]:
+        """Get all direct replies to a message."""
+        if self.reply_graph is None:
+            self._build_reply_graph()
+        child_ids = self.reply_graph.get_children(message_id)
+        return self._fetch_messages_ordered(child_ids)
+    def get_conversation_path(self, message_id: int) -> list[dict]:
+        """Get the path from thread root to this message."""
+        if self.reply_graph is None:
+            self._build_reply_graph()
+        path_ids = self.reply_graph.get_thread_path(message_id)
+        return self._fetch_messages_ordered(path_ids)
+    def _build_reply_graph(self) -> None:
+        """Build in-memory reply graph from database."""
+        print("Building reply graph (first time only)...")
+        self.reply_graph = ReplyGraph()
+        cursor = self.conn.execute(
+            'SELECT id, reply_to_message_id FROM messages'
+        )
+        for row in cursor.fetchall():
+            self.reply_graph.add_message(row[0], row[1])
+        print(f"Reply graph built: {self.reply_graph.stats}")
+    def _fetch_messages_ordered(self, msg_ids: list[int]) -> list[dict]:
+        """Fetch messages preserving the order of IDs."""
+        if not msg_ids:
+            return []
+        placeholders = ','.join('?' * len(msg_ids))
+        cursor = self.conn.execute(
+            f'SELECT * FROM messages WHERE id IN ({placeholders})',
+            msg_ids
+        )
+        # Create lookup dict
+        msg_map = {row['id']: dict(row) for row in cursor.fetchall()}
+        # Return in original order
+        return [msg_map[mid] for mid in msg_ids if mid in msg_map]
+    # ==========================================
+    # AUTOCOMPLETE
+    # ==========================================
+    def autocomplete_user(self, prefix: str, limit: int = 10) -> list[str]:
+        """
+        Autocomplete username suggestions.
+        Uses Trie for O(p + k) lookup where p=prefix length, k=results.
+        """
+        if self.user_trie is None:
+            self._build_user_trie()
+        return self.user_trie.autocomplete(prefix, limit=limit)
+    def _build_user_trie(self) -> None:
+        """Build Trie index for usernames."""
+        self.user_trie = Trie()
+        cursor = self.conn.execute('SELECT user_id, display_name FROM users')
+        for row in cursor.fetchall():
+            if row['display_name']:
+                self.user_trie.insert(row['display_name'], data=row['user_id'])
+            if row['user_id']:
+                self.user_trie.insert(row['user_id'], data=row['user_id'])
+    # ==========================================
+    # CONVENIENCE METHODS
+    # ==========================================
+    def search_by_user(self, user_id: str, limit: int = 100) -> list[dict]:
+        """Get all messages from a specific user."""
+        sql = '''
+            SELECT * FROM messages
+            WHERE from_id = ?
+            ORDER BY date_unixtime DESC
+            LIMIT ?
+        '''
+        cursor = self.conn.execute(sql, (user_id, limit))
+        return [dict(row) for row in cursor.fetchall()]
+    def search_by_date_range(
+        self,
+        from_date: int,
+        to_date: int,
+        limit: int = 1000
+    ) -> list[dict]:
+        """Get messages within a date range."""
+        sql = '''
+            SELECT * FROM messages
+            WHERE date_unixtime BETWEEN ? AND ?
+            ORDER BY date_unixtime ASC
+            LIMIT ?
+        '''
+        cursor = self.conn.execute(sql, (from_date, to_date, limit))
+        return [dict(row) for row in cursor.fetchall()]
+    def get_links(self, limit: int = 100) -> list[dict]:
+        """Get all extracted links."""
+        sql = '''
+            SELECT e.value as url, e.message_id, m.from_name, m.date
+            FROM entities e
+            JOIN messages m ON e.message_id = m.id
+            WHERE e.type = 'link'
+            ORDER BY m.date_unixtime DESC
+            LIMIT ?
+        '''
+        cursor = self.conn.execute(sql, (limit,))
+        return [dict(row) for row in cursor.fetchall()]
+    def get_mentions(self, username: Optional[str] = None, limit: int = 100) -> list[dict]:
+        """Get mentions, optionally filtered by username."""
+        if username:
+            sql = '''
+                SELECT e.value as mention, e.message_id, m.from_name, m.text_plain, m.date
+                FROM entities e
+                JOIN messages m ON e.message_id = m.id
+                WHERE e.type = 'mention' AND e.value LIKE ?
+                ORDER BY m.date_unixtime DESC
+                LIMIT ?
+            '''
+            cursor = self.conn.execute(sql, (f'%{username}%', limit))
+        else:
+            sql = '''
+                SELECT e.value as mention, e.message_id, m.from_name, m.text_plain, m.date
+                FROM entities e
+                JOIN messages m ON e.message_id = m.id
+                WHERE e.type = 'mention'
+                ORDER BY m.date_unixtime DESC
+                LIMIT ?
+            '''
+            cursor = self.conn.execute(sql, (limit,))
+        return [dict(row) for row in cursor.fetchall()]
+    @property
+    def cache_stats(self) -> dict:
+        """Get cache statistics."""
+        return self.query_cache.stats
+def format_result(msg: dict, show_depth: bool = False, depth: int = 0) -> str:
+    """Format a message for display."""
+    date_str = msg.get('date', 'Unknown date')
+    from_name = msg.get('from_name', 'Unknown')
+    text = msg.get('text_plain', '')[:200]
+    if len(msg.get('text_plain', '')) > 200:
+        text += '...'
+    flags = []
+    if msg.get('has_links'):
+        flags.append('[link]')
+    if msg.get('has_mentions'):
+        flags.append('[mention]')
+    if msg.get('has_media'):
+        flags.append('[media]')
+    if msg.get('similarity'):
+        flags.append(f'[sim:{msg["similarity"]:.2f}]')
+    if msg.get('relevance'):
+        flags.append(f'[rel:{abs(msg["relevance"]):.2f}]')
+    flags_str = ' '.join(flags)
+    indent = '  ' * depth if show_depth else ''
+    return f"{indent}[{date_str}] {from_name}: {text} {flags_str}"
+def main():
+    parser = argparse.ArgumentParser(description='Search indexed Telegram messages')
+    parser.add_argument('query', nargs='?', help='Search query')
+    parser.add_argument('--db', default='telegram.db', help='Database path')
+    parser.add_argument('--user', help='Filter by user ID')
+    parser.add_argument('--from-date', help='From date (YYYY-MM-DD)')
+    parser.add_argument('--to-date', help='To date (YYYY-MM-DD)')
+    parser.add_argument('--links', action='store_true', help='Show only messages with links')
+    parser.add_argument('--mentions', action='store_true', help='Show only messages with mentions')
+    parser.add_argument('--media', action='store_true', help='Show only messages with media')
+    parser.add_argument('--limit', type=int, default=50, help='Max results')
+    parser.add_argument('--fuzzy', action='store_true', help='Use fuzzy search')
+    parser.add_argument('--threshold', type=float, default=0.3, help='Fuzzy match threshold')
+    parser.add_argument('--thread', type=int, help='Show thread for message ID')
+    parser.add_argument('--list-links', action='store_true', help='List all extracted links')
+    parser.add_argument('--list-mentions', action='store_true', help='List all mentions')
+    parser.add_argument('--autocomplete', help='Autocomplete username')
+    parser.add_argument('--cache-stats', action='store_true', help='Show cache statistics')
+    args = parser.parse_args()
+    with TelegramSearch(args.db) as search:
+        # Show thread
+        if args.thread:
+            print(f"Thread containing message {args.thread}:\n")
+            thread = search.get_thread_with_depth(args.thread)
+            for msg, depth in thread:
+                print(format_result(msg, show_depth=True, depth=depth))
+            return
+        # Autocomplete
+        if args.autocomplete:
+            suggestions = search.autocomplete_user(args.autocomplete)
+            print(f"Suggestions for '{args.autocomplete}':")
+            for s in suggestions:
+                print(f"  {s}")
+            return
+        # List links
+        if args.list_links:
+            links = search.get_links(args.limit)
+            print(f"Found {len(links)} links:\n")
+            for link in links:
+                print(f"  {link['url']}")
+                print(f"    From: {link['from_name']} at {link['date']}")
+            return
+        # List mentions
+        if args.list_mentions:
+            mentions = search.get_mentions(limit=args.limit)
+            print(f"Found {len(mentions)} mentions:\n")
+            for m in mentions:
+                print(f"  {m['mention']} by {m['from_name']}")
+            return
+        # Cache stats
+        if args.cache_stats:
+            print(f"Cache stats: {search.cache_stats}")
+            return
+        if not args.query:
+            parser.print_help()
+            return
+        # Parse dates
+        from_ts = None
+        to_ts = None
+        if args.from_date:
+            from_ts = int(datetime.strptime(args.from_date, '%Y-%m-%d').timestamp())
+        if args.to_date:
+            to_ts = int(datetime.strptime(args.to_date, '%Y-%m-%d').timestamp())
+        # Fuzzy or regular search
+        if args.fuzzy:
+            results = search.fuzzy_search(
+                query=args.query,
+                threshold=args.threshold,
+                limit=args.limit
+            )
+            print(f"Found {len(results)} fuzzy matches for '{args.query}':\n")
+        else:
+            results = search.search(
+                query=args.query,
+                user_id=args.user,
+                from_date=from_ts,
+                to_date=to_ts,
+                has_links=True if args.links else None,
+                has_mentions=True if args.mentions else None,
+                has_media=True if args.media else None,
+                limit=args.limit
+            )
+            print(f"Found {len(results)} results for '{args.query}':\n")
+        for msg in results:
+            print(format_result(msg))
+            print()
+        # Show cache stats
+        print(f"\nCache: {search.cache_stats}")
+if __name__ == '__main__':
+    main()

semantic_search.py ADDED Viewed

	@@ -0,0 +1,411 @@

+"""
+Semantic Search using pre-computed embeddings from Colab.
+Lightweight - only needs sentence-transformers for query encoding.
+"""
+import sqlite3
+import numpy as np
+from typing import List, Dict, Any, Optional
+# Try importing sentence-transformers
+try:
+    from sentence_transformers import SentenceTransformer
+    HAS_TRANSFORMERS = True
+except ImportError:
+    HAS_TRANSFORMERS = False
+    SentenceTransformer = None
+class SemanticSearch:
+    """
+    Semantic search using pre-computed embeddings.
+    The embeddings.db file is created by running the Colab notebook.
+    This class just loads and searches them.
+    """
+    def __init__(self, embeddings_db: str = 'embeddings.db', messages_db: str = 'telegram.db'):
+        self.embeddings_db = embeddings_db
+        self.messages_db = messages_db
+        self.model = None
+        self.embeddings_loaded = False
+        self.embeddings = []
+        self.message_ids = []
+        self.from_names = []
+        self.text_previews = []
+    def _load_model(self):
+        """Load the embedding model (same one used in Colab)."""
+        if not HAS_TRANSFORMERS:
+            raise RuntimeError(
+                "sentence-transformers not installed.\n"
+                "Install with: pip install sentence-transformers"
+            )
+        if self.model is None:
+            print("Loading embedding model...")
+            self.model = SentenceTransformer('paraphrase-multilingual-MiniLM-L12-v2')
+            print("Model loaded!")
+    def reload_embeddings(self):
+        """Force reload embeddings from DB (e.g., after daily sync adds new ones)."""
+        self.embeddings_loaded = False
+        self.embeddings = np.array([]).reshape(0, 0)
+        self.message_ids = []
+        self.from_names = []
+        self.text_previews = []
+        self._load_embeddings()
+    def _load_embeddings(self):
+        """Load all embeddings into memory for fast search."""
+        if self.embeddings_loaded:
+            return
+        import os
+        if not os.path.exists(self.embeddings_db):
+            print(f"Embeddings DB not found: {self.embeddings_db}")
+            self.embeddings_loaded = True
+            self.embeddings = np.array([]).reshape(0, 0)
+            return
+        print(f"Loading embeddings from {self.embeddings_db}...")
+        conn = sqlite3.connect(self.embeddings_db)
+        cursor = conn.execute(
+            "SELECT message_id, from_name, text_preview, embedding FROM embeddings"
+        )
+        emb_list = []
+        for row in cursor:
+            msg_id, name, text, emb_blob = row
+            emb = np.frombuffer(emb_blob, dtype=np.float32)
+            self.message_ids.append(msg_id)
+            self.from_names.append(name or '')
+            self.text_previews.append(text or '')
+            emb_list.append(emb)
+        conn.close()
+        if len(emb_list) == 0:
+            print("No embeddings found in database")
+            self.embeddings = np.array([]).reshape(0, 0)
+            self.embeddings_loaded = True
+            return
+        # Stack into numpy array for fast computation
+        self.embeddings = np.vstack(emb_list)
+        # Normalize embeddings for cosine similarity
+        norms = np.linalg.norm(self.embeddings, axis=1, keepdims=True)
+        norms = np.where(norms == 0, 1, norms)  # Avoid division by zero
+        self.embeddings = self.embeddings / norms
+        self.embeddings_loaded = True
+        print(f"Loaded {len(self.message_ids)} embeddings")
+    def search(self, query: str, limit: int = 50, min_score: float = 0.3) -> List[Dict[str, Any]]:
+        """
+        Search for semantically similar messages.
+        Args:
+            query: The search query
+            limit: Max results to return
+            min_score: Minimum similarity score (0-1)
+        Returns:
+            List of dicts with message_id, from_name, text, score
+        """
+        self._load_model()
+        self._load_embeddings()
+        if len(self.message_ids) == 0:
+            return []
+        # Encode query
+        query_emb = self.model.encode([query], convert_to_numpy=True)[0]
+        # Compute cosine similarity with all embeddings
+        # embeddings are already normalized from Colab
+        query_norm = query_emb / np.linalg.norm(query_emb)
+        similarities = np.dot(self.embeddings, query_norm)
+        # Get top results
+        top_indices = np.argsort(similarities)[::-1][:limit * 2]  # Get more, then filter
+        results = []
+        for idx in top_indices:
+            score = float(similarities[idx])
+            if score < min_score:
+                continue
+            results.append({
+                'message_id': int(self.message_ids[idx]),
+                'from_name': self.from_names[idx],
+                'text': self.text_previews[idx],
+                'score': score
+            })
+            if len(results) >= limit:
+                break
+        return results
+    def search_with_full_text(self, query: str, limit: int = 20) -> List[Dict[str, Any]]:
+        """
+        Search and return full message text from messages DB.
+        """
+        results = self.search(query, limit=limit)
+        if not results:
+            return []
+        # Get full text from messages DB
+        conn = sqlite3.connect(self.messages_db)
+        conn.row_factory = sqlite3.Row
+        for result in results:
+            cursor = conn.execute(
+                "SELECT date, from_name, text_plain, reply_to_message_id FROM messages WHERE id = ?",
+                (result['message_id'],)
+            )
+            row = cursor.fetchone()
+            if row:
+                result['date'] = row['date']
+                result['from_name'] = row['from_name']
+                result['text'] = row['text_plain']
+                result['reply_to_message_id'] = row['reply_to_message_id']
+        conn.close()
+        return results
+    def _add_thread_context(self, results: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
+        """
+        Add FULL thread context to search results.
+        For each message, find the entire conversation thread:
+        1. Go up to find the root message
+        2. Get all messages in that thread
+        """
+        if not results:
+            return results
+        conn = sqlite3.connect(self.messages_db)
+        conn.row_factory = sqlite3.Row
+        all_messages = {r['message_id']: r for r in results}
+        thread_roots = set()
+        # Step 1: Find root messages by following reply chains UP
+        for result in results:
+            msg_id = result['message_id']
+            reply_to = result.get('reply_to_message_id')
+            # Follow the chain up to find the root
+            current_id = msg_id
+            current_reply_to = reply_to
+            visited = {current_id}
+            while current_reply_to and current_reply_to not in visited:
+                visited.add(current_reply_to)
+                cursor = conn.execute(
+                    "SELECT id, reply_to_message_id FROM messages WHERE id = ?",
+                    (current_reply_to,)
+                )
+                row = cursor.fetchone()
+                if row:
+                    current_id = row['id']
+                    current_reply_to = row['reply_to_message_id']
+                else:
+                    break
+            # current_id is now the root of this thread
+            thread_roots.add(current_id)
+        # Step 2: Get ALL messages in these threads (recursively)
+        def get_thread_messages(root_ids, depth=0, max_depth=10):
+            """Recursively get all messages in threads."""
+            if not root_ids or depth > max_depth:
+                return []
+            messages = []
+            # Get root messages themselves
+            if root_ids:
+                placeholders = ','.join('?' * len(root_ids))
+                cursor = conn.execute(f"""
+                    SELECT id, date, from_name, text_plain, reply_to_message_id
+                    FROM messages WHERE id IN ({placeholders})
+                """, list(root_ids))
+                for row in cursor:
+                    if row['id'] not in all_messages:
+                        messages.append({
+                            'message_id': row['id'],
+                            'date': row['date'],
+                            'from_name': row['from_name'],
+                            'text': row['text_plain'],
+                            'reply_to_message_id': row['reply_to_message_id'],
+                            'is_thread_context': True
+                        })
+                        all_messages[row['id']] = messages[-1]
+            # Get all replies to these messages
+            all_ids = set(root_ids) | set(all_messages.keys())
+            if all_ids:
+                placeholders = ','.join('?' * len(all_ids))
+                cursor = conn.execute(f"""
+                    SELECT id, date, from_name, text_plain, reply_to_message_id
+                    FROM messages WHERE reply_to_message_id IN ({placeholders})
+                    LIMIT 200
+                """, list(all_ids))
+                new_ids = set()
+                for row in cursor:
+                    if row['id'] not in all_messages:
+                        msg = {
+                            'message_id': row['id'],
+                            'date': row['date'],
+                            'from_name': row['from_name'],
+                            'text': row['text_plain'],
+                            'reply_to_message_id': row['reply_to_message_id'],
+                            'is_thread_context': True
+                        }
+                        messages.append(msg)
+                        all_messages[row['id']] = msg
+                        new_ids.add(row['id'])
+                # Recursively get replies to the new messages
+                if new_ids:
+                    messages.extend(get_thread_messages(new_ids, depth + 1, max_depth))
+            return messages
+        # Get all thread messages
+        get_thread_messages(thread_roots)
+        conn.close()
+        # Sort all messages by date
+        all_list = list(all_messages.values())
+        all_list.sort(key=lambda x: x.get('date', '') or '')
+        return all_list
+    def search_with_ai_answer(self, query: str, ai_engine, limit: int = 30) -> Dict[str, Any]:
+        """
+        Search semantically and send results to AI for reasoning.
+        This combines the power of:
+        1. Semantic search (finds relevant messages by meaning)
+        2. Thread context (includes replies to/from found messages)
+        3. AI reasoning (reads messages and answers the question)
+        """
+        results = self.search_with_full_text(query, limit=limit)
+        if not results:
+            return {
+                'query': query,
+                'answer': 'לא נמצאו הודעות רלוונטיות',
+                'mode': 'semantic_ai',
+                'results': [],
+                'count': 0
+            }
+        # Get thread context for each result
+        results_with_threads = self._add_thread_context(results)
+        # Build context from semantic search results + threads
+        context_text = "\n".join([
+            f"[{r.get('date', '')}] {r.get('from_name', 'Unknown')}: {r.get('text', '')[:500]}"
+            for r in results_with_threads if r.get('text')
+        ])
+        # Send to AI for reasoning
+        reason_prompt = f"""You are analyzing a Telegram chat history to answer a question.
+The messages below were found using semantic search, along with their thread context (replies).
+Read them carefully and provide a comprehensive answer.
+Question: {query}
+Relevant messages and their threads:
+{context_text}
+Based on these messages, answer the question in Hebrew.
+If you can find the answer, provide it clearly.
+Pay special attention to reply chains - the answer might be in a reply!
+If you can infer information from context clues, do so.
+Cite specific messages when relevant.
+Answer:"""
+        try:
+            # Call the appropriate AI provider based on engine configuration
+            provider = getattr(ai_engine, 'provider', None)
+            if provider == 'gemini':
+                answer = ai_engine._call_gemini(reason_prompt)
+            elif provider == 'groq':
+                answer = ai_engine._call_groq(reason_prompt)
+            elif provider == 'ollama':
+                answer = ai_engine._call_ollama(reason_prompt)
+            else:
+                answer = "AI engine not available for reasoning"
+        except Exception as e:
+            answer = f"שגיאה ב-AI: {str(e)}"
+        return {
+            'query': query,
+            'answer': answer,
+            'mode': 'semantic_ai',
+            'results': results,  # Original results for display
+            'count': len(results),
+            'total_with_threads': len(results_with_threads)
+        }
+    def is_available(self) -> bool:
+        """Check if semantic search is available (DB exists and has embeddings)."""
+        import os
+        if not HAS_TRANSFORMERS or not os.path.exists(self.embeddings_db):
+            return False
+        try:
+            conn = sqlite3.connect(self.embeddings_db)
+            count = conn.execute("SELECT COUNT(*) FROM embeddings").fetchone()[0]
+            conn.close()
+            return count > 0
+        except Exception:
+            return False
+    def stats(self) -> Dict[str, Any]:
+        """Get statistics about the embeddings."""
+        import os
+        if not os.path.exists(self.embeddings_db):
+            return {'available': False, 'error': 'embeddings.db not found'}
+        conn = sqlite3.connect(self.embeddings_db)
+        cursor = conn.execute("SELECT COUNT(*) FROM embeddings")
+        count = cursor.fetchone()[0]
+        conn.close()
+        size_mb = os.path.getsize(self.embeddings_db) / (1024 * 1024)
+        return {
+            'available': True,
+            'count': count,
+            'size_mb': round(size_mb, 1),
+            'model': 'paraphrase-multilingual-MiniLM-L12-v2'
+        }
+# Singleton instance
+_search_instance = None
+def get_semantic_search() -> SemanticSearch:
+    """Get or create semantic search instance."""
+    global _search_instance
+    if _search_instance is None:
+        _search_instance = SemanticSearch()
+    return _search_instance
+if __name__ == '__main__':
+    # Test
+    ss = SemanticSearch()
+    print("Stats:", ss.stats())
+    if ss.is_available():
+        results = ss.search("איפה אתה עובד?", limit=5)
+        print("\nResults for 'איפה אתה עובד?':")
+        for r in results:
+            print(f"  [{r['score']:.3f}] {r['from_name']}: {r['text'][:60]}...")

static/css/style.css ADDED Viewed

	@@ -0,0 +1,859 @@

+/* ==========================================
+   TELEGRAM ANALYTICS DASHBOARD - CSS
+   ========================================== */
+:root {
+    /* Colors */
+    --primary: #0088cc;
+    --primary-dark: #006699;
+    --primary-light: #33a3d9;
+    --secondary: #6c757d;
+    --success: #28a745;
+    --warning: #ffc107;
+    --danger: #dc3545;
+    --info: #17a2b8;
+    /* Dark theme */
+    --bg-dark: #1a1a2e;
+    --bg-card: #16213e;
+    --bg-sidebar: #0f0f23;
+    --text-primary: #ffffff;
+    --text-secondary: #a0aec0;
+    --text-muted: #718096;
+    --border-color: #2d3748;
+    /* Spacing */
+    --sidebar-width: 250px;
+    --header-height: 70px;
+    --spacing-xs: 0.25rem;
+    --spacing-sm: 0.5rem;
+    --spacing-md: 1rem;
+    --spacing-lg: 1.5rem;
+    --spacing-xl: 2rem;
+    /* Border radius */
+    --radius-sm: 4px;
+    --radius-md: 8px;
+    --radius-lg: 12px;
+}
+/* Reset */
+* {
+    margin: 0;
+    padding: 0;
+    box-sizing: border-box;
+}
+body {
+    font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, sans-serif;
+    background: var(--bg-dark);
+    color: var(--text-primary);
+    min-height: 100vh;
+    display: flex;
+}
+/* ==========================================
+   SIDEBAR
+   ========================================== */
+.sidebar {
+    width: var(--sidebar-width);
+    background: var(--bg-sidebar);
+    height: 100vh;
+    position: fixed;
+    left: 0;
+    top: 0;
+    display: flex;
+    flex-direction: column;
+    border-right: 1px solid var(--border-color);
+    z-index: 100;
+}
+.logo {
+    padding: var(--spacing-lg);
+    display: flex;
+    align-items: center;
+    gap: var(--spacing-md);
+    border-bottom: 1px solid var(--border-color);
+}
+.logo-icon {
+    font-size: 2rem;
+}
+.logo-text {
+    font-size: 1.25rem;
+    font-weight: 700;
+    color: var(--primary);
+}
+.nav-menu {
+    list-style: none;
+    padding: var(--spacing-md);
+    flex: 1;
+}
+.nav-item {
+    margin-bottom: var(--spacing-xs);
+}
+.nav-link {
+    display: flex;
+    align-items: center;
+    gap: var(--spacing-md);
+    padding: var(--spacing-md);
+    color: var(--text-secondary);
+    text-decoration: none;
+    border-radius: var(--radius-md);
+    transition: all 0.2s ease;
+}
+.nav-link:hover {
+    background: var(--bg-card);
+    color: var(--text-primary);
+}
+.nav-item.active .nav-link {
+    background: var(--primary);
+    color: white;
+}
+.nav-link .icon {
+    font-size: 1.25rem;
+}
+.sidebar-footer {
+    padding: var(--spacing-md);
+    border-top: 1px solid var(--border-color);
+}
+.export-buttons {
+    display: flex;
+    flex-direction: column;
+    gap: var(--spacing-sm);
+}
+/* ==========================================
+   MAIN CONTENT
+   ========================================== */
+.main-content {
+    margin-left: var(--sidebar-width);
+    flex: 1;
+    padding: var(--spacing-lg);
+    max-width: calc(100vw - var(--sidebar-width));
+}
+/* ==========================================
+   HEADER
+   ========================================== */
+.header {
+    display: flex;
+    justify-content: space-between;
+    align-items: center;
+    margin-bottom: var(--spacing-xl);
+    padding-bottom: var(--spacing-lg);
+    border-bottom: 1px solid var(--border-color);
+}
+.header h1 {
+    font-size: 1.75rem;
+    font-weight: 600;
+}
+.header-controls {
+    display: flex;
+    gap: var(--spacing-md);
+    align-items: center;
+}
+/* ==========================================
+   BUTTONS & INPUTS
+   ========================================== */
+.btn {
+    padding: var(--spacing-sm) var(--spacing-md);
+    border: none;
+    border-radius: var(--radius-md);
+    cursor: pointer;
+    font-size: 0.875rem;
+    font-weight: 500;
+    transition: all 0.2s ease;
+    display: inline-flex;
+    align-items: center;
+    gap: var(--spacing-sm);
+    background: var(--bg-card);
+    color: var(--text-primary);
+    border: 1px solid var(--border-color);
+}
+.btn:hover {
+    background: var(--border-color);
+}
+.btn-primary {
+    background: var(--primary);
+    color: white;
+    border: none;
+}
+.btn-primary:hover {
+    background: var(--primary-dark);
+}
+.btn-sm {
+    padding: var(--spacing-xs) var(--spacing-sm);
+    font-size: 0.75rem;
+}
+.select, .select-sm {
+    padding: var(--spacing-sm) var(--spacing-md);
+    border: 1px solid var(--border-color);
+    border-radius: var(--radius-md);
+    background: var(--bg-card);
+    color: var(--text-primary);
+    font-size: 0.875rem;
+    cursor: pointer;
+}
+.select-sm {
+    padding: var(--spacing-xs) var(--spacing-sm);
+    font-size: 0.75rem;
+}
+.select:focus, .select-sm:focus {
+    outline: none;
+    border-color: var(--primary);
+}
+input[type="text"], input[type="search"] {
+    padding: var(--spacing-sm) var(--spacing-md);
+    border: 1px solid var(--border-color);
+    border-radius: var(--radius-md);
+    background: var(--bg-card);
+    color: var(--text-primary);
+    font-size: 0.875rem;
+    width: 100%;
+}
+input:focus {
+    outline: none;
+    border-color: var(--primary);
+}
+/* ==========================================
+   STATS CARDS
+   ========================================== */
+.stats-grid {
+    display: grid;
+    grid-template-columns: repeat(auto-fit, minmax(180px, 1fr));
+    gap: var(--spacing-md);
+    margin-bottom: var(--spacing-xl);
+}
+.stat-card {
+    background: var(--bg-card);
+    border-radius: var(--radius-lg);
+    padding: var(--spacing-lg);
+    display: flex;
+    align-items: center;
+    gap: var(--spacing-md);
+    border: 1px solid var(--border-color);
+    transition: transform 0.2s ease, box-shadow 0.2s ease;
+}
+.stat-card:hover {
+    transform: translateY(-2px);
+    box-shadow: 0 4px 12px rgba(0, 0, 0, 0.3);
+}
+.stat-icon {
+    font-size: 2.5rem;
+    opacity: 0.9;
+}
+.stat-value {
+    font-size: 1.75rem;
+    font-weight: 700;
+    color: var(--text-primary);
+}
+.stat-label {
+    font-size: 0.875rem;
+    color: var(--text-muted);
+    margin-top: var(--spacing-xs);
+}
+/* ==========================================
+   CHARTS
+   ========================================== */
+.charts-row {
+    display: grid;
+    grid-template-columns: repeat(2, 1fr);
+    gap: var(--spacing-lg);
+    margin-bottom: var(--spacing-xl);
+}
+.chart-card {
+    background: var(--bg-card);
+    border-radius: var(--radius-lg);
+    padding: var(--spacing-lg);
+    border: 1px solid var(--border-color);
+}
+.chart-card.large {
+    grid-column: span 1;
+}
+.chart-card.full-width {
+    grid-column: span 2;
+}
+.chart-header {
+    display: flex;
+    justify-content: space-between;
+    align-items: center;
+    margin-bottom: var(--spacing-md);
+}
+.chart-header h3 {
+    font-size: 1rem;
+    font-weight: 600;
+    color: var(--text-primary);
+}
+.chart-subtitle {
+    font-size: 0.75rem;
+    color: var(--text-muted);
+}
+.chart-container {
+    position: relative;
+    height: 250px;
+}
+/* ==========================================
+   HEATMAP
+   ========================================== */
+.heatmap-container {
+    overflow-x: auto;
+}
+.heatmap-table {
+    width: 100%;
+    border-collapse: collapse;
+    font-size: 0.75rem;
+}
+.heatmap-table th,
+.heatmap-table td {
+    padding: var(--spacing-xs);
+    text-align: center;
+    min-width: 35px;
+}
+.heatmap-table th {
+    color: var(--text-muted);
+    font-weight: 500;
+}
+.heatmap-cell {
+    width: 30px;
+    height: 30px;
+    border-radius: var(--radius-sm);
+    display: inline-block;
+    transition: transform 0.2s ease;
+}
+.heatmap-cell:hover {
+    transform: scale(1.2);
+}
+.day-label {
+    text-align: right;
+    padding-right: var(--spacing-sm) !important;
+    color: var(--text-secondary);
+}
+/* ==========================================
+   LISTS
+   ========================================== */
+.lists-row {
+    display: grid;
+    grid-template-columns: repeat(3, 1fr);
+    gap: var(--spacing-lg);
+    margin-bottom: var(--spacing-xl);
+}
+.list-card {
+    background: var(--bg-card);
+    border-radius: var(--radius-lg);
+    border: 1px solid var(--border-color);
+    overflow: hidden;
+}
+.list-header {
+    display: flex;
+    justify-content: space-between;
+    align-items: center;
+    padding: var(--spacing-md) var(--spacing-lg);
+    border-bottom: 1px solid var(--border-color);
+}
+.list-header h3 {
+    font-size: 1rem;
+    font-weight: 600;
+}
+.link {
+    color: var(--primary);
+    text-decoration: none;
+    font-size: 0.875rem;
+}
+.link:hover {
+    text-decoration: underline;
+}
+.list-content {
+    max-height: 350px;
+    overflow-y: auto;
+}
+.list-item {
+    display: flex;
+    align-items: center;
+    padding: var(--spacing-sm) var(--spacing-lg);
+    border-bottom: 1px solid var(--border-color);
+    gap: var(--spacing-md);
+}
+.list-item:last-child {
+    border-bottom: none;
+}
+.list-item:hover {
+    background: rgba(255, 255, 255, 0.02);
+}
+.list-rank {
+    font-weight: 700;
+    color: var(--text-muted);
+    min-width: 30px;
+}
+.list-rank.gold { color: #ffd700; }
+.list-rank.silver { color: #c0c0c0; }
+.list-rank.bronze { color: #cd7f32; }
+.list-info {
+    flex: 1;
+    min-width: 0;
+}
+.list-name {
+    font-weight: 500;
+    white-space: nowrap;
+    overflow: hidden;
+    text-overflow: ellipsis;
+}
+.list-subtitle {
+    font-size: 0.75rem;
+    color: var(--text-muted);
+}
+.list-value {
+    font-weight: 600;
+    color: var(--primary);
+}
+/* ==========================================
+   USERS PAGE
+   ========================================== */
+.users-table {
+    width: 100%;
+    border-collapse: collapse;
+}
+.users-table th,
+.users-table td {
+    padding: var(--spacing-md);
+    text-align: left;
+    border-bottom: 1px solid var(--border-color);
+}
+.users-table th {
+    background: var(--bg-sidebar);
+    font-weight: 600;
+    color: var(--text-secondary);
+    font-size: 0.875rem;
+    position: sticky;
+    top: 0;
+}
+.users-table tr:hover {
+    background: rgba(255, 255, 255, 0.02);
+}
+.user-avatar {
+    width: 36px;
+    height: 36px;
+    border-radius: 50%;
+    background: var(--primary);
+    display: flex;
+    align-items: center;
+    justify-content: center;
+    font-weight: 700;
+    font-size: 0.875rem;
+}
+.user-cell {
+    display: flex;
+    align-items: center;
+    gap: var(--spacing-md);
+}
+.progress-bar {
+    height: 6px;
+    background: var(--border-color);
+    border-radius: 3px;
+    overflow: hidden;
+    margin-top: var(--spacing-xs);
+}
+.progress-fill {
+    height: 100%;
+    background: var(--primary);
+    border-radius: 3px;
+}
+/* ==========================================
+   SEARCH PAGE
+   ========================================== */
+.search-box {
+    display: flex;
+    gap: var(--spacing-md);
+    margin-bottom: var(--spacing-xl);
+}
+.search-input {
+    flex: 1;
+}
+.search-results {
+    background: var(--bg-card);
+    border-radius: var(--radius-lg);
+    border: 1px solid var(--border-color);
+}
+.search-result-item {
+    padding: var(--spacing-lg);
+    border-bottom: 1px solid var(--border-color);
+}
+.search-result-item:last-child {
+    border-bottom: none;
+}
+.search-result-header {
+    display: flex;
+    justify-content: space-between;
+    align-items: center;
+    margin-bottom: var(--spacing-sm);
+}
+.search-result-author {
+    font-weight: 600;
+    color: var(--primary);
+}
+.search-result-date {
+    font-size: 0.75rem;
+    color: var(--text-muted);
+}
+.search-result-text {
+    color: var(--text-secondary);
+    line-height: 1.5;
+}
+.search-highlight {
+    background: rgba(0, 136, 204, 0.3);
+    padding: 0 2px;
+    border-radius: 2px;
+}
+/* ==========================================
+   PAGINATION
+   ========================================== */
+.pagination {
+    display: flex;
+    justify-content: center;
+    gap: var(--spacing-sm);
+    margin-top: var(--spacing-xl);
+}
+.page-btn {
+    padding: var(--spacing-sm) var(--spacing-md);
+    background: var(--bg-card);
+    border: 1px solid var(--border-color);
+    border-radius: var(--radius-md);
+    color: var(--text-primary);
+    cursor: pointer;
+    transition: all 0.2s ease;
+}
+.page-btn:hover {
+    background: var(--border-color);
+}
+.page-btn.active {
+    background: var(--primary);
+    border-color: var(--primary);
+}
+.page-btn:disabled {
+    opacity: 0.5;
+    cursor: not-allowed;
+}
+/* ==========================================
+   USER MODAL
+   ========================================== */
+.modal-overlay {
+    position: fixed;
+    top: 0;
+    left: 0;
+    right: 0;
+    bottom: 0;
+    background: rgba(0, 0, 0, 0.7);
+    display: flex;
+    align-items: center;
+    justify-content: center;
+    z-index: 1000;
+    opacity: 0;
+    visibility: hidden;
+    transition: all 0.3s ease;
+}
+.modal-overlay.active {
+    opacity: 1;
+    visibility: visible;
+}
+.modal {
+    background: var(--bg-card);
+    border-radius: var(--radius-lg);
+    width: 90%;
+    max-width: 600px;
+    max-height: 80vh;
+    overflow-y: auto;
+    border: 1px solid var(--border-color);
+    transform: translateY(-20px);
+    transition: transform 0.3s ease;
+}
+.modal-overlay.active .modal {
+    transform: translateY(0);
+}
+.modal-header {
+    display: flex;
+    justify-content: space-between;
+    align-items: center;
+    padding: var(--spacing-lg);
+    border-bottom: 1px solid var(--border-color);
+}
+.modal-header h2 {
+    font-size: 1.25rem;
+}
+.modal-close {
+    background: none;
+    border: none;
+    font-size: 1.5rem;
+    color: var(--text-secondary);
+    cursor: pointer;
+}
+.modal-body {
+    padding: var(--spacing-lg);
+}
+.user-profile {
+    display: flex;
+    align-items: center;
+    gap: var(--spacing-lg);
+    margin-bottom: var(--spacing-xl);
+}
+.user-profile-avatar {
+    width: 80px;
+    height: 80px;
+    border-radius: 50%;
+    background: var(--primary);
+    display: flex;
+    align-items: center;
+    justify-content: center;
+    font-size: 2rem;
+    font-weight: 700;
+}
+.user-profile-info h3 {
+    font-size: 1.5rem;
+    margin-bottom: var(--spacing-xs);
+}
+.user-profile-info p {
+    color: var(--text-muted);
+}
+.user-stats-grid {
+    display: grid;
+    grid-template-columns: repeat(3, 1fr);
+    gap: var(--spacing-md);
+    margin-bottom: var(--spacing-xl);
+}
+.user-stat {
+    text-align: center;
+    padding: var(--spacing-md);
+    background: var(--bg-sidebar);
+    border-radius: var(--radius-md);
+}
+.user-stat-value {
+    font-size: 1.5rem;
+    font-weight: 700;
+    color: var(--primary);
+}
+.user-stat-label {
+    font-size: 0.75rem;
+    color: var(--text-muted);
+    margin-top: var(--spacing-xs);
+}
+/* ==========================================
+   LOADING & EMPTY STATES
+   ========================================== */
+.loading {
+    display: flex;
+    align-items: center;
+    justify-content: center;
+    padding: var(--spacing-xl);
+    color: var(--text-muted);
+}
+.spinner {
+    width: 40px;
+    height: 40px;
+    border: 3px solid var(--border-color);
+    border-top-color: var(--primary);
+    border-radius: 50%;
+    animation: spin 1s linear infinite;
+}
+@keyframes spin {
+    to { transform: rotate(360deg); }
+}
+.empty-state {
+    text-align: center;
+    padding: var(--spacing-xl);
+    color: var(--text-muted);
+}
+.empty-state-icon {
+    font-size: 3rem;
+    margin-bottom: var(--spacing-md);
+    opacity: 0.5;
+}
+/* ==========================================
+   RESPONSIVE
+   ========================================== */
+@media (max-width: 1200px) {
+    .lists-row {
+        grid-template-columns: repeat(2, 1fr);
+    }
+}
+@media (max-width: 992px) {
+    .sidebar {
+        width: 70px;
+    }
+    .logo-text, .nav-link span:not(.icon) {
+        display: none;
+    }
+    .main-content {
+        margin-left: 70px;
+        max-width: calc(100vw - 70px);
+    }
+    .charts-row {
+        grid-template-columns: 1fr;
+    }
+    .chart-card.full-width,
+    .chart-card.large {
+        grid-column: span 1;
+    }
+    .lists-row {
+        grid-template-columns: 1fr;
+    }
+}
+@media (max-width: 768px) {
+    .stats-grid {
+        grid-template-columns: repeat(2, 1fr);
+    }
+    .header {
+        flex-direction: column;
+        gap: var(--spacing-md);
+        align-items: flex-start;
+    }
+    .user-stats-grid {
+        grid-template-columns: repeat(2, 1fr);
+    }
+}
+/* ==========================================
+   SCROLLBAR
+   ========================================== */
+::-webkit-scrollbar {
+    width: 8px;
+    height: 8px;
+}
+::-webkit-scrollbar-track {
+    background: var(--bg-sidebar);
+}
+::-webkit-scrollbar-thumb {
+    background: var(--border-color);
+    border-radius: 4px;
+}
+::-webkit-scrollbar-thumb:hover {
+    background: var(--text-muted);
+}

static/js/dashboard.js ADDED Viewed

	@@ -0,0 +1,622 @@

+/**
+ * Telegram Analytics Dashboard - JavaScript
+ *
+ * Handles all interactivity:
+ * - Data fetching from API
+ * - Chart rendering with Chart.js
+ * - Real-time updates
+ * - User interactions
+ * - Export functionality
+ */
+// ==========================================
+// GLOBAL STATE
+// ==========================================
+const state = {
+    timeframe: 'month',
+    charts: {},
+    autoRefresh: null,
+    currentPage: 1,
+    usersPerPage: 20
+};
+// Chart.js default configuration
+Chart.defaults.color = '#a0aec0';
+Chart.defaults.borderColor = '#2d3748';
+Chart.defaults.font.family = '-apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, sans-serif';
+// ==========================================
+// UTILITY FUNCTIONS
+// ==========================================
+function formatNumber(num) {
+    if (num >= 1000000) return (num / 1000000).toFixed(1) + 'M';
+    if (num >= 1000) return (num / 1000).toFixed(1) + 'K';
+    return num.toLocaleString();
+}
+function formatDate(timestamp) {
+    if (!timestamp) return '-';
+    return new Date(timestamp * 1000).toLocaleDateString('en-US', {
+        year: 'numeric',
+        month: 'short',
+        day: 'numeric'
+    });
+}
+function getTimeframe() {
+    const select = document.getElementById('timeframe');
+    return select ? select.value : state.timeframe;
+}
+async function fetchAPI(endpoint) {
+    try {
+        const timeframe = getTimeframe();
+        const separator = endpoint.includes('?') ? '&' : '?';
+        const response = await fetch(`${endpoint}${separator}timeframe=${timeframe}`);
+        return await response.json();
+    } catch (error) {
+        console.error('API Error:', error);
+        return null;
+    }
+}
+function showLoading(elementId) {
+    const element = document.getElementById(elementId);
+    if (element) {
+        element.innerHTML = '<div class="loading"><div class="spinner"></div></div>';
+    }
+}
+function showEmpty(elementId, message = 'No data available') {
+    const element = document.getElementById(elementId);
+    if (element) {
+        element.innerHTML = `
+            <div class="empty-state">
+                <div class="empty-state-icon">📭</div>
+                <p>${message}</p>
+            </div>
+        `;
+    }
+}
+// ==========================================
+// DATA LOADING
+// ==========================================
+async function loadAllData() {
+    state.timeframe = getTimeframe();
+    // Load all data in parallel
+    await Promise.all([
+        loadOverviewStats(),
+        loadMessagesChart(),
+        loadUsersChart(),
+        loadHourlyChart(),
+        loadDailyChart(),
+        loadHeatmap(),
+        loadTopUsers(),
+        loadTopWords(),
+        loadTopDomains()
+    ]);
+}
+async function loadOverviewStats() {
+    const data = await fetchAPI('/api/overview');
+    if (!data) return;
+    // Update stat cards
+    document.getElementById('total-messages').textContent = formatNumber(data.total_messages);
+    document.getElementById('active-users').textContent = formatNumber(data.active_users);
+    document.getElementById('messages-per-day').textContent = formatNumber(data.messages_per_day);
+    document.getElementById('links-count').textContent = formatNumber(data.links_count);
+    document.getElementById('media-count').textContent = formatNumber(data.media_count);
+    document.getElementById('replies-count').textContent = formatNumber(data.replies_count);
+}
+// ==========================================
+// CHARTS
+// ==========================================
+async function loadMessagesChart() {
+    const granularitySelect = document.getElementById('messages-granularity');
+    const granularity = granularitySelect ? granularitySelect.value : 'day';
+    const data = await fetchAPI(`/api/chart/messages?granularity=${granularity}`);
+    if (!data || data.length === 0) return;
+    const ctx = document.getElementById('messages-chart');
+    if (!ctx) return;
+    // Destroy existing chart
+    if (state.charts.messages) {
+        state.charts.messages.destroy();
+    }
+    state.charts.messages = new Chart(ctx, {
+        type: 'line',
+        data: {
+            labels: data.map(d => d.label),
+            datasets: [{
+                label: 'Messages',
+                data: data.map(d => d.value),
+                borderColor: '#0088cc',
+                backgroundColor: 'rgba(0, 136, 204, 0.1)',
+                fill: true,
+                tension: 0.4,
+                pointRadius: 2,
+                pointHoverRadius: 5
+            }]
+        },
+        options: {
+            responsive: true,
+            maintainAspectRatio: false,
+            plugins: {
+                legend: { display: false }
+            },
+            scales: {
+                x: {
+                    grid: { display: false },
+                    ticks: { maxTicksLimit: 10 }
+                },
+                y: {
+                    beginAtZero: true,
+                    grid: { color: '#2d3748' }
+                }
+            },
+            interaction: {
+                intersect: false,
+                mode: 'index'
+            }
+        }
+    });
+}
+async function loadUsersChart() {
+    const data = await fetchAPI('/api/chart/users?granularity=day');
+    if (!data || data.length === 0) return;
+    const ctx = document.getElementById('users-chart');
+    if (!ctx) return;
+    if (state.charts.users) {
+        state.charts.users.destroy();
+    }
+    state.charts.users = new Chart(ctx, {
+        type: 'line',
+        data: {
+            labels: data.map(d => d.label),
+            datasets: [{
+                label: 'Active Users',
+                data: data.map(d => d.value),
+                borderColor: '#28a745',
+                backgroundColor: 'rgba(40, 167, 69, 0.1)',
+                fill: true,
+                tension: 0.4,
+                pointRadius: 2,
+                pointHoverRadius: 5
+            }]
+        },
+        options: {
+            responsive: true,
+            maintainAspectRatio: false,
+            plugins: {
+                legend: { display: false }
+            },
+            scales: {
+                x: {
+                    grid: { display: false },
+                    ticks: { maxTicksLimit: 10 }
+                },
+                y: {
+                    beginAtZero: true,
+                    grid: { color: '#2d3748' }
+                }
+            }
+        }
+    });
+}
+async function loadHourlyChart() {
+    const data = await fetchAPI('/api/chart/hourly');
+    if (!data || data.length === 0) return;
+    const ctx = document.getElementById('hourly-chart');
+    if (!ctx) return;
+    if (state.charts.hourly) {
+        state.charts.hourly.destroy();
+    }
+    state.charts.hourly = new Chart(ctx, {
+        type: 'bar',
+        data: {
+            labels: data.map(d => d.label),
+            datasets: [{
+                label: 'Messages',
+                data: data.map(d => d.value),
+                backgroundColor: '#0088cc',
+                borderRadius: 4
+            }]
+        },
+        options: {
+            responsive: true,
+            maintainAspectRatio: false,
+            plugins: {
+                legend: { display: false }
+            },
+            scales: {
+                x: {
+                    grid: { display: false },
+                    ticks: { maxTicksLimit: 12 }
+                },
+                y: {
+                    beginAtZero: true,
+                    grid: { color: '#2d3748' }
+                }
+            }
+        }
+    });
+}
+async function loadDailyChart() {
+    const data = await fetchAPI('/api/chart/daily');
+    if (!data || data.length === 0) return;
+    const ctx = document.getElementById('daily-chart');
+    if (!ctx) return;
+    if (state.charts.daily) {
+        state.charts.daily.destroy();
+    }
+    const colors = [
+        '#dc3545', // Sunday - red
+        '#ffc107', // Monday - yellow
+        '#28a745', // Tuesday - green
+        '#17a2b8', // Wednesday - cyan
+        '#0088cc', // Thursday - blue
+        '#6f42c1', // Friday - purple
+        '#fd7e14'  // Saturday - orange
+    ];
+    state.charts.daily = new Chart(ctx, {
+        type: 'bar',
+        data: {
+            labels: data.map(d => d.label.substring(0, 3)),
+            datasets: [{
+                label: 'Messages',
+                data: data.map(d => d.value),
+                backgroundColor: colors,
+                borderRadius: 4
+            }]
+        },
+        options: {
+            responsive: true,
+            maintainAspectRatio: false,
+            plugins: {
+                legend: { display: false }
+            },
+            scales: {
+                x: {
+                    grid: { display: false }
+                },
+                y: {
+                    beginAtZero: true,
+                    grid: { color: '#2d3748' }
+                }
+            }
+        }
+    });
+}
+async function loadHeatmap() {
+    const data = await fetchAPI('/api/chart/heatmap');
+    if (!data || !data.data) return;
+    const container = document.getElementById('heatmap');
+    if (!container) return;
+    // Find max value for color scaling
+    const maxValue = Math.max(...data.data.flat());
+    // Generate color based on intensity
+    function getColor(value) {
+        if (value === 0) return 'rgba(0, 136, 204, 0.1)';
+        const intensity = value / maxValue;
+        return `rgba(0, 136, 204, ${0.2 + intensity * 0.8})`;
+    }
+    let html = '<table class="heatmap-table"><thead><tr><th></th>';
+    // Hour headers
+    for (let h = 0; h < 24; h++) {
+        html += `<th>${h}</th>`;
+    }
+    html += '</tr></thead><tbody>';
+    // Day rows
+    data.days.forEach((day, dayIndex) => {
+        html += `<tr><td class="day-label">${day.substring(0, 3)}</td>`;
+        for (let h = 0; h < 24; h++) {
+            const value = data.data[dayIndex][h];
+            const color = getColor(value);
+            html += `<td><div class="heatmap-cell" style="background: ${color}" title="${day} ${h}:00 - ${value} messages"></div></td>`;
+        }
+        html += '</tr>';
+    });
+    html += '</tbody></table>';
+    container.innerHTML = html;
+}
+// ==========================================
+// TOP LISTS
+// ==========================================
+async function loadTopUsers() {
+    const listElement = document.getElementById('top-users-list');
+    if (!listElement) return;
+    showLoading('top-users-list');
+    const data = await fetchAPI('/api/users?limit=10');
+    if (!data || !data.users || data.users.length === 0) {
+        showEmpty('top-users-list');
+        return;
+    }
+    let html = '';
+    data.users.forEach((user, index) => {
+        const rankClass = index === 0 ? 'gold' : index === 1 ? 'silver' : index === 2 ? 'bronze' : '';
+        const initial = user.name.charAt(0).toUpperCase();
+        html += `
+            <div class="list-item" onclick="window.location.href='/user/${user.user_id}'" style="cursor: pointer">
+                <div class="list-rank ${rankClass}">#${user.rank}</div>
+                <div class="user-avatar">${initial}</div>
+                <div class="list-info">
+                    <div class="list-name">${escapeHtml(user.name)}</div>
+                    <div class="list-subtitle">${user.percentage}% of total</div>
+                </div>
+                <div class="list-value">${formatNumber(user.messages)}</div>
+            </div>
+        `;
+    });
+    listElement.innerHTML = html;
+}
+async function loadTopWords() {
+    const listElement = document.getElementById('top-words-list');
+    if (!listElement) return;
+    showLoading('top-words-list');
+    const data = await fetchAPI('/api/top/words?limit=10');
+    if (!data || data.length === 0) {
+        showEmpty('top-words-list');
+        return;
+    }
+    const maxCount = data[0].count;
+    let html = '';
+    data.forEach((item, index) => {
+        const percentage = (item.count / maxCount * 100).toFixed(0);
+        html += `
+            <div class="list-item">
+                <div class="list-rank">#${index + 1}</div>
+                <div class="list-info">
+                    <div class="list-name">${escapeHtml(item.word)}</div>
+                    <div class="progress-bar">
+                        <div class="progress-fill" style="width: ${percentage}%"></div>
+                    </div>
+                </div>
+                <div class="list-value">${formatNumber(item.count)}</div>
+            </div>
+        `;
+    });
+    listElement.innerHTML = html;
+}
+async function loadTopDomains() {
+    const listElement = document.getElementById('top-domains-list');
+    if (!listElement) return;
+    showLoading('top-domains-list');
+    const data = await fetchAPI('/api/top/domains?limit=10');
+    if (!data || data.length === 0) {
+        showEmpty('top-domains-list');
+        return;
+    }
+    const maxCount = data[0].count;
+    let html = '';
+    data.forEach((item, index) => {
+        const percentage = (item.count / maxCount * 100).toFixed(0);
+        html += `
+            <div class="list-item">
+                <div class="list-rank">#${index + 1}</div>
+                <div class="list-info">
+                    <div class="list-name">${escapeHtml(item.domain)}</div>
+                    <div class="progress-bar">
+                        <div class="progress-fill" style="width: ${percentage}%"></div>
+                    </div>
+                </div>
+                <div class="list-value">${formatNumber(item.count)}</div>
+            </div>
+        `;
+    });
+    listElement.innerHTML = html;
+}
+// ==========================================
+// USER MODAL
+// ==========================================
+async function openUserModal(userId) {
+    // Create modal if it doesn't exist
+    let modal = document.getElementById('user-modal');
+    if (!modal) {
+        modal = document.createElement('div');
+        modal.id = 'user-modal';
+        modal.className = 'modal-overlay';
+        modal.innerHTML = `
+            <div class="modal">
+                <div class="modal-header">
+                    <h2>User Details</h2>
+                    <button class="modal-close" onclick="closeUserModal()">&times;</button>
+                </div>
+                <div class="modal-body" id="user-modal-content">
+                    <div class="loading"><div class="spinner"></div></div>
+                </div>
+            </div>
+        `;
+        document.body.appendChild(modal);
+        // Close on backdrop click
+        modal.addEventListener('click', (e) => {
+            if (e.target === modal) closeUserModal();
+        });
+    }
+    modal.classList.add('active');
+    document.getElementById('user-modal-content').innerHTML = '<div class="loading"><div class="spinner"></div></div>';
+    const data = await fetchAPI(`/api/user/${userId}`);
+    if (!data || data.error) {
+        document.getElementById('user-modal-content').innerHTML = '<div class="empty-state"><p>User not found</p></div>';
+        return;
+    }
+    const initial = data.name.charAt(0).toUpperCase();
+    document.getElementById('user-modal-content').innerHTML = `
+        <div class="user-profile">
+            <div class="user-profile-avatar">${initial}</div>
+            <div class="user-profile-info">
+                <h3>${escapeHtml(data.name)}</h3>
+                <p>Rank #${data.rank} • Member since ${formatDate(data.first_seen)}</p>
+            </div>
+        </div>
+        <div class="user-stats-grid">
+            <div class="user-stat">
+                <div class="user-stat-value">${formatNumber(data.messages)}</div>
+                <div class="user-stat-label">Messages</div>
+            </div>
+            <div class="user-stat">
+                <div class="user-stat-value">${formatNumber(data.characters)}</div>
+                <div class="user-stat-label">Characters</div>
+            </div>
+            <div class="user-stat">
+                <div class="user-stat-value">${data.daily_average}</div>
+                <div class="user-stat-label">Daily Avg</div>
+            </div>
+            <div class="user-stat">
+                <div class="user-stat-value">${formatNumber(data.links)}</div>
+                <div class="user-stat-label">Links</div>
+            </div>
+            <div class="user-stat">
+                <div class="user-stat-value">${formatNumber(data.media)}</div>
+                <div class="user-stat-label">Media</div>
+            </div>
+            <div class="user-stat">
+                <div class="user-stat-value">${data.active_days}</div>
+                <div class="user-stat-label">Active Days</div>
+            </div>
+        </div>
+        <h4 style="margin-bottom: 1rem;">Activity by Hour</h4>
+        <canvas id="user-hourly-chart" height="150"></canvas>
+    `;
+    // Render user's hourly chart
+    const ctx = document.getElementById('user-hourly-chart');
+    new Chart(ctx, {
+        type: 'bar',
+        data: {
+            labels: Array.from({length: 24}, (_, i) => `${i}:00`),
+            datasets: [{
+                data: data.hourly_activity,
+                backgroundColor: '#0088cc',
+                borderRadius: 2
+            }]
+        },
+        options: {
+            responsive: true,
+            maintainAspectRatio: false,
+            plugins: { legend: { display: false } },
+            scales: {
+                x: { grid: { display: false }, ticks: { maxTicksLimit: 12 } },
+                y: { beginAtZero: true, grid: { color: '#2d3748' } }
+            }
+        }
+    });
+}
+function closeUserModal() {
+    const modal = document.getElementById('user-modal');
+    if (modal) modal.classList.remove('active');
+}
+// ==========================================
+// EXPORT FUNCTIONS
+// ==========================================
+function exportUsers() {
+    const timeframe = getTimeframe();
+    window.location.href = `/api/export/users?timeframe=${timeframe}`;
+}
+function exportMessages() {
+    const timeframe = getTimeframe();
+    window.location.href = `/api/export/messages?timeframe=${timeframe}`;
+}
+// ==========================================
+// AUTO REFRESH
+// ==========================================
+function toggleAutoRefresh() {
+    if (state.autoRefresh) {
+        clearInterval(state.autoRefresh);
+        state.autoRefresh = null;
+        console.log('Auto-refresh disabled');
+    } else {
+        state.autoRefresh = setInterval(loadAllData, 60000); // Refresh every minute
+        console.log('Auto-refresh enabled (60s)');
+    }
+}
+// ==========================================
+// UTILITY
+// ==========================================
+function escapeHtml(text) {
+    const div = document.createElement('div');
+    div.textContent = text;
+    return div.innerHTML;
+}
+// Keyboard shortcuts
+document.addEventListener('keydown', (e) => {
+    // Escape to close modal
+    if (e.key === 'Escape') {
+        closeUserModal();
+    }
+    // R to refresh
+    if (e.key === 'r' && !e.ctrlKey && !e.metaKey) {
+        const activeElement = document.activeElement;
+        if (activeElement.tagName !== 'INPUT' && activeElement.tagName !== 'TEXTAREA') {
+            loadAllData();
+        }
+    }
+});

templates/chat.html ADDED Viewed

	@@ -0,0 +1,831 @@

+<!DOCTYPE html>
+<html lang="he" dir="rtl">
+<head>
+    <meta charset="UTF-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>Chat View - Telegram Style</title>
+    <style>
+        /* ===== Telegram-like Chat Viewer ===== */
+        :root {
+            --bg-primary: #0e1621;
+            --bg-secondary: #17212b;
+            --bg-message: #182533;
+            --bg-hover: #1e2c3a;
+            --bg-reply: rgba(77, 184, 255, 0.08);
+            --bg-forward: rgba(100, 191, 71, 0.08);
+            --text-primary: #f5f5f5;
+            --text-secondary: #8b9fad;
+            --text-link: #6ab2f2;
+            --accent-blue: #6ab2f2;
+            --accent-green: #6dc264;
+            --border-reply: #6ab2f2;
+            --border-forward: #6dc264;
+            --date-badge: #1b2a38;
+            --nav-bg: #17212b;
+            --nav-border: #0e1621;
+        }
+        * { box-sizing: border-box; margin: 0; padding: 0; }
+        body {
+            font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, 'Helvetica Neue', Arial, sans-serif;
+            font-size: 14px;
+            line-height: 1.5;
+            background-color: var(--bg-primary);
+            color: var(--text-primary);
+        }
+        /* ===== Navigation ===== */
+        .nav-bar {
+            position: fixed;
+            top: 0; left: 0; right: 0;
+            z-index: 100;
+            background-color: var(--nav-bg);
+            border-bottom: 1px solid var(--nav-border);
+            padding: 0 16px;
+        }
+        .nav-content {
+            max-width: 800px;
+            margin: 0 auto;
+            display: flex;
+            align-items: center;
+            justify-content: space-between;
+            height: 56px;
+        }
+        .nav-title {
+            font-size: 18px;
+            font-weight: 700;
+            color: var(--text-primary);
+        }
+        .nav-links { display: flex; gap: 4px; }
+        .nav-links a {
+            color: var(--accent-blue);
+            text-decoration: none;
+            padding: 8px 14px;
+            border-radius: 8px;
+            font-size: 13px;
+            transition: background 0.15s;
+        }
+        .nav-links a:hover { background-color: var(--bg-hover); }
+        .nav-links a.active {
+            background-color: var(--accent-blue);
+            color: var(--bg-primary);
+        }
+        /* ===== Chat Area ===== */
+        .chat-wrap {
+            padding-top: 56px;
+            min-height: 100vh;
+        }
+        .chat-body {
+            max-width: 680px;
+            margin: 0 auto;
+            padding: 0 12px 80px;
+        }
+        .history { padding: 8px 0; }
+        /* ===== Load More ===== */
+        .load-more {
+            text-align: center;
+            padding: 16px;
+        }
+        .load-more button {
+            padding: 10px 24px;
+            background-color: var(--bg-secondary);
+            color: var(--accent-blue);
+            border: 1px solid rgba(106, 178, 242, 0.3);
+            border-radius: 20px;
+            cursor: pointer;
+            font-size: 14px;
+            transition: all 0.15s;
+        }
+        .load-more button:hover {
+            background-color: var(--bg-hover);
+            border-color: var(--accent-blue);
+        }
+        .load-more button:disabled { opacity: 0.4; cursor: not-allowed; }
+        /* ===== Date Separator ===== */
+        .date-separator {
+            display: flex;
+            align-items: center;
+            justify-content: center;
+            padding: 12px 0;
+            position: sticky;
+            top: 60px;
+            z-index: 10;
+        }
+        .date-badge {
+            padding: 4px 12px;
+            background-color: var(--date-badge);
+            border-radius: 12px;
+            color: var(--text-secondary);
+            font-size: 13px;
+            font-weight: 500;
+            box-shadow: 0 1px 4px rgba(0,0,0,0.2);
+        }
+        /* ===== Message ===== */
+        .msg {
+            display: flex;
+            align-items: flex-start;
+            gap: 10px;
+            padding: 3px 8px;
+            border-radius: 8px;
+            transition: background 0.15s;
+        }
+        .msg:hover { background-color: var(--bg-hover); }
+        .msg.joined { padding-top: 1px; }
+        .msg.joined .avatar-wrap { visibility: hidden; height: 0; }
+        /* ===== Avatar ===== */
+        .avatar-wrap { flex-shrink: 0; padding-top: 2px; }
+        .avatar {
+            width: 40px;
+            height: 40px;
+            border-radius: 50%;
+            display: flex;
+            align-items: center;
+            justify-content: center;
+            font-weight: 600;
+            font-size: 15px;
+            color: #fff;
+            cursor: pointer;
+        }
+        .avatar:hover { filter: brightness(1.15); }
+        /* 8 Telegram avatar colors */
+        .c1 { background: #ff5555; }
+        .c2 { background: #64bf47; }
+        .c3 { background: #ffab00; }
+        .c4 { background: #4f9cd9; }
+        .c5 { background: #9884e8; }
+        .c6 { background: #e671a5; }
+        .c7 { background: #47bcd1; }
+        .c8 { background: #ff8c44; }
+        /* Name colors to match avatars */
+        .name-c1 { color: #ff5555; }
+        .name-c2 { color: #64bf47; }
+        .name-c3 { color: #ffab00; }
+        .name-c4 { color: #4f9cd9; }
+        .name-c5 { color: #9884e8; }
+        .name-c6 { color: #e671a5; }
+        .name-c7 { color: #47bcd1; }
+        .name-c8 { color: #ff8c44; }
+        /* ===== Message Body ===== */
+        .msg-body {
+            flex: 1;
+            min-width: 0;
+        }
+        /* Header: name + time */
+        .msg-header {
+            display: flex;
+            align-items: baseline;
+            gap: 8px;
+            margin-bottom: 2px;
+        }
+        .msg-name {
+            font-weight: 600;
+            font-size: 14px;
+            cursor: pointer;
+        }
+        .msg-name:hover { text-decoration: underline; }
+        .msg-time {
+            color: var(--text-secondary);
+            font-size: 12px;
+            white-space: nowrap;
+        }
+        .msg-edited {
+            color: var(--text-secondary);
+            font-size: 11px;
+            font-style: italic;
+        }
+        /* ===== Reply Block ===== */
+        .reply-block {
+            display: flex;
+            gap: 0;
+            margin: 4px 0 6px;
+            padding: 6px 10px;
+            border-radius: 6px;
+            border-right: 3px solid var(--border-reply);
+            background: var(--bg-reply);
+            cursor: pointer;
+            overflow: hidden;
+            transition: background 0.15s;
+        }
+        .reply-block:hover { background: rgba(106, 178, 242, 0.15); }
+        .reply-content { min-width: 0; }
+        .reply-name {
+            font-weight: 600;
+            font-size: 13px;
+            color: var(--accent-blue);
+        }
+        .reply-text {
+            font-size: 13px;
+            color: var(--text-secondary);
+            white-space: nowrap;
+            overflow: hidden;
+            text-overflow: ellipsis;
+            max-width: 400px;
+        }
+        /* ===== Forward Block ===== */
+        .forward-block {
+            margin: 4px 0 6px;
+            padding: 6px 10px;
+            border-radius: 6px;
+            border-right: 3px solid var(--border-forward);
+            background: var(--bg-forward);
+        }
+        .forward-label {
+            font-size: 12px;
+            color: var(--text-secondary);
+        }
+        .forward-name {
+            font-weight: 600;
+            font-size: 13px;
+            color: var(--accent-green);
+        }
+        /* ===== Message Text ===== */
+        .msg-text {
+            word-wrap: break-word;
+            overflow-wrap: break-word;
+            line-height: 1.55;
+            unicode-bidi: plaintext;
+            text-align: start;
+            white-space: pre-wrap;
+        }
+        .msg-text a {
+            color: var(--text-link);
+            text-decoration: none;
+        }
+        .msg-text a:hover { text-decoration: underline; }
+        /* Mention */
+        .mention {
+            color: var(--accent-blue);
+            font-weight: 500;
+            cursor: pointer;
+        }
+        .mention:hover { text-decoration: underline; }
+        /* Hashtag */
+        .hashtag {
+            color: var(--accent-blue);
+            cursor: pointer;
+        }
+        /* Code */
+        .msg-text code {
+            font-family: 'Consolas', 'Monaco', 'Courier New', monospace;
+            background: rgba(255,255,255,0.06);
+            padding: 1px 5px;
+            border-radius: 4px;
+            font-size: 13px;
+        }
+        .msg-text pre {
+            background: rgba(0,0,0,0.3);
+            padding: 10px 12px;
+            border-radius: 8px;
+            margin: 6px 0;
+            overflow-x: auto;
+            font-family: 'Consolas', 'Monaco', 'Courier New', monospace;
+            font-size: 13px;
+            line-height: 1.4;
+        }
+        /* ===== Entities (links, media) ===== */
+        .entity-links {
+            margin-top: 6px;
+            display: flex;
+            flex-wrap: wrap;
+            gap: 6px;
+        }
+        .entity-link {
+            display: inline-flex;
+            align-items: center;
+            gap: 5px;
+            padding: 4px 10px;
+            background: rgba(106, 178, 242, 0.1);
+            border-radius: 8px;
+            font-size: 13px;
+            color: var(--text-link);
+            text-decoration: none;
+            max-width: 350px;
+            overflow: hidden;
+            text-overflow: ellipsis;
+            white-space: nowrap;
+            transition: background 0.15s;
+        }
+        .entity-link:hover {
+            background: rgba(106, 178, 242, 0.2);
+            text-decoration: none;
+        }
+        .entity-link .link-icon { font-size: 11px; }
+        .entity-link .link-domain {
+            opacity: 0.7;
+            font-size: 12px;
+        }
+        /* ===== Media Badge ===== */
+        .media-badge {
+            display: inline-flex;
+            align-items: center;
+            gap: 6px;
+            padding: 5px 10px;
+            background: var(--bg-secondary);
+            border-radius: 8px;
+            margin-top: 6px;
+            font-size: 13px;
+            color: var(--text-secondary);
+        }
+        .media-badge .media-icon { font-size: 14px; }
+        /* ===== Time for joined messages ===== */
+        .msg-time-inline {
+            color: var(--text-secondary);
+            font-size: 12px;
+            margin-top: 2px;
+            opacity: 0;
+            transition: opacity 0.15s;
+        }
+        .msg:hover .msg-time-inline { opacity: 1; }
+        /* ===== Selected (highlight on go-to) ===== */
+        .msg.selected {
+            background-color: rgba(106, 178, 242, 0.15);
+            transition: background-color 2s ease;
+        }
+        /* ===== Scroll-to-bottom ===== */
+        .scroll-btn {
+            position: fixed;
+            bottom: 24px;
+            left: 50%;
+            transform: translateX(-50%);
+            width: 44px;
+            height: 44px;
+            background: var(--bg-secondary);
+            color: var(--accent-blue);
+            border: 1px solid rgba(106, 178, 242, 0.3);
+            border-radius: 50%;
+            cursor: pointer;
+            font-size: 20px;
+            display: none;
+            align-items: center;
+            justify-content: center;
+            box-shadow: 0 2px 12px rgba(0,0,0,0.4);
+            z-index: 80;
+            transition: all 0.15s;
+        }
+        .scroll-btn.visible { display: flex; }
+        .scroll-btn:hover {
+            background: var(--accent-blue);
+            color: var(--bg-primary);
+        }
+        /* ===== Loading ===== */
+        .loading {
+            text-align: center;
+            padding: 24px;
+            color: var(--text-secondary);
+        }
+        .spinner {
+            display: inline-block;
+            width: 24px; height: 24px;
+            border: 3px solid var(--bg-secondary);
+            border-top-color: var(--accent-blue);
+            border-radius: 50%;
+            animation: spin 1s linear infinite;
+            margin-bottom: 8px;
+        }
+        @keyframes spin { to { transform: rotate(360deg); } }
+        /* ===== Toast ===== */
+        .toast {
+            position: fixed;
+            bottom: 80px;
+            left: 50%;
+            transform: translateX(-50%);
+            background: rgba(0,0,0,0.85);
+            color: #fff;
+            padding: 10px 24px;
+            border-radius: 20px;
+            z-index: 200;
+            opacity: 0;
+            transition: opacity 0.3s;
+            font-size: 13px;
+        }
+        .toast.visible { opacity: 1; }
+        /* ===== Responsive ===== */
+        @media (max-width: 700px) {
+            .nav-links a { padding: 6px 8px; font-size: 12px; }
+            .chat-body { padding: 0 4px 80px; }
+            .reply-text { max-width: 200px; }
+            .entity-link { max-width: 250px; }
+        }
+    </style>
+</head>
+<body>
+    <nav class="nav-bar">
+        <div class="nav-content">
+            <div class="nav-title">Chat View</div>
+            <div class="nav-links">
+                <a href="/">Overview</a>
+                <a href="/users">Users</a>
+                <a href="/chat" class="active">Chat</a>
+                <a href="/search">Search</a>
+                <a href="/moderation">Moderation</a>
+                <a href="/settings">Settings</a>
+            </div>
+        </div>
+    </nav>
+    <div class="chat-wrap">
+        <div class="chat-body">
+            <div class="history" id="history">
+                <div class="load-more" id="load-more-top">
+                    <button onclick="loadOlderMessages()" id="load-older-btn">&#8593; Load earlier messages</button>
+                </div>
+                <div id="messages-container"></div>
+                <div class="loading" id="loading">
+                    <div class="spinner"></div>
+                    <div>Loading messages...</div>
+                </div>
+            </div>
+        </div>
+    </div>
+    <button class="scroll-btn" id="scroll-bottom" onclick="scrollToBottom()">&#8595;</button>
+    <div class="toast" id="toast"></div>
+    <script>
+        // ===== State =====
+        let allMessages = [];
+        let oldestOffset = 0;
+        let totalMessages = 0;
+        let loading = false;
+        let initialLoad = true;
+        const BATCH_SIZE = 100;
+        const userColors = {};
+        // ===== Utilities =====
+        function getUserColor(userId) {
+            if (!userColors[userId]) {
+                let hash = 0;
+                const str = String(userId);
+                for (let i = 0; i < str.length; i++) {
+                    hash = str.charCodeAt(i) + ((hash << 5) - hash);
+                }
+                userColors[userId] = (Math.abs(hash) % 8) + 1;
+            }
+            return userColors[userId];
+        }
+        function getInitials(name) {
+            if (!name) return '?';
+            const parts = name.trim().split(/\s+/);
+            if (parts.length >= 2) return (parts[0][0] + parts[1][0]).toUpperCase();
+            return name.substring(0, 2).toUpperCase();
+        }
+        function formatDate(dateStr) {
+            if (!dateStr) return '';
+            const d = new Date(dateStr);
+            const months = ['January','February','March','April','May','June',
+                           'July','August','September','October','November','December'];
+            return `${months[d.getMonth()]} ${d.getDate()}, ${d.getFullYear()}`;
+        }
+        function formatTime(dateStr) {
+            if (!dateStr) return '';
+            const d = new Date(dateStr);
+            return d.toLocaleTimeString('en-US', { hour: '2-digit', minute: '2-digit', hour12: false });
+        }
+        function escapeHtml(text) {
+            if (!text) return '';
+            const div = document.createElement('div');
+            div.textContent = text;
+            return div.innerHTML;
+        }
+        function getDomain(url) {
+            try {
+                return new URL(url).hostname.replace('www.', '');
+            } catch {
+                return url.substring(0, 30);
+            }
+        }
+        // ===== Text Formatting =====
+        function formatMessageText(text, entities) {
+            if (!text) return '';
+            let html = escapeHtml(text);
+            // Auto-linkify URLs in text
+            html = html.replace(
+                /(https?:\/\/[^\s<]+)/g,
+                '<a href="$1" target="_blank" rel="noopener">$1</a>'
+            );
+            // Highlight @mentions
+            html = html.replace(
+                /@(\w{3,})/g,
+                '<span class="mention">@$1</span>'
+            );
+            // Highlight #hashtags
+            html = html.replace(
+                /#(\w{2,})/g,
+                '<span class="hashtag">#$1</span>'
+            );
+            // Convert newlines to <br>
+            html = html.replace(/\n/g, '<br>');
+            return html;
+        }
+        // ===== Render Message =====
+        function renderMessage(msg, prevMsg) {
+            const frag = document.createDocumentFragment();
+            // Date separator
+            const msgDate = msg.date ? msg.date.split('T')[0] : '';
+            const prevDate = prevMsg && prevMsg.date ? prevMsg.date.split('T')[0] : '';
+            if (msgDate !== prevDate) {
+                const sep = document.createElement('div');
+                sep.className = 'date-separator';
+                sep.innerHTML = `<div class="date-badge">${formatDate(msg.date)}</div>`;
+                frag.appendChild(sep);
+            }
+            // Joined message? (same user, same day, within 5 minutes)
+            const isJoined = prevMsg &&
+                prevMsg.from_id === msg.from_id &&
+                msgDate === prevDate &&
+                !msg.forwarded_from &&
+                !prevMsg.forwarded_from &&
+                timeDiffMinutes(prevMsg.date, msg.date) < 5;
+            const colorNum = getUserColor(msg.from_id);
+            const el = document.createElement('div');
+            el.className = `msg${isJoined ? ' joined' : ''}`;
+            el.id = `message${msg.message_id || msg.id}`;
+            let html = '';
+            // Avatar
+            html += `<div class="avatar-wrap">
+                <div class="avatar c${colorNum}">${getInitials(msg.from_name)}</div>
+            </div>`;
+            // Body
+            html += '<div class="msg-body">';
+            // Header (name + time) - only for first message in group
+            if (!isJoined) {
+                html += `<div class="msg-header">
+                    <span class="msg-name name-c${colorNum}">${escapeHtml(msg.from_name || 'Unknown')}</span>
+                    <span class="msg-time">${formatTime(msg.date)}</span>
+                    ${msg.is_edited ? '<span class="msg-edited">edited</span>' : ''}
+                </div>`;
+            }
+            // Forward block
+            if (msg.forwarded_from) {
+                html += `<div class="forward-block">
+                    <div class="forward-label">Forwarded message</div>
+                    <div class="forward-name">${escapeHtml(msg.forwarded_from)}</div>
+                </div>`;
+            }
+            // Reply block
+            if (msg.reply_to_message_id && msg.reply_to_name) {
+                html += `<div class="reply-block" onclick="goToMessage(${msg.reply_to_message_id})">
+                    <div class="reply-content">
+                        <div class="reply-name">${escapeHtml(msg.reply_to_name)}</div>
+                        <div class="reply-text">${escapeHtml(msg.reply_to_text || '')}</div>
+                    </div>
+                </div>`;
+            }
+            // Message text
+            if (msg.text) {
+                html += `<div class="msg-text">${formatMessageText(msg.text, msg.entities)}</div>`;
+            }
+            // Entity links (extracted from DB)
+            const links = (msg.entities || []).filter(e => e.type === 'link' || e.type === 'text_link');
+            if (links.length > 0) {
+                html += '<div class="entity-links">';
+                const seen = new Set();
+                for (const link of links) {
+                    const url = link.value;
+                    if (seen.has(url)) continue;
+                    seen.add(url);
+                    // Skip if the link is already visible in the text
+                    if (msg.text && msg.text.includes(url)) continue;
+                    const domain = getDomain(url);
+                    html += `<a class="entity-link" href="${escapeHtml(url)}" target="_blank" rel="noopener">
+                        <span class="link-icon">🔗</span>
+                        <span class="link-domain">${escapeHtml(domain)}</span>
+                    </a>`;
+                }
+                html += '</div>';
+            }
+            // Media badge
+            if (msg.has_media) {
+                const icon = msg.has_photo ? '📷' : '📎';
+                const label = msg.has_photo ? 'Photo' : 'Media';
+                html += `<div class="media-badge"><span class="media-icon">${icon}</span> ${label}</div>`;
+            }
+            // Time for joined messages (shown on hover)
+            if (isJoined) {
+                html += `<div class="msg-time-inline">${formatTime(msg.date)}${msg.is_edited ? ' · edited' : ''}</div>`;
+            }
+            html += '</div>'; // close msg-body
+            el.innerHTML = html;
+            frag.appendChild(el);
+            return frag;
+        }
+        function timeDiffMinutes(dateStr1, dateStr2) {
+            if (!dateStr1 || !dateStr2) return 999;
+            return Math.abs(new Date(dateStr2) - new Date(dateStr1)) / 60000;
+        }
+        // ===== Render All =====
+        function renderAllMessages() {
+            const container = document.getElementById('messages-container');
+            container.innerHTML = '';
+            for (let i = 0; i < allMessages.length; i++) {
+                container.appendChild(renderMessage(allMessages[i], i > 0 ? allMessages[i-1] : null));
+            }
+        }
+        // ===== Load Messages =====
+        async function loadInitialMessages() {
+            if (loading) return;
+            loading = true;
+            document.getElementById('loading').style.display = 'block';
+            try {
+                const countRes = await fetch('/api/chat/messages?limit=1&offset=0');
+                const countData = await countRes.json();
+                totalMessages = countData.total || 0;
+                if (totalMessages === 0) {
+                    document.getElementById('loading').style.display = 'none';
+                    document.getElementById('messages-container').innerHTML =
+                        '<div class="date-separator"><div class="date-badge">No messages found</div></div>';
+                    loading = false;
+                    return;
+                }
+                const startOffset = Math.max(0, totalMessages - BATCH_SIZE);
+                oldestOffset = startOffset;
+                const res = await fetch(`/api/chat/messages?limit=${BATCH_SIZE}&offset=${startOffset}`);
+                const data = await res.json();
+                if (data.messages && data.messages.length > 0) {
+                    allMessages = data.messages;
+                    renderAllMessages();
+                    setTimeout(() => { scrollToBottom(); initialLoad = false; }, 100);
+                    if (oldestOffset <= 0) {
+                        document.getElementById('load-more-top').style.display = 'none';
+                    }
+                }
+            } catch (e) {
+                console.error('Error loading messages:', e);
+                showToast('Error loading messages');
+            }
+            loading = false;
+            document.getElementById('loading').style.display = 'none';
+        }
+        async function loadOlderMessages() {
+            if (loading || oldestOffset <= 0) return;
+            loading = true;
+            document.getElementById('load-older-btn').disabled = true;
+            try {
+                const newOffset = Math.max(0, oldestOffset - BATCH_SIZE);
+                const limit = oldestOffset - newOffset;
+                const res = await fetch(`/api/chat/messages?limit=${limit}&offset=${newOffset}`);
+                const data = await res.json();
+                if (data.messages && data.messages.length > 0) {
+                    const container = document.getElementById('messages-container');
+                    const scrollBefore = container.scrollHeight;
+                    allMessages = [...data.messages, ...allMessages];
+                    oldestOffset = newOffset;
+                    renderAllMessages();
+                    const scrollAfter = container.scrollHeight;
+                    window.scrollBy(0, scrollAfter - scrollBefore);
+                    if (oldestOffset <= 0) {
+                        document.getElementById('load-more-top').style.display = 'none';
+                    }
+                }
+            } catch (e) {
+                console.error('Error loading older messages:', e);
+                showToast('Error loading messages');
+            }
+            loading = false;
+            document.getElementById('load-older-btn').disabled = false;
+        }
+        // ===== Navigation =====
+        function goToMessage(messageId) {
+            const el = document.getElementById(`message${messageId}`);
+            if (el) {
+                el.scrollIntoView({ behavior: 'smooth', block: 'center' });
+                el.classList.add('selected');
+                setTimeout(() => el.classList.remove('selected'), 2500);
+            } else {
+                showToast('Message not in current view');
+            }
+        }
+        function scrollToBottom() {
+            window.scrollTo({ top: document.body.scrollHeight, behavior: 'smooth' });
+        }
+        function showToast(message) {
+            const toast = document.getElementById('toast');
+            toast.textContent = message;
+            toast.classList.add('visible');
+            setTimeout(() => toast.classList.remove('visible'), 3000);
+        }
+        // Scroll button visibility
+        window.addEventListener('scroll', () => {
+            const btn = document.getElementById('scroll-bottom');
+            const dist = document.body.scrollHeight - window.scrollY - window.innerHeight;
+            btn.classList.toggle('visible', dist > 500);
+        });
+        // ===== Init =====
+        loadInitialMessages();
+    </script>
+</body>
+</html>

templates/index.html ADDED Viewed

	@@ -0,0 +1,223 @@

+<!DOCTYPE html>
+<html lang="en">
+<head>
+    <meta charset="UTF-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>Telegram Analytics Dashboard</title>
+    <link rel="stylesheet" href="/static/css/style.css">
+    <script src="https://cdn.jsdelivr.net/npm/chart.js"></script>
+</head>
+<body>
+    <!-- Sidebar -->
+    <nav class="sidebar">
+        <div class="logo">
+            <span class="logo-icon">📊</span>
+            <span class="logo-text">TG Analytics</span>
+        </div>
+        <ul class="nav-menu">
+            <li class="nav-item active">
+                <a href="/" class="nav-link">
+                    <span class="icon">📈</span>
+                    <span>Overview</span>
+                </a>
+            </li>
+            <li class="nav-item">
+                <a href="/users" class="nav-link">
+                    <span class="icon">👥</span>
+                    <span>Users</span>
+                </a>
+            </li>
+            <li class="nav-item">
+                <a href="/chat" class="nav-link">
+                    <span class="icon">💬</span>
+                    <span>Chat</span>
+                </a>
+            </li>
+            <li class="nav-item">
+                <a href="/search" class="nav-link">
+                    <span class="icon">🔍</span>
+                    <span>Search</span>
+                </a>
+            </li>
+            <li class="nav-item">
+                <a href="/moderation" class="nav-link">
+                    <span class="icon">🛡️</span>
+                    <span>Moderation</span>
+                </a>
+            </li>
+            <li class="nav-item">
+                <a href="/settings" class="nav-link">
+                    <span class="icon">⚙️</span>
+                    <span>Settings</span>
+                </a>
+            </li>
+        </ul>
+        <div class="sidebar-footer">
+            <div class="export-buttons">
+                <button onclick="exportUsers()" class="btn btn-sm">📥 Export Users</button>
+                <button onclick="exportMessages()" class="btn btn-sm">📥 Export Messages</button>
+            </div>
+        </div>
+    </nav>
+    <!-- Main Content -->
+    <main class="main-content">
+        <!-- Header -->
+        <header class="header">
+            <h1>Dashboard Overview</h1>
+            <div class="header-controls">
+                <select id="timeframe" class="select" onchange="loadAllData()">
+                    <option value="today">Today</option>
+                    <option value="yesterday">Yesterday</option>
+                    <option value="week">This Week</option>
+                    <option value="month" selected>This Month</option>
+                    <option value="year">This Year</option>
+                    <option value="all">All Time</option>
+                </select>
+                <button onclick="loadAllData()" class="btn btn-primary">🔄 Refresh</button>
+            </div>
+        </header>
+        <!-- Stats Cards -->
+        <section class="stats-grid">
+            <div class="stat-card">
+                <div class="stat-icon">💬</div>
+                <div class="stat-content">
+                    <div class="stat-value" id="total-messages">-</div>
+                    <div class="stat-label">Total Messages</div>
+                </div>
+            </div>
+            <div class="stat-card">
+                <div class="stat-icon">👤</div>
+                <div class="stat-content">
+                    <div class="stat-value" id="active-users">-</div>
+                    <div class="stat-label">Active Users</div>
+                </div>
+            </div>
+            <div class="stat-card">
+                <div class="stat-icon">📅</div>
+                <div class="stat-content">
+                    <div class="stat-value" id="messages-per-day">-</div>
+                    <div class="stat-label">Messages/Day</div>
+                </div>
+            </div>
+            <div class="stat-card">
+                <div class="stat-icon">🔗</div>
+                <div class="stat-content">
+                    <div class="stat-value" id="links-count">-</div>
+                    <div class="stat-label">Links Shared</div>
+                </div>
+            </div>
+            <div class="stat-card">
+                <div class="stat-icon">🖼️</div>
+                <div class="stat-content">
+                    <div class="stat-value" id="media-count">-</div>
+                    <div class="stat-label">Media Shared</div>
+                </div>
+            </div>
+            <div class="stat-card">
+                <div class="stat-icon">↩️</div>
+                <div class="stat-content">
+                    <div class="stat-value" id="replies-count">-</div>
+                    <div class="stat-label">Replies</div>
+                </div>
+            </div>
+        </section>
+        <!-- Charts Row 1 -->
+        <section class="charts-row">
+            <div class="chart-card large">
+                <div class="chart-header">
+                    <h3>Message Volume</h3>
+                    <select id="messages-granularity" class="select-sm" onchange="loadMessagesChart()">
+                        <option value="hour">Hourly</option>
+                        <option value="day" selected>Daily</option>
+                        <option value="week">Weekly</option>
+                    </select>
+                </div>
+                <div class="chart-container">
+                    <canvas id="messages-chart"></canvas>
+                </div>
+            </div>
+            <div class="chart-card">
+                <div class="chart-header">
+                    <h3>Active Users</h3>
+                </div>
+                <div class="chart-container">
+                    <canvas id="users-chart"></canvas>
+                </div>
+            </div>
+        </section>
+        <!-- Charts Row 2 -->
+        <section class="charts-row">
+            <div class="chart-card">
+                <div class="chart-header">
+                    <h3>Activity by Hour</h3>
+                </div>
+                <div class="chart-container">
+                    <canvas id="hourly-chart"></canvas>
+                </div>
+            </div>
+            <div class="chart-card">
+                <div class="chart-header">
+                    <h3>Activity by Day</h3>
+                </div>
+                <div class="chart-container">
+                    <canvas id="daily-chart"></canvas>
+                </div>
+            </div>
+        </section>
+        <!-- Heatmap -->
+        <section class="charts-row">
+            <div class="chart-card full-width">
+                <div class="chart-header">
+                    <h3>Activity Heatmap</h3>
+                    <span class="chart-subtitle">Hour of Day vs Day of Week</span>
+                </div>
+                <div class="heatmap-container" id="heatmap">
+                    <!-- Heatmap will be rendered here -->
+                </div>
+            </div>
+        </section>
+        <!-- Top Lists -->
+        <section class="lists-row">
+            <div class="list-card">
+                <div class="list-header">
+                    <h3>🏆 Top Users</h3>
+                    <a href="/users" class="link">View All →</a>
+                </div>
+                <div class="list-content" id="top-users-list">
+                    <!-- List will be rendered here -->
+                </div>
+            </div>
+            <div class="list-card">
+                <div class="list-header">
+                    <h3>🔤 Top Words</h3>
+                </div>
+                <div class="list-content" id="top-words-list">
+                    <!-- List will be rendered here -->
+                </div>
+            </div>
+            <div class="list-card">
+                <div class="list-header">
+                    <h3>🌐 Top Domains</h3>
+                </div>
+                <div class="list-content" id="top-domains-list">
+                    <!-- List will be rendered here -->
+                </div>
+            </div>
+        </section>
+    </main>
+    <script src="/static/js/dashboard.js"></script>
+    <script>
+        // Initialize
+        document.addEventListener('DOMContentLoaded', () => {
+            loadAllData();
+        });
+    </script>
+</body>
+</html>

templates/moderation.html ADDED Viewed

	@@ -0,0 +1,459 @@

+<!DOCTYPE html>
+<html lang="en">
+<head>
+    <meta charset="UTF-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>Moderation - Telegram Analytics</title>
+    <link rel="stylesheet" href="/static/css/style.css">
+    <script src="https://cdn.jsdelivr.net/npm/chart.js"></script>
+</head>
+<body>
+    <!-- Sidebar -->
+    <nav class="sidebar">
+        <div class="logo">
+            <span class="logo-icon">📊</span>
+            <span class="logo-text">TG Analytics</span>
+        </div>
+        <ul class="nav-menu">
+            <li class="nav-item">
+                <a href="/" class="nav-link">
+                    <span class="icon">📈</span>
+                    <span>Overview</span>
+                </a>
+            </li>
+            <li class="nav-item">
+                <a href="/users" class="nav-link">
+                    <span class="icon">👥</span>
+                    <span>Users</span>
+                </a>
+            </li>
+            <li class="nav-item">
+                <a href="/chat" class="nav-link">
+                    <span class="icon">💬</span>
+                    <span>Chat</span>
+                </a>
+            </li>
+            <li class="nav-item">
+                <a href="/search" class="nav-link">
+                    <span class="icon">🔍</span>
+                    <span>Search</span>
+                </a>
+            </li>
+            <li class="nav-item active">
+                <a href="/moderation" class="nav-link">
+                    <span class="icon">🛡️</span>
+                    <span>Moderation</span>
+                </a>
+            </li>
+            <li class="nav-item">
+                <a href="/settings" class="nav-link">
+                    <span class="icon">⚙️</span>
+                    <span>Settings</span>
+                </a>
+            </li>
+        </ul>
+    </nav>
+    <!-- Main Content -->
+    <main class="main-content">
+        <!-- Header -->
+        <header class="header">
+            <h1>Moderation & Content Analytics</h1>
+            <div class="header-controls">
+                <select id="timeframe" class="select" onchange="loadAllData()">
+                    <option value="today">Today</option>
+                    <option value="yesterday">Yesterday</option>
+                    <option value="week">This Week</option>
+                    <option value="month" selected>This Month</option>
+                    <option value="year">This Year</option>
+                    <option value="all">All Time</option>
+                </select>
+                <button onclick="loadAllData()" class="btn btn-primary">🔄 Refresh</button>
+            </div>
+        </header>
+        <!-- Content Stats -->
+        <section class="stats-grid">
+            <div class="stat-card">
+                <div class="stat-icon">🔗</div>
+                <div class="stat-content">
+                    <div class="stat-value" id="total-links">-</div>
+                    <div class="stat-label">Links Shared</div>
+                </div>
+            </div>
+            <div class="stat-card">
+                <div class="stat-icon">🖼️</div>
+                <div class="stat-content">
+                    <div class="stat-value" id="total-media">-</div>
+                    <div class="stat-label">Media Shared</div>
+                </div>
+            </div>
+            <div class="stat-card">
+                <div class="stat-icon">@</div>
+                <div class="stat-content">
+                    <div class="stat-value" id="total-mentions">-</div>
+                    <div class="stat-label">Mentions</div>
+                </div>
+            </div>
+            <div class="stat-card">
+                <div class="stat-icon">↪️</div>
+                <div class="stat-content">
+                    <div class="stat-value" id="total-forwards">-</div>
+                    <div class="stat-label">Forwards</div>
+                </div>
+            </div>
+        </section>
+        <!-- Charts Row -->
+        <section class="charts-row">
+            <div class="chart-card">
+                <div class="chart-header">
+                    <h3>Top Shared Domains</h3>
+                </div>
+                <div class="chart-container">
+                    <canvas id="domains-chart"></canvas>
+                </div>
+            </div>
+            <div class="chart-card">
+                <div class="chart-header">
+                    <h3>Content Type Distribution</h3>
+                </div>
+                <div class="chart-container">
+                    <canvas id="content-chart"></canvas>
+                </div>
+            </div>
+        </section>
+        <!-- Lists Row -->
+        <section class="lists-row">
+            <!-- Top Domains List -->
+            <div class="list-card">
+                <div class="list-header">
+                    <h3>🌐 Top Domains</h3>
+                </div>
+                <div class="list-content" id="domains-list">
+                    <div class="loading"><div class="spinner"></div></div>
+                </div>
+            </div>
+            <!-- Top Mentions List -->
+            <div class="list-card">
+                <div class="list-header">
+                    <h3>@ Top Mentions</h3>
+                </div>
+                <div class="list-content" id="mentions-list">
+                    <div class="loading"><div class="spinner"></div></div>
+                </div>
+            </div>
+            <!-- Top Words List -->
+            <div class="list-card">
+                <div class="list-header">
+                    <h3>🔤 Top Words</h3>
+                </div>
+                <div class="list-content" id="words-list">
+                    <div class="loading"><div class="spinner"></div></div>
+                </div>
+            </div>
+        </section>
+        <!-- Link Sharers -->
+        <section class="chart-card full-width">
+            <div class="chart-header">
+                <h3>Top Link Sharers</h3>
+            </div>
+            <div style="overflow-x: auto;">
+                <table class="users-table">
+                    <thead>
+                        <tr>
+                            <th style="width: 60px;">Rank</th>
+                            <th>User</th>
+                            <th style="width: 120px;">Links</th>
+                            <th style="width: 120px;">Media</th>
+                            <th style="width: 120px;">Messages</th>
+                            <th style="width: 150px;">Link Rate</th>
+                        </tr>
+                    </thead>
+                    <tbody id="link-sharers-body">
+                        <tr>
+                            <td colspan="6" class="loading">
+                                <div class="spinner"></div>
+                            </td>
+                        </tr>
+                    </tbody>
+                </table>
+            </div>
+        </section>
+    </main>
+    <script>
+        // Chart instances
+        let domainsChart = null;
+        let contentChart = null;
+        // Initialize
+        document.addEventListener('DOMContentLoaded', () => {
+            loadAllData();
+        });
+        async function loadAllData() {
+            await Promise.all([
+                loadOverview(),
+                loadDomains(),
+                loadMentions(),
+                loadWords(),
+                loadLinkSharers()
+            ]);
+        }
+        async function loadOverview() {
+            const timeframe = document.getElementById('timeframe').value;
+            try {
+                const response = await fetch(`/api/overview?timeframe=${timeframe}`);
+                const data = await response.json();
+                document.getElementById('total-links').textContent = formatNumber(data.links_count);
+                document.getElementById('total-media').textContent = formatNumber(data.media_count);
+                document.getElementById('total-mentions').textContent = formatNumber(data.mentions_count);
+                document.getElementById('total-forwards').textContent = formatNumber(data.forwards_count);
+                // Update content distribution chart
+                renderContentChart(data);
+            } catch (error) {
+                console.error('Error loading overview:', error);
+            }
+        }
+        async function loadDomains() {
+            const timeframe = document.getElementById('timeframe').value;
+            const listDiv = document.getElementById('domains-list');
+            try {
+                const response = await fetch(`/api/top/domains?timeframe=${timeframe}&limit=15`);
+                const data = await response.json();
+                if (data.length === 0) {
+                    listDiv.innerHTML = '<div class="empty-state">No domains found</div>';
+                    return;
+                }
+                listDiv.innerHTML = data.map((item, i) => `
+                    <div class="list-item">
+                        <span class="list-rank ${i < 3 ? ['gold', 'silver', 'bronze'][i] : ''}">#${i + 1}</span>
+                        <div class="list-info">
+                            <div class="list-name">${escapeHtml(item.domain)}</div>
+                        </div>
+                        <span class="list-value">${formatNumber(item.count)}</span>
+                    </div>
+                `).join('');
+                // Render domains chart
+                renderDomainsChart(data.slice(0, 8));
+            } catch (error) {
+                listDiv.innerHTML = '<div class="empty-state">Error loading domains</div>';
+            }
+        }
+        async function loadMentions() {
+            const timeframe = document.getElementById('timeframe').value;
+            const listDiv = document.getElementById('mentions-list');
+            try {
+                const response = await fetch(`/api/top/mentions?timeframe=${timeframe}&limit=15`);
+                const data = await response.json();
+                if (data.length === 0) {
+                    listDiv.innerHTML = '<div class="empty-state">No mentions found</div>';
+                    return;
+                }
+                listDiv.innerHTML = data.map((item, i) => `
+                    <div class="list-item">
+                        <span class="list-rank ${i < 3 ? ['gold', 'silver', 'bronze'][i] : ''}">#${i + 1}</span>
+                        <div class="list-info">
+                            <div class="list-name">@${escapeHtml(item.mention)}</div>
+                        </div>
+                        <span class="list-value">${formatNumber(item.count)}</span>
+                    </div>
+                `).join('');
+            } catch (error) {
+                listDiv.innerHTML = '<div class="empty-state">Error loading mentions</div>';
+            }
+        }
+        async function loadWords() {
+            const timeframe = document.getElementById('timeframe').value;
+            const listDiv = document.getElementById('words-list');
+            try {
+                const response = await fetch(`/api/top/words?timeframe=${timeframe}&limit=15`);
+                const data = await response.json();
+                if (data.length === 0) {
+                    listDiv.innerHTML = '<div class="empty-state">No words found</div>';
+                    return;
+                }
+                listDiv.innerHTML = data.map((item, i) => `
+                    <div class="list-item">
+                        <span class="list-rank ${i < 3 ? ['gold', 'silver', 'bronze'][i] : ''}">#${i + 1}</span>
+                        <div class="list-info">
+                            <div class="list-name">${escapeHtml(item.word)}</div>
+                        </div>
+                        <span class="list-value">${formatNumber(item.count)}</span>
+                    </div>
+                `).join('');
+            } catch (error) {
+                listDiv.innerHTML = '<div class="empty-state">Error loading words</div>';
+            }
+        }
+        async function loadLinkSharers() {
+            const timeframe = document.getElementById('timeframe').value;
+            const tbody = document.getElementById('link-sharers-body');
+            try {
+                const response = await fetch(`/api/users?timeframe=${timeframe}&limit=10`);
+                const data = await response.json();
+                // Sort by links
+                const users = data.users.sort((a, b) => b.links - a.links).slice(0, 10);
+                if (users.length === 0) {
+                    tbody.innerHTML = '<tr><td colspan="6" class="empty-state">No data found</td></tr>';
+                    return;
+                }
+                tbody.innerHTML = users.map((user, i) => {
+                    const linkRate = user.messages > 0 ? ((user.links / user.messages) * 100).toFixed(1) : 0;
+                    const rankClass = i === 0 ? 'gold' : i === 1 ? 'silver' : i === 2 ? 'bronze' : '';
+                    return `
+                        <tr>
+                            <td><span class="list-rank ${rankClass}">#${i + 1}</span></td>
+                            <td>
+                                <div class="user-cell">
+                                    <div class="user-avatar">${user.name.charAt(0).toUpperCase()}</div>
+                                    <div>
+                                        <div class="list-name">${escapeHtml(user.name)}</div>
+                                    </div>
+                                </div>
+                            </td>
+                            <td><strong>${formatNumber(user.links)}</strong></td>
+                            <td>${formatNumber(user.media)}</td>
+                            <td>${formatNumber(user.messages)}</td>
+                            <td>
+                                ${linkRate}%
+                                <div class="progress-bar">
+                                    <div class="progress-fill" style="width: ${Math.min(linkRate * 2, 100)}%"></div>
+                                </div>
+                            </td>
+                        </tr>
+                    `;
+                }).join('');
+            } catch (error) {
+                tbody.innerHTML = '<tr><td colspan="6" class="empty-state">Error loading data</td></tr>';
+            }
+        }
+        function renderDomainsChart(data) {
+            const ctx = document.getElementById('domains-chart').getContext('2d');
+            if (domainsChart) domainsChart.destroy();
+            domainsChart = new Chart(ctx, {
+                type: 'bar',
+                data: {
+                    labels: data.map(d => d.domain.substring(0, 15)),
+                    datasets: [{
+                        data: data.map(d => d.count),
+                        backgroundColor: [
+                            'rgba(0, 136, 204, 0.8)',
+                            'rgba(40, 167, 69, 0.8)',
+                            'rgba(255, 193, 7, 0.8)',
+                            'rgba(220, 53, 69, 0.8)',
+                            'rgba(23, 162, 184, 0.8)',
+                            'rgba(108, 117, 125, 0.8)',
+                            'rgba(111, 66, 193, 0.8)',
+                            'rgba(253, 126, 20, 0.8)'
+                        ],
+                        borderWidth: 0
+                    }]
+                },
+                options: {
+                    indexAxis: 'y',
+                    responsive: true,
+                    maintainAspectRatio: false,
+                    plugins: { legend: { display: false } },
+                    scales: {
+                        x: {
+                            grid: { color: 'rgba(255, 255, 255, 0.1)' },
+                            ticks: { color: '#a0aec0' }
+                        },
+                        y: {
+                            grid: { display: false },
+                            ticks: { color: '#a0aec0' }
+                        }
+                    }
+                }
+            });
+        }
+        function renderContentChart(data) {
+            const ctx = document.getElementById('content-chart').getContext('2d');
+            if (contentChart) contentChart.destroy();
+            const textOnly = data.total_messages - data.links_count - data.media_count;
+            contentChart = new Chart(ctx, {
+                type: 'doughnut',
+                data: {
+                    labels: ['Text Only', 'With Links', 'With Media', 'Replies', 'Forwards'],
+                    datasets: [{
+                        data: [
+                            Math.max(0, textOnly),
+                            data.links_count,
+                            data.media_count,
+                            data.replies_count,
+                            data.forwards_count
+                        ],
+                        backgroundColor: [
+                            'rgba(0, 136, 204, 0.8)',
+                            'rgba(40, 167, 69, 0.8)',
+                            'rgba(255, 193, 7, 0.8)',
+                            'rgba(23, 162, 184, 0.8)',
+                            'rgba(108, 117, 125, 0.8)'
+                        ],
+                        borderWidth: 0
+                    }]
+                },
+                options: {
+                    responsive: true,
+                    maintainAspectRatio: false,
+                    plugins: {
+                        legend: {
+                            position: 'right',
+                            labels: { color: '#a0aec0' }
+                        }
+                    }
+                }
+            });
+        }
+        // Helper functions
+        function formatNumber(num) {
+            if (num >= 1000000) return (num / 1000000).toFixed(1) + 'M';
+            if (num >= 1000) return (num / 1000).toFixed(1) + 'K';
+            return num.toString();
+        }
+        function escapeHtml(text) {
+            const div = document.createElement('div');
+            div.textContent = text;
+            return div.innerHTML;
+        }
+    </script>
+</body>
+</html>

templates/search.html ADDED Viewed

	@@ -0,0 +1,359 @@

+<!DOCTYPE html>
+<html lang="en">
+<head>
+    <meta charset="UTF-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>Search - Telegram Analytics</title>
+    <link rel="stylesheet" href="/static/css/style.css">
+</head>
+<body>
+    <!-- Sidebar -->
+    <nav class="sidebar">
+        <div class="logo">
+            <span class="logo-icon">📊</span>
+            <span class="logo-text">TG Analytics</span>
+        </div>
+        <ul class="nav-menu">
+            <li class="nav-item">
+                <a href="/" class="nav-link">
+                    <span class="icon">📈</span>
+                    <span>Overview</span>
+                </a>
+            </li>
+            <li class="nav-item">
+                <a href="/users" class="nav-link">
+                    <span class="icon">👥</span>
+                    <span>Users</span>
+                </a>
+            </li>
+            <li class="nav-item">
+                <a href="/chat" class="nav-link">
+                    <span class="icon">💬</span>
+                    <span>Chat</span>
+                </a>
+            </li>
+            <li class="nav-item active">
+                <a href="/search" class="nav-link">
+                    <span class="icon">🔍</span>
+                    <span>Search</span>
+                </a>
+            </li>
+            <li class="nav-item">
+                <a href="/moderation" class="nav-link">
+                    <span class="icon">🛡️</span>
+                    <span>Moderation</span>
+                </a>
+            </li>
+            <li class="nav-item">
+                <a href="/settings" class="nav-link">
+                    <span class="icon">⚙️</span>
+                    <span>Settings</span>
+                </a>
+            </li>
+        </ul>
+        <div class="sidebar-footer">
+            <div class="export-buttons">
+                <button onclick="exportMessages()" class="btn btn-sm">📥 Export Messages</button>
+            </div>
+        </div>
+    </nav>
+    <!-- Main Content -->
+    <main class="main-content">
+        <!-- Header -->
+        <header class="header">
+            <h1>Search Messages</h1>
+            <div class="header-controls">
+                <select id="timeframe" class="select">
+                    <option value="today">Today</option>
+                    <option value="yesterday">Yesterday</option>
+                    <option value="week">This Week</option>
+                    <option value="month">This Month</option>
+                    <option value="year">This Year</option>
+                    <option value="all" selected>All Time</option>
+                </select>
+            </div>
+        </header>
+        <!-- Search Box -->
+        <section class="search-box">
+            <input type="search" id="search-input" class="search-input"
+                   placeholder="Search messages... (supports Hebrew and English)"
+                   onkeypress="if(event.key === 'Enter') performSearch()">
+            <button onclick="performSearch()" class="btn btn-primary">🔍 Search</button>
+        </section>
+        <!-- Search Tips -->
+        <section class="chart-card" style="margin-bottom: var(--spacing-xl);">
+            <div class="chart-header">
+                <h3>Search Tips</h3>
+            </div>
+            <div style="padding: var(--spacing-md); color: var(--text-secondary); font-size: 0.875rem;">
+                <ul style="list-style: none; display: grid; grid-template-columns: repeat(auto-fit, minmax(250px, 1fr)); gap: 1rem;">
+                    <li><strong>word1 word2</strong> - Messages containing both words</li>
+                    <li><strong>"exact phrase"</strong> - Messages with exact phrase</li>
+                    <li><strong>word1 OR word2</strong> - Messages with either word</li>
+                    <li><strong>word*</strong> - Prefix search (word, words, wording)</li>
+                    <li><strong>NOT word</strong> - Exclude messages with word</li>
+                    <li><strong>Hebrew supported</strong> - Full Hebrew text search</li>
+                </ul>
+            </div>
+        </section>
+        <!-- Search Stats -->
+        <section class="stats-grid" id="search-stats" style="display: none;">
+            <div class="stat-card">
+                <div class="stat-icon">🔍</div>
+                <div class="stat-content">
+                    <div class="stat-value" id="result-count">0</div>
+                    <div class="stat-label">Results Found</div>
+                </div>
+            </div>
+            <div class="stat-card">
+                <div class="stat-icon">⚡</div>
+                <div class="stat-content">
+                    <div class="stat-value" id="search-time">0ms</div>
+                    <div class="stat-label">Search Time</div>
+                </div>
+            </div>
+        </section>
+        <!-- Search Results -->
+        <section class="search-results" id="search-results">
+            <div class="empty-state">
+                <div class="empty-state-icon">🔍</div>
+                <p>Enter a search term to find messages</p>
+            </div>
+        </section>
+        <!-- Pagination -->
+        <div class="pagination" id="pagination"></div>
+    </main>
+    <script>
+        // State
+        let currentQuery = '';
+        let currentPage = 1;
+        const pageSize = 20;
+        async function performSearch(page = 1) {
+            const query = document.getElementById('search-input').value.trim();
+            const timeframe = document.getElementById('timeframe').value;
+            if (!query) {
+                document.getElementById('search-results').innerHTML = `
+                    <div class="empty-state">
+                        <div class="empty-state-icon">🔍</div>
+                        <p>Enter a search term to find messages</p>
+                    </div>
+                `;
+                document.getElementById('search-stats').style.display = 'none';
+                document.getElementById('pagination').innerHTML = '';
+                return;
+            }
+            currentQuery = query;
+            currentPage = page;
+            const resultsDiv = document.getElementById('search-results');
+            resultsDiv.innerHTML = '<div class="loading"><div class="spinner"></div></div>';
+            const startTime = performance.now();
+            try {
+                const offset = (page - 1) * pageSize;
+                const response = await fetch(
+                    `/api/search?q=${encodeURIComponent(query)}&timeframe=${timeframe}&limit=${pageSize}&offset=${offset}`
+                );
+                const data = await response.json();
+                const endTime = performance.now();
+                const searchTime = Math.round(endTime - startTime);
+                // Show stats
+                document.getElementById('search-stats').style.display = 'grid';
+                document.getElementById('result-count').textContent = data.results.length +
+                    (data.results.length === pageSize ? '+' : '');
+                document.getElementById('search-time').textContent = searchTime + 'ms';
+                if (data.results.length === 0) {
+                    resultsDiv.innerHTML = `
+                        <div class="empty-state">
+                            <div class="empty-state-icon">😕</div>
+                            <p>No messages found for "${escapeHtml(query)}"</p>
+                        </div>
+                    `;
+                    document.getElementById('pagination').innerHTML = '';
+                    return;
+                }
+                resultsDiv.innerHTML = data.results.map(result => `
+                    <div class="search-result-item">
+                        <div class="search-result-header">
+                            <span class="search-result-author">${escapeHtml(result.from_name || 'Unknown')}</span>
+                            <span class="search-result-date">${result.date}</span>
+                        </div>
+                        <div class="search-result-text">${highlightQuery(result.text, query)}</div>
+                        <div style="margin-top: 0.5rem; font-size: 0.75rem; color: var(--text-muted);">
+                            ${result.has_links ? '🔗 Link' : ''}
+                            ${result.has_media ? '🖼️ Media' : ''}
+                        </div>
+                    </div>
+                `).join('');
+                // Simple pagination (since we don't have total count from FTS)
+                renderPagination(data.results.length === pageSize);
+            } catch (error) {
+                resultsDiv.innerHTML = `
+                    <div class="empty-state">
+                        <div class="empty-state-icon">❌</div>
+                        <p>Error performing search. Please try again.</p>
+                    </div>
+                `;
+            }
+        }
+        function renderPagination(hasMore) {
+            const pagination = document.getElementById('pagination');
+            if (currentPage === 1 && !hasMore) {
+                pagination.innerHTML = '';
+                return;
+            }
+            let html = '';
+            html += `<button class="page-btn" onclick="performSearch(${currentPage - 1})"
+                     ${currentPage === 1 ? 'disabled' : ''}>&laquo; Previous</button>`;
+            html += `<span style="padding: 0 1rem; color: var(--text-muted);">Page ${currentPage}</span>`;
+            html += `<button class="page-btn" onclick="performSearch(${currentPage + 1})"
+                     ${!hasMore ? 'disabled' : ''}>Next &raquo;</button>`;
+            pagination.innerHTML = html;
+        }
+        function highlightQuery(text, query) {
+            if (!text) return '';
+            // Escape HTML first
+            text = escapeHtml(text);
+            // Simple highlight for each word in query
+            const words = query.replace(/["*]/g, '').split(/\s+/).filter(w => w && w !== 'OR' && w !== 'NOT');
+            words.forEach(word => {
+                const regex = new RegExp(`(${escapeRegex(word)})`, 'gi');
+                text = text.replace(regex, '<span class="search-highlight">$1</span>');
+            });
+            return text;
+        }
+        function escapeHtml(text) {
+            const div = document.createElement('div');
+            div.textContent = text;
+            return div.innerHTML;
+        }
+        function escapeRegex(string) {
+            return string.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
+        }
+        function exportMessages() {
+            const timeframe = document.getElementById('timeframe').value;
+            window.location.href = `/api/export/messages?timeframe=${timeframe}`;
+        }
+        // AI Search
+        async function aiSearch() {
+            const query = document.getElementById('ai-query').value.trim();
+            const mode = document.getElementById('ai-mode').value;
+            if (!query) return;
+            const resultDiv = document.getElementById('ai-result');
+            const answerDiv = document.getElementById('ai-answer');
+            const sqlPre = document.getElementById('ai-sql');
+            resultDiv.style.display = 'block';
+            const loadingMessages = {
+                'context': 'קורא הודעות ומנתח...',
+                'semantic': 'מחפש לפי משמעות + שולח ל-AI...',
+                'sql': 'מחפש...',
+                'auto': 'מחפש...'
+            };
+            answerDiv.textContent = loadingMessages[mode] || 'מחפש...';
+            sqlPre.textContent = '';
+            try {
+                const response = await fetch('/api/ai/search', {
+                    method: 'POST',
+                    headers: { 'Content-Type': 'application/json' },
+                    body: JSON.stringify({ query, mode })
+                });
+                const data = await response.json();
+                if (data.error) {
+                    answerDiv.innerHTML = `<span style="color:#ff6b6b;">שגיאה: ${escapeHtml(data.error)}</span>`;
+                } else {
+                    let html = escapeHtml(data.answer || 'לא נמצאה תשובה');
+                    // Show mode info
+                    if (data.mode === 'context_search') {
+                        html += `<br><br><small style="color:var(--text-muted);">🧠 Hybrid Search: קראתי ${data.context_messages} הודעות`;
+                        if (data.context_user) html += ` מ"${escapeHtml(data.context_user)}"`;
+                        if (data.keywords_used && data.keywords_used.length > 0) {
+                            html += `<br>🔑 מילות מפתח: ${data.keywords_used.slice(0, 5).join(', ')}`;
+                        }
+                        html += `</small>`;
+                    } else if (data.mode === 'semantic_ai' || data.mode === 'semantic') {
+                        html += `<br><br><small style="color:var(--text-muted);">🔮 Semantic + AI: נמצאו ${data.count} הודעות דומות`;
+                        if (data.total_with_threads && data.total_with_threads > data.count) {
+                            html += ` + ${data.total_with_threads - data.count} הודעות מ-threads`;
+                        }
+                        html += `</small>`;
+                    }
+                    answerDiv.innerHTML = html;
+                    sqlPre.textContent = data.sql || '';
+                    // If results contain messages, optionally populate main search
+                    if (data.results && data.results.length > 0 && data.results[0].text) {
+                        displayAIResults(data.results);
+                    }
+                }
+            } catch (error) {
+                answerDiv.textContent = `שגיאה: ${error.message}`;
+            }
+        }
+        function displayAIResults(results) {
+            const resultsDiv = document.getElementById('search-results');
+            if (results.length === 0) return;
+            document.getElementById('search-stats').style.display = 'grid';
+            document.getElementById('result-count').textContent = results.length;
+            document.getElementById('search-time').textContent = 'AI';
+            resultsDiv.innerHTML = results.slice(0, 20).map(result => `
+                <div class="search-result-item">
+                    <div class="search-result-header">
+                        <span class="search-result-author">${escapeHtml(result.from_name || 'Unknown')}</span>
+                        <span class="search-result-date">${result.date || ''}${result.score ? ` (${(result.score * 100).toFixed(0)}% דמיון)` : ''}</span>
+                    </div>
+                    <div class="search-result-text">${escapeHtml(result.text || '')}</div>
+                </div>
+            `).join('');
+        }
+        // Focus search input on page load
+        document.addEventListener('DOMContentLoaded', () => {
+            document.getElementById('search-input').focus();
+        });
+    </script>
+</body>
+</html>

templates/settings.html ADDED Viewed

	@@ -0,0 +1,444 @@

+<!DOCTYPE html>
+<html lang="he" dir="rtl">
+<head>
+    <meta charset="UTF-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>Settings - Telegram Analytics</title>
+    <link rel="stylesheet" href="/static/css/style.css">
+    <style>
+        .upload-zone {
+            border: 2px dashed var(--border-color);
+            border-radius: var(--radius-lg);
+            padding: var(--spacing-xl);
+            text-align: center;
+            transition: all 0.3s ease;
+            cursor: pointer;
+            margin-bottom: var(--spacing-xl);
+        }
+        .upload-zone:hover,
+        .upload-zone.dragover {
+            border-color: var(--primary);
+            background: rgba(0, 136, 204, 0.1);
+        }
+        .upload-zone-icon {
+            font-size: 3rem;
+            margin-bottom: var(--spacing-md);
+        }
+        .upload-zone-text {
+            color: var(--text-secondary);
+            margin-bottom: var(--spacing-sm);
+        }
+        .upload-zone-hint {
+            font-size: 0.75rem;
+            color: var(--text-muted);
+        }
+        .upload-progress {
+            display: none;
+            margin-top: var(--spacing-lg);
+        }
+        .upload-progress.active {
+            display: block;
+        }
+        .progress-bar-container {
+            background: var(--bg-sidebar);
+            border-radius: var(--radius-md);
+            height: 20px;
+            overflow: hidden;
+        }
+        .progress-bar-fill {
+            height: 100%;
+            background: var(--primary);
+            transition: width 0.3s ease;
+            display: flex;
+            align-items: center;
+            justify-content: center;
+            color: white;
+            font-size: 0.75rem;
+        }
+        .upload-result {
+            display: none;
+            margin-top: var(--spacing-lg);
+            padding: var(--spacing-lg);
+            border-radius: var(--radius-lg);
+        }
+        .upload-result.success {
+            display: block;
+            background: rgba(40, 167, 69, 0.2);
+            border: 1px solid var(--success);
+        }
+        .upload-result.error {
+            display: block;
+            background: rgba(220, 53, 69, 0.2);
+            border: 1px solid var(--danger);
+        }
+        .result-title {
+            font-weight: 600;
+            margin-bottom: var(--spacing-md);
+            display: flex;
+            align-items: center;
+            gap: var(--spacing-sm);
+        }
+        .result-stats {
+            display: grid;
+            grid-template-columns: repeat(auto-fit, minmax(150px, 1fr));
+            gap: var(--spacing-md);
+        }
+        .result-stat {
+            text-align: center;
+            padding: var(--spacing-md);
+            background: var(--bg-sidebar);
+            border-radius: var(--radius-md);
+        }
+        .result-stat-value {
+            font-size: 1.5rem;
+            font-weight: 700;
+            color: var(--primary);
+        }
+        .result-stat-label {
+            font-size: 0.75rem;
+            color: var(--text-muted);
+        }
+        .db-stats {
+            display: grid;
+            grid-template-columns: repeat(auto-fit, minmax(200px, 1fr));
+            gap: var(--spacing-md);
+            margin-bottom: var(--spacing-xl);
+        }
+        .db-stat {
+            background: var(--bg-card);
+            border-radius: var(--radius-lg);
+            padding: var(--spacing-lg);
+            border: 1px solid var(--border-color);
+        }
+        .db-stat-value {
+            font-size: 1.75rem;
+            font-weight: 700;
+            color: var(--primary);
+        }
+        .db-stat-label {
+            font-size: 0.875rem;
+            color: var(--text-muted);
+            margin-top: var(--spacing-xs);
+        }
+        .instructions {
+            background: var(--bg-card);
+            border-radius: var(--radius-lg);
+            padding: var(--spacing-lg);
+            border: 1px solid var(--border-color);
+        }
+        .instructions h3 {
+            margin-bottom: var(--spacing-md);
+        }
+        .instructions ol {
+            padding-right: var(--spacing-lg);
+            color: var(--text-secondary);
+            line-height: 1.8;
+        }
+        .instructions code {
+            background: var(--bg-sidebar);
+            padding: 2px 6px;
+            border-radius: var(--radius-sm);
+            font-family: monospace;
+        }
+    </style>
+</head>
+<body>
+    <!-- Sidebar -->
+    <nav class="sidebar">
+        <div class="logo">
+            <span class="logo-icon">📊</span>
+            <span class="logo-text">TG Analytics</span>
+        </div>
+        <ul class="nav-menu">
+            <li class="nav-item">
+                <a href="/" class="nav-link">
+                    <span class="icon">📈</span>
+                    <span>Overview</span>
+                </a>
+            </li>
+            <li class="nav-item">
+                <a href="/users" class="nav-link">
+                    <span class="icon">👥</span>
+                    <span>Users</span>
+                </a>
+            </li>
+            <li class="nav-item">
+                <a href="/chat" class="nav-link">
+                    <span class="icon">💬</span>
+                    <span>Chat</span>
+                </a>
+            </li>
+            <li class="nav-item">
+                <a href="/search" class="nav-link">
+                    <span class="icon">🔍</span>
+                    <span>Search</span>
+                </a>
+            </li>
+            <li class="nav-item">
+                <a href="/moderation" class="nav-link">
+                    <span class="icon">🛡️</span>
+                    <span>Moderation</span>
+                </a>
+            </li>
+            <li class="nav-item active">
+                <a href="/settings" class="nav-link">
+                    <span class="icon">⚙️</span>
+                    <span>Settings</span>
+                </a>
+            </li>
+        </ul>
+    </nav>
+    <!-- Main Content -->
+    <main class="main-content">
+        <!-- Header -->
+        <header class="header">
+            <h1>⚙️ Settings & Update Data</h1>
+        </header>
+        <!-- Database Stats -->
+        <section>
+            <h2 style="margin-bottom: var(--spacing-md);">📊 Database Status</h2>
+            <div class="db-stats" id="db-stats">
+                <div class="db-stat">
+                    <div class="db-stat-value" id="stat-messages">-</div>
+                    <div class="db-stat-label">Total Messages</div>
+                </div>
+                <div class="db-stat">
+                    <div class="db-stat-value" id="stat-users">-</div>
+                    <div class="db-stat-label">Total Users</div>
+                </div>
+                <div class="db-stat">
+                    <div class="db-stat-value" id="stat-first">-</div>
+                    <div class="db-stat-label">First Message</div>
+                </div>
+                <div class="db-stat">
+                    <div class="db-stat-value" id="stat-last">-</div>
+                    <div class="db-stat-label">Last Message</div>
+                </div>
+                <div class="db-stat">
+                    <div class="db-stat-value" id="stat-size">-</div>
+                    <div class="db-stat-label">Database Size</div>
+                </div>
+            </div>
+        </section>
+        <!-- Upload Section (disabled - updates done locally) -->
+        <section class="chart-card" style="margin-bottom: var(--spacing-xl); opacity: 0.6;">
+            <div class="chart-header">
+                <h3>📤 Update Database</h3>
+            </div>
+            <div style="padding: var(--spacing-lg); text-align: center; color: var(--text-muted);">
+                <p>עדכוני מסד הנתונים מתבצעים מקומית באמצעות daily_sync.py</p>
+            </div>
+        </section>
+        <!-- Instructions -->
+        <section class="instructions">
+            <h3>📖 איך לייצא נתונים מטלגרם</h3>
+            <ol>
+                <li>פתח את <strong>Telegram Desktop</strong> (לא ניתן מהאפליקציה הניידת)</li>
+                <li>לך ל-<strong>Settings → Advanced → Export Telegram data</strong></li>
+                <li>בחר את הקבוצה/צ'אט שברצונך לייצא</li>
+                <li>סמן <strong>JSON</strong> כפורמט הייצוא</li>
+                <li>לחץ <strong>Export</strong> והמתן לסיום</li>
+                <li>העלה את קובץ <code>result.json</code> כאן</li>
+            </ol>
+            <div style="margin-top: var(--spacing-lg); padding: var(--spacing-md); background: var(--bg-sidebar); border-radius: var(--radius-md);">
+                <strong>💡 טיפ:</strong> המערכת תזהה אוטומטית הודעות כפולות ותוסיף רק הודעות חדשות.
+                אין צורך לדאוג מהעלאת אותו קובץ פעמיים.
+            </div>
+        </section>
+        <!-- CLI Instructions -->
+        <section class="instructions" style="margin-top: var(--spacing-xl);">
+            <h3>💻 עדכון דרך שורת הפקודה</h3>
+            <p style="color: var(--text-secondary); margin-bottom: var(--spacing-md);">
+                לקבצים גדולים, מומלץ להשתמש בשורת הפקודה:
+            </p>
+            <pre style="background: var(--bg-sidebar); padding: var(--spacing-md); border-radius: var(--radius-md); overflow-x: auto; direction: ltr; text-align: left;">
+# עדכון database קיים עם JSON חדש
+python indexer.py new_export.json --db telegram.db --update
+# יצירת database חדש
+python indexer.py result.json --db telegram.db
+            </pre>
+        </section>
+    </main>
+    <script>
+        // Load database stats on page load
+        document.addEventListener('DOMContentLoaded', loadDbStats);
+        async function loadDbStats() {
+            try {
+                const response = await fetch('/api/db/stats');
+                const stats = await response.json();
+                document.getElementById('stat-messages').textContent =
+                    stats.total_messages?.toLocaleString() || '-';
+                document.getElementById('stat-users').textContent =
+                    stats.total_users?.toLocaleString() || '-';
+                document.getElementById('stat-first').textContent =
+                    stats.first_message ? new Date(stats.first_message).toLocaleDateString('he-IL') : '-';
+                document.getElementById('stat-last').textContent =
+                    stats.last_message ? new Date(stats.last_message).toLocaleDateString('he-IL') : '-';
+                document.getElementById('stat-size').textContent =
+                    stats.db_size_mb ? `${stats.db_size_mb} MB` : '-';
+            } catch (error) {
+                console.error('Error loading db stats:', error);
+            }
+        }
+        // Drag and drop handlers
+        const uploadZone = document.getElementById('upload-zone');
+        uploadZone.addEventListener('dragover', (e) => {
+            e.preventDefault();
+            uploadZone.classList.add('dragover');
+        });
+        uploadZone.addEventListener('dragleave', () => {
+            uploadZone.classList.remove('dragover');
+        });
+        uploadZone.addEventListener('drop', (e) => {
+            e.preventDefault();
+            uploadZone.classList.remove('dragover');
+            const files = e.dataTransfer.files;
+            if (files.length > 0) {
+                uploadFile(files[0]);
+            }
+        });
+        function handleFileSelect(event) {
+            const file = event.target.files[0];
+            if (file) {
+                uploadFile(file);
+            }
+        }
+        async function uploadFile(file) {
+            if (!file.name.endsWith('.json')) {
+                showError('נא לבחור קובץ JSON בלבד');
+                return;
+            }
+            const progressDiv = document.getElementById('upload-progress');
+            const progressFill = document.getElementById('progress-fill');
+            const progressText = document.getElementById('progress-text');
+            const resultDiv = document.getElementById('upload-result');
+            // Reset and show progress
+            progressDiv.classList.add('active');
+            resultDiv.className = 'upload-result';
+            progressFill.style.width = '0%';
+            progressFill.textContent = '0%';
+            progressText.textContent = `מעלה ${file.name}...`;
+            try {
+                // Read file
+                progressFill.style.width = '20%';
+                progressFill.textContent = '20%';
+                progressText.textContent = 'קורא קובץ...';
+                const formData = new FormData();
+                formData.append('file', file);
+                // Upload
+                progressFill.style.width = '50%';
+                progressFill.textContent = '50%';
+                progressText.textContent = 'מעבד נתונים...';
+                const response = await fetch('/api/update', {
+                    method: 'POST',
+                    body: formData
+                });
+                const result = await response.json();
+                progressFill.style.width = '100%';
+                progressFill.textContent = '100%';
+                if (result.success) {
+                    showSuccess(result.stats);
+                    loadDbStats(); // Refresh stats
+                } else {
+                    showError(result.error || 'שגיאה לא ידועה');
+                }
+            } catch (error) {
+                showError(error.message);
+            }
+            // Hide progress after a delay
+            setTimeout(() => {
+                progressDiv.classList.remove('active');
+            }, 1000);
+        }
+        function showSuccess(stats) {
+            const resultDiv = document.getElementById('upload-result');
+            const resultTitle = document.getElementById('result-title');
+            const resultStats = document.getElementById('result-stats');
+            resultDiv.className = 'upload-result success';
+            resultTitle.innerHTML = '✅ העדכון הושלם בהצלחה!';
+            resultStats.innerHTML = `
+                <div class="result-stat">
+                    <div class="result-stat-value">${stats.total_in_file?.toLocaleString() || 0}</div>
+                    <div class="result-stat-label">הודעות בקובץ</div>
+                </div>
+                <div class="result-stat">
+                    <div class="result-stat-value">${stats.new_messages?.toLocaleString() || 0}</div>
+                    <div class="result-stat-label">הודעות חדשות נוספו</div>
+                </div>
+                <div class="result-stat">
+                    <div class="result-stat-value">${stats.duplicates?.toLocaleString() || 0}</div>
+                    <div class="result-stat-label">כפילויות (דולגו)</div>
+                </div>
+                <div class="result-stat">
+                    <div class="result-stat-value">${stats.elapsed_seconds?.toFixed(1) || 0}s</div>
+                    <div class="result-stat-label">זמן עיבוד</div>
+                </div>
+            `;
+        }
+        function showError(message) {
+            const resultDiv = document.getElementById('upload-result');
+            const resultTitle = document.getElementById('result-title');
+            const resultStats = document.getElementById('result-stats');
+            resultDiv.className = 'upload-result error';
+            resultTitle.innerHTML = `❌ שגיאה: ${message}`;
+            resultStats.innerHTML = '';
+        }
+    </script>
+</body>
+</html>

templates/user_profile.html ADDED Viewed

	@@ -0,0 +1,721 @@

+<!DOCTYPE html>
+<html lang="en">
+<head>
+    <meta charset="UTF-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>User Profile - Telegram Analytics</title>
+    <link rel="stylesheet" href="/static/css/style.css">
+    <script src="https://cdn.jsdelivr.net/npm/chart.js"></script>
+    <style>
+        /* Profile-specific styles */
+        .profile-header {
+            display: flex;
+            align-items: center;
+            gap: 2rem;
+            margin-bottom: 2rem;
+            padding: 2rem;
+            background: var(--bg-card);
+            border-radius: var(--radius-lg);
+            border: 1px solid var(--border-color);
+        }
+        .profile-avatar {
+            width: 100px;
+            height: 100px;
+            border-radius: 50%;
+            background: var(--primary);
+            display: flex;
+            align-items: center;
+            justify-content: center;
+            font-size: 2.5rem;
+            font-weight: 700;
+            flex-shrink: 0;
+        }
+        .profile-info { flex: 1; }
+        .profile-name {
+            font-size: 1.75rem;
+            font-weight: 700;
+            margin-bottom: 0.25rem;
+        }
+        .profile-meta {
+            color: var(--text-muted);
+            font-size: 0.875rem;
+            display: flex;
+            gap: 1rem;
+            flex-wrap: wrap;
+            margin-top: 0.5rem;
+        }
+        .profile-meta span {
+            display: inline-flex;
+            align-items: center;
+            gap: 0.25rem;
+        }
+        .badge {
+            display: inline-block;
+            padding: 0.15rem 0.5rem;
+            border-radius: 4px;
+            font-size: 0.75rem;
+            font-weight: 600;
+        }
+        .badge-creator { background: #ffd700; color: #1a1a2e; }
+        .badge-admin { background: #28a745; color: white; }
+        .badge-bot { background: #6c757d; color: white; }
+        .badge-premium { background: #9b59b6; color: white; }
+        .badge-online { background: #28a745; color: white; }
+        .badge-recently { background: #17a2b8; color: white; }
+        .badge-offline { background: var(--border-color); color: var(--text-muted); }
+        .profile-stats {
+            display: grid;
+            grid-template-columns: repeat(auto-fit, minmax(140px, 1fr));
+            gap: 1rem;
+            margin-bottom: 2rem;
+        }
+        .profile-stat-card {
+            background: var(--bg-card);
+            border: 1px solid var(--border-color);
+            border-radius: var(--radius-md);
+            padding: 1rem;
+            text-align: center;
+        }
+        .profile-stat-value {
+            font-size: 1.5rem;
+            font-weight: 700;
+            color: var(--primary);
+        }
+        .profile-stat-label {
+            font-size: 0.75rem;
+            color: var(--text-muted);
+            margin-top: 0.25rem;
+        }
+        .profile-grid {
+            display: grid;
+            grid-template-columns: repeat(2, 1fr);
+            gap: 1.5rem;
+            margin-bottom: 1.5rem;
+        }
+        .profile-card {
+            background: var(--bg-card);
+            border: 1px solid var(--border-color);
+            border-radius: var(--radius-lg);
+            padding: 1.5rem;
+        }
+        .profile-card h3 {
+            font-size: 1rem;
+            margin-bottom: 1rem;
+            color: var(--text-primary);
+            display: flex;
+            align-items: center;
+            gap: 0.5rem;
+        }
+        .profile-card.full-width {
+            grid-column: span 2;
+        }
+        .reply-network-list {
+            list-style: none;
+        }
+        .reply-network-item {
+            display: flex;
+            justify-content: space-between;
+            align-items: center;
+            padding: 0.5rem 0;
+            border-bottom: 1px solid var(--border-color);
+        }
+        .reply-network-item:last-child {
+            border-bottom: none;
+        }
+        .reply-network-name {
+            display: flex;
+            align-items: center;
+            gap: 0.5rem;
+        }
+        .reply-network-name a {
+            color: var(--primary);
+            text-decoration: none;
+        }
+        .reply-network-name a:hover {
+            text-decoration: underline;
+        }
+        .reply-network-count {
+            font-weight: 600;
+            color: var(--text-secondary);
+        }
+        .reply-bar {
+            height: 4px;
+            background: var(--border-color);
+            border-radius: 2px;
+            margin-top: 4px;
+        }
+        .reply-bar-fill {
+            height: 100%;
+            background: var(--primary);
+            border-radius: 2px;
+        }
+        .links-list {
+            list-style: none;
+        }
+        .links-list li {
+            padding: 0.5rem 0;
+            border-bottom: 1px solid var(--border-color);
+            display: flex;
+            justify-content: space-between;
+            align-items: center;
+        }
+        .links-list li:last-child { border-bottom: none; }
+        .links-list a {
+            color: var(--primary);
+            text-decoration: none;
+            word-break: break-all;
+            font-size: 0.875rem;
+        }
+        .links-list a:hover { text-decoration: underline; }
+        .links-list .count {
+            font-weight: 600;
+            color: var(--text-muted);
+            flex-shrink: 0;
+            margin-left: 1rem;
+        }
+        .no-messages {
+            text-align: center;
+            padding: 3rem;
+            background: var(--bg-card);
+            border-radius: var(--radius-lg);
+            border: 1px solid var(--border-color);
+        }
+        .no-messages h2 {
+            margin-bottom: 0.5rem;
+            color: var(--text-muted);
+        }
+        .forward-source {
+            display: flex;
+            justify-content: space-between;
+            align-items: center;
+            padding: 0.5rem 0;
+            border-bottom: 1px solid var(--border-color);
+        }
+        .forward-source:last-child { border-bottom: none; }
+        .time-info {
+            font-size: 0.875rem;
+            color: var(--text-secondary);
+            padding: 0.5rem 0;
+            display: flex;
+            justify-content: space-between;
+        }
+        @media (max-width: 992px) {
+            .profile-grid {
+                grid-template-columns: 1fr;
+            }
+            .profile-card.full-width {
+                grid-column: span 1;
+            }
+            .profile-header {
+                flex-direction: column;
+                text-align: center;
+            }
+            .profile-meta {
+                justify-content: center;
+            }
+        }
+    </style>
+</head>
+<body>
+    <!-- Sidebar -->
+    <nav class="sidebar">
+        <div class="logo">
+            <span class="logo-icon">📊</span>
+            <span class="logo-text">TG Analytics</span>
+        </div>
+        <ul class="nav-menu">
+            <li class="nav-item">
+                <a href="/" class="nav-link">
+                    <span class="icon">📈</span>
+                    <span>Overview</span>
+                </a>
+            </li>
+            <li class="nav-item active">
+                <a href="/users" class="nav-link">
+                    <span class="icon">👥</span>
+                    <span>Users</span>
+                </a>
+            </li>
+            <li class="nav-item">
+                <a href="/chat" class="nav-link">
+                    <span class="icon">💬</span>
+                    <span>Chat</span>
+                </a>
+            </li>
+            <li class="nav-item">
+                <a href="/search" class="nav-link">
+                    <span class="icon">🔍</span>
+                    <span>Search</span>
+                </a>
+            </li>
+            <li class="nav-item">
+                <a href="/moderation" class="nav-link">
+                    <span class="icon">🛡️</span>
+                    <span>Moderation</span>
+                </a>
+            </li>
+            <li class="nav-item">
+                <a href="/settings" class="nav-link">
+                    <span class="icon">⚙️</span>
+                    <span>Settings</span>
+                </a>
+            </li>
+        </ul>
+    </nav>
+    <!-- Main Content -->
+    <main class="main-content">
+        <header class="header">
+            <h1><a href="/users" style="color: var(--text-muted); text-decoration: none;">&larr; Users</a></h1>
+        </header>
+        <div id="profile-content">
+            <div class="loading"><div class="spinner"></div></div>
+        </div>
+    </main>
+    <script>
+        const USER_ID = '{{ user_id }}';
+        const COLORS = ['#e17076','#7bc862','#e5ca77','#65aadd','#a695e7','#ee7aae','#6ec9cb','#faa774'];
+        function getAvatarColor(name) {
+            let hash = 0;
+            for (let i = 0; i < name.length; i++) hash = name.charCodeAt(i) + ((hash << 5) - hash);
+            return COLORS[Math.abs(hash) % COLORS.length];
+        }
+        function formatNumber(num) {
+            if (num === null || num === undefined) return '-';
+            if (num >= 1000000) return (num / 1000000).toFixed(1) + 'M';
+            if (num >= 1000) return (num / 1000).toFixed(1) + 'K';
+            return num.toLocaleString();
+        }
+        function formatDate(ts) {
+            if (!ts) return '-';
+            const d = new Date(ts * 1000);
+            return d.toLocaleDateString('he-IL', { year: 'numeric', month: 'short', day: 'numeric' });
+        }
+        function formatDuration(seconds) {
+            if (!seconds) return '-';
+            if (seconds < 60) return Math.round(seconds) + 's';
+            if (seconds < 3600) return Math.round(seconds / 60) + 'm';
+            return (seconds / 3600).toFixed(1) + 'h';
+        }
+        function escapeHtml(text) {
+            const div = document.createElement('div');
+            div.textContent = text;
+            return div.innerHTML;
+        }
+        document.addEventListener('DOMContentLoaded', loadProfile);
+        async function loadProfile() {
+            const container = document.getElementById('profile-content');
+            try {
+                const resp = await fetch(`/api/user/${USER_ID}/profile`);
+                const data = await resp.json();
+                if (data.error) {
+                    container.innerHTML = `<div class="empty-state"><h2>User not found</h2><p>${data.error}</p></div>`;
+                    return;
+                }
+                if (!data.has_messages && data.participant) {
+                    renderInactiveProfile(container, data);
+                    return;
+                }
+                renderFullProfile(container, data);
+            } catch (err) {
+                container.innerHTML = `<div class="empty-state">Error loading profile: ${err.message}</div>`;
+            }
+        }
+        function renderInactiveProfile(container, data) {
+            const p = data.participant;
+            const name = data.name || 'Unknown';
+            const color = getAvatarColor(name);
+            const initial = name.charAt(0).toUpperCase();
+            let badges = '';
+            if (p.is_creator) badges += ' <span class="badge badge-creator">Creator</span>';
+            if (p.is_admin && !p.is_creator) badges += ' <span class="badge badge-admin">Admin</span>';
+            if (p.is_bot) badges += ' <span class="badge badge-bot">Bot</span>';
+            if (p.is_premium) badges += ' <span class="badge badge-premium">Premium</span>';
+            container.innerHTML = `
+                <div class="profile-header">
+                    <div class="profile-avatar" style="background: ${color}">${initial}</div>
+                    <div class="profile-info">
+                        <div class="profile-name">${escapeHtml(name)}${badges}</div>
+                        ${p.username ? `<div style="color: var(--primary);">@${escapeHtml(p.username)}</div>` : ''}
+                        <div class="profile-meta">
+                            ${p.join_date ? `<span>Joined: ${formatDate(p.join_date)}</span>` : ''}
+                            <span>Status: <span class="badge badge-${p.last_status === 'online' ? 'online' : p.last_status === 'recently' ? 'recently' : 'offline'}">${p.last_status}</span></span>
+                        </div>
+                    </div>
+                </div>
+                <div class="no-messages">
+                    <h2>No Messages</h2>
+                    <p style="color: var(--text-muted);">This participant hasn't sent any messages in the group.</p>
+                </div>
+            `;
+        }
+        function renderFullProfile(container, data) {
+            const name = data.name || 'Unknown';
+            const color = getAvatarColor(name);
+            const initial = name.charAt(0).toUpperCase();
+            const p = data.participant;
+            // Badges
+            let badges = '';
+            if (p) {
+                if (p.is_creator) badges += ' <span class="badge badge-creator">Creator</span>';
+                if (p.is_admin && !p.is_creator) badges += ' <span class="badge badge-admin">Admin</span>';
+                if (p.is_bot) badges += ' <span class="badge badge-bot">Bot</span>';
+                if (p.is_premium) badges += ' <span class="badge badge-premium">Premium</span>';
+            }
+            // Header
+            let html = `
+                <div class="profile-header">
+                    <div class="profile-avatar" style="background: ${color}">${initial}</div>
+                    <div class="profile-info">
+                        <div class="profile-name">${escapeHtml(name)}${badges}</div>
+                        ${p && p.username ? `<div style="color: var(--primary);">@${escapeHtml(p.username)}</div>` : ''}
+                        <div class="profile-meta">
+                            <span>#${data.rank} of ${data.total_active_users}</span>
+                            <span>ID: ${data.user_id}</span>
+                            ${p && p.join_date ? `<span>Joined: ${formatDate(p.join_date)}</span>` : ''}
+                            ${p ? `<span>Status: <span class="badge badge-${p.last_status === 'online' ? 'online' : p.last_status === 'recently' ? 'recently' : 'offline'}">${p.last_status}</span></span>` : ''}
+                        </div>
+                    </div>
+                </div>
+            `;
+            // Stats grid
+            html += `
+                <div class="profile-stats">
+                    <div class="profile-stat-card">
+                        <div class="profile-stat-value">${formatNumber(data.total_messages)}</div>
+                        <div class="profile-stat-label">Messages</div>
+                    </div>
+                    <div class="profile-stat-card">
+                        <div class="profile-stat-value">${formatNumber(data.total_characters)}</div>
+                        <div class="profile-stat-label">Characters</div>
+                    </div>
+                    <div class="profile-stat-card">
+                        <div class="profile-stat-value">${data.avg_message_length}</div>
+                        <div class="profile-stat-label">Avg Length</div>
+                    </div>
+                    <div class="profile-stat-card">
+                        <div class="profile-stat-value">${data.active_days}</div>
+                        <div class="profile-stat-label">Active Days</div>
+                    </div>
+                    <div class="profile-stat-card">
+                        <div class="profile-stat-value">${data.daily_average}</div>
+                        <div class="profile-stat-label">Daily Avg</div>
+                    </div>
+                    <div class="profile-stat-card">
+                        <div class="profile-stat-value">${formatNumber(data.total_replies_sent)}</div>
+                        <div class="profile-stat-label">Replies Sent</div>
+                    </div>
+                    <div class="profile-stat-card">
+                        <div class="profile-stat-value">${formatNumber(data.total_replies_received)}</div>
+                        <div class="profile-stat-label">Replies Received</div>
+                    </div>
+                    <div class="profile-stat-card">
+                        <div class="profile-stat-value">${data.reply_ratio}%</div>
+                        <div class="profile-stat-label">Reply Rate</div>
+                    </div>
+                    <div class="profile-stat-card">
+                        <div class="profile-stat-value">${formatDuration(data.avg_reply_time_seconds)}</div>
+                        <div class="profile-stat-label">Avg Reply Time</div>
+                    </div>
+                    <div class="profile-stat-card">
+                        <div class="profile-stat-value">${formatNumber(data.links_shared)}</div>
+                        <div class="profile-stat-label">Links</div>
+                    </div>
+                    <div class="profile-stat-card">
+                        <div class="profile-stat-value">${formatNumber(data.media_sent)}</div>
+                        <div class="profile-stat-label">Media</div>
+                    </div>
+                    <div class="profile-stat-card">
+                        <div class="profile-stat-value">${formatNumber(data.forwards_sent)}</div>
+                        <div class="profile-stat-label">Forwards</div>
+                    </div>
+                </div>
+            `;
+            // Time info
+            html += `
+                <div class="profile-card full-width" style="margin-bottom: 1.5rem;">
+                    <h3>Timeline</h3>
+                    <div class="time-info">
+                        <span>First message: ${formatDate(data.first_message)}</span>
+                        <span>Last message: ${formatDate(data.last_message)}</span>
+                    </div>
+                    <div class="time-info">
+                        <span>Edits: ${formatNumber(data.edits)}</span>
+                        <span>Mentions: ${formatNumber(data.mentions_made)}</span>
+                    </div>
+                </div>
+            `;
+            // Charts + Reply network
+            html += `<div class="profile-grid">`;
+            // Hourly chart
+            html += `
+                <div class="profile-card">
+                    <h3>Activity by Hour</h3>
+                    <div style="height: 200px;"><canvas id="hourly-chart"></canvas></div>
+                </div>
+            `;
+            // Weekday chart
+            html += `
+                <div class="profile-card">
+                    <h3>Activity by Day of Week</h3>
+                    <div style="height: 200px;"><canvas id="weekday-chart"></canvas></div>
+                </div>
+            `;
+            // Monthly trend
+            html += `
+                <div class="profile-card full-width">
+                    <h3>Monthly Trend</h3>
+                    <div style="height: 200px;"><canvas id="monthly-chart"></canvas></div>
+                </div>
+            `;
+            // Daily activity (last 90 days)
+            html += `
+                <div class="profile-card full-width">
+                    <h3>Daily Activity (Last 90 Days)</h3>
+                    <div style="height: 200px;"><canvas id="daily-chart"></canvas></div>
+                </div>
+            `;
+            // Replies to (top 10)
+            const maxReplyTo = data.replies_to.length > 0 ? data.replies_to[0].count : 1;
+            html += `
+                <div class="profile-card">
+                    <h3>Most Replies To</h3>
+                    ${data.replies_to.length === 0 ? '<p style="color: var(--text-muted);">No reply data</p>' : ''}
+                    <ul class="reply-network-list">
+                        ${data.replies_to.map(r => `
+                            <li class="reply-network-item">
+                                <div class="reply-network-name">
+                                    <a href="/user/${r.user_id}">${escapeHtml(r.name)}</a>
+                                </div>
+                                <span class="reply-network-count">${r.count}</span>
+                            </li>
+                            <div class="reply-bar"><div class="reply-bar-fill" style="width: ${(r.count / maxReplyTo * 100).toFixed(1)}%"></div></div>
+                        `).join('')}
+                    </ul>
+                </div>
+            `;
+            // Replies from (top 10)
+            const maxReplyFrom = data.replies_from.length > 0 ? data.replies_from[0].count : 1;
+            html += `
+                <div class="profile-card">
+                    <h3>Most Replies From</h3>
+                    ${data.replies_from.length === 0 ? '<p style="color: var(--text-muted);">No reply data</p>' : ''}
+                    <ul class="reply-network-list">
+                        ${data.replies_from.map(r => `
+                            <li class="reply-network-item">
+                                <div class="reply-network-name">
+                                    <a href="/user/${r.user_id}">${escapeHtml(r.name)}</a>
+                                </div>
+                                <span class="reply-network-count">${r.count}</span>
+                            </li>
+                            <div class="reply-bar"><div class="reply-bar-fill" style="width: ${(r.count / maxReplyFrom * 100).toFixed(1)}%; background: #28a745;"></div></div>
+                        `).join('')}
+                    </ul>
+                </div>
+            `;
+            // Top forward sources
+            if (data.top_forward_sources && data.top_forward_sources.length > 0) {
+                html += `
+                    <div class="profile-card">
+                        <h3>Top Forward Sources</h3>
+                        ${data.top_forward_sources.map(f => `
+                            <div class="forward-source">
+                                <span>${escapeHtml(f.name)}</span>
+                                <span class="reply-network-count">${f.count}</span>
+                            </div>
+                        `).join('')}
+                    </div>
+                `;
+            }
+            // Top links
+            if (data.top_links && data.top_links.length > 0) {
+                html += `
+                    <div class="profile-card">
+                        <h3>Top Links Shared</h3>
+                        <ul class="links-list">
+                            ${data.top_links.map(l => `
+                                <li>
+                                    <a href="${escapeHtml(l.url)}" target="_blank" rel="noopener">${escapeHtml(l.url.length > 50 ? l.url.substring(0, 50) + '...' : l.url)}</a>
+                                    <span class="count">${l.count}x</span>
+                                </li>
+                            `).join('')}
+                        </ul>
+                    </div>
+                `;
+            }
+            html += `</div>`; // close profile-grid
+            container.innerHTML = html;
+            // Render charts
+            renderHourlyChart(data.hourly_activity);
+            renderWeekdayChart(data.weekday_activity);
+            renderMonthlyChart(data.monthly_activity);
+            renderDailyChart(data.daily_activity);
+        }
+        function chartDefaults() {
+            return {
+                responsive: true,
+                maintainAspectRatio: false,
+                plugins: { legend: { display: false } },
+                scales: {
+                    y: {
+                        beginAtZero: true,
+                        grid: { color: 'rgba(255,255,255,0.05)' },
+                        ticks: { color: '#718096' }
+                    },
+                    x: {
+                        grid: { display: false },
+                        ticks: { color: '#718096', maxRotation: 0, autoSkip: true, maxTicksLimit: 12 }
+                    }
+                }
+            };
+        }
+        function renderHourlyChart(hourly) {
+            const ctx = document.getElementById('hourly-chart');
+            if (!ctx) return;
+            new Chart(ctx.getContext('2d'), {
+                type: 'bar',
+                data: {
+                    labels: Array.from({length: 24}, (_, i) => `${i}:00`),
+                    datasets: [{
+                        data: hourly,
+                        backgroundColor: 'rgba(0, 136, 204, 0.6)',
+                        borderColor: 'rgba(0, 136, 204, 1)',
+                        borderWidth: 1
+                    }]
+                },
+                options: chartDefaults()
+            });
+        }
+        function renderWeekdayChart(weekday) {
+            const ctx = document.getElementById('weekday-chart');
+            if (!ctx) return;
+            new Chart(ctx.getContext('2d'), {
+                type: 'bar',
+                data: {
+                    labels: weekday.map(w => w.day.substring(0, 3)),
+                    datasets: [{
+                        data: weekday.map(w => w.count),
+                        backgroundColor: weekday.map((w, i) => i === 5 || i === 6
+                            ? 'rgba(40, 167, 69, 0.6)'
+                            : 'rgba(0, 136, 204, 0.6)'),
+                        borderWidth: 1
+                    }]
+                },
+                options: chartDefaults()
+            });
+        }
+        function renderMonthlyChart(monthly) {
+            const ctx = document.getElementById('monthly-chart');
+            if (!ctx) return;
+            new Chart(ctx.getContext('2d'), {
+                type: 'line',
+                data: {
+                    labels: monthly.map(m => m.month),
+                    datasets: [{
+                        data: monthly.map(m => m.count),
+                        borderColor: '#0088cc',
+                        backgroundColor: 'rgba(0, 136, 204, 0.1)',
+                        fill: true,
+                        tension: 0.3,
+                        pointRadius: 3,
+                        pointHoverRadius: 6
+                    }]
+                },
+                options: chartDefaults()
+            });
+        }
+        function renderDailyChart(daily) {
+            const ctx = document.getElementById('daily-chart');
+            if (!ctx) return;
+            // Reverse to chronological order
+            const sorted = [...daily].reverse();
+            new Chart(ctx.getContext('2d'), {
+                type: 'bar',
+                data: {
+                    labels: sorted.map(d => d.date.substring(5)),  // MM-DD
+                    datasets: [{
+                        data: sorted.map(d => d.count),
+                        backgroundColor: 'rgba(0, 136, 204, 0.4)',
+                        borderColor: 'rgba(0, 136, 204, 0.8)',
+                        borderWidth: 1
+                    }]
+                },
+                options: chartDefaults()
+            });
+        }
+    </script>
+</body>
+</html>

templates/users.html ADDED Viewed

	@@ -0,0 +1,344 @@

+<!DOCTYPE html>
+<html lang="en">
+<head>
+    <meta charset="UTF-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>Users - Telegram Analytics</title>
+    <link rel="stylesheet" href="/static/css/style.css">
+    <script src="https://cdn.jsdelivr.net/npm/chart.js"></script>
+</head>
+<body>
+    <!-- Sidebar -->
+    <nav class="sidebar">
+        <div class="logo">
+            <span class="logo-icon">📊</span>
+            <span class="logo-text">TG Analytics</span>
+        </div>
+        <ul class="nav-menu">
+            <li class="nav-item">
+                <a href="/" class="nav-link">
+                    <span class="icon">📈</span>
+                    <span>Overview</span>
+                </a>
+            </li>
+            <li class="nav-item active">
+                <a href="/users" class="nav-link">
+                    <span class="icon">👥</span>
+                    <span>Users</span>
+                </a>
+            </li>
+            <li class="nav-item">
+                <a href="/chat" class="nav-link">
+                    <span class="icon">💬</span>
+                    <span>Chat</span>
+                </a>
+            </li>
+            <li class="nav-item">
+                <a href="/search" class="nav-link">
+                    <span class="icon">🔍</span>
+                    <span>Search</span>
+                </a>
+            </li>
+            <li class="nav-item">
+                <a href="/moderation" class="nav-link">
+                    <span class="icon">🛡️</span>
+                    <span>Moderation</span>
+                </a>
+            </li>
+            <li class="nav-item">
+                <a href="/settings" class="nav-link">
+                    <span class="icon">⚙️</span>
+                    <span>Settings</span>
+                </a>
+            </li>
+        </ul>
+        <div class="sidebar-footer">
+            <div class="export-buttons">
+                <button onclick="exportUsers()" class="btn btn-sm">📥 Export Users</button>
+            </div>
+        </div>
+    </nav>
+    <!-- Main Content -->
+    <main class="main-content">
+        <!-- Header -->
+        <header class="header">
+            <h1>User Leaderboard</h1>
+            <div class="header-controls">
+                <select id="timeframe" class="select" onchange="loadUsers()">
+                    <option value="today">Today</option>
+                    <option value="yesterday">Yesterday</option>
+                    <option value="week">This Week</option>
+                    <option value="month" selected>This Month</option>
+                    <option value="year">This Year</option>
+                    <option value="all">All Time</option>
+                </select>
+                <button onclick="loadUsers()" class="btn btn-primary">🔄 Refresh</button>
+            </div>
+        </header>
+        <!-- User Stats Summary -->
+        <section class="stats-grid">
+            <div class="stat-card">
+                <div class="stat-icon">👥</div>
+                <div class="stat-content">
+                    <div class="stat-value" id="total-users">-</div>
+                    <div class="stat-label">Total Members</div>
+                </div>
+            </div>
+            <div class="stat-card">
+                <div class="stat-icon">💬</div>
+                <div class="stat-content">
+                    <div class="stat-value" id="total-active">-</div>
+                    <div class="stat-label">Active Users</div>
+                </div>
+            </div>
+            <div class="stat-card">
+                <div class="stat-icon">🏆</div>
+                <div class="stat-content">
+                    <div class="stat-value" id="top-user">-</div>
+                    <div class="stat-label">Top User</div>
+                </div>
+            </div>
+            <div class="stat-card">
+                <div class="stat-icon">📊</div>
+                <div class="stat-content">
+                    <div class="stat-value" id="avg-messages">-</div>
+                    <div class="stat-label">Avg Messages/User</div>
+                </div>
+            </div>
+        </section>
+        <!-- Users Table -->
+        <section class="chart-card full-width">
+            <div class="chart-header">
+                <h3>All Users</h3>
+                <div style="display: flex; gap: 1rem; align-items: center;">
+                    <input type="search" id="user-search" placeholder="Search users..."
+                           style="width: 200px;" onkeyup="filterUsers()">
+                    <span id="showing-count" style="color: var(--text-muted); font-size: 0.875rem;"></span>
+                </div>
+            </div>
+            <div style="overflow-x: auto;">
+                <table class="users-table">
+                    <thead>
+                        <tr>
+                            <th style="width: 60px;">Rank</th>
+                            <th>User</th>
+                            <th style="width: 80px;">Role</th>
+                            <th style="width: 120px;">Messages</th>
+                            <th style="width: 100px;">Share</th>
+                            <th style="width: 100px;">Links</th>
+                            <th style="width: 100px;">Media</th>
+                            <th style="width: 100px;">Active Days</th>
+                            <th style="width: 100px;">Daily Avg</th>
+                        </tr>
+                    </thead>
+                    <tbody id="users-table-body">
+                        <tr>
+                            <td colspan="8" class="loading">
+                                <div class="spinner"></div>
+                            </td>
+                        </tr>
+                    </tbody>
+                </table>
+            </div>
+            <!-- Pagination -->
+            <div class="pagination" id="pagination"></div>
+        </section>
+    </main>
+    <script src="/static/js/dashboard.js"></script>
+    <script>
+        // State
+        let allUsers = [];
+        let currentPage = 1;
+        const pageSize = 20;
+        // Initialize
+        document.addEventListener('DOMContentLoaded', () => {
+            loadUsers();
+        });
+        async function loadUsers() {
+            const timeframe = document.getElementById('timeframe').value;
+            const tbody = document.getElementById('users-table-body');
+            tbody.innerHTML = '<tr><td colspan="9" class="loading"><div class="spinner"></div></td></tr>';
+            try {
+                const response = await fetch(`/api/users?timeframe=${timeframe}&limit=500&include_inactive=1`);
+                const data = await response.json();
+                allUsers = data.users;
+                // Update summary stats
+                document.getElementById('total-users').textContent = formatNumber(data.total);
+                document.getElementById('total-active').textContent = formatNumber(data.total_active);
+                if (allUsers.length > 0) {
+                    const activeUsers = allUsers.filter(u => u.messages > 0);
+                    if (activeUsers.length > 0) {
+                        document.getElementById('top-user').textContent = activeUsers[0].name;
+                        const totalMessages = activeUsers.reduce((sum, u) => sum + u.messages, 0);
+                        document.getElementById('avg-messages').textContent =
+                            formatNumber(Math.round(totalMessages / activeUsers.length));
+                    }
+                }
+                currentPage = 1;
+                renderUsers();
+            } catch (error) {
+                tbody.innerHTML = '<tr><td colspan="9" class="empty-state">Error loading users</td></tr>';
+            }
+        }
+        function filterUsers() {
+            currentPage = 1;
+            renderUsers();
+        }
+        function renderUsers() {
+            const search = document.getElementById('user-search').value.toLowerCase();
+            const filtered = allUsers.filter(u =>
+                u.name.toLowerCase().includes(search) ||
+                u.user_id.toLowerCase().includes(search)
+            );
+            const start = (currentPage - 1) * pageSize;
+            const end = start + pageSize;
+            const pageUsers = filtered.slice(start, end);
+            document.getElementById('showing-count').textContent =
+                `Showing ${start + 1}-${Math.min(end, filtered.length)} of ${filtered.length}`;
+            const tbody = document.getElementById('users-table-body');
+            if (pageUsers.length === 0) {
+                tbody.innerHTML = '<tr><td colspan="9" class="empty-state">No users found</td></tr>';
+                return;
+            }
+            tbody.innerHTML = pageUsers.map((user, i) => {
+                const rank = user.rank || '-';
+                const rankClass = rank === 1 ? 'gold' : rank === 2 ? 'silver' : rank === 3 ? 'bronze' : '';
+                const initial = user.name.charAt(0).toUpperCase();
+                const isInactive = user.messages === 0;
+                const rowStyle = isInactive ? 'opacity: 0.6;' : '';
+                let roleBadge = '';
+                if (user.role === 'creator') roleBadge = '<span style="background:#ffd700;color:#1a1a2e;padding:2px 6px;border-radius:4px;font-size:0.7rem;font-weight:600;">Creator</span>';
+                else if (user.role === 'admin') roleBadge = '<span style="background:#28a745;color:white;padding:2px 6px;border-radius:4px;font-size:0.7rem;font-weight:600;">Admin</span>';
+                else if (user.role === 'bot') roleBadge = '<span style="background:#6c757d;color:white;padding:2px 6px;border-radius:4px;font-size:0.7rem;font-weight:600;">Bot</span>';
+                const subtitle = user.username
+                    ? `@${escapeHtml(user.username)}`
+                    : `ID: ${user.user_id}`;
+                return `
+                    <tr onclick="window.location.href='/user/${user.user_id}'" style="cursor: pointer; ${rowStyle}">
+                        <td><span class="list-rank ${rankClass}">${rank !== '-' ? '#' + rank : '-'}</span></td>
+                        <td>
+                            <div class="user-cell">
+                                <div class="user-avatar">${initial}</div>
+                                <div>
+                                    <div class="list-name">${escapeHtml(user.name)}</div>
+                                    <div class="list-subtitle">${subtitle}</div>
+                                </div>
+                            </div>
+                        </td>
+                        <td>${roleBadge}</td>
+                        <td>
+                            ${isInactive ? '<span style="color: var(--text-muted);">-</span>' : `
+                            <strong>${formatNumber(user.messages)}</strong>
+                            <div class="progress-bar">
+                                <div class="progress-fill" style="width: ${user.percentage}%"></div>
+                            </div>`}
+                        </td>
+                        <td>${isInactive ? '-' : user.percentage + '%'}</td>
+                        <td>${isInactive ? '-' : formatNumber(user.links)}</td>
+                        <td>${isInactive ? '-' : formatNumber(user.media)}</td>
+                        <td>${isInactive ? '-' : user.active_days}</td>
+                        <td>${isInactive ? '-' : user.daily_average}</td>
+                    </tr>
+                `;
+            }).join('');
+            // Render pagination
+            const totalPages = Math.ceil(filtered.length / pageSize);
+            renderPagination(totalPages);
+        }
+        function renderPagination(totalPages) {
+            const pagination = document.getElementById('pagination');
+            if (totalPages <= 1) {
+                pagination.innerHTML = '';
+                return;
+            }
+            let html = '';
+            // Previous button
+            html += `<button class="page-btn" onclick="goToPage(${currentPage - 1})"
+                     ${currentPage === 1 ? 'disabled' : ''}>&laquo;</button>`;
+            // Page numbers
+            const maxVisible = 5;
+            let startPage = Math.max(1, currentPage - Math.floor(maxVisible / 2));
+            let endPage = Math.min(totalPages, startPage + maxVisible - 1);
+            if (endPage - startPage < maxVisible - 1) {
+                startPage = Math.max(1, endPage - maxVisible + 1);
+            }
+            if (startPage > 1) {
+                html += `<button class="page-btn" onclick="goToPage(1)">1</button>`;
+                if (startPage > 2) html += `<span style="padding: 0 0.5rem;">...</span>`;
+            }
+            for (let i = startPage; i <= endPage; i++) {
+                html += `<button class="page-btn ${i === currentPage ? 'active' : ''}"
+                         onclick="goToPage(${i})">${i}</button>`;
+            }
+            if (endPage < totalPages) {
+                if (endPage < totalPages - 1) html += `<span style="padding: 0 0.5rem;">...</span>`;
+                html += `<button class="page-btn" onclick="goToPage(${totalPages})">${totalPages}</button>`;
+            }
+            // Next button
+            html += `<button class="page-btn" onclick="goToPage(${currentPage + 1})"
+                     ${currentPage === totalPages ? 'disabled' : ''}>&raquo;</button>`;
+            pagination.innerHTML = html;
+        }
+        function goToPage(page) {
+            currentPage = page;
+            renderUsers();
+            window.scrollTo({ top: 0, behavior: 'smooth' });
+        }
+        function openUserProfile(userId) {
+            window.location.href = `/user/${userId}`;
+        }
+        // Export function
+        function exportUsers() {
+            const timeframe = document.getElementById('timeframe').value;
+            window.location.href = `/api/export/users?timeframe=${timeframe}`;
+        }
+        // Helper functions
+        function formatNumber(num) {
+            if (num >= 1000000) return (num / 1000000).toFixed(1) + 'M';
+            if (num >= 1000) return (num / 1000).toFixed(1) + 'K';
+            return num.toString();
+        }
+        function escapeHtml(text) {
+            const div = document.createElement('div');
+            div.textContent = text;
+            return div.innerHTML;
+        }
+    </script>
+</body>
+</html>