---
title: Telegram Analytics Dashboard
emoji: 📊
colorFrom: blue
colorTo: indigo
sdk: docker
app_port: 7860
---

# Telegram JSON Indexer & Analyzer

A high-performance system for indexing, searching, and analyzing Telegram chat exports using SQLite FTS5 and advanced algorithms from Data Structures course. Includes a full-featured **Web Dashboard** with **AI-powered search**.

```
╔══════════════════════════════════════════════════════════════════════════════╗
║                         TELEGRAM CHAT ANALYZER                                ║
║                                                                               ║
║  ┌─────────┐    ┌─────────┐    ┌─────────┐    ┌─────────────────────────┐    ║
║  │  JSON   │───▶│ INDEXER │───▶│ SQLite  │───▶│     WEB DASHBOARD       │    ║
║  │ Export  │    │ Bloom   │    │ + FTS5  │    │  ┌─────┬─────┬─────┐   │    ║
║  │         │    │ Filter  │    │         │    │  │Stats│Users│Chat │   │    ║
║  └─────────┘    └─────────┘    └─────────┘    │  ├─────┼─────┼─────┤   │    ║
║                                               │  │Search│ AI  │Mod  │   │    ║
║                                               │  └─────┴─────┴─────┘   │    ║
║                                               └─────────────────────────┘    ║
╚══════════════════════════════════════════════════════════════════════════════╝
```

## Features

### Core Features
- **Full-Text Search** - Fast search with Hebrew support using SQLite FTS5
- **Fuzzy Search** - Find messages even with typos using trigram similarity
- **Similar Message Detection** - LCS algorithm finds duplicates/reposts
- **Conversation Threads** - DFS/BFS traversal reconstructs reply chains
- **User Rankings** - O(log n) rank queries using AVL Rank Tree
- **Time Analytics** - Bucket Sort for efficient histograms
- **Top-K Queries** - Heap-based O(n log k) instead of O(n log n)
- **Percentiles** - O(n) median/percentiles using Selection algorithm

### Web Dashboard
- **Interactive Overview** - Charts, stats, activity graphs
- **User Leaderboard** - Rankings with detailed user profiles
- **Telegram-like Chat View** - Browse all messages like in Telegram
- **Advanced Search** - Full-text + fuzzy search with filters
- **AI-Powered Search** - Natural language queries (Hebrew/English)
- **Moderation Analytics** - Links, mentions, domains analysis
- **Database Updates** - Upload new JSON files via web UI

### AI Search (Free Providers)
- **Ollama** - Local LLM (recommended, 100% free)
- **Groq** - Free API tier available
- **Google Gemini** - Free API tier available

---

## Table of Contents

1. [Installation](#installation)
2. [Quick Start](#quick-start)
3. [Web Dashboard](#web-dashboard)
4. [AI Search](#ai-search)
5. [Database Updates](#database-updates)
6. [Architecture](#architecture)
7. [Usage Guide](#usage-guide)
8. [Algorithms](#algorithms)
9. [API Reference](#api-reference)
10. [Examples](#examples)

---

## Installation

### Requirements

- Python 3.10 or higher
- No external packages required for core functionality

### Setup

```bash
# Clone or download the project
cd telegram

# Verify Python version
python --version  # Should be 3.10+

# Test the system
python algorithms.py  # Should print "ALL TESTS PASSED!"
```

### Optional: Semantic Search

For AI-powered semantic similarity search:

```bash
pip install numpy faiss-cpu sentence-transformers
```

---

## Quick Start

### Step 1: Export from Telegram

1. Open Telegram Desktop
2. Go to any chat/group
3. Click ⋮ → Export Chat History
4. Select JSON format
5. Save as `result.json`

### Step 2: Index Your Data

```bash
python indexer.py result.json --db telegram.db
```

### Step 3: Launch Web Dashboard

```bash
# Start the dashboard (recommended)
python dashboard.py

# Open in browser: http://localhost:5000
```

### Step 4: Search & Analyze (CLI)

```bash
# Search messages
python search.py "שלום"

# View statistics
python analyzer.py --stats

# Find similar messages
python analyzer.py --similar
```

---

## Web Dashboard

The web dashboard provides a complete visual interface for analyzing your Telegram data.

### Starting the Dashboard

```bash
python dashboard.py
# Or with custom port:
python dashboard.py --port 8080
```

### Dashboard Pages

```
┌─────────────────────────────────────────────────────────────────────────┐
│                           WEB DASHBOARD                                  │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  📈 Overview      │  Main statistics, charts, activity graphs           │
│                   │  - Total messages, users, links, media              │
│                   │  - Daily/hourly activity charts                     │
│                   │  - Top users leaderboard                            │
│                                                                          │
│  👥 Users         │  User leaderboard with detailed profiles            │
│                   │  - Ranking by message count                         │
│                   │  - User details modal (hourly activity)             │
│                   │  - Export users to CSV                              │
│                                                                          │
│  💬 Chat          │  Telegram-like message view                         │
│                   │  - Browse all messages chronologically              │
│                   │  - Filter by user, date, media type                 │
│                   │  - Click message to view full thread                │
│                   │  - AI search with natural language                  │
│                                                                          │
│  🔍 Search        │  Advanced search interface                          │
│                   │  - Full-text search (Hebrew supported)              │
│                   │  - AI-powered natural language search               │
│                   │  - Boolean operators (AND, OR, NOT)                 │
│                   │  - Export search results                            │
│                                                                          │
│  🛡️ Moderation    │  Content analytics                                  │
│                   │  - Top shared domains                               │
│                   │  - Most mentioned users                             │
│                   │  - Link sharers leaderboard                         │
│                   │  - Word frequency analysis                          │
│                                                                          │
│  ⚙️ Settings      │  Database management                                │
│                   │  - View database statistics                         │
│                   │  - Upload new JSON files                            │
│                   │  - Automatic duplicate detection                    │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘
```

### Dashboard Features

- **Dark Theme** - Modern dark UI, easy on the eyes
- **RTL Support** - Full Hebrew/Arabic text support
- **Responsive** - Works on mobile and desktop
- **Real-time Charts** - Interactive Chart.js visualizations
- **Export** - Download data as CSV/JSON

---

## AI Search

Ask questions about your chat data in natural language (Hebrew or English).

### Setup AI Provider (Free Options)

#### Option 1: Ollama (Recommended - 100% Local & Free)

```bash
# Install Ollama (https://ollama.ai)
curl -fsSL https://ollama.ai/install.sh | sh

# Pull a model
ollama pull llama3.2

# Start Ollama server
ollama serve
```

#### Option 2: Groq (Free API Tier)

```bash
# Get free API key from https://console.groq.com
export GROQ_API_KEY="your_api_key"
```

#### Option 3: Google Gemini (Free API Tier)

```bash
# Get free API key from https://makersuite.google.com/app/apikey
export GEMINI_API_KEY="your_api_key"
```

### AI Search Examples

```
┌─────────────────────────────────────────────────────────────────────────┐
│  🤖 AI Search - Natural Language Queries                                │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  Query: "מי שלח הכי הרבה הודעות?"                                       │
│  Answer: המשתמש הפעיל ביותר הוא דני עם 5,432 הודעות                     │
│                                                                          │
│  Query: "מתי היו הכי הרבה הודעות?"                                      │
│  Answer: היום הפעיל ביותר היה 15.03.2024 עם 342 הודעות                  │
│                                                                          │
│  Query: "Who mentioned @admin the most?"                                 │
│  Answer: User "Mike" mentioned @admin 47 times                           │
│                                                                          │
│  Query: "הראה הודעות עם קישורים מהשבוע האחרון"                          │
│  Answer: נמצאו 23 הודעות עם קישורים...                                  │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘
```

### AI Search API

```python
from ai_search import AISearchEngine

# Initialize with Ollama (local)
ai = AISearchEngine('telegram.db', provider='ollama')

# Or with Groq
ai = AISearchEngine('telegram.db', provider='groq', api_key='your_key')

# Search
result = ai.search("מי הכי פעיל בלילה?")
print(result['answer'])  # Natural language answer
print(result['sql'])     # Generated SQL query
print(result['results']) # Raw data
```

---

## Database Updates

Update your database with new JSON exports without losing existing data.

### Via Web UI

1. Go to **Settings** page in the dashboard
2. Drag & drop your new `result.json` file
3. Wait for processing (duplicate detection automatic)
4. See summary of new messages added

### Via CLI

```bash
# Update existing database with new JSON
python indexer.py new_export.json --db telegram.db --update

# What happens:
# 1. Loads existing message IDs into Bloom filter (O(n))
# 2. For each message in JSON:
#    - Check if exists using Bloom filter (O(1))
#    - Only insert if new
# 3. Re-index FTS if needed
# 4. Report: X new messages, Y duplicates skipped
```

### Incremental Update Process

```
┌─────────────────────────────────────────────────────────────────────────┐
│                    INCREMENTAL UPDATE PROCESS                            │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  Existing DB                    New JSON                                 │
│  ┌─────────────┐               ┌─────────────┐                          │
│  │ msg_1 ✓     │               │ msg_1       │ → Skip (duplicate)       │
│  │ msg_2 ✓     │               │ msg_2       │ → Skip (duplicate)       │
│  │ msg_3 ✓     │               │ msg_5  NEW  │ → Insert                 │
│  │ msg_4 ✓     │               │ msg_6  NEW  │ → Insert                 │
│  └─────────────┘               └─────────────┘                          │
│         │                             │                                  │
│         │      Bloom Filter           │                                  │
│         │      ┌───────────┐          │                                  │
│         └─────▶│ O(1) test │◀─────────┘                                  │
│                └───────────┘                                             │
│                                                                          │
│  Result: Only msg_5 and msg_6 added (fast!)                             │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘
```

---

## Architecture

### System Overview

```
┌─────────────────────────────────────────────────────────────────┐
│                         INPUT                                    │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │  Telegram JSON Export (result.json)                      │    │
│  │  ├── messages[]                                          │    │
│  │  │   ├── id, date, from, text                           │    │
│  │  │   ├── reply_to_message_id                            │    │
│  │  │   └── text_entities[] (links, mentions)              │    │
│  │  └── ...                                                 │    │
│  └─────────────────────────────────────────────────────────┘    │
└─────────────────────────┬───────────────────────────────────────┘
                          │
                          ▼
┌─────────────────────────────────────────────────────────────────┐
│                      INDEXER (indexer.py)                        │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐              │
│  │   Batch     │  │   Bloom     │  │   Reply     │              │
│  │  Processing │  │   Filter    │  │   Graph     │              │
│  │  (1000/tx)  │  │ (Dedup O(1))│  │  Builder    │              │
│  └─────────────┘  └─────────────┘  └─────────────┘              │
└─────────────────────────┬───────────────────────────────────────┘
                          │
                          ▼
┌─────────────────────────────────────────────────────────────────┐
│                    SQLite DATABASE                               │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │  messages          │  FTS5 Index      │  reply_graph    │    │
│  │  ├── id (PK)       │  ├── text_plain  │  ├── parent_id  │    │
│  │  ├── text_plain    │  └── from_name   │  └── child_id   │    │
│  │  ├── from_id       │                  │                 │    │
│  │  ├── date_unixtime │  entities        │  threads        │    │
│  │  └── ...           │  ├── links       │  └── messages   │    │
│  │                    │  └── mentions    │                 │    │
│  └─────────────────────────────────────────────────────────┘    │
└─────────────────────────┬───────────────────────────────────────┘
                          │
          ┌───────────────┼───────────────┐
          ▼               ▼               ▼
┌─────────────┐  ┌─────────────┐  ┌─────────────┐
│   SEARCH    │  │  ANALYZER   │  │   VECTOR    │
│ (search.py) │  │(analyzer.py)│  │  (optional) │
│             │  │             │  │             │
│ • FTS5+BM25 │  │ • Top-K     │  │ • FAISS     │
│ • Fuzzy     │  │ • LCS       │  │ • Semantic  │
│ • Threads   │  │ • Rank Tree │  │ • Clustering│
│ • LRU Cache │  │ • Percentile│  │             │
└─────────────┘  └─────────────┘  └─────────────┘
```

### Data Flow

```
JSON Message                 Database Tables              Search/Analytics
───────────                  ───────────────              ────────────────

{                           ┌─────────────┐
  "id": 548795,       ───▶  │  messages   │  ───▶  Full-text search
  "text": "שלום",           └─────────────┘        User filtering
  "from": "User1",                                 Date range queries
  "from_id": "user123", ─▶  ┌─────────────┐
  "date_unixtime": ...,     │   users     │  ───▶  Top users (Heap)
                            └─────────────┘        User rank (Rank Tree)
  "text_entities": [
    {"type": "link", ────▶  ┌─────────────┐
     "text": "url"}         │  entities   │  ───▶  Link analysis
  ],                        └─────────────┘        Mention network

  "reply_to_message_id" ─▶  ┌─────────────┐
                            │ reply_graph │  ───▶  Thread DFS/BFS
}                           └─────────────┘        Conversation view
```

### File Structure

```
telegram/
│
├── dashboard.py        # 🌐 Web Dashboard (Flask)
│   └── Routes: /, /users, /chat, /search, /moderation, /settings
│   └── API: /api/overview, /api/users, /api/search, /api/update, etc.
│
├── ai_search.py        # 🤖 AI-Powered Search
│   └── AISearchEngine class
│       ├── Natural language to SQL
│       ├── Ollama/Groq/Gemini providers
│       └── Hebrew/English support
│
├── indexer.py          # JSON → SQLite indexer
│   ├── OptimizedIndexer class
│   │   ├── Batch processing (100x faster)
│   │   ├── Bloom filter (duplicate detection)
│   │   └── Graph builder (reply threads)
│   └── IncrementalIndexer class
│       ├── Update existing database
│       ├── Bloom filter duplicate check
│       └── Only insert new messages
│
├── search.py           # Search interface
│   └── TelegramSearch class
│       ├── FTS5 full-text search
│       ├── Fuzzy trigram search
│       ├── LRU query cache
│       └── DFS/BFS thread traversal
│
├── analyzer.py         # Analytics & statistics
│   └── TelegramAnalyzer class
│       ├── LCS similar messages
│       ├── Heap-based Top-K
│       ├── Selection percentiles
│       ├── Rank Tree queries
│       └── Bucket Sort histograms
│
├── data_structures.py  # Core data structures
│   ├── BloomFilter     # O(1) membership test
│   ├── Trie            # O(k) prefix search
│   ├── LRUCache        # O(1) caching
│   ├── ReplyGraph      # DFS/BFS traversal
│   └── TrigramIndex    # Fuzzy matching
│
├── algorithms.py       # Course algorithms
│   ├── LCS             # Similar message detection
│   ├── TopK (Heap)     # Efficient ranking
│   ├── Selection       # O(n) percentiles
│   ├── RankTree        # O(log n) rank queries
│   └── BucketSort      # Time histograms
│
├── templates/          # 🎨 HTML Templates
│   ├── index.html      # Overview dashboard
│   ├── users.html      # User leaderboard
│   ├── chat.html       # Telegram-like chat view
│   ├── search.html     # Search interface
│   ├── moderation.html # Content analytics
│   └── settings.html   # Settings & DB update
│
├── static/             # 📁 Static assets
│   ├── css/style.css   # Dashboard styles
│   └── js/dashboard.js # Dashboard scripts
│
├── vector_search.py    # Optional: Semantic search
│   └── VectorSearch class (requires FAISS)
│
├── schema.sql          # Database schema
└── telegram.db         # SQLite database (created)
```

---

## Usage Guide

### Web Dashboard (Recommended)

```bash
# Start the dashboard
python dashboard.py

# Custom port
python dashboard.py --port 8080

# Custom database
python dashboard.py --db my_chat.db
```

### Indexing

```bash
# Basic indexing
python indexer.py result.json

# Custom database name
python indexer.py result.json --db my_chat.db

# With trigram index (for fuzzy search)
python indexer.py result.json --build-trigrams

# Larger batch size (faster for big files)
python indexer.py result.json --batch-size 5000

# Update existing database with new JSON (incremental)
python indexer.py new_export.json --db telegram.db --update
```

### Searching

```bash
# Basic search (Hebrew supported)
python search.py "שלום"

# Search with filters
python search.py "מילה" --user user123456 --limit 50

# Date range
python search.py "חדשות" --from-date 2024-01-01 --to-date 2024-12-31

# Fuzzy search (finds typos)
python search.py "שלמ" --fuzzy --threshold 0.3

# View conversation thread
python search.py --thread 548795

# List all links
python search.py --list-links

# List all mentions
python search.py --list-mentions
```

### Analytics

```bash
# General statistics
python analyzer.py --stats

# Top users (Heap-based O(n log k))
python analyzer.py --top-users --limit 10

# Hourly activity
python analyzer.py --hourly

# Daily activity
python analyzer.py --daily

# Top words
python analyzer.py --words --limit 30

# Top domains
python analyzer.py --domains

# Find similar messages (LCS algorithm)
python analyzer.py --similar --threshold 0.7

# Find reposts
python analyzer.py --reposts

# Message length percentiles (Selection algorithm)
python analyzer.py --percentiles

# Response time percentiles
python analyzer.py --response-times

# User rank (Rank Tree O(log n))
python analyzer.py --user-rank user123456

# Get user at rank #5
python analyzer.py --rank 5

# Activity histogram (Bucket Sort)
python analyzer.py --histogram --bucket-size 86400

# Export as JSON
python analyzer.py --stats --json > stats.json
```

---

## Algorithms

### Algorithm Complexity Comparison

```
┌────────────────────┬─────────────────┬─────────────────┬─────────────┐
│     Operation      │  Naive Method   │  Our Algorithm  │ Improvement │
├────────────────────┼─────────────────┼─────────────────┼─────────────┤
│ Top-K users        │ O(n log n) sort │ O(n log k) heap │   ~10x      │
│ Find median        │ O(n log n) sort │ O(n) selection  │   ~5x       │
│ User rank query    │ O(n) scan       │ O(log n) tree   │   ~100x     │
│ Duplicate check    │ O(n) lookup     │ O(1) bloom      │   ~1000x    │
│ Similar messages   │ O(n²m²) naive   │ O(n²m) LCS+DP   │   ~10x      │
│ Time histogram     │ O(n log n) sort │ O(n+k) bucket   │   ~5x       │
│ Thread traversal   │ O(n) repeated   │ O(V+E) DFS/BFS  │   ~10x      │
└────────────────────┴─────────────────┴─────────────────┴─────────────┘
```

### 1. LCS (Longest Common Subsequence)

**Purpose:** Find similar/duplicate messages

```
String 1: "שלום לכולם מה קורה"
String 2: "שלום לכולם מה נשמע"
                          ↓
LCS:      "שלום לכולם מה "
Similarity: 77.78%
```

**Algorithm:**
```
┌───┬───┬───┬───┬───┬───┐
│   │ ∅ │ A │ B │ C │ D │   DP Table
├───┼───┼───┼───┼───┼───┤
│ ∅ │ 0 │ 0 │ 0 │ 0 │ 0 │   dp[i][j] = length of LCS
│ A │ 0 │ 1 │ 1 │ 1 │ 1 │   for first i and j chars
│ C │ 0 │ 1 │ 1 │ 2 │ 2 │
│ B │ 0 │ 1 │ 2 │ 2 │ 2 │   Time:  O(m × n)
│ D │ 0 │ 1 │ 2 │ 2 │ 3 │   Space: O(min(m,n))
└───┴───┴───┴───┴───┴───┘
```

### 2. Heap-based Top-K

**Purpose:** Find top K items without sorting everything

```
Finding Top 3 from [5,2,8,1,9,3,7,4,6]

Min-Heap (size K=3):

Step 1: [5]           Add 5
Step 2: [2,5]         Add 2
Step 3: [2,5,8]       Add 8 (heap full)
Step 4: [2,5,8]       Skip 1 (< min)
Step 5: [5,9,8]       Replace 2 with 9
Step 6: [5,9,8]       Skip 3 (< min)
Step 7: [7,9,8]       Replace 5 with 7
...
Result: [7,8,9]       Top 3!

Time: O(n log k) vs O(n log n) for full sort
```

### 3. Selection Algorithm (Median of Medians)

**Purpose:** Find k-th element or percentiles in O(n)

```
Find median of [3,1,4,1,5,9,2,6,5,3,5]

┌─────────────────────────────────────────┐
│  Divide into groups of 5:               │
│  [3,1,4,1,5] [9,2,6,5,3] [5]           │
│       ↓           ↓        ↓            │
│  Medians: 3       5        5            │
│       ↓                                 │
│  Median of medians: 5 (pivot)           │
│       ↓                                 │
│  Partition around 5                     │
│  [3,1,4,1,2,3] [5,5,5] [9,6]           │
│       6 elements  3     2               │
│       ↓                                 │
│  Median is at position 5 → found!       │
└─────────────────────────────────────────┘

Time: O(n) guaranteed (not just average!)
```

### 4. Rank Tree (Order Statistics Tree)

**Purpose:** O(log n) rank queries

```
AVL Tree with size augmentation:

           ┌───────────────┐
           │  150 (size=5) │
           └───────┬───────┘
          ┌────────┴────────┐
    ┌─────┴─────┐     ┌─────┴─────┐
    │ 100 (s=2) │     │ 250 (s=2) │
    └─────┬─────┘     └─────┬─────┘
    ┌─────┴              ┌──┴
┌───┴───┐            ┌───┴───┐
│50 (1) │            │300 (1)│
└───────┘            └───────┘

select(3) → 150  (3rd smallest)
rank(150) → 3    (rank of 150)

Time: O(log n) for both operations
```

### 5. Bucket Sort (Time Histograms)

**Purpose:** O(n+k) time-based grouping

```
Messages with timestamps:
[1000, 1500, 2500, 1200, 3000]

Bucket size: 1000 seconds

┌─────────┬─────────┬─────────┬─────────┐
│ 0-1000  │1000-2000│2000-3000│3000-4000│
├─────────┼─────────┼─────────┼─────────┤
│         │ 1000    │  2500   │  3000   │
│         │ 1500    │         │         │
│         │ 1200    │         │         │
├─────────┼─────────┼─────────┼─────────┤
│ Count:0 │ Count:3 │ Count:1 │ Count:1 │
└─────────┴─────────┴─────────┴─────────┘

Time: O(n + k) where k = number of buckets
```

### 6. DFS/BFS Thread Traversal

**Purpose:** Reconstruct conversation threads

```
Reply Graph:

    [1] Original message
     │
     ├──[2] Reply to 1
     │   │
     │   ├──[4] Reply to 2
     │   │
     │   └──[5] Reply to 2
     │
     └──[3] Reply to 1

DFS order: [1, 2, 4, 5, 3]  (deep first)
BFS order: [1, 2, 3, 4, 5]  (level by level)

With depth info:
  [1] depth=0
    [2] depth=1
      [4] depth=2
      [5] depth=2
    [3] depth=1

Time: O(V + E)
```

---

## API Reference

### Dashboard REST API

The web dashboard exposes a REST API for all operations:

```
┌─────────────────────────────────────────────────────────────────────────┐
│                         REST API ENDPOINTS                               │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  GET  /api/overview           Overview statistics                        │
│       ?timeframe=month        (today|yesterday|week|month|year|all)      │
│                                                                          │
│  GET  /api/users              User leaderboard                           │
│       ?timeframe=month        Timeframe filter                           │
│       &limit=100              Max users                                  │
│                                                                          │
│  GET  /api/user/<user_id>     User details                              │
│       ?timeframe=month        Includes hourly activity                   │
│                                                                          │
│  GET  /api/search             Full-text search                           │
│       ?q=search_term          Search query                               │
│       &timeframe=all          Timeframe filter                           │
│       &limit=20&offset=0      Pagination                                 │
│                                                                          │
│  POST /api/ai/search          AI-powered search                          │
│       {"query": "..."}        Natural language query                     │
│                                                                          │
│  GET  /api/chat/messages      Chat messages                              │
│       ?limit=50&offset=0      Pagination                                 │
│       &user_id=...            Filter by user                             │
│       &from_date=...          Date range                                 │
│                                                                          │
│  GET  /api/chat/thread/<id>   Get conversation thread                    │
│                               Returns full thread with DFS               │
│                                                                          │
│  GET  /api/top/domains        Top shared domains                         │
│  GET  /api/top/mentions       Top mentioned users                        │
│  GET  /api/top/words          Most frequent words                        │
│                                                                          │
│  POST /api/update             Update database with JSON                  │
│       (multipart form)        File upload                                │
│                                                                          │
│  GET  /api/db/stats           Database statistics                        │
│                               Size, counts, date range                   │
│                                                                          │
│  GET  /api/export/users       Export users as CSV                        │
│  GET  /api/export/messages    Export messages as CSV                     │
│                                                                          │
├─────────────────────────────────────────────────────────────────────────┤
│                    ALGORITHM-POWERED ENDPOINTS                           │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  GET  /api/similar/<id>       Find similar messages (LCS algorithm)      │
│       ?threshold=0.7          Similarity threshold                       │
│       ?limit=10               Max results                                │
│       Complexity: O(n*m)      n=sample, m=avg length                     │
│                                                                          │
│  GET  /api/analytics/similar  Find all similar pairs in DB               │
│       ?threshold=0.8          Similarity threshold                       │
│       Algorithm: LCS          O(n² * m) with early termination           │
│                                                                          │
│  GET  /api/user/rank/<id>     Get user rank (RankTree)                   │
│       Complexity: O(log n)    vs O(n) SQL scan                           │
│                                                                          │
│  GET  /api/user/by-rank/<k>   Get k-th ranked user (RankTree)            │
│       Algorithm: select(k)    O(log n)                                   │
│                                                                          │
│  GET  /api/analytics/histogram Activity histogram (Bucket Sort)          │
│       ?bucket=86400           Bucket size in seconds                     │
│       Complexity: O(n + k)    k=number of buckets                        │
│                                                                          │
│  GET  /api/analytics/percentiles Message length stats (Selection)        │
│       Algorithm: Quickselect  O(n) guaranteed                            │
│       Returns: min,max,median,p25,p75,p90,p95,p99                        │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘
```

### TelegramSearch

```python
from search import TelegramSearch

with TelegramSearch('telegram.db') as search:
    # Full-text search
    results = search.search("שלום", limit=50)

    # With filters
    results = search.search(
        "מילה",
        user_id="user123",
        from_date=1704067200,  # Unix timestamp
        to_date=1735689600,
        has_links=True
    )

    # Fuzzy search
    results = search.fuzzy_search("שלמ", threshold=0.3)

    # Get thread (DFS)
    thread = search.get_thread_dfs(message_id=548795)

    # Get thread with depth
    thread = search.get_thread_with_depth(message_id=548795)
    # Returns: [(message_dict, depth), ...]

    # Autocomplete usernames
    suggestions = search.autocomplete_user("@user")
```

### TelegramAnalyzer

```python
from analyzer import TelegramAnalyzer

with TelegramAnalyzer('telegram.db') as analyzer:
    # Statistics
    stats = analyzer.get_stats()

    # Top users (Heap-based)
    top_users = analyzer.get_top_users(limit=10)

    # Similar messages (LCS)
    similar = analyzer.find_similar_messages(threshold=0.7)

    # Percentiles (Selection algorithm)
    percentiles = analyzer.get_message_length_stats()
    # Returns: {min, max, median, p25, p75, p90, p95, p99}

    # User rank (Rank Tree)
    rank_info = analyzer.get_user_rank("user123")
    # Returns: {rank, total_users, percentile}

    # Get user by rank
    user = analyzer.get_user_by_rank(5)

    # Histogram (Bucket Sort)
    hist = analyzer.get_activity_histogram(bucket_size=86400)
```

---

## Examples

### Example 1: Find Most Active Hours

```python
from analyzer import TelegramAnalyzer

with TelegramAnalyzer('telegram.db') as analyzer:
    hourly = analyzer.get_hourly_activity()

    # Find peak hour
    peak_hour = max(hourly, key=hourly.get)
    print(f"Most active hour: {peak_hour}:00 ({hourly[peak_hour]} messages)")
```

### Example 2: Detect Spam/Reposts

```python
from analyzer import TelegramAnalyzer

with TelegramAnalyzer('telegram.db') as analyzer:
    reposts = analyzer.find_reposts(threshold=0.9)

    for r in reposts[:10]:
        print(f"Similarity: {r['similarity']:.0%}")
        print(f"  User 1: {r['user_1']}")
        print(f"  User 2: {r['user_2']}")
        print(f"  Text: {r['text_preview'][:50]}...")
```

### Example 3: Conversation Thread Analysis

```python
from search import TelegramSearch

with TelegramSearch('telegram.db') as search:
    # Get full thread
    thread = search.get_thread_with_depth(548795)

    print("Conversation thread:")
    for msg, depth in thread:
        indent = "  " * depth
        print(f"{indent}[{msg['from_name']}]: {msg['text_plain'][:50]}")
```

### Example 4: User Ranking

```python
from analyzer import TelegramAnalyzer

with TelegramAnalyzer('telegram.db') as analyzer:
    # Get rank of specific user
    rank = analyzer.get_user_rank("user123456")
    print(f"Rank: #{rank['rank']} of {rank['total_users']}")
    print(f"Top {rank['percentile']:.1f}%")

    # Get top 3 users
    for i in range(1, 4):
        user = analyzer.get_user_by_rank(i)
        print(f"#{i}: {user['name']} ({user['count']} messages)")
```

---

## Performance

Tested on 100,000 messages:

| Operation | Time |
|-----------|------|
| Indexing | ~10 seconds |
| Full-text search | <10ms |
| Fuzzy search | ~100ms |
| Top-K (k=20) | ~50ms |
| User rank query | <1ms |
| Thread traversal | <5ms |
| Similar messages (1000 sample) | ~2 seconds |

---

## License

MIT License - Free for personal and commercial use.

---

## Contributing

1. Fork the repository
2. Create feature branch
3. Commit changes
4. Push and create PR

---

## Troubleshooting

### "Module not found" error
```bash
# Make sure you're in the telegram directory
cd /path/to/telegram
python indexer.py result.json
```

### "Database is locked" error
```bash
# Close any other programs using the database
# Or use a different database name
python indexer.py result.json --db telegram2.db
```

### Hebrew text not displaying correctly
```bash
# Ensure your terminal supports UTF-8
export LANG=en_US.UTF-8
```

---

## Credits

Algorithms implemented from "Data Structures and Introduction to Algorithms" course:
- LCS (Longest Common Subsequence)
- Heap-based Top-K
- Selection Algorithm (Median of Medians)
- Rank Tree (Order Statistics Tree)
- Bucket Sort
- DFS/BFS Graph Traversal
- Bloom Filter
- Trie (Prefix Tree)