fikri0o0 commited on
Commit
76955d2
·
verified ·
1 Parent(s): a4d407b

Deploy: auto-ingest, hybrid RAG, streaming, UMAP viz, 16 LLMs

Browse files
Files changed (9) hide show
  1. .env.example +3 -0
  2. .gitignore +9 -0
  3. MODEL_COMPARISON.md +121 -0
  4. README.md +122 -5
  5. app.py +522 -0
  6. config.py +89 -0
  7. ingest.py +157 -0
  8. rag_chain.py +353 -0
  9. requirements.txt +17 -0
.env.example ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ GOOGLE_API_KEY=your_google_ai_studio_key_here
2
+ GROQ_API_KEY=your_groq_key_here
3
+ OPENROUTER_API_KEY=your_openrouter_key_here
.gitignore ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ .env
2
+ __pycache__/
3
+ *.pyc
4
+ *.pyo
5
+ .DS_Store
6
+ data/
7
+ vectorstore/
8
+ *.log
9
+ .pytest_cache/
MODEL_COMPARISON.md ADDED
@@ -0,0 +1,121 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # LLM Model Comparison
2
+
3
+ Benchmarked on: *"What is Nietzsche's view on nihilism and the will to power?"*
4
+ Setup: RTX 3060, EmbeddingGemma-300M on CUDA, ChromaDB (~5,700 chunks), `RETRIEVAL_K=5`.
5
+
6
+ Rate limits for Google verified directly from **aistudio.google.com/rate-limit** (May 2026).
7
+ Rate limits for Groq verified from **live API response headers** + console.groq.com/docs/rate-limits.
8
+ Rate limits for OpenRouter verified from **openrouter.ai/docs/guides/routing/model-variants/free**.
9
+
10
+ ---
11
+
12
+ ## Full Comparison Table
13
+
14
+ | Model | Provider | Latency | RPM | TPM | RPD | Notes |
15
+ |---|---|---|---|---|---|---|
16
+ | **Gemma 4 MoE 26B** | Google | ~65 s | 15 | ∞ | **1,500** | Best limits of any Google model; slow but deep |
17
+ | **Gemma 4 Dense 31B** | Google | ~25 s | 15 | ∞ | **1,500** | Same limits as MoE; faster, slightly less depth |
18
+ | **Gemini 3.1 Flash Lite** | Google | ~0.6 s | 15 | 250K | **500** | Newest Gemini, highest RPD among Flash models |
19
+ | **Gemini 3.5 Flash** | Google | ~0.8 s | 5 | 250K | 20 | Latest Gemini series; crisp reasoning |
20
+ | **Gemini 3 Flash** | Google | ~9 s | 5 | 250K | 20 | Solid baseline Gemini 3 |
21
+ | **Gemini 2.5 Flash** | Google | ~7 s | 5 | 250K | 20 | Previous generation; well-rounded |
22
+ | **Gemini 2.5 Flash Lite** | Google | ~2 s | 10 | 250K | 20 | Fastest 2.5; same 20 RPD as 2.5 Flash |
23
+ | **Llama 3.1 8B** | Groq | ~2 s | **14,400** | 6K | **14,400** | Highest throughput by far; limited depth |
24
+ | **Llama 4 Scout 17B** | Groq | ~1.5 s | 1,000 | 30K | 1,000 | Fastest quality model overall |
25
+ | **Llama 3.3 70B** | Groq | ~4.5 s | 1,000 | 12K | 1,000 | Best Groq quality; lower token quota |
26
+ | **Qwen3 32B** | Groq | ~5 s | 1,000 | 6K | 1,000 | Chain-of-thought; deepest Groq reasoning |
27
+ | **Nvidia Nemotron 120B** | OpenRouter | ~75 s | 20 | — | 50* | Exceptional philosophical depth; slow |
28
+ | **OpenAI OSS 120B** | OpenRouter | ~22 s | 20 | — | 50* | Strong quality; best free OR option |
29
+ | **DeepSeek V4 Flash** | OpenRouter | ~5 s† | 20 | — | 50* | 1M context window; fast when available |
30
+ | **Llama 3.3 70B** | OpenRouter | ~4 s† | 20 | — | 50* | Same weights as Groq; use Groq instead |
31
+ | **Qwen3 Next 80B** | OpenRouter | ~8 s† | 20 | — | 50* | Strong reasoning; frequently throttled |
32
+ | **Gemma 4 MoE 26B** | OpenRouter | ~5 s† | 20 | — | 50* | Same weights as Google version |
33
+
34
+ *50 RPD without account credits; 1,000 RPD with $10+ credit purchase
35
+ †Latency when not throttled; free-tier provider-side 429s are common during peak hours
36
+ — OpenRouter does not enforce token-based limits on free models
37
+
38
+ ---
39
+
40
+ ## Rate Limit Deep-Dive
41
+
42
+ ### Google AI Studio — verified from aistudio.google.com/rate-limit
43
+
44
+ | Model | API Model ID | RPM | TPM | RPD |
45
+ |---|---|---|---|---|
46
+ | Gemma 4 MoE 26B | `gemma-4-26b-a4b-it` | 15 | **Unlimited** | **1,500** |
47
+ | Gemma 4 Dense 31B | `gemma-4-31b-it` | 15 | **Unlimited** | **1,500** |
48
+ | Gemini 3.1 Flash Lite | `gemini-3.1-flash-lite` | 15 | 250,000 | **500** |
49
+ | Gemini 3.5 Flash | `gemini-3.5-flash` | 5 | 250,000 | 20 |
50
+ | Gemini 3 Flash | `gemini-3-flash-preview` | 5 | 250,000 | 20 |
51
+ | Gemini 2.5 Flash | `gemini-2.5-flash` | 5 | 250,000 | 20 |
52
+ | Gemini 2.5 Flash Lite | `gemini-2.5-flash-lite` | 10 | 250,000 | 20 |
53
+ | ~~Gemini 2.5 Pro~~ | ~~`gemini-2.5-pro`~~ | 0 | 0 | 0 |
54
+ | ~~Gemini 2.0 Flash~~ | ~~`gemini-2.0-flash`~~ | 0 | 0 | 0 |
55
+
56
+ > **Key insight:** Gemma 4 models have *significantly better* limits than Gemini models — unlimited TPM and 1,500 RPD vs just 20 RPD for most Gemini Flash variants. Gemini 2.5 Pro and 2.0 Flash are completely locked (0/0/0) on this account's free tier.
57
+
58
+ ---
59
+
60
+ ### Groq — verified from live API headers + docs
61
+
62
+ | Model | API Model ID | RPM | TPM | RPD | TPD |
63
+ |---|---|---|---|---|---|
64
+ | Llama 3.1 8B instant | `llama-3.1-8b-instant` | **14,400** | 6,000 | **14,400** | 500,000 |
65
+ | Llama 3.3 70B versatile | `llama-3.3-70b-versatile` | 1,000 | 12,000 | 1,000 | 100,000 |
66
+ | Llama 4 Scout 17B | `meta-llama/llama-4-scout-17b-16e-instruct` | 1,000 | 30,000 | 1,000 | 500,000 |
67
+ | Qwen3 32B | `qwen/qwen3-32b` | 1,000 | 6,000 | 1,000 | 500,000 |
68
+
69
+ > **Key insight:** Groq is the most generous free tier for RAG use. Llama 3.1 8B has 14,400 RPD — useful for high-volume scenarios. Note that TPM limits (6K–30K) can be a bottleneck when RAG context is large; Llama 4 Scout has the most generous TPM at 30K.
70
+
71
+ ---
72
+
73
+ ### OpenRouter — all `:free` models share identical limits
74
+
75
+ | Metric | Without credits | With $10+ credits |
76
+ |---|---|---|
77
+ | RPM | 20 | 20 |
78
+ | RPD | **50** | 1,000 |
79
+ | TPM / TPD | Unlimited | Unlimited |
80
+
81
+ > **Key insight:** 50 RPD is exhausted extremely quickly — this explains the frequent 429 errors during testing. OpenRouter free tier is best for occasional access to very large models (120B+) not available elsewhere, not for regular daily use. Provider-side throttling from upstream (NVIDIA, DeepSeek, etc.) adds additional 429s beyond OpenRouter's own quota.
82
+
83
+ ---
84
+
85
+ ## Provider Verdict
86
+
87
+ | Provider | Best for | Main bottleneck |
88
+ |---|---|---|
89
+ | **Google (Gemma 4)** | Best free tier overall — high RPD + unlimited tokens | Slow inference (~25–65 s) |
90
+ | **Google (Gemini 3.1 Flash Lite)** | Best speed + reasonable daily quota | 500 RPD, 250K TPM |
91
+ | **Groq** | Fastest inference, high-volume use | TPM cap (6K–30K) limits long RAG contexts |
92
+ | **OpenRouter** | Accessing 120B+ models for free | 50 RPD hard cap, frequent provider throttling |
93
+
94
+ ---
95
+
96
+ ## Recommendations
97
+
98
+ | Use case | Best choice |
99
+ |---|---|
100
+ | Best overall (default) | **Gemma 4 MoE 26B [Google]** — best limits + quality |
101
+ | Fastest response | **Llama 4 Scout 17B [Groq]** — ~1.5 s |
102
+ | Fastest + high daily quota | **Gemini 3.1 Flash Lite [Google]** — 500 RPD, ~0.6 s |
103
+ | Deepest philosophical reasoning | **Qwen3 32B [Groq]** or **Llama 3.3 70B [Groq]** |
104
+ | Maximum context window | **DeepSeek V4 Flash [OR]** — 1M tokens |
105
+ | Highest model quality | **Nvidia Nemotron 120B [OR]** or **OpenAI OSS 120B [OR]** |
106
+ | High-volume / many requests/day | **Llama 3.1 8B [Groq]** — 14,400 RPD |
107
+
108
+ ---
109
+
110
+ ## Running the benchmark
111
+
112
+ ```bash
113
+ python test_models.py
114
+ ```
115
+
116
+ Requires `.env` with at least one key:
117
+ ```
118
+ GOOGLE_API_KEY=... # aistudio.google.com
119
+ GROQ_API_KEY=... # console.groq.com
120
+ OPENROUTER_API_KEY=... # openrouter.ai
121
+ ```
README.md CHANGED
@@ -1,13 +1,130 @@
1
  ---
2
  title: Philosopher Chat
3
- emoji: 📈
4
- colorFrom: pink
5
- colorTo: yellow
6
  sdk: gradio
7
  sdk_version: 6.15.1
8
- python_version: '3.13'
9
  app_file: app.py
10
  pinned: false
 
11
  ---
12
 
13
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  title: Philosopher Chat
3
+ emoji: 🏛️
4
+ colorFrom: purple
5
+ colorTo: indigo
6
  sdk: gradio
7
  sdk_version: 6.15.1
 
8
  app_file: app.py
9
  pinned: false
10
+ license: mit
11
  ---
12
 
13
+ # Philosopher Chat
14
+
15
+ A RAG (Retrieval-Augmented Generation) chatbot grounded in Western philosophical primary texts.
16
+ Ask questions about nihilism, existentialism, epistemology, ethics, and more — answers are
17
+ cited directly from 12 primary texts (~5,700 chunks).
18
+
19
+ **Live demo:** [fikri0o0/philosopher-chat on HuggingFace Spaces](https://huggingface.co/spaces/fikri0o0/philosopher-chat)
20
+
21
+ ---
22
+
23
+ ## Features
24
+
25
+ | Feature | Detail |
26
+ |---|---|
27
+ | **Hybrid RAG** | BM25 + semantic cosine similarity ensemble |
28
+ | **Streaming** | Token-by-token via Google / Groq / OpenRouter |
29
+ | **16 LLMs** | Gemma 4, Gemini, Llama 4, Qwen3, DeepSeek, Nemotron — all free tier |
30
+ | **Think blocks** | Qwen3 / DeepSeek reasoning rendered as collapsible chains-of-thought |
31
+ | **UMAP viz** | 2D projection of all 5,700+ embeddings coloured by philosopher |
32
+ | **Model comparison** | Side-by-side latency + quality comparison across any two models |
33
+ | **Extendable KB** | Upload your own PDF/TXT to add new philosophers |
34
+
35
+ ---
36
+
37
+ ## Knowledge Base
38
+
39
+ | Philosopher | Works |
40
+ |---|---|
41
+ | Nietzsche | *Thus Spoke Zarathustra*, *Beyond Good and Evil*, *On the Genealogy of Morality*, *The Birth of Tragedy* |
42
+ | Schopenhauer | *Essays of Arthur Schopenhauer* |
43
+ | Hume | *An Enquiry Concerning Human Understanding* |
44
+ | Russell | *The Problems of Philosophy* |
45
+ | Marcus Aurelius | *Meditations* |
46
+ | Plato | *The Republic* |
47
+ | Mill | *Utilitarianism* |
48
+ | Epictetus | *The Enchiridion* |
49
+ | Kant | *Fundamental Principles of the Metaphysic of Morals* |
50
+
51
+ All texts are public domain, sourced from [Project Gutenberg](https://www.gutenberg.org).
52
+
53
+ ---
54
+
55
+ ## Tech Stack
56
+
57
+ | Layer | Tool |
58
+ |---|---|
59
+ | LLM routing | 16 models via Google AI Studio, Groq, OpenRouter (all free tier) |
60
+ | Embeddings | `google/embeddinggemma-300m` (HuggingFace, 768-dim) |
61
+ | Retrieval | Hybrid BM25 + ChromaDB semantic search |
62
+ | RAG Framework | LangChain LCEL (no chains, direct composition) |
63
+ | UI | Gradio 6 |
64
+ | Deployment | HuggingFace Spaces |
65
+
66
+ ---
67
+
68
+ ## Local Setup
69
+
70
+ ### 1. Clone and install
71
+
72
+ ```bash
73
+ git clone https://github.com/Fikri645/philosopher-chat
74
+ cd philosopher-chat
75
+ pip install -r requirements.txt
76
+ ```
77
+
78
+ ### 2. Set up API keys
79
+
80
+ ```bash
81
+ # Create .env with your keys:
82
+ GOOGLE_API_KEY=... # https://ai.google.dev (free)
83
+ GROQ_API_KEY=... # https://console.groq.com (free)
84
+ OPENROUTER_API_KEY=... # https://openrouter.ai (free)
85
+ HF_TOKEN=... # https://huggingface.co/settings/tokens (for gated EmbeddingGemma)
86
+ ```
87
+
88
+ ### 3. Build the vectorstore (run once)
89
+
90
+ ```bash
91
+ python ingest.py
92
+ ```
93
+
94
+ Downloads 12 texts from Project Gutenberg, chunks them, embeds with EmbeddingGemma-300M,
95
+ and persists to `vectorstore/`. Takes ~5–10 min on first run (model download + embedding).
96
+
97
+ ### 4. Run the app
98
+
99
+ ```bash
100
+ python app.py
101
+ ```
102
+
103
+ Open http://localhost:7860 in your browser.
104
+
105
+ ---
106
+
107
+ ## Deploying to HuggingFace Spaces
108
+
109
+ 1. Fork or push to a new Space (SDK: **Gradio**)
110
+ 2. In **Space Settings → Variables and Secrets**, add:
111
+ - `GOOGLE_API_KEY`
112
+ - `GROQ_API_KEY`
113
+ - `OPENROUTER_API_KEY`
114
+ - `HF_TOKEN` (your HF token — needed to download the gated EmbeddingGemma model)
115
+ 3. On first boot the Space auto-ingests all 12 texts (~10 min); subsequent boots load the cached vectorstore.
116
+
117
+ ---
118
+
119
+ ## Project Structure
120
+
121
+ ```
122
+ philosopher-chat/
123
+ ├── app.py ← Gradio UI + event handlers
124
+ ├── rag_chain.py ← LangChain RAG pipeline (retrieval + LLM routing)
125
+ ├── ingest.py ← Data ingestion from Project Gutenberg
126
+ ├── config.py ← LLM options, embedding model, RAG parameters
127
+ ├── requirements.txt
128
+ ├── .gitignore
129
+ └── README.md
130
+ ```
app.py ADDED
@@ -0,0 +1,522 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import re
2
+ import time
3
+
4
+ import gradio as gr
5
+ import plotly.express as px
6
+ import pandas as pd
7
+
8
+ from rag_chain import (
9
+ retrieve_docs, stream_llm, query, add_to_kb, vectorstore_exists,
10
+ get_all_philosophers, get_kb_stats, get_umap_data,
11
+ )
12
+ from config import LLM_OPTIONS, DEFAULT_LLM, EMBEDDING_OPTIONS, DEFAULT_EMBEDDING
13
+
14
+ # ---------------------------------------------------------------------------
15
+ # Display helpers
16
+ # ---------------------------------------------------------------------------
17
+
18
+ _PROVIDER_COLOR = {
19
+ "Google": "#4285F4",
20
+ "Groq": "#FF4B36",
21
+ "OpenRouter": "#6366F1",
22
+ }
23
+
24
+ _COMPARE_DEFAULT_B = "Llama 4 Scout 17B [Groq]"
25
+
26
+ _THINK_STYLE = (
27
+ "color:var(--body-text-color-subdued);font-size:0.88em;"
28
+ "border-left:3px solid var(--border-color-primary);padding-left:12px;margin:6px 0"
29
+ )
30
+ _SUMMARY_STYLE = (
31
+ "cursor:pointer;color:var(--body-text-color-subdued);"
32
+ "font-style:italic;user-select:none"
33
+ )
34
+
35
+
36
+ def _format_think_blocks(text: str) -> str:
37
+ """Render <think>…</think> as collapsible, muted sections.
38
+
39
+ Mid-stream (</think> not yet seen): open <details> showing live reasoning.
40
+ Complete block: closed <details> with 'click to expand' label.
41
+ """
42
+ if "<think>" not in text:
43
+ return text
44
+
45
+ if "</think>" not in text:
46
+ # Partial — think block still streaming
47
+ idx = text.index("<think>")
48
+ pre, thinking = text[:idx], text[idx + 7:]
49
+ return (
50
+ pre
51
+ + f'<details open><summary style="{_SUMMARY_STYLE}">🤔 Thinking…</summary>'
52
+ + f'<div style="{_THINK_STYLE}">{thinking}</div></details>'
53
+ )
54
+
55
+ def _wrap(m: re.Match) -> str:
56
+ content = m.group(1).strip()
57
+ return (
58
+ f'<details><summary style="{_SUMMARY_STYLE}">'
59
+ "🤔 Chain of thought (click to expand)</summary>"
60
+ f'<div style="{_THINK_STYLE}">{content}</div></details>\n\n'
61
+ )
62
+
63
+ return re.sub(r"<think>(.*?)</think>", _wrap, text, flags=re.DOTALL)
64
+
65
+
66
+ def _score_bar(score: float, width: int = 10) -> str:
67
+ filled = max(0, min(width, round(score * width)))
68
+ return "█" * filled + "░" * (width - filled)
69
+
70
+
71
+ def _format_sources(docs: list, scores: list[float]) -> str:
72
+ if not docs:
73
+ return ""
74
+ seen: set = set()
75
+ lines: list[str] = []
76
+ for doc, score in zip(docs, scores):
77
+ key = doc.metadata.get("source", "Unknown source")
78
+ if key not in seen:
79
+ seen.add(key)
80
+ tag = f"`{score:.2f}` " if score >= 0 else "`BM25` "
81
+ lines.append(f"- {tag}{key}")
82
+ return "\n\n---\n**Sources:**\n" + "\n".join(lines)
83
+
84
+
85
+ def _format_retrieved_chunks(docs: list, scores: list[float]) -> str:
86
+ if not docs:
87
+ return "_No chunks retrieved._"
88
+
89
+ semantic_scores = [s for s in scores if s >= 0]
90
+ avg = sum(semantic_scores) / len(semantic_scores) if semantic_scores else 0.0
91
+ has_bm25 = any(s < 0 for s in scores)
92
+ method = "Hybrid BM25 + Semantic" if has_bm25 else "Semantic"
93
+
94
+ lines = [
95
+ f"**{len(docs)} chunks** &nbsp;·&nbsp; {method}"
96
+ f" &nbsp;·&nbsp; avg similarity: **{avg:.3f}**\n"
97
+ ]
98
+ for i, (doc, score) in enumerate(zip(docs, scores), 1):
99
+ phil = doc.metadata.get("philosopher", "?")
100
+ title = doc.metadata.get("title", "?")
101
+ if score >= 0:
102
+ tag = f"`{score:.3f}` {_score_bar(score)}"
103
+ else:
104
+ tag = "`BM25 ` ──────────"
105
+ text = doc.page_content[:200].replace("\n", " ").strip()
106
+ lines.append(
107
+ f"**{i}.** {tag} &nbsp; *{phil}* · {title} \n"
108
+ f"&nbsp;&nbsp;&nbsp;&nbsp;*\"{text}...\"*\n"
109
+ )
110
+ return "\n".join(lines)
111
+
112
+
113
+ def _format_metrics(
114
+ retrieve_s: float, llm_s: float, n_docs: int, n_sources: int
115
+ ) -> str:
116
+ return (
117
+ f"⏱ &nbsp;Retrieval **{retrieve_s:.2f}s** &nbsp;·&nbsp; "
118
+ f"LLM **{llm_s:.2f}s** &nbsp;·&nbsp; "
119
+ f"Total **{retrieve_s + llm_s:.2f}s** &nbsp;·&nbsp; "
120
+ f"**{n_docs}** chunks from **{n_sources}** source(s)"
121
+ )
122
+
123
+
124
+ def _kb_markdown() -> str:
125
+ stats = get_kb_stats()
126
+ if not stats["total"]:
127
+ return "_Knowledge base is empty._"
128
+ lines = []
129
+ for phil in sorted(stats["sources"]):
130
+ lines.append(f"**{phil}**")
131
+ for title in sorted(stats["sources"][phil]):
132
+ lines.append(f"&nbsp;&nbsp;- *{title}*")
133
+ lines.append(f"\n_{stats['total']:,} total chunks_")
134
+ return "\n\n".join(lines)
135
+
136
+
137
+ # ---------------------------------------------------------------------------
138
+ # Event handlers
139
+ # ---------------------------------------------------------------------------
140
+
141
+ def respond_stream(message: str, history: list, philosopher: str, llm_label: str):
142
+ if not message.strip():
143
+ yield history, "", gr.update(), gr.update()
144
+ return
145
+
146
+ if not vectorstore_exists():
147
+ err = "Knowledge base not found. Run `python ingest.py` first."
148
+ yield history + [{"role": "assistant", "content": err}], "", gr.update(), gr.update()
149
+ return
150
+
151
+ # — Retrieval (fast, happens before streaming) —
152
+ t0 = time.perf_counter()
153
+ docs, scores = retrieve_docs(message, philosopher)
154
+ retrieve_time = time.perf_counter() - t0
155
+ context_str = "\n\n".join(d.page_content for d in docs)
156
+
157
+ chunks_md = _format_retrieved_chunks(docs, scores)
158
+
159
+ history = history + [
160
+ {"role": "user", "content": message},
161
+ {
162
+ "role": "assistant",
163
+ "content": (
164
+ "<em style='color:var(--body-text-color-subdued)'>"
165
+ "⏳ Retrieving context and generating response…"
166
+ "</em>"
167
+ ),
168
+ },
169
+ ]
170
+ # Show user bubble + loading message immediately
171
+ yield history, "", gr.update(value=chunks_md), gr.update()
172
+
173
+ provider, model_id = LLM_OPTIONS.get(llm_label, LLM_OPTIONS[DEFAULT_LLM])
174
+ t1 = time.perf_counter()
175
+ full_response = ""
176
+ try:
177
+ for text_chunk in stream_llm(provider, model_id, context_str, message):
178
+ full_response += text_chunk
179
+ history[-1]["content"] = _format_think_blocks(full_response)
180
+ yield history, "", gr.update(value=chunks_md), gr.update()
181
+
182
+ llm_time = time.perf_counter() - t1
183
+ unique_sources = len({d.metadata.get("source") for d in docs})
184
+ metrics_md = _format_metrics(retrieve_time, llm_time, len(docs), unique_sources)
185
+
186
+ history[-1]["content"] = (
187
+ _format_think_blocks(full_response) + _format_sources(docs, scores)
188
+ )
189
+ yield history, "", gr.update(value=chunks_md), gr.update(value=metrics_md)
190
+
191
+ except Exception as exc:
192
+ history[-1]["content"] = f"⚠️ **Error:** {exc}"
193
+ yield history, "", gr.update(value=chunks_md), gr.update()
194
+
195
+
196
+ def compare_respond(message: str, philosopher: str, llm_a: str, llm_b: str):
197
+ if not message.strip():
198
+ return "Enter a question above.", "", "Enter a question above.", ""
199
+ if not vectorstore_exists():
200
+ msg = "Knowledge base not found."
201
+ return msg, "", msg, ""
202
+
203
+ def _run(llm_label: str) -> tuple[str, str]:
204
+ t0 = time.perf_counter()
205
+ result = query(message, philosopher, llm_label)
206
+ elapsed = time.perf_counter() - t0
207
+ n_src = len({d.metadata.get("source") for d in result["context"]})
208
+ sem_scores = [s for s in result["scores"] if s >= 0]
209
+ avg = sum(sem_scores) / len(sem_scores) if sem_scores else 0.0
210
+ metrics = (
211
+ f"⏱ **{elapsed:.2f}s** &nbsp;·&nbsp; "
212
+ f"**{len(result['context'])}** chunks from **{n_src}** source(s)"
213
+ f" &nbsp;·&nbsp; avg similarity **{avg:.3f}**"
214
+ )
215
+ return result["answer"], metrics
216
+
217
+ ans_a, met_a = _run(llm_a)
218
+ ans_b, met_b = _run(llm_b)
219
+ return ans_a, met_a, ans_b, met_b
220
+
221
+
222
+ def upload_source(file, author: str, title: str):
223
+ if file is None:
224
+ return gr.update(value="Please upload a file first."), gr.update()
225
+ if not author.strip() or not title.strip():
226
+ return gr.update(value="Please fill in both Author and Title."), gr.update()
227
+ try:
228
+ n = add_to_kb(file, author.strip(), title.strip())
229
+ msg = f"Added {n:,} chunks from *{title}* by {author}."
230
+ except Exception as e:
231
+ msg = f"Error: {e}"
232
+ return (
233
+ gr.update(value=msg),
234
+ gr.update(choices=get_all_philosophers(), value="All"),
235
+ )
236
+
237
+
238
+ def refresh_kb():
239
+ return gr.update(value=_kb_markdown())
240
+
241
+
242
+ def build_umap_plot():
243
+ data = get_umap_data()
244
+ if data is None:
245
+ return None
246
+ df = pd.DataFrame(data)
247
+ fig = px.scatter(
248
+ df, x="x", y="y",
249
+ color="philosopher",
250
+ hover_data={"title": True, "preview": True, "x": False, "y": False},
251
+ title="Knowledge Base — Semantic Embedding Space (UMAP 2D)",
252
+ labels={"x": "UMAP-1", "y": "UMAP-2"},
253
+ opacity=0.75,
254
+ template="plotly_dark",
255
+ color_discrete_sequence=px.colors.qualitative.Bold,
256
+ )
257
+ fig.update_traces(marker=dict(size=5))
258
+ fig.update_layout(
259
+ height=540,
260
+ plot_bgcolor="rgba(0,0,0,0)",
261
+ paper_bgcolor="rgba(0,0,0,0)",
262
+ title_font=dict(size=14),
263
+ font=dict(color="rgba(220,220,220,0.9)"),
264
+ legend=dict(
265
+ title_text="",
266
+ yanchor="top", y=0.99, xanchor="left", x=0.01,
267
+ bgcolor="rgba(20,20,20,0.5)",
268
+ bordercolor="rgba(255,255,255,0.12)",
269
+ borderwidth=1,
270
+ ),
271
+ xaxis=dict(gridcolor="rgba(255,255,255,0.07)", zeroline=False),
272
+ yaxis=dict(gridcolor="rgba(255,255,255,0.07)", zeroline=False),
273
+ margin=dict(l=40, r=20, t=48, b=36),
274
+ )
275
+ return fig
276
+
277
+
278
+ # ---------------------------------------------------------------------------
279
+ # UI
280
+ # ---------------------------------------------------------------------------
281
+
282
+ EXAMPLE_QUESTIONS = [
283
+ "What is Nietzsche's view on nihilism and the death of God?",
284
+ "How does Schopenhauer view suffering and the will to live?",
285
+ "What does Hume say about causality and the limits of reason?",
286
+ "Can we have certain knowledge of the external world?",
287
+ "Is morality objective or invented?",
288
+ "Explain the concept of Eternal Return",
289
+ "How does Marcus Aurelius advise dealing with suffering?",
290
+ "What is Plato's ideal society in The Republic?",
291
+ "Compare Schopenhauer and Nietzsche on the will",
292
+ "What is Kant's categorical imperative?",
293
+ "How does Mill justify utilitarianism?",
294
+ "What does Epictetus say about what is in our control?",
295
+ ]
296
+
297
+ CSS = """
298
+ footer { display: none !important; }
299
+ .section-label {
300
+ font-size: 0.78rem; font-weight: 700; letter-spacing: 0.07em;
301
+ text-transform: uppercase; color: var(--body-text-color-subdued);
302
+ margin-bottom: 2px;
303
+ }
304
+ .metric-bar p { font-size: 0.82rem; color: var(--body-text-color-subdued); margin: 4px 0; }
305
+ .status-box textarea { font-size: 0.82rem !important; }
306
+
307
+ /* Fix double scrollbar: prevent inner message wrappers from scrolling */
308
+ .chatbot .overflow-y-auto { scrollbar-width: thin; scrollbar-color: var(--border-color-primary) transparent; }
309
+ .chatbot .message-wrap { overflow: visible !important; }
310
+ .chatbot .message-wrap > div { overflow: visible !important; max-height: none !important; }
311
+ /* Prevent long markdown lines from adding a horizontal inner scroll */
312
+ .chatbot .prose { overflow-x: hidden !important; overflow-wrap: break-word; word-break: break-word; }
313
+ """
314
+
315
+ with gr.Blocks(title="Philosopher Chat") as demo:
316
+
317
+ gr.Markdown(
318
+ """
319
+ # 📚 Philosopher Chat
320
+ **RAG chatbot grounded in Western philosophical primary texts**
321
+
322
+ Hybrid BM25 + Semantic retrieval &nbsp;·&nbsp; Real-time streaming
323
+ &nbsp;·&nbsp; Multi-provider LLM routing &nbsp;·&nbsp; 12 primary texts · ~5 700 chunks
324
+ """
325
+ )
326
+
327
+ with gr.Tabs():
328
+
329
+ # ── Tab 1 ─ Chat ─────────────────────────────────────────────────
330
+ with gr.Tab("💬 Chat"):
331
+ with gr.Row(equal_height=False):
332
+
333
+ # Left: chat area
334
+ with gr.Column(scale=3):
335
+ chatbot_ui = gr.Chatbot(
336
+ height=480,
337
+ show_label=False,
338
+ placeholder="*Ask a philosophical question to get started...*",
339
+ )
340
+ msg_input = gr.Textbox(
341
+ placeholder="Ask a philosophical question…",
342
+ show_label=False,
343
+ autofocus=True,
344
+ submit_btn=True,
345
+ )
346
+ metrics_display = gr.Markdown(
347
+ value="", elem_classes="metric-bar"
348
+ )
349
+ with gr.Accordion("📄 Retrieved Chunks & Scores", open=False):
350
+ retrieved_display = gr.Markdown(
351
+ value="_Submit a question to see retrieved context._"
352
+ )
353
+ with gr.Accordion("💡 Example Questions", open=False):
354
+ gr.Examples(
355
+ examples=[[q] for q in EXAMPLE_QUESTIONS],
356
+ inputs=[msg_input],
357
+ label=None,
358
+ )
359
+
360
+ # Right: settings sidebar
361
+ with gr.Column(scale=1, min_width=240):
362
+ with gr.Group():
363
+ gr.Markdown("**⚙️ Chat Settings**", elem_classes="section-label")
364
+ llm_dropdown = gr.Dropdown(
365
+ choices=list(LLM_OPTIONS.keys()),
366
+ value=DEFAULT_LLM,
367
+ label="LLM Model",
368
+ )
369
+ embedding_display = gr.Dropdown(
370
+ choices=list(EMBEDDING_OPTIONS.keys()),
371
+ value=DEFAULT_EMBEDDING,
372
+ label="Embedding Model",
373
+ info="Change requires rebuilding index (ingest.py)",
374
+ interactive=False,
375
+ )
376
+ philosopher_filter = gr.Dropdown(
377
+ choices=get_all_philosophers(),
378
+ value="All",
379
+ label="Filter by Philosopher",
380
+ )
381
+
382
+ with gr.Group():
383
+ gr.Markdown("**ℹ️ Stack**", elem_classes="section-label")
384
+ gr.Markdown(
385
+ "- Retrieval: **Hybrid BM25 + Semantic**\n"
386
+ "- Embeddings: **EmbeddingGemma-300M**\n"
387
+ "- Vector DB: **ChromaDB**\n"
388
+ "- Framework: **LangChain LCEL**\n"
389
+ "- UI: **Gradio 6**"
390
+ )
391
+
392
+ # ── Tab 2 ─ Compare Models ───────────��───────────────────────────
393
+ with gr.Tab("⚖️ Compare Models"):
394
+ gr.Markdown(
395
+ "Run the same question through two models and compare quality, "
396
+ "latency, and retrieval coverage side by side."
397
+ )
398
+ with gr.Row():
399
+ compare_input = gr.Textbox(
400
+ label="Question",
401
+ placeholder="Ask a philosophical question…",
402
+ scale=4,
403
+ )
404
+ compare_philosopher = gr.Dropdown(
405
+ choices=get_all_philosophers(),
406
+ value="All",
407
+ label="Philosopher Filter",
408
+ scale=1,
409
+ )
410
+ compare_btn = gr.Button("▶ Compare", variant="primary")
411
+
412
+ with gr.Row():
413
+ with gr.Column():
414
+ model_a = gr.Dropdown(
415
+ choices=list(LLM_OPTIONS.keys()),
416
+ value=DEFAULT_LLM,
417
+ label="Model A",
418
+ )
419
+ response_a = gr.Markdown(label="Response A")
420
+ metrics_a = gr.Markdown(elem_classes="metric-bar")
421
+
422
+ with gr.Column():
423
+ model_b = gr.Dropdown(
424
+ choices=list(LLM_OPTIONS.keys()),
425
+ value=_COMPARE_DEFAULT_B,
426
+ label="Model B",
427
+ )
428
+ response_b = gr.Markdown(label="Response B")
429
+ metrics_b = gr.Markdown(elem_classes="metric-bar")
430
+
431
+ # ── Tab 3 ─ Knowledge Base ───────────────────────────────────────
432
+ with gr.Tab("🗺️ Knowledge Base"):
433
+ with gr.Row(equal_height=False):
434
+
435
+ # Left: UMAP visualization
436
+ with gr.Column(scale=2):
437
+ gr.Markdown(
438
+ "**Semantic Embedding Space** \n"
439
+ "Each point is one text chunk. Clusters indicate semantic similarity — "
440
+ "nearby chunks share philosophical themes regardless of source."
441
+ )
442
+ umap_plot = gr.Plot()
443
+ umap_btn = gr.Button(
444
+ "Generate Embedding Visualization", variant="secondary"
445
+ )
446
+ gr.Markdown(
447
+ "_UMAP projects ~5,700 × 768-dim embeddings to 2D. "
448
+ "First run takes ~1–2 min on CPU._"
449
+ )
450
+
451
+ # Right: stats + upload
452
+ with gr.Column(scale=1, min_width=280):
453
+ with gr.Group():
454
+ with gr.Row():
455
+ gr.Markdown(
456
+ "**📚 Knowledge Base**", elem_classes="section-label"
457
+ )
458
+ refresh_kb_btn = gr.Button("↻", size="sm", min_width=32)
459
+ kb_display = gr.Markdown(_kb_markdown())
460
+
461
+ with gr.Group():
462
+ gr.Markdown(
463
+ "**📤 Add Source**", elem_classes="section-label"
464
+ )
465
+ file_upload = gr.File(
466
+ label="Upload PDF or TXT",
467
+ file_types=[".pdf", ".txt"],
468
+ )
469
+ with gr.Row():
470
+ author_input = gr.Textbox(label="Author", scale=1)
471
+ title_input = gr.Textbox(label="Title", scale=1)
472
+ upload_btn = gr.Button(
473
+ "Add to Knowledge Base", variant="secondary", size="sm"
474
+ )
475
+ upload_status = gr.Textbox(
476
+ show_label=False,
477
+ interactive=False,
478
+ placeholder="Upload status will appear here…",
479
+ elem_classes="status-box",
480
+ )
481
+
482
+ # ── Event wiring ─────────────────────────────────────────────────────
483
+
484
+ msg_input.submit(
485
+ respond_stream,
486
+ inputs=[msg_input, chatbot_ui, philosopher_filter, llm_dropdown],
487
+ outputs=[chatbot_ui, msg_input, retrieved_display, metrics_display],
488
+ )
489
+
490
+ compare_btn.click(
491
+ compare_respond,
492
+ inputs=[compare_input, compare_philosopher, model_a, model_b],
493
+ outputs=[response_a, metrics_a, response_b, metrics_b],
494
+ )
495
+
496
+ umap_btn.click(build_umap_plot, outputs=umap_plot)
497
+
498
+ refresh_kb_btn.click(refresh_kb, outputs=kb_display)
499
+
500
+ upload_btn.click(
501
+ upload_source,
502
+ inputs=[file_upload, author_input, title_input],
503
+ outputs=[upload_status, philosopher_filter],
504
+ ).then(refresh_kb, outputs=kb_display)
505
+
506
+
507
+ def _auto_ingest() -> None:
508
+ """Build the vectorstore automatically on first Spaces run."""
509
+ if not vectorstore_exists():
510
+ print("[startup] Vectorstore missing — running initial ingest (this takes ~10 min)…")
511
+ try:
512
+ import ingest
513
+ ingest.main()
514
+ print("[startup] Ingest complete.")
515
+ except Exception as exc:
516
+ print(f"[startup] Ingest failed: {exc}")
517
+
518
+
519
+ _auto_ingest()
520
+
521
+ if __name__ == "__main__":
522
+ demo.launch(css=CSS)
config.py ADDED
@@ -0,0 +1,89 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import torch
3
+ from pathlib import Path
4
+ from dotenv import load_dotenv
5
+
6
+ load_dotenv()
7
+
8
+ PROJECT_ROOT = Path(__file__).parent
9
+ DATA_DIR = PROJECT_ROOT / "data" / "texts"
10
+ VECTORSTORE_DIR = PROJECT_ROOT / "vectorstore"
11
+
12
+ GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY", "")
13
+ GROQ_API_KEY = os.getenv("GROQ_API_KEY", "")
14
+ OPENROUTER_API_KEY = os.getenv("OPENROUTER_API_KEY", "")
15
+
16
+ DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
17
+
18
+ # ---------------------------------------------------------------------------
19
+ # LLM options — (provider, model_id)
20
+ # Providers: "google" | "groq" | "openrouter"
21
+ # ---------------------------------------------------------------------------
22
+ LLM_OPTIONS: dict[str, tuple[str, str]] = {
23
+ # ── Google AI Studio (free tier) ──────────────────────────────────────
24
+ # Limits verified from aistudio.google.com/rate-limit (2026-05)
25
+ "Gemma 4 MoE 26B [Google]": ("google", "gemma-4-26b-a4b-it"), # 15 RPM | ∞ TPM | 1500 RPD
26
+ "Gemma 4 Dense 31B [Google]": ("google", "gemma-4-31b-it"), # 15 RPM | ∞ TPM | 1500 RPD
27
+ "Gemini 3.1 Flash Lite [Google]": ("google", "gemini-3.1-flash-lite"), # 15 RPM | 250K TPM | 500 RPD
28
+ "Gemini 3.5 Flash [Google]": ("google", "gemini-3.5-flash"), # 5 RPM | 250K TPM | 20 RPD
29
+ "Gemini 3 Flash [Google]": ("google", "gemini-3-flash-preview"), # 5 RPM | 250K TPM | 20 RPD
30
+ "Gemini 2.5 Flash [Google]": ("google", "gemini-2.5-flash"), # 5 RPM | 250K TPM | 20 RPD
31
+ "Gemini 2.5 Flash Lite [Google]": ("google", "gemini-2.5-flash-lite"), # 10 RPM | 250K TPM | 20 RPD
32
+ # ── Groq (free tier, very fast LPU inference) ─────────────────────────
33
+ "Llama 3.3 70B [Groq]": ("groq", "llama-3.3-70b-versatile"),
34
+ "Llama 4 Scout 17B [Groq]": ("groq", "meta-llama/llama-4-scout-17b-16e-instruct"),
35
+ "Qwen3 32B [Groq]": ("groq", "qwen/qwen3-32b"),
36
+ "Llama 3.1 8B [Groq]": ("groq", "llama-3.1-8b-instant"),
37
+ # ── OpenRouter free models (:free = no cost, rate-limited) ────────────
38
+ "Nvidia Nemotron 120B [OpenRouter]":("openrouter", "nvidia/nemotron-3-super-120b-a12b:free"),
39
+ "OpenAI OSS 120B [OpenRouter]": ("openrouter", "openai/gpt-oss-120b:free"),
40
+ "DeepSeek V4 Flash [OpenRouter]": ("openrouter", "deepseek/deepseek-v4-flash:free"),
41
+ "Llama 3.3 70B [OpenRouter]": ("openrouter", "meta-llama/llama-3.3-70b-instruct:free"),
42
+ "Qwen3 Next 80B [OpenRouter]": ("openrouter", "qwen/qwen3-next-80b-a3b-instruct:free"),
43
+ "Gemma 4 MoE 26B [OpenRouter]": ("openrouter", "google/gemma-4-26b-a4b-it:free"),
44
+ }
45
+
46
+ DEFAULT_LLM = "Gemma 4 MoE 26B [Google]"
47
+
48
+ PROVIDER_KEYS = {
49
+ "google": ("GOOGLE_API_KEY", "ai.google.dev"),
50
+ "groq": ("GROQ_API_KEY", "console.groq.com"),
51
+ "openrouter": ("OPENROUTER_API_KEY", "openrouter.ai"),
52
+ }
53
+
54
+ # ---------------------------------------------------------------------------
55
+ # Embedding
56
+ # ---------------------------------------------------------------------------
57
+ EMBEDDING_OPTIONS = {
58
+ "EmbeddingGemma 300M (active)": "google/embeddinggemma-300m",
59
+ "BGE Large EN v1.5": "BAAI/bge-large-en-v1.5",
60
+ "Multilingual E5 Large": "intfloat/multilingual-e5-large",
61
+ }
62
+ DEFAULT_EMBEDDING = "EmbeddingGemma 300M (active)"
63
+ EMBEDDING_MODEL = EMBEDDING_OPTIONS[DEFAULT_EMBEDDING]
64
+
65
+ # ---------------------------------------------------------------------------
66
+ # RAG
67
+ # ---------------------------------------------------------------------------
68
+ CHUNK_SIZE = 1000
69
+ CHUNK_OVERLAP = 150
70
+ RETRIEVAL_K = 6 # slightly more to absorb BM25 extras
71
+ USE_HYBRID_SEARCH = True # BM25 + semantic ensemble
72
+
73
+ # ---------------------------------------------------------------------------
74
+ # Knowledge base sources (Project Gutenberg)
75
+ # ---------------------------------------------------------------------------
76
+ SOURCES = [
77
+ {"philosopher": "Nietzsche", "title": "Thus Spoke Zarathustra", "gutenberg_id": 1998},
78
+ {"philosopher": "Nietzsche", "title": "Beyond Good and Evil", "gutenberg_id": 4363},
79
+ {"philosopher": "Nietzsche", "title": "On the Genealogy of Morality", "gutenberg_id": 52319},
80
+ {"philosopher": "Nietzsche", "title": "The Birth of Tragedy", "gutenberg_id": 51356},
81
+ {"philosopher": "Schopenhauer", "title": "Essays of Arthur Schopenhauer", "gutenberg_id": 11945},
82
+ {"philosopher": "Hume", "title": "An Enquiry Concerning Human Understanding", "gutenberg_id": 9662},
83
+ {"philosopher": "Russell", "title": "The Problems of Philosophy", "gutenberg_id": 5827},
84
+ {"philosopher": "Marcus Aurelius", "title": "Meditations", "gutenberg_id": 2680},
85
+ {"philosopher": "Plato", "title": "The Republic", "gutenberg_id": 1497},
86
+ {"philosopher": "Mill", "title": "Utilitarianism", "gutenberg_id": 11224},
87
+ {"philosopher": "Epictetus", "title": "The Enchiridion", "gutenberg_id": 45109},
88
+ {"philosopher": "Kant", "title": "Fundamental Principles of the Metaphysic of Morals", "gutenberg_id": 5682},
89
+ ]
ingest.py ADDED
@@ -0,0 +1,157 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Build or update the ChromaDB vectorstore from philosophical texts.
3
+
4
+ python ingest.py # incremental: skips already-indexed sources
5
+ python ingest.py --rebuild # wipes and rebuilds from scratch
6
+ """
7
+
8
+ import sys
9
+ import time
10
+ import requests
11
+ from langchain_text_splitters import RecursiveCharacterTextSplitter
12
+ from langchain_core.documents import Document
13
+ from langchain_huggingface import HuggingFaceEmbeddings
14
+ from langchain_chroma import Chroma
15
+ from config import (
16
+ DATA_DIR, VECTORSTORE_DIR,
17
+ EMBEDDING_MODEL, CHUNK_SIZE, CHUNK_OVERLAP, SOURCES, DEVICE
18
+ )
19
+
20
+ GUTENBERG_URL = "https://www.gutenberg.org/cache/epub/{id}/pg{id}.txt"
21
+ BATCH_SIZE = 50
22
+ SLEEP_BETWEEN_BATCHES = 2
23
+
24
+
25
+ def download_gutenberg(gutenberg_id: int, title: str) -> str:
26
+ url = GUTENBERG_URL.format(id=gutenberg_id)
27
+ print(f" Downloading {url}")
28
+ try:
29
+ resp = requests.get(url, timeout=30)
30
+ resp.raise_for_status()
31
+ return resp.text
32
+ except Exception as e:
33
+ print(f" ERROR: {e}")
34
+ return ""
35
+
36
+
37
+ def strip_gutenberg_boilerplate(text: str) -> str:
38
+ start_markers = [
39
+ "*** START OF THE PROJECT GUTENBERG",
40
+ "***START OF THE PROJECT GUTENBERG",
41
+ "*** START OF THIS PROJECT GUTENBERG",
42
+ ]
43
+ end_markers = [
44
+ "*** END OF THE PROJECT GUTENBERG",
45
+ "***END OF THE PROJECT GUTENBERG",
46
+ "*** END OF THIS PROJECT GUTENBERG",
47
+ ]
48
+ start_idx = 0
49
+ for marker in start_markers:
50
+ idx = text.find(marker)
51
+ if idx != -1:
52
+ start_idx = text.find("\n", idx) + 1
53
+ break
54
+ end_idx = len(text)
55
+ for marker in end_markers:
56
+ idx = text.find(marker)
57
+ if idx != -1:
58
+ end_idx = idx
59
+ break
60
+ return text[start_idx:end_idx].strip()
61
+
62
+
63
+ def get_embeddings() -> HuggingFaceEmbeddings:
64
+ print(f"Loading embedding model on {DEVICE}...")
65
+ return HuggingFaceEmbeddings(
66
+ model_name=EMBEDDING_MODEL,
67
+ model_kwargs={"device": DEVICE},
68
+ encode_kwargs={"prompt_name": "document", "normalize_embeddings": True},
69
+ query_encode_kwargs={"prompt_name": "query", "normalize_embeddings": True},
70
+ )
71
+
72
+
73
+ def get_indexed_titles(vectorstore: Chroma) -> set[str]:
74
+ result = vectorstore.get(include=["metadatas"])
75
+ return {m.get("title", "") for m in result["metadatas"]}
76
+
77
+
78
+ def ingest_source(source: dict, vectorstore: Chroma, splitter: RecursiveCharacterTextSplitter) -> int:
79
+ raw = download_gutenberg(source["gutenberg_id"], source["title"])
80
+ if not raw:
81
+ return 0
82
+
83
+ cleaned = strip_gutenberg_boilerplate(raw)
84
+
85
+ # Cache locally
86
+ DATA_DIR.mkdir(parents=True, exist_ok=True)
87
+ safe_name = f"{source['philosopher']}_{source['title'][:40].replace(' ', '_')}.txt"
88
+ (DATA_DIR / safe_name).write_text(cleaned, encoding="utf-8")
89
+
90
+ chunks = splitter.split_text(cleaned)
91
+ docs = [
92
+ Document(
93
+ page_content=chunk,
94
+ metadata={
95
+ "philosopher": source["philosopher"],
96
+ "title": source["title"],
97
+ "source": f"{source['philosopher']} — *{source['title']}*",
98
+ },
99
+ )
100
+ for chunk in chunks
101
+ ]
102
+
103
+ for i in range(0, len(docs), BATCH_SIZE):
104
+ vectorstore.add_documents(docs[i : i + BATCH_SIZE])
105
+ if i + BATCH_SIZE < len(docs):
106
+ time.sleep(SLEEP_BETWEEN_BATCHES)
107
+
108
+ return len(docs)
109
+
110
+
111
+ def main() -> None:
112
+ rebuild = "--rebuild" in sys.argv
113
+
114
+ VECTORSTORE_DIR.mkdir(parents=True, exist_ok=True)
115
+
116
+ embeddings = get_embeddings()
117
+ splitter = RecursiveCharacterTextSplitter(
118
+ chunk_size=CHUNK_SIZE,
119
+ chunk_overlap=CHUNK_OVERLAP,
120
+ separators=["\n\n", "\n", ". ", " ", ""],
121
+ )
122
+
123
+ if rebuild and VECTORSTORE_DIR.exists():
124
+ import shutil
125
+ shutil.rmtree(VECTORSTORE_DIR)
126
+ VECTORSTORE_DIR.mkdir()
127
+ print("Vectorstore wiped for rebuild.")
128
+
129
+ vectorstore = Chroma(
130
+ collection_name="philosophers",
131
+ embedding_function=embeddings,
132
+ persist_directory=str(VECTORSTORE_DIR),
133
+ )
134
+
135
+ already_indexed = get_indexed_titles(vectorstore) if not rebuild else set()
136
+ total_new = 0
137
+
138
+ for source in SOURCES:
139
+ print(f"\n[{source['philosopher']}] {source['title']}")
140
+ if source["title"] in already_indexed:
141
+ print(" SKIPPED (already indexed)")
142
+ continue
143
+
144
+ n = ingest_source(source, vectorstore, splitter)
145
+ if n:
146
+ print(f" -> {n} chunks added")
147
+ total_new += n
148
+ time.sleep(1)
149
+
150
+ if total_new:
151
+ print(f"\nDone. {total_new} new chunks added to vectorstore.")
152
+ else:
153
+ print("\nNothing new to index.")
154
+
155
+
156
+ if __name__ == "__main__":
157
+ main()
rag_chain.py ADDED
@@ -0,0 +1,353 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from functools import lru_cache
2
+ from pathlib import Path
3
+ from typing import Generator
4
+
5
+ from google import genai
6
+ from google.genai import types
7
+ from langchain_huggingface import HuggingFaceEmbeddings
8
+ from langchain_chroma import Chroma
9
+ from langchain_core.documents import Document
10
+ from langchain_text_splitters import RecursiveCharacterTextSplitter
11
+ from config import (
12
+ GOOGLE_API_KEY, GROQ_API_KEY, OPENROUTER_API_KEY,
13
+ LLM_OPTIONS, DEFAULT_LLM,
14
+ EMBEDDING_MODEL, VECTORSTORE_DIR, RETRIEVAL_K,
15
+ CHUNK_SIZE, CHUNK_OVERLAP, DEVICE, PROVIDER_KEYS,
16
+ USE_HYBRID_SEARCH,
17
+ )
18
+
19
+ SYSTEM_PROMPT = (
20
+ "You are a philosophical assistant with deep knowledge of Western philosophy, "
21
+ "particularly nihilism, absurdism, pessimism, existentialism, and epistemology. "
22
+ "Your answers are grounded in the primary texts provided as context.\n\n"
23
+ "Rules:\n"
24
+ "- Draw directly from the retrieved context passages.\n"
25
+ "- Always cite the philosopher and work "
26
+ "(e.g., 'As Nietzsche writes in *Thus Spoke Zarathustra*...').\n"
27
+ "- Be intellectually rigorous but accessible.\n"
28
+ "- If the context is insufficient, say so clearly.\n"
29
+ "- Present the philosophers' views faithfully without moralizing."
30
+ )
31
+
32
+
33
+ # ---------------------------------------------------------------------------
34
+ # Cached singletons
35
+ # ---------------------------------------------------------------------------
36
+
37
+ @lru_cache(maxsize=1)
38
+ def _get_genai_client() -> genai.Client:
39
+ return genai.Client(api_key=GOOGLE_API_KEY)
40
+
41
+
42
+ @lru_cache(maxsize=1)
43
+ def _get_embeddings() -> HuggingFaceEmbeddings:
44
+ return HuggingFaceEmbeddings(
45
+ model_name=EMBEDDING_MODEL,
46
+ model_kwargs={"device": DEVICE},
47
+ encode_kwargs={"prompt_name": "document", "normalize_embeddings": True},
48
+ query_encode_kwargs={"prompt_name": "query", "normalize_embeddings": True},
49
+ )
50
+
51
+
52
+ @lru_cache(maxsize=1)
53
+ def _get_vectorstore() -> Chroma:
54
+ return Chroma(
55
+ collection_name="philosophers",
56
+ embedding_function=_get_embeddings(),
57
+ persist_directory=str(VECTORSTORE_DIR),
58
+ )
59
+
60
+
61
+ @lru_cache(maxsize=1)
62
+ def _get_bm25_retriever():
63
+ """Build BM25 index over the full KB (cached after first call)."""
64
+ from langchain_community.retrievers import BM25Retriever # requires rank-bm25
65
+ result = _get_vectorstore().get(include=["documents", "metadatas"])
66
+ docs = [
67
+ Document(page_content=d, metadata=m)
68
+ for d, m in zip(result["documents"], result["metadatas"])
69
+ if d.strip()
70
+ ]
71
+ retriever = BM25Retriever.from_documents(docs)
72
+ retriever.k = RETRIEVAL_K
73
+ return retriever
74
+
75
+
76
+ # ---------------------------------------------------------------------------
77
+ # Public helpers
78
+ # ---------------------------------------------------------------------------
79
+
80
+ def vectorstore_exists() -> bool:
81
+ return (VECTORSTORE_DIR / "chroma.sqlite3").exists()
82
+
83
+
84
+ def get_all_philosophers() -> list[str]:
85
+ if not vectorstore_exists():
86
+ return ["All"]
87
+ result = _get_vectorstore().get(include=["metadatas"])
88
+ names = sorted({m["philosopher"] for m in result["metadatas"] if "philosopher" in m})
89
+ return ["All"] + names
90
+
91
+
92
+ def get_kb_stats() -> dict:
93
+ if not vectorstore_exists():
94
+ return {"total": 0, "sources": {}}
95
+ result = _get_vectorstore().get(include=["metadatas"])
96
+ sources: dict[str, set] = {}
97
+ for m in result["metadatas"]:
98
+ phil = m.get("philosopher", "Unknown")
99
+ title = m.get("title", "Unknown")
100
+ sources.setdefault(phil, set()).add(title)
101
+ return {"total": len(result["ids"]), "sources": sources}
102
+
103
+
104
+ # ---------------------------------------------------------------------------
105
+ # Retrieval
106
+ # ---------------------------------------------------------------------------
107
+
108
+ def retrieve_docs(
109
+ input_text: str, philosopher: str = "All"
110
+ ) -> tuple[list[Document], list[float]]:
111
+ """Hybrid BM25 + semantic retrieval.
112
+
113
+ Returns (docs, scores) where scores are cosine relevance ∈ [0, 1].
114
+ BM25-only results are tagged with score -1.0 (no embedding similarity).
115
+ """
116
+ vectorstore = _get_vectorstore()
117
+ search_kwargs: dict = {"k": RETRIEVAL_K}
118
+ if philosopher != "All":
119
+ search_kwargs["filter"] = {"philosopher": philosopher}
120
+
121
+ pairs = vectorstore.similarity_search_with_relevance_scores(input_text, **search_kwargs)
122
+
123
+ if USE_HYBRID_SEARCH and philosopher == "All":
124
+ try:
125
+ bm25_docs = _get_bm25_retriever().invoke(input_text)
126
+ seen = {doc.page_content for doc, _ in pairs}
127
+ for doc in bm25_docs[:2]:
128
+ if doc.page_content not in seen:
129
+ pairs.append((doc, -1.0))
130
+ seen.add(doc.page_content)
131
+ except Exception:
132
+ pass
133
+
134
+ # Sort: semantic scores descending, BM25 appended at end
135
+ semantic = sorted([(d, s) for d, s in pairs if s >= 0], key=lambda x: x[1], reverse=True)
136
+ bm25_only = [(d, s) for d, s in pairs if s < 0]
137
+ pairs = (semantic + bm25_only)[: RETRIEVAL_K + 2]
138
+
139
+ return [d for d, _ in pairs], [s for _, s in pairs]
140
+
141
+
142
+ # ---------------------------------------------------------------------------
143
+ # LLM calls — non-streaming
144
+ # ---------------------------------------------------------------------------
145
+
146
+ def _call_llm(provider: str, model_id: str, context_str: str, input_text: str) -> str:
147
+ user_content = (
148
+ f"Context from philosophical texts:\n{context_str}\n\nQuestion: {input_text}"
149
+ )
150
+
151
+ if provider == "google":
152
+ if not GOOGLE_API_KEY:
153
+ env_var, site = PROVIDER_KEYS["google"]
154
+ raise ValueError(f"{env_var} not set. Get a free key at {site}")
155
+ response = _get_genai_client().models.generate_content(
156
+ model=model_id,
157
+ contents=user_content,
158
+ config=types.GenerateContentConfig(
159
+ system_instruction=SYSTEM_PROMPT, temperature=0.3
160
+ ),
161
+ )
162
+ return response.text
163
+
164
+ elif provider == "groq":
165
+ if not GROQ_API_KEY:
166
+ env_var, site = PROVIDER_KEYS["groq"]
167
+ raise ValueError(f"{env_var} not set. Get a free key at {site}")
168
+ from openai import OpenAI
169
+ client = OpenAI(api_key=GROQ_API_KEY, base_url="https://api.groq.com/openai/v1")
170
+
171
+ elif provider == "openrouter":
172
+ if not OPENROUTER_API_KEY:
173
+ env_var, site = PROVIDER_KEYS["openrouter"]
174
+ raise ValueError(f"{env_var} not set. Get a free key at {site}")
175
+ from openai import OpenAI
176
+ client = OpenAI(
177
+ api_key=OPENROUTER_API_KEY,
178
+ base_url="https://openrouter.ai/api/v1",
179
+ default_headers={"HTTP-Referer": "https://github.com/Fikri645/philosopher-chat"},
180
+ )
181
+ else:
182
+ raise ValueError(f"Unknown provider: {provider!r}")
183
+
184
+ resp = client.chat.completions.create(
185
+ model=model_id,
186
+ messages=[
187
+ {"role": "system", "content": SYSTEM_PROMPT},
188
+ {"role": "user", "content": user_content},
189
+ ],
190
+ temperature=0.3,
191
+ )
192
+ return resp.choices[0].message.content
193
+
194
+
195
+ # ---------------------------------------------------------------------------
196
+ # LLM calls — streaming
197
+ # ---------------------------------------------------------------------------
198
+
199
+ def stream_llm(
200
+ provider: str, model_id: str, context_str: str, input_text: str
201
+ ) -> Generator[str, None, None]:
202
+ """Yield text chunks for real-time streaming."""
203
+ user_content = (
204
+ f"Context from philosophical texts:\n{context_str}\n\nQuestion: {input_text}"
205
+ )
206
+
207
+ if provider == "google":
208
+ if not GOOGLE_API_KEY:
209
+ env_var, site = PROVIDER_KEYS["google"]
210
+ raise ValueError(f"{env_var} not set. Get a free key at {site}")
211
+ for chunk in _get_genai_client().models.generate_content_stream(
212
+ model=model_id,
213
+ contents=user_content,
214
+ config=types.GenerateContentConfig(
215
+ system_instruction=SYSTEM_PROMPT, temperature=0.3
216
+ ),
217
+ ):
218
+ if chunk.text:
219
+ yield chunk.text
220
+
221
+ elif provider in ("groq", "openrouter"):
222
+ if provider == "groq":
223
+ if not GROQ_API_KEY:
224
+ env_var, site = PROVIDER_KEYS["groq"]
225
+ raise ValueError(f"{env_var} not set. Get a free key at {site}")
226
+ from openai import OpenAI
227
+ client = OpenAI(
228
+ api_key=GROQ_API_KEY, base_url="https://api.groq.com/openai/v1"
229
+ )
230
+ else:
231
+ if not OPENROUTER_API_KEY:
232
+ env_var, site = PROVIDER_KEYS["openrouter"]
233
+ raise ValueError(f"{env_var} not set. Get a free key at {site}")
234
+ from openai import OpenAI
235
+ client = OpenAI(
236
+ api_key=OPENROUTER_API_KEY,
237
+ base_url="https://openrouter.ai/api/v1",
238
+ default_headers={
239
+ "HTTP-Referer": "https://github.com/Fikri645/philosopher-chat"
240
+ },
241
+ )
242
+ stream = client.chat.completions.create(
243
+ model=model_id,
244
+ messages=[
245
+ {"role": "system", "content": SYSTEM_PROMPT},
246
+ {"role": "user", "content": user_content},
247
+ ],
248
+ temperature=0.3,
249
+ stream=True,
250
+ )
251
+ for chunk in stream:
252
+ content = chunk.choices[0].delta.content
253
+ if content:
254
+ yield content
255
+
256
+ else:
257
+ raise ValueError(f"Unknown provider: {provider!r}")
258
+
259
+
260
+ # ---------------------------------------------------------------------------
261
+ # Public query interface
262
+ # ---------------------------------------------------------------------------
263
+
264
+ def query(
265
+ input_text: str, philosopher: str = "All", llm_label: str = DEFAULT_LLM
266
+ ) -> dict:
267
+ """Non-streaming query. Returns answer + context + scores."""
268
+ provider, model_id = LLM_OPTIONS.get(llm_label, LLM_OPTIONS[DEFAULT_LLM])
269
+ docs, scores = retrieve_docs(input_text, philosopher)
270
+ context_str = "\n\n".join(d.page_content for d in docs)
271
+ answer = _call_llm(provider, model_id, context_str, input_text)
272
+ return {"answer": answer, "context": docs, "scores": scores}
273
+
274
+
275
+ # ---------------------------------------------------------------------------
276
+ # UMAP embedding visualization
277
+ # ---------------------------------------------------------------------------
278
+
279
+ def get_umap_data() -> dict | None:
280
+ """Compute 2D UMAP projection of all KB embeddings.
281
+
282
+ Returns dict ready for plotly, or None if unavailable.
283
+ """
284
+ import numpy as np
285
+
286
+ try:
287
+ import umap as umap_module # type: ignore
288
+ except ImportError:
289
+ return None
290
+
291
+ if not vectorstore_exists():
292
+ return None
293
+
294
+ result = _get_vectorstore().get(include=["embeddings", "metadatas", "documents"])
295
+ embeddings_raw = result.get("embeddings")
296
+ if embeddings_raw is None or len(embeddings_raw) == 0:
297
+ return None
298
+
299
+ embeddings = np.array(embeddings_raw)
300
+ reducer = umap_module.UMAP(
301
+ n_components=2, random_state=42, n_neighbors=15, min_dist=0.1
302
+ )
303
+ coords = reducer.fit_transform(embeddings)
304
+
305
+ return {
306
+ "x": coords[:, 0].tolist(),
307
+ "y": coords[:, 1].tolist(),
308
+ "philosopher": [m.get("philosopher", "Unknown") for m in result["metadatas"]],
309
+ "title": [m.get("title", "Unknown") for m in result["metadatas"]],
310
+ "preview": [d[:120].replace("\n", " ") + "…" for d in result["documents"]],
311
+ }
312
+
313
+
314
+ # ---------------------------------------------------------------------------
315
+ # KB management
316
+ # ---------------------------------------------------------------------------
317
+
318
+ def add_to_kb(file_path: str | Path, author: str, title: str) -> int:
319
+ """Chunk, embed, and add a file to the vectorstore. Returns chunk count."""
320
+ file_path = Path(file_path)
321
+
322
+ if file_path.suffix.lower() == ".pdf":
323
+ from pypdf import PdfReader
324
+ reader = PdfReader(str(file_path))
325
+ text = "\n\n".join(
326
+ page.extract_text() for page in reader.pages if page.extract_text()
327
+ )
328
+ else:
329
+ text = file_path.read_text(encoding="utf-8", errors="replace")
330
+
331
+ if not text.strip():
332
+ raise ValueError("Could not extract text from the uploaded file.")
333
+
334
+ splitter = RecursiveCharacterTextSplitter(
335
+ chunk_size=CHUNK_SIZE,
336
+ chunk_overlap=CHUNK_OVERLAP,
337
+ separators=["\n\n", "\n", ". ", " ", ""],
338
+ )
339
+ docs = [
340
+ Document(
341
+ page_content=chunk,
342
+ metadata={
343
+ "philosopher": author.strip(),
344
+ "title": title.strip(),
345
+ "source": f"{author.strip()} — *{title.strip()}*",
346
+ },
347
+ )
348
+ for chunk in splitter.split_text(text)
349
+ ]
350
+
351
+ _get_vectorstore().add_documents(docs)
352
+ _get_bm25_retriever.cache_clear() # invalidate BM25 index after KB change
353
+ return len(docs)
requirements.txt ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ google-genai>=1.0.0
2
+ langchain>=0.3.0
3
+ langchain-google-genai>=2.0.0
4
+ langchain-huggingface>=0.1.0
5
+ langchain-community>=0.3.0
6
+ langchain-chroma>=0.1.4
7
+ langchain-text-splitters>=0.3.0
8
+ chromadb>=0.5.0
9
+ sentence-transformers>=3.0.0
10
+ gradio>=4.44.0
11
+ python-dotenv>=1.0.0
12
+ requests>=2.31.0
13
+ pypdf>=4.0.0
14
+ openai>=1.0.0
15
+ rank-bm25>=0.2.2
16
+ umap-learn>=0.5.0
17
+ plotly>=5.0.0