Spaces:

fikri0o0
/

philosopher-chat

Sleeping

+# LLM Model Comparison
+Benchmarked on: *"What is Nietzsche's view on nihilism and the will to power?"*
+Setup: RTX 3060, EmbeddingGemma-300M on CUDA, ChromaDB (~5,700 chunks), `RETRIEVAL_K=5`.
+Rate limits for Google verified directly from **aistudio.google.com/rate-limit** (May 2026).
+Rate limits for Groq verified from **live API response headers** + console.groq.com/docs/rate-limits.
+Rate limits for OpenRouter verified from **openrouter.ai/docs/guides/routing/model-variants/free**.
+---
+## Full Comparison Table
+| Model | Provider | Latency | RPM | TPM | RPD | Notes |
+|---|---|---|---|---|---|---|
+| **Gemma 4 MoE 26B** | Google | ~65 s | 15 | ∞ | **1,500** | Best limits of any Google model; slow but deep |
+| **Gemma 4 Dense 31B** | Google | ~25 s | 15 | ∞ | **1,500** | Same limits as MoE; faster, slightly less depth |
+| **Gemini 3.1 Flash Lite** | Google | ~0.6 s | 15 | 250K | **500** | Newest Gemini, highest RPD among Flash models |
+| **Gemini 3.5 Flash** | Google | ~0.8 s | 5 | 250K | 20 | Latest Gemini series; crisp reasoning |
+| **Gemini 3 Flash** | Google | ~9 s | 5 | 250K | 20 | Solid baseline Gemini 3 |
+| **Gemini 2.5 Flash** | Google | ~7 s | 5 | 250K | 20 | Previous generation; well-rounded |
+| **Gemini 2.5 Flash Lite** | Google | ~2 s | 10 | 250K | 20 | Fastest 2.5; same 20 RPD as 2.5 Flash |
+| **Llama 3.1 8B** | Groq | ~2 s | **14,400** | 6K | **14,400** | Highest throughput by far; limited depth |
+| **Llama 4 Scout 17B** | Groq | ~1.5 s | 1,000 | 30K | 1,000 | Fastest quality model overall |
+| **Llama 3.3 70B** | Groq | ~4.5 s | 1,000 | 12K | 1,000 | Best Groq quality; lower token quota |
+| **Qwen3 32B** | Groq | ~5 s | 1,000 | 6K | 1,000 | Chain-of-thought; deepest Groq reasoning |
+| **Nvidia Nemotron 120B** | OpenRouter | ~75 s | 20 | — | 50* | Exceptional philosophical depth; slow |
+| **OpenAI OSS 120B** | OpenRouter | ~22 s | 20 | — | 50* | Strong quality; best free OR option |
+| **DeepSeek V4 Flash** | OpenRouter | ~5 s† | 20 | — | 50* | 1M context window; fast when available |
+| **Llama 3.3 70B** | OpenRouter | ~4 s† | 20 | — | 50* | Same weights as Groq; use Groq instead |
+| **Qwen3 Next 80B** | OpenRouter | ~8 s† | 20 | — | 50* | Strong reasoning; frequently throttled |
+| **Gemma 4 MoE 26B** | OpenRouter | ~5 s† | 20 | — | 50* | Same weights as Google version |
+*50 RPD without account credits; 1,000 RPD with $10+ credit purchase
+†Latency when not throttled; free-tier provider-side 429s are common during peak hours
+— OpenRouter does not enforce token-based limits on free models
+---
+## Rate Limit Deep-Dive
+### Google AI Studio — verified from aistudio.google.com/rate-limit
+| Model | API Model ID | RPM | TPM | RPD |
+|---|---|---|---|---|
+| Gemma 4 MoE 26B | `gemma-4-26b-a4b-it` | 15 | **Unlimited** | **1,500** |
+| Gemma 4 Dense 31B | `gemma-4-31b-it` | 15 | **Unlimited** | **1,500** |
+| Gemini 3.1 Flash Lite | `gemini-3.1-flash-lite` | 15 | 250,000 | **500** |
+| Gemini 3.5 Flash | `gemini-3.5-flash` | 5 | 250,000 | 20 |
+| Gemini 3 Flash | `gemini-3-flash-preview` | 5 | 250,000 | 20 |
+| Gemini 2.5 Flash | `gemini-2.5-flash` | 5 | 250,000 | 20 |
+| Gemini 2.5 Flash Lite | `gemini-2.5-flash-lite` | 10 | 250,000 | 20 |
+| ~~Gemini 2.5 Pro~~ | ~~`gemini-2.5-pro`~~ | 0 | 0 | 0 |
+| ~~Gemini 2.0 Flash~~ | ~~`gemini-2.0-flash`~~ | 0 | 0 | 0 |
+> **Key insight:** Gemma 4 models have *significantly better* limits than Gemini models — unlimited TPM and 1,500 RPD vs just 20 RPD for most Gemini Flash variants. Gemini 2.5 Pro and 2.0 Flash are completely locked (0/0/0) on this account's free tier.
+---
+### Groq — verified from live API headers + docs
+| Model | API Model ID | RPM | TPM | RPD | TPD |
+|---|---|---|---|---|---|
+| Llama 3.1 8B instant | `llama-3.1-8b-instant` | **14,400** | 6,000 | **14,400** | 500,000 |
+| Llama 3.3 70B versatile | `llama-3.3-70b-versatile` | 1,000 | 12,000 | 1,000 | 100,000 |
+| Llama 4 Scout 17B | `meta-llama/llama-4-scout-17b-16e-instruct` | 1,000 | 30,000 | 1,000 | 500,000 |
+| Qwen3 32B | `qwen/qwen3-32b` | 1,000 | 6,000 | 1,000 | 500,000 |
+> **Key insight:** Groq is the most generous free tier for RAG use. Llama 3.1 8B has 14,400 RPD — useful for high-volume scenarios. Note that TPM limits (6K–30K) can be a bottleneck when RAG context is large; Llama 4 Scout has the most generous TPM at 30K.
+---
+### OpenRouter — all `:free` models share identical limits
+| Metric | Without credits | With $10+ credits |
+|---|---|---|
+| RPM | 20 | 20 |
+| RPD | **50** | 1,000 |
+| TPM / TPD | Unlimited | Unlimited |
+> **Key insight:** 50 RPD is exhausted extremely quickly — this explains the frequent 429 errors during testing. OpenRouter free tier is best for occasional access to very large models (120B+) not available elsewhere, not for regular daily use. Provider-side throttling from upstream (NVIDIA, DeepSeek, etc.) adds additional 429s beyond OpenRouter's own quota.
+---
+## Provider Verdict
+| Provider | Best for | Main bottleneck |
+|---|---|---|
+| **Google (Gemma 4)** | Best free tier overall — high RPD + unlimited tokens | Slow inference (~25–65 s) |
+| **Google (Gemini 3.1 Flash Lite)** | Best speed + reasonable daily quota | 500 RPD, 250K TPM |
+| **Groq** | Fastest inference, high-volume use | TPM cap (6K–30K) limits long RAG contexts |
+| **OpenRouter** | Accessing 120B+ models for free | 50 RPD hard cap, frequent provider throttling |
+---
+## Recommendations
+| Use case | Best choice |
+|---|---|
+| Best overall (default) | **Gemma 4 MoE 26B [Google]** — best limits + quality |
+| Fastest response | **Llama 4 Scout 17B [Groq]** — ~1.5 s |
+| Fastest + high daily quota | **Gemini 3.1 Flash Lite [Google]** — 500 RPD, ~0.6 s |
+| Deepest philosophical reasoning | **Qwen3 32B [Groq]** or **Llama 3.3 70B [Groq]** |
+| Maximum context window | **DeepSeek V4 Flash [OR]** — 1M tokens |
+| Highest model quality | **Nvidia Nemotron 120B [OR]** or **OpenAI OSS 120B [OR]** |
+| High-volume / many requests/day | **Llama 3.1 8B [Groq]** — 14,400 RPD |
+---
+## Running the benchmark
+```bash
+python test_models.py
+```
+Requires `.env` with at least one key:
+```
+GOOGLE_API_KEY=...      # aistudio.google.com
+GROQ_API_KEY=...        # console.groq.com
+OPENROUTER_API_KEY=...  # openrouter.ai
+```

README.md CHANGED Viewed

@@ -1,13 +1,130 @@
 ---
 title: Philosopher Chat
-emoji: 📈
-colorFrom: pink
-colorTo: yellow
 sdk: gradio
 sdk_version: 6.15.1
-python_version: '3.13'
 app_file: app.py
 pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
 title: Philosopher Chat
+emoji: 🏛️
+colorFrom: purple
+colorTo: indigo
 sdk: gradio
 sdk_version: 6.15.1
 app_file: app.py
 pinned: false
+license: mit
 ---
+# Philosopher Chat
+A RAG (Retrieval-Augmented Generation) chatbot grounded in Western philosophical primary texts.
+Ask questions about nihilism, existentialism, epistemology, ethics, and more — answers are
+cited directly from 12 primary texts (~5,700 chunks).
+**Live demo:** [fikri0o0/philosopher-chat on HuggingFace Spaces](https://huggingface.co/spaces/fikri0o0/philosopher-chat)
+---
+## Features
+| Feature | Detail |
+|---|---|
+| **Hybrid RAG** | BM25 + semantic cosine similarity ensemble |
+| **Streaming** | Token-by-token via Google / Groq / OpenRouter |
+| **16 LLMs** | Gemma 4, Gemini, Llama 4, Qwen3, DeepSeek, Nemotron — all free tier |
+| **Think blocks** | Qwen3 / DeepSeek reasoning rendered as collapsible chains-of-thought |
+| **UMAP viz** | 2D projection of all 5,700+ embeddings coloured by philosopher |
+| **Model comparison** | Side-by-side latency + quality comparison across any two models |
+| **Extendable KB** | Upload your own PDF/TXT to add new philosophers |
+---
+## Knowledge Base
+| Philosopher | Works |
+|---|---|
+| Nietzsche | *Thus Spoke Zarathustra*, *Beyond Good and Evil*, *On the Genealogy of Morality*, *The Birth of Tragedy* |
+| Schopenhauer | *Essays of Arthur Schopenhauer* |
+| Hume | *An Enquiry Concerning Human Understanding* |
+| Russell | *The Problems of Philosophy* |
+| Marcus Aurelius | *Meditations* |
+| Plato | *The Republic* |
+| Mill | *Utilitarianism* |
+| Epictetus | *The Enchiridion* |
+| Kant | *Fundamental Principles of the Metaphysic of Morals* |
+All texts are public domain, sourced from [Project Gutenberg](https://www.gutenberg.org).
+---
+## Tech Stack
+| Layer | Tool |
+|---|---|
+| LLM routing | 16 models via Google AI Studio, Groq, OpenRouter (all free tier) |
+| Embeddings | `google/embeddinggemma-300m` (HuggingFace, 768-dim) |
+| Retrieval | Hybrid BM25 + ChromaDB semantic search |
+| RAG Framework | LangChain LCEL (no chains, direct composition) |
+| UI | Gradio 6 |
+| Deployment | HuggingFace Spaces |
+---
+## Local Setup
+### 1. Clone and install
+```bash
+git clone https://github.com/Fikri645/philosopher-chat
+cd philosopher-chat
+pip install -r requirements.txt
+```
+### 2. Set up API keys
+```bash
+# Create .env with your keys:
+GOOGLE_API_KEY=...       # https://ai.google.dev  (free)
+GROQ_API_KEY=...         # https://console.groq.com  (free)
+OPENROUTER_API_KEY=...   # https://openrouter.ai  (free)
+HF_TOKEN=...             # https://huggingface.co/settings/tokens  (for gated EmbeddingGemma)
+```
+### 3. Build the vectorstore (run once)
+```bash
+python ingest.py
+```
+Downloads 12 texts from Project Gutenberg, chunks them, embeds with EmbeddingGemma-300M,
+and persists to `vectorstore/`. Takes ~5–10 min on first run (model download + embedding).
+### 4. Run the app
+```bash
+python app.py
+```
+Open http://localhost:7860 in your browser.
+---
+## Deploying to HuggingFace Spaces
+1. Fork or push to a new Space (SDK: **Gradio**)
+2. In **Space Settings → Variables and Secrets**, add:
+   - `GOOGLE_API_KEY`
+   - `GROQ_API_KEY`
+   - `OPENROUTER_API_KEY`
+   - `HF_TOKEN` (your HF token — needed to download the gated EmbeddingGemma model)
+3. On first boot the Space auto-ingests all 12 texts (~10 min); subsequent boots load the cached vectorstore.
+---
+## Project Structure
+```
+philosopher-chat/
+├── app.py              ← Gradio UI + event handlers
+├── rag_chain.py        ← LangChain RAG pipeline (retrieval + LLM routing)
+├── ingest.py           ← Data ingestion from Project Gutenberg
+├── config.py           ← LLM options, embedding model, RAG parameters
+├── requirements.txt
+├── .gitignore
+└── README.md
+```

app.py ADDED Viewed

	@@ -0,0 +1,522 @@

+import re
+import time
+import gradio as gr
+import plotly.express as px
+import pandas as pd
+from rag_chain import (
+    retrieve_docs, stream_llm, query, add_to_kb, vectorstore_exists,
+    get_all_philosophers, get_kb_stats, get_umap_data,
+)
+from config import LLM_OPTIONS, DEFAULT_LLM, EMBEDDING_OPTIONS, DEFAULT_EMBEDDING
+# ---------------------------------------------------------------------------
+# Display helpers
+# ---------------------------------------------------------------------------
+_PROVIDER_COLOR = {
+    "Google": "#4285F4",
+    "Groq":   "#FF4B36",
+    "OpenRouter": "#6366F1",
+}
+_COMPARE_DEFAULT_B = "Llama 4 Scout 17B  [Groq]"
+_THINK_STYLE = (
+    "color:var(--body-text-color-subdued);font-size:0.88em;"
+    "border-left:3px solid var(--border-color-primary);padding-left:12px;margin:6px 0"
+)
+_SUMMARY_STYLE = (
+    "cursor:pointer;color:var(--body-text-color-subdued);"
+    "font-style:italic;user-select:none"
+)
+def _format_think_blocks(text: str) -> str:
+    """Render <think>…</think> as collapsible, muted sections.
+    Mid-stream (</think> not yet seen): open <details> showing live reasoning.
+    Complete block: closed <details> with 'click to expand' label.
+    """
+    if "<think>" not in text:
+        return text
+    if "</think>" not in text:
+        # Partial — think block still streaming
+        idx = text.index("<think>")
+        pre, thinking = text[:idx], text[idx + 7:]
+        return (
+            pre
+            + f'<details open><summary style="{_SUMMARY_STYLE}">🤔 Thinking…</summary>'
+            + f'<div style="{_THINK_STYLE}">{thinking}</div></details>'
+        )
+    def _wrap(m: re.Match) -> str:
+        content = m.group(1).strip()
+        return (
+            f'<details><summary style="{_SUMMARY_STYLE}">'
+            "🤔 Chain of thought (click to expand)</summary>"
+            f'<div style="{_THINK_STYLE}">{content}</div></details>\n\n'
+        )
+    return re.sub(r"<think>(.*?)</think>", _wrap, text, flags=re.DOTALL)
+def _score_bar(score: float, width: int = 10) -> str:
+    filled = max(0, min(width, round(score * width)))
+    return "█" * filled + "░" * (width - filled)
+def _format_sources(docs: list, scores: list[float]) -> str:
+    if not docs:
+        return ""
+    seen: set = set()
+    lines: list[str] = []
+    for doc, score in zip(docs, scores):
+        key = doc.metadata.get("source", "Unknown source")
+        if key not in seen:
+            seen.add(key)
+            tag = f"`{score:.2f}` " if score >= 0 else "`BM25` "
+            lines.append(f"- {tag}{key}")
+    return "\n\n---\n**Sources:**\n" + "\n".join(lines)
+def _format_retrieved_chunks(docs: list, scores: list[float]) -> str:
+    if not docs:
+        return "_No chunks retrieved._"
+    semantic_scores = [s for s in scores if s >= 0]
+    avg = sum(semantic_scores) / len(semantic_scores) if semantic_scores else 0.0
+    has_bm25 = any(s < 0 for s in scores)
+    method = "Hybrid BM25 + Semantic" if has_bm25 else "Semantic"
+    lines = [
+        f"**{len(docs)} chunks** &nbsp;·&nbsp; {method}"
+        f" &nbsp;·&nbsp; avg similarity: **{avg:.3f}**\n"
+    ]
+    for i, (doc, score) in enumerate(zip(docs, scores), 1):
+        phil  = doc.metadata.get("philosopher", "?")
+        title = doc.metadata.get("title", "?")
+        if score >= 0:
+            tag = f"`{score:.3f}` {_score_bar(score)}"
+        else:
+            tag = "`BM25 ` ──────────"
+        text = doc.page_content[:200].replace("\n", " ").strip()
+        lines.append(
+            f"**{i}.** {tag} &nbsp; *{phil}* · {title}  \n"
+            f"&nbsp;&nbsp;&nbsp;&nbsp;*\"{text}...\"*\n"
+        )
+    return "\n".join(lines)
+def _format_metrics(
+    retrieve_s: float, llm_s: float, n_docs: int, n_sources: int
+) -> str:
+    return (
+        f"⏱ &nbsp;Retrieval **{retrieve_s:.2f}s** &nbsp;·&nbsp; "
+        f"LLM **{llm_s:.2f}s** &nbsp;·&nbsp; "
+        f"Total **{retrieve_s + llm_s:.2f}s** &nbsp;·&nbsp; "
+        f"**{n_docs}** chunks from **{n_sources}** source(s)"
+    )
+def _kb_markdown() -> str:
+    stats = get_kb_stats()
+    if not stats["total"]:
+        return "_Knowledge base is empty._"
+    lines = []
+    for phil in sorted(stats["sources"]):
+        lines.append(f"**{phil}**")
+        for title in sorted(stats["sources"][phil]):
+            lines.append(f"&nbsp;&nbsp;- *{title}*")
+    lines.append(f"\n_{stats['total']:,} total chunks_")
+    return "\n\n".join(lines)
+# ---------------------------------------------------------------------------
+# Event handlers
+# ---------------------------------------------------------------------------
+def respond_stream(message: str, history: list, philosopher: str, llm_label: str):
+    if not message.strip():
+        yield history, "", gr.update(), gr.update()
+        return
+    if not vectorstore_exists():
+        err = "Knowledge base not found. Run `python ingest.py` first."
+        yield history + [{"role": "assistant", "content": err}], "", gr.update(), gr.update()
+        return
+    # — Retrieval (fast, happens before streaming) —
+    t0 = time.perf_counter()
+    docs, scores = retrieve_docs(message, philosopher)
+    retrieve_time = time.perf_counter() - t0
+    context_str = "\n\n".join(d.page_content for d in docs)
+    chunks_md = _format_retrieved_chunks(docs, scores)
+    history = history + [
+        {"role": "user", "content": message},
+        {
+            "role": "assistant",
+            "content": (
+                "<em style='color:var(--body-text-color-subdued)'>"
+                "⏳ Retrieving context and generating response…"
+                "</em>"
+            ),
+        },
+    ]
+    # Show user bubble + loading message immediately
+    yield history, "", gr.update(value=chunks_md), gr.update()
+    provider, model_id = LLM_OPTIONS.get(llm_label, LLM_OPTIONS[DEFAULT_LLM])
+    t1 = time.perf_counter()
+    full_response = ""
+    try:
+        for text_chunk in stream_llm(provider, model_id, context_str, message):
+            full_response += text_chunk
+            history[-1]["content"] = _format_think_blocks(full_response)
+            yield history, "", gr.update(value=chunks_md), gr.update()
+        llm_time = time.perf_counter() - t1
+        unique_sources = len({d.metadata.get("source") for d in docs})
+        metrics_md = _format_metrics(retrieve_time, llm_time, len(docs), unique_sources)
+        history[-1]["content"] = (
+            _format_think_blocks(full_response) + _format_sources(docs, scores)
+        )
+        yield history, "", gr.update(value=chunks_md), gr.update(value=metrics_md)
+    except Exception as exc:
+        history[-1]["content"] = f"⚠️ **Error:** {exc}"
+        yield history, "", gr.update(value=chunks_md), gr.update()
+def compare_respond(message: str, philosopher: str, llm_a: str, llm_b: str):
+    if not message.strip():
+        return "Enter a question above.", "", "Enter a question above.", ""
+    if not vectorstore_exists():
+        msg = "Knowledge base not found."
+        return msg, "", msg, ""
+    def _run(llm_label: str) -> tuple[str, str]:
+        t0 = time.perf_counter()
+        result = query(message, philosopher, llm_label)
+        elapsed = time.perf_counter() - t0
+        n_src = len({d.metadata.get("source") for d in result["context"]})
+        sem_scores = [s for s in result["scores"] if s >= 0]
+        avg = sum(sem_scores) / len(sem_scores) if sem_scores else 0.0
+        metrics = (
+            f"⏱ **{elapsed:.2f}s** &nbsp;·&nbsp; "
+            f"**{len(result['context'])}** chunks from **{n_src}** source(s)"
+            f" &nbsp;·&nbsp; avg similarity **{avg:.3f}**"
+        )
+        return result["answer"], metrics
+    ans_a, met_a = _run(llm_a)
+    ans_b, met_b = _run(llm_b)
+    return ans_a, met_a, ans_b, met_b
+def upload_source(file, author: str, title: str):
+    if file is None:
+        return gr.update(value="Please upload a file first."), gr.update()
+    if not author.strip() or not title.strip():
+        return gr.update(value="Please fill in both Author and Title."), gr.update()
+    try:
+        n = add_to_kb(file, author.strip(), title.strip())
+        msg = f"Added {n:,} chunks from *{title}* by {author}."
+    except Exception as e:
+        msg = f"Error: {e}"
+    return (
+        gr.update(value=msg),
+        gr.update(choices=get_all_philosophers(), value="All"),
+    )
+def refresh_kb():
+    return gr.update(value=_kb_markdown())
+def build_umap_plot():
+    data = get_umap_data()
+    if data is None:
+        return None
+    df = pd.DataFrame(data)
+    fig = px.scatter(
+        df, x="x", y="y",
+        color="philosopher",
+        hover_data={"title": True, "preview": True, "x": False, "y": False},
+        title="Knowledge Base — Semantic Embedding Space (UMAP 2D)",
+        labels={"x": "UMAP-1", "y": "UMAP-2"},
+        opacity=0.75,
+        template="plotly_dark",
+        color_discrete_sequence=px.colors.qualitative.Bold,
+    )
+    fig.update_traces(marker=dict(size=5))
+    fig.update_layout(
+        height=540,
+        plot_bgcolor="rgba(0,0,0,0)",
+        paper_bgcolor="rgba(0,0,0,0)",
+        title_font=dict(size=14),
+        font=dict(color="rgba(220,220,220,0.9)"),
+        legend=dict(
+            title_text="",
+            yanchor="top", y=0.99, xanchor="left", x=0.01,
+            bgcolor="rgba(20,20,20,0.5)",
+            bordercolor="rgba(255,255,255,0.12)",
+            borderwidth=1,
+        ),
+        xaxis=dict(gridcolor="rgba(255,255,255,0.07)", zeroline=False),
+        yaxis=dict(gridcolor="rgba(255,255,255,0.07)", zeroline=False),
+        margin=dict(l=40, r=20, t=48, b=36),
+    )
+    return fig
+# ---------------------------------------------------------------------------
+# UI
+# ---------------------------------------------------------------------------
+EXAMPLE_QUESTIONS = [
+    "What is Nietzsche's view on nihilism and the death of God?",
+    "How does Schopenhauer view suffering and the will to live?",
+    "What does Hume say about causality and the limits of reason?",
+    "Can we have certain knowledge of the external world?",
+    "Is morality objective or invented?",
+    "Explain the concept of Eternal Return",
+    "How does Marcus Aurelius advise dealing with suffering?",
+    "What is Plato's ideal society in The Republic?",
+    "Compare Schopenhauer and Nietzsche on the will",
+    "What is Kant's categorical imperative?",
+    "How does Mill justify utilitarianism?",
+    "What does Epictetus say about what is in our control?",
+]
+CSS = """
+footer { display: none !important; }
+.section-label {
+    font-size: 0.78rem; font-weight: 700; letter-spacing: 0.07em;
+    text-transform: uppercase; color: var(--body-text-color-subdued);
+    margin-bottom: 2px;
+}
+.metric-bar p { font-size: 0.82rem; color: var(--body-text-color-subdued); margin: 4px 0; }
+.status-box textarea { font-size: 0.82rem !important; }
+/* Fix double scrollbar: prevent inner message wrappers from scrolling */
+.chatbot .overflow-y-auto { scrollbar-width: thin; scrollbar-color: var(--border-color-primary) transparent; }
+.chatbot .message-wrap { overflow: visible !important; }
+.chatbot .message-wrap > div { overflow: visible !important; max-height: none !important; }
+/* Prevent long markdown lines from adding a horizontal inner scroll */
+.chatbot .prose { overflow-x: hidden !important; overflow-wrap: break-word; word-break: break-word; }
+"""
+with gr.Blocks(title="Philosopher Chat") as demo:
+    gr.Markdown(
+        """
+# 📚 Philosopher Chat
+**RAG chatbot grounded in Western philosophical primary texts**
+Hybrid BM25 + Semantic retrieval &nbsp;·&nbsp; Real-time streaming
+&nbsp;·&nbsp; Multi-provider LLM routing &nbsp;·&nbsp; 12 primary texts · ~5 700 chunks
+        """
+    )
+    with gr.Tabs():
+        # ── Tab 1 ─ Chat ─────────────────────────────────────────────────
+        with gr.Tab("💬 Chat"):
+            with gr.Row(equal_height=False):
+                # Left: chat area
+                with gr.Column(scale=3):
+                    chatbot_ui = gr.Chatbot(
+                        height=480,
+                        show_label=False,
+                        placeholder="*Ask a philosophical question to get started...*",
+                    )
+                    msg_input = gr.Textbox(
+                        placeholder="Ask a philosophical question…",
+                        show_label=False,
+                        autofocus=True,
+                        submit_btn=True,
+                    )
+                    metrics_display = gr.Markdown(
+                        value="", elem_classes="metric-bar"
+                    )
+                    with gr.Accordion("📄 Retrieved Chunks & Scores", open=False):
+                        retrieved_display = gr.Markdown(
+                            value="_Submit a question to see retrieved context._"
+                        )
+                    with gr.Accordion("💡 Example Questions", open=False):
+                        gr.Examples(
+                            examples=[[q] for q in EXAMPLE_QUESTIONS],
+                            inputs=[msg_input],
+                            label=None,
+                        )
+                # Right: settings sidebar
+                with gr.Column(scale=1, min_width=240):
+                    with gr.Group():
+                        gr.Markdown("**⚙️ Chat Settings**", elem_classes="section-label")
+                        llm_dropdown = gr.Dropdown(
+                            choices=list(LLM_OPTIONS.keys()),
+                            value=DEFAULT_LLM,
+                            label="LLM Model",
+                        )
+                        embedding_display = gr.Dropdown(
+                            choices=list(EMBEDDING_OPTIONS.keys()),
+                            value=DEFAULT_EMBEDDING,
+                            label="Embedding Model",
+                            info="Change requires rebuilding index (ingest.py)",
+                            interactive=False,
+                        )
+                        philosopher_filter = gr.Dropdown(
+                            choices=get_all_philosophers(),
+                            value="All",
+                            label="Filter by Philosopher",
+                        )
+                    with gr.Group():
+                        gr.Markdown("**ℹ️ Stack**", elem_classes="section-label")
+                        gr.Markdown(
+                            "- Retrieval: **Hybrid BM25 + Semantic**\n"
+                            "- Embeddings: **EmbeddingGemma-300M**\n"
+                            "- Vector DB: **ChromaDB**\n"
+                            "- Framework: **LangChain LCEL**\n"
+                            "- UI: **Gradio 6**"
+                        )
+        # ── Tab 2 ─ Compare Models ───────────��───────────────────────────
+        with gr.Tab("⚖️ Compare Models"):
+            gr.Markdown(
+                "Run the same question through two models and compare quality, "
+                "latency, and retrieval coverage side by side."
+            )
+            with gr.Row():
+                compare_input = gr.Textbox(
+                    label="Question",
+                    placeholder="Ask a philosophical question…",
+                    scale=4,
+                )
+                compare_philosopher = gr.Dropdown(
+                    choices=get_all_philosophers(),
+                    value="All",
+                    label="Philosopher Filter",
+                    scale=1,
+                )
+            compare_btn = gr.Button("▶ Compare", variant="primary")
+            with gr.Row():
+                with gr.Column():
+                    model_a = gr.Dropdown(
+                        choices=list(LLM_OPTIONS.keys()),
+                        value=DEFAULT_LLM,
+                        label="Model A",
+                    )
+                    response_a = gr.Markdown(label="Response A")
+                    metrics_a  = gr.Markdown(elem_classes="metric-bar")
+                with gr.Column():
+                    model_b = gr.Dropdown(
+                        choices=list(LLM_OPTIONS.keys()),
+                        value=_COMPARE_DEFAULT_B,
+                        label="Model B",
+                    )
+                    response_b = gr.Markdown(label="Response B")
+                    metrics_b  = gr.Markdown(elem_classes="metric-bar")
+        # ── Tab 3 ─ Knowledge Base ───────────────────────────────────────
+        with gr.Tab("🗺️ Knowledge Base"):
+            with gr.Row(equal_height=False):
+                # Left: UMAP visualization
+                with gr.Column(scale=2):
+                    gr.Markdown(
+                        "**Semantic Embedding Space**  \n"
+                        "Each point is one text chunk. Clusters indicate semantic similarity — "
+                        "nearby chunks share philosophical themes regardless of source."
+                    )
+                    umap_plot = gr.Plot()
+                    umap_btn  = gr.Button(
+                        "Generate Embedding Visualization", variant="secondary"
+                    )
+                    gr.Markdown(
+                        "_UMAP projects ~5,700 × 768-dim embeddings to 2D. "
+                        "First run takes ~1–2 min on CPU._"
+                    )
+                # Right: stats + upload
+                with gr.Column(scale=1, min_width=280):
+                    with gr.Group():
+                        with gr.Row():
+                            gr.Markdown(
+                                "**📚 Knowledge Base**", elem_classes="section-label"
+                            )
+                            refresh_kb_btn = gr.Button("↻", size="sm", min_width=32)
+                        kb_display = gr.Markdown(_kb_markdown())
+                    with gr.Group():
+                        gr.Markdown(
+                            "**📤 Add Source**", elem_classes="section-label"
+                        )
+                        file_upload = gr.File(
+                            label="Upload PDF or TXT",
+                            file_types=[".pdf", ".txt"],
+                        )
+                        with gr.Row():
+                            author_input = gr.Textbox(label="Author", scale=1)
+                            title_input  = gr.Textbox(label="Title",  scale=1)
+                        upload_btn = gr.Button(
+                            "Add to Knowledge Base", variant="secondary", size="sm"
+                        )
+                        upload_status = gr.Textbox(
+                            show_label=False,
+                            interactive=False,
+                            placeholder="Upload status will appear here…",
+                            elem_classes="status-box",
+                        )
+    # ── Event wiring ─────────────────────────────────────────────────────
+    msg_input.submit(
+        respond_stream,
+        inputs=[msg_input, chatbot_ui, philosopher_filter, llm_dropdown],
+        outputs=[chatbot_ui, msg_input, retrieved_display, metrics_display],
+    )
+    compare_btn.click(
+        compare_respond,
+        inputs=[compare_input, compare_philosopher, model_a, model_b],
+        outputs=[response_a, metrics_a, response_b, metrics_b],
+    )
+    umap_btn.click(build_umap_plot, outputs=umap_plot)
+    refresh_kb_btn.click(refresh_kb, outputs=kb_display)
+    upload_btn.click(
+        upload_source,
+        inputs=[file_upload, author_input, title_input],
+        outputs=[upload_status, philosopher_filter],
+    ).then(refresh_kb, outputs=kb_display)
+def _auto_ingest() -> None:
+    """Build the vectorstore automatically on first Spaces run."""
+    if not vectorstore_exists():
+        print("[startup] Vectorstore missing — running initial ingest (this takes ~10 min)…")
+        try:
+            import ingest
+            ingest.main()
+            print("[startup] Ingest complete.")
+        except Exception as exc:
+            print(f"[startup] Ingest failed: {exc}")
+_auto_ingest()
+if __name__ == "__main__":
+    demo.launch(css=CSS)

config.py ADDED Viewed

	@@ -0,0 +1,89 @@

+import os
+import torch
+from pathlib import Path
+from dotenv import load_dotenv
+load_dotenv()
+PROJECT_ROOT = Path(__file__).parent
+DATA_DIR = PROJECT_ROOT / "data" / "texts"
+VECTORSTORE_DIR = PROJECT_ROOT / "vectorstore"
+GOOGLE_API_KEY      = os.getenv("GOOGLE_API_KEY", "")
+GROQ_API_KEY        = os.getenv("GROQ_API_KEY", "")
+OPENROUTER_API_KEY  = os.getenv("OPENROUTER_API_KEY", "")
+DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
+# ---------------------------------------------------------------------------
+# LLM options — (provider, model_id)
+# Providers: "google" | "groq" | "openrouter"
+# ---------------------------------------------------------------------------
+LLM_OPTIONS: dict[str, tuple[str, str]] = {
+    # ── Google AI Studio (free tier) ──────────────────────────────────────
+    # Limits verified from aistudio.google.com/rate-limit (2026-05)
+    "Gemma 4 MoE 26B  [Google]":         ("google",      "gemma-4-26b-a4b-it"),    # 15 RPM | ∞ TPM | 1500 RPD
+    "Gemma 4 Dense 31B  [Google]":       ("google",      "gemma-4-31b-it"),         # 15 RPM | ∞ TPM | 1500 RPD
+    "Gemini 3.1 Flash Lite  [Google]":   ("google",      "gemini-3.1-flash-lite"),  # 15 RPM | 250K TPM | 500 RPD
+    "Gemini 3.5 Flash  [Google]":        ("google",      "gemini-3.5-flash"),       #  5 RPM | 250K TPM |  20 RPD
+    "Gemini 3 Flash  [Google]":          ("google",      "gemini-3-flash-preview"), #  5 RPM | 250K TPM |  20 RPD
+    "Gemini 2.5 Flash  [Google]":        ("google",      "gemini-2.5-flash"),       #  5 RPM | 250K TPM |  20 RPD
+    "Gemini 2.5 Flash Lite  [Google]":   ("google",      "gemini-2.5-flash-lite"),  # 10 RPM | 250K TPM |  20 RPD
+    # ── Groq (free tier, very fast LPU inference) ─────────────────────────
+    "Llama 3.3 70B  [Groq]":             ("groq",        "llama-3.3-70b-versatile"),
+    "Llama 4 Scout 17B  [Groq]":         ("groq",        "meta-llama/llama-4-scout-17b-16e-instruct"),
+    "Qwen3 32B  [Groq]":                 ("groq",        "qwen/qwen3-32b"),
+    "Llama 3.1 8B  [Groq]":              ("groq",        "llama-3.1-8b-instant"),
+    # ── OpenRouter free models (:free = no cost, rate-limited) ────────────
+    "Nvidia Nemotron 120B  [OpenRouter]":("openrouter",  "nvidia/nemotron-3-super-120b-a12b:free"),
+    "OpenAI OSS 120B  [OpenRouter]":     ("openrouter",  "openai/gpt-oss-120b:free"),
+    "DeepSeek V4 Flash  [OpenRouter]":   ("openrouter",  "deepseek/deepseek-v4-flash:free"),
+    "Llama 3.3 70B  [OpenRouter]":       ("openrouter",  "meta-llama/llama-3.3-70b-instruct:free"),
+    "Qwen3 Next 80B  [OpenRouter]":      ("openrouter",  "qwen/qwen3-next-80b-a3b-instruct:free"),
+    "Gemma 4 MoE 26B  [OpenRouter]":     ("openrouter",  "google/gemma-4-26b-a4b-it:free"),
+}
+DEFAULT_LLM = "Gemma 4 MoE 26B  [Google]"
+PROVIDER_KEYS = {
+    "google":     ("GOOGLE_API_KEY",     "ai.google.dev"),
+    "groq":       ("GROQ_API_KEY",       "console.groq.com"),
+    "openrouter": ("OPENROUTER_API_KEY", "openrouter.ai"),
+}
+# ---------------------------------------------------------------------------
+# Embedding
+# ---------------------------------------------------------------------------
+EMBEDDING_OPTIONS = {
+    "EmbeddingGemma 300M (active)": "google/embeddinggemma-300m",
+    "BGE Large EN v1.5":            "BAAI/bge-large-en-v1.5",
+    "Multilingual E5 Large":        "intfloat/multilingual-e5-large",
+}
+DEFAULT_EMBEDDING = "EmbeddingGemma 300M (active)"
+EMBEDDING_MODEL   = EMBEDDING_OPTIONS[DEFAULT_EMBEDDING]
+# ---------------------------------------------------------------------------
+# RAG
+# ---------------------------------------------------------------------------
+CHUNK_SIZE        = 1000
+CHUNK_OVERLAP     = 150
+RETRIEVAL_K       = 6       # slightly more to absorb BM25 extras
+USE_HYBRID_SEARCH = True    # BM25 + semantic ensemble
+# ---------------------------------------------------------------------------
+# Knowledge base sources (Project Gutenberg)
+# ---------------------------------------------------------------------------
+SOURCES = [
+    {"philosopher": "Nietzsche",       "title": "Thus Spoke Zarathustra",                              "gutenberg_id": 1998},
+    {"philosopher": "Nietzsche",       "title": "Beyond Good and Evil",                                "gutenberg_id": 4363},
+    {"philosopher": "Nietzsche",       "title": "On the Genealogy of Morality",                        "gutenberg_id": 52319},
+    {"philosopher": "Nietzsche",       "title": "The Birth of Tragedy",                                "gutenberg_id": 51356},
+    {"philosopher": "Schopenhauer",    "title": "Essays of Arthur Schopenhauer",                       "gutenberg_id": 11945},
+    {"philosopher": "Hume",            "title": "An Enquiry Concerning Human Understanding",           "gutenberg_id": 9662},
+    {"philosopher": "Russell",         "title": "The Problems of Philosophy",                          "gutenberg_id": 5827},
+    {"philosopher": "Marcus Aurelius", "title": "Meditations",                                         "gutenberg_id": 2680},
+    {"philosopher": "Plato",           "title": "The Republic",                                        "gutenberg_id": 1497},
+    {"philosopher": "Mill",            "title": "Utilitarianism",                                      "gutenberg_id": 11224},
+    {"philosopher": "Epictetus",       "title": "The Enchiridion",                                     "gutenberg_id": 45109},
+    {"philosopher": "Kant",            "title": "Fundamental Principles of the Metaphysic of Morals", "gutenberg_id": 5682},
+]

ingest.py ADDED Viewed

	@@ -0,0 +1,157 @@

+"""
+Build or update the ChromaDB vectorstore from philosophical texts.
+    python ingest.py           # incremental: skips already-indexed sources
+    python ingest.py --rebuild # wipes and rebuilds from scratch
+"""
+import sys
+import time
+import requests
+from langchain_text_splitters import RecursiveCharacterTextSplitter
+from langchain_core.documents import Document
+from langchain_huggingface import HuggingFaceEmbeddings
+from langchain_chroma import Chroma
+from config import (
+    DATA_DIR, VECTORSTORE_DIR,
+    EMBEDDING_MODEL, CHUNK_SIZE, CHUNK_OVERLAP, SOURCES, DEVICE
+)
+GUTENBERG_URL = "https://www.gutenberg.org/cache/epub/{id}/pg{id}.txt"
+BATCH_SIZE = 50
+SLEEP_BETWEEN_BATCHES = 2
+def download_gutenberg(gutenberg_id: int, title: str) -> str:
+    url = GUTENBERG_URL.format(id=gutenberg_id)
+    print(f"  Downloading {url}")
+    try:
+        resp = requests.get(url, timeout=30)
+        resp.raise_for_status()
+        return resp.text
+    except Exception as e:
+        print(f"  ERROR: {e}")
+        return ""
+def strip_gutenberg_boilerplate(text: str) -> str:
+    start_markers = [
+        "*** START OF THE PROJECT GUTENBERG",
+        "***START OF THE PROJECT GUTENBERG",
+        "*** START OF THIS PROJECT GUTENBERG",
+    ]
+    end_markers = [
+        "*** END OF THE PROJECT GUTENBERG",
+        "***END OF THE PROJECT GUTENBERG",
+        "*** END OF THIS PROJECT GUTENBERG",
+    ]
+    start_idx = 0
+    for marker in start_markers:
+        idx = text.find(marker)
+        if idx != -1:
+            start_idx = text.find("\n", idx) + 1
+            break
+    end_idx = len(text)
+    for marker in end_markers:
+        idx = text.find(marker)
+        if idx != -1:
+            end_idx = idx
+            break
+    return text[start_idx:end_idx].strip()
+def get_embeddings() -> HuggingFaceEmbeddings:
+    print(f"Loading embedding model on {DEVICE}...")
+    return HuggingFaceEmbeddings(
+        model_name=EMBEDDING_MODEL,
+        model_kwargs={"device": DEVICE},
+        encode_kwargs={"prompt_name": "document", "normalize_embeddings": True},
+        query_encode_kwargs={"prompt_name": "query", "normalize_embeddings": True},
+    )
+def get_indexed_titles(vectorstore: Chroma) -> set[str]:
+    result = vectorstore.get(include=["metadatas"])
+    return {m.get("title", "") for m in result["metadatas"]}
+def ingest_source(source: dict, vectorstore: Chroma, splitter: RecursiveCharacterTextSplitter) -> int:
+    raw = download_gutenberg(source["gutenberg_id"], source["title"])
+    if not raw:
+        return 0
+    cleaned = strip_gutenberg_boilerplate(raw)
+    # Cache locally
+    DATA_DIR.mkdir(parents=True, exist_ok=True)
+    safe_name = f"{source['philosopher']}_{source['title'][:40].replace(' ', '_')}.txt"
+    (DATA_DIR / safe_name).write_text(cleaned, encoding="utf-8")
+    chunks = splitter.split_text(cleaned)
+    docs = [
+        Document(
+            page_content=chunk,
+            metadata={
+                "philosopher": source["philosopher"],
+                "title": source["title"],
+                "source": f"{source['philosopher']} — *{source['title']}*",
+            },
+        )
+        for chunk in chunks
+    ]
+    for i in range(0, len(docs), BATCH_SIZE):
+        vectorstore.add_documents(docs[i : i + BATCH_SIZE])
+        if i + BATCH_SIZE < len(docs):
+            time.sleep(SLEEP_BETWEEN_BATCHES)
+    return len(docs)
+def main() -> None:
+    rebuild = "--rebuild" in sys.argv
+    VECTORSTORE_DIR.mkdir(parents=True, exist_ok=True)
+    embeddings = get_embeddings()
+    splitter = RecursiveCharacterTextSplitter(
+        chunk_size=CHUNK_SIZE,
+        chunk_overlap=CHUNK_OVERLAP,
+        separators=["\n\n", "\n", ". ", " ", ""],
+    )
+    if rebuild and VECTORSTORE_DIR.exists():
+        import shutil
+        shutil.rmtree(VECTORSTORE_DIR)
+        VECTORSTORE_DIR.mkdir()
+        print("Vectorstore wiped for rebuild.")
+    vectorstore = Chroma(
+        collection_name="philosophers",
+        embedding_function=embeddings,
+        persist_directory=str(VECTORSTORE_DIR),
+    )
+    already_indexed = get_indexed_titles(vectorstore) if not rebuild else set()
+    total_new = 0
+    for source in SOURCES:
+        print(f"\n[{source['philosopher']}] {source['title']}")
+        if source["title"] in already_indexed:
+            print("  SKIPPED (already indexed)")
+            continue
+        n = ingest_source(source, vectorstore, splitter)
+        if n:
+            print(f"  -> {n} chunks added")
+            total_new += n
+        time.sleep(1)
+    if total_new:
+        print(f"\nDone. {total_new} new chunks added to vectorstore.")
+    else:
+        print("\nNothing new to index.")
+if __name__ == "__main__":
+    main()

rag_chain.py ADDED Viewed

	@@ -0,0 +1,353 @@

+from functools import lru_cache
+from pathlib import Path
+from typing import Generator
+from google import genai
+from google.genai import types
+from langchain_huggingface import HuggingFaceEmbeddings
+from langchain_chroma import Chroma
+from langchain_core.documents import Document
+from langchain_text_splitters import RecursiveCharacterTextSplitter
+from config import (
+    GOOGLE_API_KEY, GROQ_API_KEY, OPENROUTER_API_KEY,
+    LLM_OPTIONS, DEFAULT_LLM,
+    EMBEDDING_MODEL, VECTORSTORE_DIR, RETRIEVAL_K,
+    CHUNK_SIZE, CHUNK_OVERLAP, DEVICE, PROVIDER_KEYS,
+    USE_HYBRID_SEARCH,
+)
+SYSTEM_PROMPT = (
+    "You are a philosophical assistant with deep knowledge of Western philosophy, "
+    "particularly nihilism, absurdism, pessimism, existentialism, and epistemology. "
+    "Your answers are grounded in the primary texts provided as context.\n\n"
+    "Rules:\n"
+    "- Draw directly from the retrieved context passages.\n"
+    "- Always cite the philosopher and work "
+    "(e.g., 'As Nietzsche writes in *Thus Spoke Zarathustra*...').\n"
+    "- Be intellectually rigorous but accessible.\n"
+    "- If the context is insufficient, say so clearly.\n"
+    "- Present the philosophers' views faithfully without moralizing."
+)
+# ---------------------------------------------------------------------------
+# Cached singletons
+# ---------------------------------------------------------------------------
+@lru_cache(maxsize=1)
+def _get_genai_client() -> genai.Client:
+    return genai.Client(api_key=GOOGLE_API_KEY)
+@lru_cache(maxsize=1)
+def _get_embeddings() -> HuggingFaceEmbeddings:
+    return HuggingFaceEmbeddings(
+        model_name=EMBEDDING_MODEL,
+        model_kwargs={"device": DEVICE},
+        encode_kwargs={"prompt_name": "document", "normalize_embeddings": True},
+        query_encode_kwargs={"prompt_name": "query", "normalize_embeddings": True},
+    )
+@lru_cache(maxsize=1)
+def _get_vectorstore() -> Chroma:
+    return Chroma(
+        collection_name="philosophers",
+        embedding_function=_get_embeddings(),
+        persist_directory=str(VECTORSTORE_DIR),
+    )
+@lru_cache(maxsize=1)
+def _get_bm25_retriever():
+    """Build BM25 index over the full KB (cached after first call)."""
+    from langchain_community.retrievers import BM25Retriever  # requires rank-bm25
+    result = _get_vectorstore().get(include=["documents", "metadatas"])
+    docs = [
+        Document(page_content=d, metadata=m)
+        for d, m in zip(result["documents"], result["metadatas"])
+        if d.strip()
+    ]
+    retriever = BM25Retriever.from_documents(docs)
+    retriever.k = RETRIEVAL_K
+    return retriever
+# ---------------------------------------------------------------------------
+# Public helpers
+# ---------------------------------------------------------------------------
+def vectorstore_exists() -> bool:
+    return (VECTORSTORE_DIR / "chroma.sqlite3").exists()
+def get_all_philosophers() -> list[str]:
+    if not vectorstore_exists():
+        return ["All"]
+    result = _get_vectorstore().get(include=["metadatas"])
+    names = sorted({m["philosopher"] for m in result["metadatas"] if "philosopher" in m})
+    return ["All"] + names
+def get_kb_stats() -> dict:
+    if not vectorstore_exists():
+        return {"total": 0, "sources": {}}
+    result = _get_vectorstore().get(include=["metadatas"])
+    sources: dict[str, set] = {}
+    for m in result["metadatas"]:
+        phil = m.get("philosopher", "Unknown")
+        title = m.get("title", "Unknown")
+        sources.setdefault(phil, set()).add(title)
+    return {"total": len(result["ids"]), "sources": sources}
+# ---------------------------------------------------------------------------
+# Retrieval
+# ---------------------------------------------------------------------------
+def retrieve_docs(
+    input_text: str, philosopher: str = "All"
+) -> tuple[list[Document], list[float]]:
+    """Hybrid BM25 + semantic retrieval.
+    Returns (docs, scores) where scores are cosine relevance ∈ [0, 1].
+    BM25-only results are tagged with score -1.0 (no embedding similarity).
+    """
+    vectorstore = _get_vectorstore()
+    search_kwargs: dict = {"k": RETRIEVAL_K}
+    if philosopher != "All":
+        search_kwargs["filter"] = {"philosopher": philosopher}
+    pairs = vectorstore.similarity_search_with_relevance_scores(input_text, **search_kwargs)
+    if USE_HYBRID_SEARCH and philosopher == "All":
+        try:
+            bm25_docs = _get_bm25_retriever().invoke(input_text)
+            seen = {doc.page_content for doc, _ in pairs}
+            for doc in bm25_docs[:2]:
+                if doc.page_content not in seen:
+                    pairs.append((doc, -1.0))
+                    seen.add(doc.page_content)
+        except Exception:
+            pass
+    # Sort: semantic scores descending, BM25 appended at end
+    semantic = sorted([(d, s) for d, s in pairs if s >= 0], key=lambda x: x[1], reverse=True)
+    bm25_only = [(d, s) for d, s in pairs if s < 0]
+    pairs = (semantic + bm25_only)[: RETRIEVAL_K + 2]
+    return [d for d, _ in pairs], [s for _, s in pairs]
+# ---------------------------------------------------------------------------
+# LLM calls — non-streaming
+# ---------------------------------------------------------------------------
+def _call_llm(provider: str, model_id: str, context_str: str, input_text: str) -> str:
+    user_content = (
+        f"Context from philosophical texts:\n{context_str}\n\nQuestion: {input_text}"
+    )
+    if provider == "google":
+        if not GOOGLE_API_KEY:
+            env_var, site = PROVIDER_KEYS["google"]
+            raise ValueError(f"{env_var} not set. Get a free key at {site}")
+        response = _get_genai_client().models.generate_content(
+            model=model_id,
+            contents=user_content,
+            config=types.GenerateContentConfig(
+                system_instruction=SYSTEM_PROMPT, temperature=0.3
+            ),
+        )
+        return response.text
+    elif provider == "groq":
+        if not GROQ_API_KEY:
+            env_var, site = PROVIDER_KEYS["groq"]
+            raise ValueError(f"{env_var} not set. Get a free key at {site}")
+        from openai import OpenAI
+        client = OpenAI(api_key=GROQ_API_KEY, base_url="https://api.groq.com/openai/v1")
+    elif provider == "openrouter":
+        if not OPENROUTER_API_KEY:
+            env_var, site = PROVIDER_KEYS["openrouter"]
+            raise ValueError(f"{env_var} not set. Get a free key at {site}")
+        from openai import OpenAI
+        client = OpenAI(
+            api_key=OPENROUTER_API_KEY,
+            base_url="https://openrouter.ai/api/v1",
+            default_headers={"HTTP-Referer": "https://github.com/Fikri645/philosopher-chat"},
+        )
+    else:
+        raise ValueError(f"Unknown provider: {provider!r}")
+    resp = client.chat.completions.create(
+        model=model_id,
+        messages=[
+            {"role": "system", "content": SYSTEM_PROMPT},
+            {"role": "user", "content": user_content},
+        ],
+        temperature=0.3,
+    )
+    return resp.choices[0].message.content
+# ---------------------------------------------------------------------------
+# LLM calls — streaming
+# ---------------------------------------------------------------------------
+def stream_llm(
+    provider: str, model_id: str, context_str: str, input_text: str
+) -> Generator[str, None, None]:
+    """Yield text chunks for real-time streaming."""
+    user_content = (
+        f"Context from philosophical texts:\n{context_str}\n\nQuestion: {input_text}"
+    )
+    if provider == "google":
+        if not GOOGLE_API_KEY:
+            env_var, site = PROVIDER_KEYS["google"]
+            raise ValueError(f"{env_var} not set. Get a free key at {site}")
+        for chunk in _get_genai_client().models.generate_content_stream(
+            model=model_id,
+            contents=user_content,
+            config=types.GenerateContentConfig(
+                system_instruction=SYSTEM_PROMPT, temperature=0.3
+            ),
+        ):
+            if chunk.text:
+                yield chunk.text
+    elif provider in ("groq", "openrouter"):
+        if provider == "groq":
+            if not GROQ_API_KEY:
+                env_var, site = PROVIDER_KEYS["groq"]
+                raise ValueError(f"{env_var} not set. Get a free key at {site}")
+            from openai import OpenAI
+            client = OpenAI(
+                api_key=GROQ_API_KEY, base_url="https://api.groq.com/openai/v1"
+            )
+        else:
+            if not OPENROUTER_API_KEY:
+                env_var, site = PROVIDER_KEYS["openrouter"]
+                raise ValueError(f"{env_var} not set. Get a free key at {site}")
+            from openai import OpenAI
+            client = OpenAI(
+                api_key=OPENROUTER_API_KEY,
+                base_url="https://openrouter.ai/api/v1",
+                default_headers={
+                    "HTTP-Referer": "https://github.com/Fikri645/philosopher-chat"
+                },
+            )
+        stream = client.chat.completions.create(
+            model=model_id,
+            messages=[
+                {"role": "system", "content": SYSTEM_PROMPT},
+                {"role": "user", "content": user_content},
+            ],
+            temperature=0.3,
+            stream=True,
+        )
+        for chunk in stream:
+            content = chunk.choices[0].delta.content
+            if content:
+                yield content
+    else:
+        raise ValueError(f"Unknown provider: {provider!r}")
+# ---------------------------------------------------------------------------
+# Public query interface
+# ---------------------------------------------------------------------------
+def query(
+    input_text: str, philosopher: str = "All", llm_label: str = DEFAULT_LLM
+) -> dict:
+    """Non-streaming query. Returns answer + context + scores."""
+    provider, model_id = LLM_OPTIONS.get(llm_label, LLM_OPTIONS[DEFAULT_LLM])
+    docs, scores = retrieve_docs(input_text, philosopher)
+    context_str = "\n\n".join(d.page_content for d in docs)
+    answer = _call_llm(provider, model_id, context_str, input_text)
+    return {"answer": answer, "context": docs, "scores": scores}
+# ---------------------------------------------------------------------------
+# UMAP embedding visualization
+# ---------------------------------------------------------------------------
+def get_umap_data() -> dict | None:
+    """Compute 2D UMAP projection of all KB embeddings.
+    Returns dict ready for plotly, or None if unavailable.
+    """
+    import numpy as np
+    try:
+        import umap as umap_module  # type: ignore
+    except ImportError:
+        return None
+    if not vectorstore_exists():
+        return None
+    result = _get_vectorstore().get(include=["embeddings", "metadatas", "documents"])
+    embeddings_raw = result.get("embeddings")
+    if embeddings_raw is None or len(embeddings_raw) == 0:
+        return None
+    embeddings = np.array(embeddings_raw)
+    reducer = umap_module.UMAP(
+        n_components=2, random_state=42, n_neighbors=15, min_dist=0.1
+    )
+    coords = reducer.fit_transform(embeddings)
+    return {
+        "x": coords[:, 0].tolist(),
+        "y": coords[:, 1].tolist(),
+        "philosopher": [m.get("philosopher", "Unknown") for m in result["metadatas"]],
+        "title": [m.get("title", "Unknown") for m in result["metadatas"]],
+        "preview": [d[:120].replace("\n", " ") + "…" for d in result["documents"]],
+    }
+# ---------------------------------------------------------------------------
+# KB management
+# ---------------------------------------------------------------------------
+def add_to_kb(file_path: str | Path, author: str, title: str) -> int:
+    """Chunk, embed, and add a file to the vectorstore. Returns chunk count."""
+    file_path = Path(file_path)
+    if file_path.suffix.lower() == ".pdf":
+        from pypdf import PdfReader
+        reader = PdfReader(str(file_path))
+        text = "\n\n".join(
+            page.extract_text() for page in reader.pages if page.extract_text()
+        )
+    else:
+        text = file_path.read_text(encoding="utf-8", errors="replace")
+    if not text.strip():
+        raise ValueError("Could not extract text from the uploaded file.")
+    splitter = RecursiveCharacterTextSplitter(
+        chunk_size=CHUNK_SIZE,
+        chunk_overlap=CHUNK_OVERLAP,
+        separators=["\n\n", "\n", ". ", " ", ""],
+    )
+    docs = [
+        Document(
+            page_content=chunk,
+            metadata={
+                "philosopher": author.strip(),
+                "title": title.strip(),
+                "source": f"{author.strip()} — *{title.strip()}*",
+            },
+        )
+        for chunk in splitter.split_text(text)
+    ]
+    _get_vectorstore().add_documents(docs)
+    _get_bm25_retriever.cache_clear()  # invalidate BM25 index after KB change
+    return len(docs)

requirements.txt ADDED Viewed

	@@ -0,0 +1,17 @@

+google-genai>=1.0.0
+langchain>=0.3.0
+langchain-google-genai>=2.0.0
+langchain-huggingface>=0.1.0
+langchain-community>=0.3.0
+langchain-chroma>=0.1.4
+langchain-text-splitters>=0.3.0
+chromadb>=0.5.0
+sentence-transformers>=3.0.0
+gradio>=4.44.0
+python-dotenv>=1.0.0
+requests>=2.31.0
+pypdf>=4.0.0
+openai>=1.0.0
+rank-bm25>=0.2.2
+umap-learn>=0.5.0
+plotly>=5.0.0