philosopher-chat / README.md
fikri0o0's picture
Honest eval narrative + query rewrite off by default
2224b90 verified

A newer version of the Gradio SDK is available: 6.16.0

Upgrade
metadata
title: Philosopher Chat
emoji: πŸ›οΈ
colorFrom: purple
colorTo: indigo
sdk: gradio
sdk_version: 6.15.1
app_file: app.py
pinned: false
license: mit

Philosopher Chat

A RAG (Retrieval-Augmented Generation) chatbot grounded in Western philosophical primary texts. Ask questions about nihilism, existentialism, epistemology, ethics, and more β€” answers are cited directly from 12 primary texts (~5,700 chunks).

Live demo: fikri0o0/philosopher-chat on HuggingFace Spaces


Features

Feature Detail
Query rewriting Multi-query expansion (LLM paraphrases) fused with RRF for better recall
Two-stage retrieval Hybrid (dense + BM25) β†’ RRF β†’ cross-encoder reranking
Corrective RAG Abstains when retrieval confidence is low instead of hallucinating
RAGAS evaluation 4 metrics, 3-stage ablation β€” each component quantified, not assumed
Streaming Token-by-token via Google / Groq / OpenRouter
15 LLMs Gemma 4, Gemini, Llama 4, Qwen3, DeepSeek, Nemotron β€” all free tier
Think blocks Qwen3 / DeepSeek reasoning rendered as collapsible chains-of-thought
UMAP viz 2D projection of all 5,700+ embeddings coloured by philosopher
Model comparison Side-by-side latency + quality comparison across any two models
Extendable KB Upload your own PDF/TXT to add new philosophers

Knowledge Base

Philosopher Works
Nietzsche Thus Spoke Zarathustra, Beyond Good and Evil, On the Genealogy of Morality, The Birth of Tragedy
Schopenhauer Essays of Arthur Schopenhauer
Hume An Enquiry Concerning Human Understanding
Russell The Problems of Philosophy
Marcus Aurelius Meditations
Plato The Republic
Mill Utilitarianism
Epictetus The Enchiridion
Kant Fundamental Principles of the Metaphysic of Morals

All texts are public domain, sourced from Project Gutenberg.


Tech Stack

Layer Tool
LLM routing 15 models via Google AI Studio, Groq, OpenRouter (all free tier)
Embeddings google/embeddinggemma-300m (HuggingFace, 768-dim)
Query transform Multi-query rewriting (LLM paraphrases β†’ RRF)
Retrieval Hybrid (dense + BM25) β†’ RRF fusion β†’ cross-encoder rerank
Reranker BAAI/bge-reranker-v2-m3 (multilingual cross-encoder)
Guardrail Corrective RAG β€” cosine-gated abstention on out-of-corpus queries
Evaluation RAGAS metrics (faithfulness, relevancy, context precision/recall)
RAG Framework LangChain LCEL (no chains, direct composition)
UI Gradio 6
Deployment HuggingFace Spaces

Retrieval Architecture

Question
   β”‚
   β”œβ”€ Query rewriting   (LLM β†’ original + paraphrases)            ─┐
   β”‚     each variant ↓                                            β”‚
   β”œβ”€ Dense retrieval   (EmbeddingGemma-300M β†’ ChromaDB cosine)    β”œβ”€ RRF fusion β†’ top-20 pool
   β”œβ”€ Sparse retrieval  (BM25 / rank-bm25)                        β”€β”˜
   β”‚
   β”œβ”€ Cross-encoder rerank  (BGE-reranker-v2-m3) β†’ top-6
   β”‚
   β”œβ”€ Corrective gate  (cosine < threshold β†’ abstain)
   β”‚
   └─ LLM answer  (grounded + cited from top-6 chunks)

The pattern follows modern production RAG: cheap recall first (multi-query hybrid), a precise cross-encoder rerank of the small pool, and an abstention gate so out-of-corpus questions get an honest "I don't know" instead of a hallucination.

Evaluation

The pipeline is measured, not assumed. evaluate.py generates answers for a curated question set with reference answers across a 3-stage ablation (baseline β†’ + reranker β†’ + query rewrite), then an LLM judge scores four RAGAS metrics. Results render live in the app's πŸ“Š Evaluation tab; full analysis in the evaluation notebook.

Measuring each component (12 questions, LLM-as-judge)

Metric Baseline (Hybrid) + Reranker + Query Rewrite Ξ” (full)
Faithfulness 0.40 0.44 0.43 +0.03
Answer Relevancy 0.94 0.90 0.91 βˆ’0.03
Context Precision 1.00 1.00 0.99 βˆ’0.01
Context Recall 0.38 0.51 0.43 +0.06

The reranker is the clear win β€” Context Recall jumps +0.14 (0.38 β†’ 0.51) and Faithfulness rises, with no real cost elsewhere. Query rewriting did not help this corpus β€” it slightly reduced recall (0.51 β†’ 0.43) and relevancy. That is the point of measuring: the data says ship the reranker and treat query rewriting as corpus-dependent rather than assuming more components are always better. Two-phase eval (generation, then judging) keeps it reproducible:

pip install -r requirements.txt
pip install --no-deps ragas && pip install -r requirements-eval.txt
python evaluate.py --generate   # phase A: real retrieval + generation β†’ eval_samples.json
# phase B: an LLM judge scores eval_samples.json β†’ eval_results.json

Local Setup

1. Clone and install

git clone https://github.com/Fikri645/philosopher-chat
cd philosopher-chat
pip install -r requirements.txt

2. Set up API keys

# Create .env with your keys:
GOOGLE_API_KEY=...       # https://ai.google.dev  (free)
GROQ_API_KEY=...         # https://console.groq.com  (free)
OPENROUTER_API_KEY=...   # https://openrouter.ai  (free)
HF_TOKEN=...             # https://huggingface.co/settings/tokens  (for gated EmbeddingGemma)

3. Build the vectorstore (run once)

python ingest.py

Downloads 12 texts from Project Gutenberg, chunks them, embeds with EmbeddingGemma-300M, and persists to vectorstore/. Takes ~5–10 min on first run (model download + embedding).

4. Run the app

python app.py

Open http://localhost:7860 in your browser.


Deploying to HuggingFace Spaces

  1. Fork or push to a new Space (SDK: Gradio)
  2. In Space Settings β†’ Variables and Secrets, add:
    • GOOGLE_API_KEY
    • GROQ_API_KEY
    • OPENROUTER_API_KEY
    • HF_TOKEN (your HF token β€” needed to download the gated EmbeddingGemma model)
  3. On first boot the Space auto-ingests all 12 texts (~10 min); subsequent boots load the cached vectorstore.

Project Structure

philosopher-chat/
β”œβ”€β”€ app.py              ← Gradio UI + event handlers
β”œβ”€β”€ rag_chain.py        ← LangChain RAG pipeline (retrieval + LLM routing)
β”œβ”€β”€ ingest.py           ← Data ingestion from Project Gutenberg
β”œβ”€β”€ config.py           ← LLM options, embedding model, RAG parameters
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ .gitignore
└── README.md