Spaces:

fikri0o0
/

philosopher-chat

Sleeping

App Files Files Community

philosopher-chat / README.md

fikri0o0

Honest eval narrative + query rewrite off by default

2224b90 verified 8 days ago

preview code

raw

history blame contribute delete

6.86 kB

A newer version of the Gradio SDK is available: 6.16.0

Upgrade

metadata

title: Philosopher Chat
emoji: 🏛️
colorFrom: purple
colorTo: indigo
sdk: gradio
sdk_version: 6.15.1
app_file: app.py
pinned: false
license: mit

Philosopher Chat

A RAG (Retrieval-Augmented Generation) chatbot grounded in Western philosophical primary texts. Ask questions about nihilism, existentialism, epistemology, ethics, and more — answers are cited directly from 12 primary texts (~5,700 chunks).

Live demo: fikri0o0/philosopher-chat on HuggingFace Spaces

Features

Feature	Detail
Query rewriting	Multi-query expansion (LLM paraphrases) fused with RRF for better recall
Two-stage retrieval	Hybrid (dense + BM25) → RRF → cross-encoder reranking
Corrective RAG	Abstains when retrieval confidence is low instead of hallucinating
RAGAS evaluation	4 metrics, 3-stage ablation — each component quantified, not assumed
Streaming	Token-by-token via Google / Groq / OpenRouter
15 LLMs	Gemma 4, Gemini, Llama 4, Qwen3, DeepSeek, Nemotron — all free tier
Think blocks	Qwen3 / DeepSeek reasoning rendered as collapsible chains-of-thought
UMAP viz	2D projection of all 5,700+ embeddings coloured by philosopher
Model comparison	Side-by-side latency + quality comparison across any two models
Extendable KB	Upload your own PDF/TXT to add new philosophers

Knowledge Base

Philosopher	Works
Nietzsche	Thus Spoke Zarathustra, Beyond Good and Evil, On the Genealogy of Morality, The Birth of Tragedy
Schopenhauer	Essays of Arthur Schopenhauer
Hume	An Enquiry Concerning Human Understanding
Russell	The Problems of Philosophy
Marcus Aurelius	Meditations
Plato	The Republic
Mill	Utilitarianism
Epictetus	The Enchiridion
Kant	Fundamental Principles of the Metaphysic of Morals

All texts are public domain, sourced from Project Gutenberg.

Tech Stack

Layer	Tool
LLM routing	15 models via Google AI Studio, Groq, OpenRouter (all free tier)
Embeddings	`google/embeddinggemma-300m` (HuggingFace, 768-dim)
Query transform	Multi-query rewriting (LLM paraphrases → RRF)
Retrieval	Hybrid (dense + BM25) → RRF fusion → cross-encoder rerank
Reranker	`BAAI/bge-reranker-v2-m3` (multilingual cross-encoder)
Guardrail	Corrective RAG — cosine-gated abstention on out-of-corpus queries
Evaluation	RAGAS metrics (faithfulness, relevancy, context precision/recall)
RAG Framework	LangChain LCEL (no chains, direct composition)
UI	Gradio 6
Deployment	HuggingFace Spaces

Retrieval Architecture

Question
   │
   ├─ Query rewriting   (LLM → original + paraphrases)            ─┐
   │     each variant ↓                                            │
   ├─ Dense retrieval   (EmbeddingGemma-300M → ChromaDB cosine)    ├─ RRF fusion → top-20 pool
   ├─ Sparse retrieval  (BM25 / rank-bm25)                        ─┘
   │
   ├─ Cross-encoder rerank  (BGE-reranker-v2-m3) → top-6
   │
   ├─ Corrective gate  (cosine < threshold → abstain)
   │
   └─ LLM answer  (grounded + cited from top-6 chunks)

The pattern follows modern production RAG: cheap recall first (multi-query hybrid), a precise cross-encoder rerank of the small pool, and an abstention gate so out-of-corpus questions get an honest "I don't know" instead of a hallucination.

Evaluation

The pipeline is measured, not assumed. evaluate.py generates answers for a curated question set with reference answers across a 3-stage ablation (baseline → + reranker → + query rewrite), then an LLM judge scores four RAGAS metrics. Results render live in the app's 📊 Evaluation tab; full analysis in the evaluation notebook.

Measuring each component (12 questions, LLM-as-judge)

Metric	Baseline (Hybrid)	+ Reranker	+ Query Rewrite	Δ (full)
Faithfulness	0.40	0.44	0.43	+0.03
Answer Relevancy	0.94	0.90	0.91	−0.03
Context Precision	1.00	1.00	0.99	−0.01
Context Recall	0.38	0.51	0.43	+0.06

The reranker is the clear win — Context Recall jumps +0.14 (0.38 → 0.51) and Faithfulness rises, with no real cost elsewhere. Query rewriting did not help this corpus — it slightly reduced recall (0.51 → 0.43) and relevancy. That is the point of measuring: the data says ship the reranker and treat query rewriting as corpus-dependent rather than assuming more components are always better. Two-phase eval (generation, then judging) keeps it reproducible:

pip install -r requirements.txt
pip install --no-deps ragas && pip install -r requirements-eval.txt
python evaluate.py --generate   # phase A: real retrieval + generation → eval_samples.json
# phase B: an LLM judge scores eval_samples.json → eval_results.json

Local Setup

1. Clone and install

git clone https://github.com/Fikri645/philosopher-chat
cd philosopher-chat
pip install -r requirements.txt

2. Set up API keys

# Create .env with your keys:
GOOGLE_API_KEY=...       # https://ai.google.dev  (free)
GROQ_API_KEY=...         # https://console.groq.com  (free)
OPENROUTER_API_KEY=...   # https://openrouter.ai  (free)
HF_TOKEN=...             # https://huggingface.co/settings/tokens  (for gated EmbeddingGemma)

3. Build the vectorstore (run once)

python ingest.py

Downloads 12 texts from Project Gutenberg, chunks them, embeds with EmbeddingGemma-300M, and persists to vectorstore/. Takes ~5–10 min on first run (model download + embedding).

4. Run the app

python app.py

Open http://localhost:7860 in your browser.

Deploying to HuggingFace Spaces

Fork or push to a new Space (SDK: Gradio)
In Space Settings → Variables and Secrets, add:
- GOOGLE_API_KEY
- GROQ_API_KEY
- OPENROUTER_API_KEY
- HF_TOKEN (your HF token — needed to download the gated EmbeddingGemma model)
On first boot the Space auto-ingests all 12 texts (~10 min); subsequent boots load the cached vectorstore.

Project Structure

philosopher-chat/
├── app.py              ← Gradio UI + event handlers
├── rag_chain.py        ← LangChain RAG pipeline (retrieval + LLM routing)
├── ingest.py           ← Data ingestion from Project Gutenberg
├── config.py           ← LLM options, embedding model, RAG parameters
├── requirements.txt
├── .gitignore
└── README.md