Spaces:
Sleeping
Sleeping
| title: Philosopher Chat | |
| emoji: ποΈ | |
| colorFrom: purple | |
| colorTo: indigo | |
| sdk: gradio | |
| sdk_version: 6.15.1 | |
| app_file: app.py | |
| pinned: false | |
| license: mit | |
| # Philosopher Chat | |
| A RAG (Retrieval-Augmented Generation) chatbot grounded in Western philosophical primary texts. | |
| Ask questions about nihilism, existentialism, epistemology, ethics, and more β answers are | |
| cited directly from 12 primary texts (~5,700 chunks). | |
| **Live demo:** [fikri0o0/philosopher-chat on HuggingFace Spaces](https://huggingface.co/spaces/fikri0o0/philosopher-chat) | |
| --- | |
| ## Features | |
| | Feature | Detail | | |
| |---|---| | |
| | **Query rewriting** | Multi-query expansion (LLM paraphrases) fused with RRF for better recall | | |
| | **Two-stage retrieval** | Hybrid (dense + BM25) β RRF β cross-encoder reranking | | |
| | **Corrective RAG** | Abstains when retrieval confidence is low instead of hallucinating | | |
| | **RAGAS evaluation** | 4 metrics, 3-stage ablation β each component quantified, not assumed | | |
| | **Streaming** | Token-by-token via Google / Groq / OpenRouter | | |
| | **15 LLMs** | Gemma 4, Gemini, Llama 4, Qwen3, DeepSeek, Nemotron β all free tier | | |
| | **Think blocks** | Qwen3 / DeepSeek reasoning rendered as collapsible chains-of-thought | | |
| | **UMAP viz** | 2D projection of all 5,700+ embeddings coloured by philosopher | | |
| | **Model comparison** | Side-by-side latency + quality comparison across any two models | | |
| | **Extendable KB** | Upload your own PDF/TXT to add new philosophers | | |
| --- | |
| ## Knowledge Base | |
| | Philosopher | Works | | |
| |---|---| | |
| | Nietzsche | *Thus Spoke Zarathustra*, *Beyond Good and Evil*, *On the Genealogy of Morality*, *The Birth of Tragedy* | | |
| | Schopenhauer | *Essays of Arthur Schopenhauer* | | |
| | Hume | *An Enquiry Concerning Human Understanding* | | |
| | Russell | *The Problems of Philosophy* | | |
| | Marcus Aurelius | *Meditations* | | |
| | Plato | *The Republic* | | |
| | Mill | *Utilitarianism* | | |
| | Epictetus | *The Enchiridion* | | |
| | Kant | *Fundamental Principles of the Metaphysic of Morals* | | |
| All texts are public domain, sourced from [Project Gutenberg](https://www.gutenberg.org). | |
| --- | |
| ## Tech Stack | |
| | Layer | Tool | | |
| |---|---| | |
| | LLM routing | 15 models via Google AI Studio, Groq, OpenRouter (all free tier) | | |
| | Embeddings | `google/embeddinggemma-300m` (HuggingFace, 768-dim) | | |
| | Query transform | Multi-query rewriting (LLM paraphrases β RRF) | | |
| | Retrieval | Hybrid (dense + BM25) β RRF fusion β cross-encoder rerank | | |
| | Reranker | `BAAI/bge-reranker-v2-m3` (multilingual cross-encoder) | | |
| | Guardrail | Corrective RAG β cosine-gated abstention on out-of-corpus queries | | |
| | Evaluation | RAGAS metrics (faithfulness, relevancy, context precision/recall) | | |
| | RAG Framework | LangChain LCEL (no chains, direct composition) | | |
| | UI | Gradio 6 | | |
| | Deployment | HuggingFace Spaces | | |
| --- | |
| ## Retrieval Architecture | |
| ``` | |
| Question | |
| β | |
| ββ Query rewriting (LLM β original + paraphrases) ββ | |
| β each variant β β | |
| ββ Dense retrieval (EmbeddingGemma-300M β ChromaDB cosine) ββ RRF fusion β top-20 pool | |
| ββ Sparse retrieval (BM25 / rank-bm25) ββ | |
| β | |
| ββ Cross-encoder rerank (BGE-reranker-v2-m3) β top-6 | |
| β | |
| ββ Corrective gate (cosine < threshold β abstain) | |
| β | |
| ββ LLM answer (grounded + cited from top-6 chunks) | |
| ``` | |
| The pattern follows modern production RAG: cheap recall first (multi-query hybrid), | |
| a precise cross-encoder rerank of the small pool, and an abstention gate so | |
| out-of-corpus questions get an honest "I don't know" instead of a hallucination. | |
| ## Evaluation | |
| The pipeline is **measured, not assumed**. [`evaluate.py`](evaluate.py) generates | |
| answers for a curated question set with reference answers across a **3-stage ablation** | |
| (baseline β + reranker β + query rewrite), then an LLM judge scores four | |
| [RAGAS](https://docs.ragas.io) metrics. Results render live in the app's | |
| **π Evaluation** tab; full analysis in the | |
| [evaluation notebook](notebooks/rag_evaluation.ipynb). | |
| ### Measuring each component (12 questions, LLM-as-judge) | |
| | Metric | Baseline (Hybrid) | + Reranker | + Query Rewrite | Ξ (full) | | |
| |---|:---:|:---:|:---:|:---:| | |
| | **Faithfulness** | 0.40 | 0.44 | 0.43 | +0.03 | | |
| | **Answer Relevancy** | 0.94 | 0.90 | 0.91 | β0.03 | | |
| | **Context Precision** | 1.00 | 1.00 | 0.99 | β0.01 | | |
| | **Context Recall** | 0.38 | 0.51 | 0.43 | +0.06 | | |
| **The reranker is the clear win** β Context Recall jumps +0.14 (0.38 β 0.51) and | |
| Faithfulness rises, with no real cost elsewhere. **Query rewriting did *not* help | |
| this corpus** β it slightly *reduced* recall (0.51 β 0.43) and relevancy. That is the | |
| point of measuring: the data says ship the reranker and treat query rewriting as | |
| corpus-dependent rather than assuming more components are always better. Two-phase | |
| eval (generation, then judging) keeps it reproducible: | |
| ```bash | |
| pip install -r requirements.txt | |
| pip install --no-deps ragas && pip install -r requirements-eval.txt | |
| python evaluate.py --generate # phase A: real retrieval + generation β eval_samples.json | |
| # phase B: an LLM judge scores eval_samples.json β eval_results.json | |
| ``` | |
| --- | |
| ## Local Setup | |
| ### 1. Clone and install | |
| ```bash | |
| git clone https://github.com/Fikri645/philosopher-chat | |
| cd philosopher-chat | |
| pip install -r requirements.txt | |
| ``` | |
| ### 2. Set up API keys | |
| ```bash | |
| # Create .env with your keys: | |
| GOOGLE_API_KEY=... # https://ai.google.dev (free) | |
| GROQ_API_KEY=... # https://console.groq.com (free) | |
| OPENROUTER_API_KEY=... # https://openrouter.ai (free) | |
| HF_TOKEN=... # https://huggingface.co/settings/tokens (for gated EmbeddingGemma) | |
| ``` | |
| ### 3. Build the vectorstore (run once) | |
| ```bash | |
| python ingest.py | |
| ``` | |
| Downloads 12 texts from Project Gutenberg, chunks them, embeds with EmbeddingGemma-300M, | |
| and persists to `vectorstore/`. Takes ~5β10 min on first run (model download + embedding). | |
| ### 4. Run the app | |
| ```bash | |
| python app.py | |
| ``` | |
| Open http://localhost:7860 in your browser. | |
| --- | |
| ## Deploying to HuggingFace Spaces | |
| 1. Fork or push to a new Space (SDK: **Gradio**) | |
| 2. In **Space Settings β Variables and Secrets**, add: | |
| - `GOOGLE_API_KEY` | |
| - `GROQ_API_KEY` | |
| - `OPENROUTER_API_KEY` | |
| - `HF_TOKEN` (your HF token β needed to download the gated EmbeddingGemma model) | |
| 3. On first boot the Space auto-ingests all 12 texts (~10 min); subsequent boots load the cached vectorstore. | |
| --- | |
| ## Project Structure | |
| ``` | |
| philosopher-chat/ | |
| βββ app.py β Gradio UI + event handlers | |
| βββ rag_chain.py β LangChain RAG pipeline (retrieval + LLM routing) | |
| βββ ingest.py β Data ingestion from Project Gutenberg | |
| βββ config.py β LLM options, embedding model, RAG parameters | |
| βββ requirements.txt | |
| βββ .gitignore | |
| βββ README.md | |
| ``` | |