AdaptiveRAG / README.md
NoobNovel's picture
Improve README: badges, pipeline diagram, full tech stack table, HF demo link
a33ce9e
---
title: AdaptiveRAG
sdk: docker
pinned: true
license: mit
short_description: Agentic + Self-RAG + Modular RAG with visual pipeline UI
---
<div align="center">
# πŸ“š AdaptiveRAG
### Production-grade RAG combining Modular Β· Self-RAG Β· Agentic patterns
[![HF Space](https://img.shields.io/badge/πŸ€—%20Hugging%20Face-Live%20Demo-blue)](https://huggingface.co/spaces/NoobNovel/AdaptiveRAG)
[![GitHub](https://img.shields.io/badge/GitHub-Gh--Novel%2FAdaptiveRAG-black?logo=github)](https://github.com/Gh-Novel/AdaptiveRAG)
[![Python](https://img.shields.io/badge/Python-3.11-blue?logo=python)](https://python.org)
[![Streamlit](https://img.shields.io/badge/UI-Streamlit-red?logo=streamlit)](https://streamlit.io)
**Every stage of the pipeline is visualized live β€” from raw text to grounded answer with citations.**
[πŸš€ Try Live Demo](https://huggingface.co/spaces/NoobNovel/AdaptiveRAG) Β· [πŸ’» Run Locally](#run-locally)
</div>
---
## 🎬 Demo
<!-- Replace the URL below with your actual demo video link -->
> πŸ“Ή **[Watch full pipeline demo β†’](https://your-video-link-here)**
*Shows: question encoding β†’ Self-RAG routing β†’ hybrid retrieval β†’ 2D vector space β†’ self-critique*
---
## 🧠 What makes this different
Most RAG demos do: `embed query β†’ cosine search β†’ stuff into prompt`. This does:
```
User question
↓ embed (MiniLM-L6 β†’ 384-dim vector)
↓ Self-RAG router β†’ RETRIEVE / ANSWER_DIRECTLY / CLARIFY
↓ Planner β†’ break into focused sub-queries
↓ Dense retrieval β†’ ChromaDB cosine similarity (k=12)
↓ Sparse retrieval β†’ BM25 keyword matching (k=12)
↓ RRF fusion β†’ Reciprocal Rank Fusion merge
↓ Cross-encoder β†’ BGE reranker deep relevance scoring (top 5)
↓ LLM answer β†’ Qwen3-VL (local) / LLaMA 3.1 via Groq (hosted)
↓ Self-critique β†’ grounded? complete? confidence score
↓ Refine & retry β†’ if confidence < 0.85
β†’ Answer + citations + trace
```
---
## πŸ”¬ Underhood Pipeline View
Every step renders its inputs and outputs **as it runs**:
| Step | What you see |
|------|-------------|
| **1 Β· Question encoding** | Embedding model Β· 384 dimensions Β· L2 norm Β· latency Β· first-32-dim bar chart Β· raw `vector[0:8]` values |
| **2 Β· Self-RAG router** | Color-coded decision pill (`RETRIEVE` / `ANSWER_DIRECTLY` / `CLARIFY`) + LLM reasoning |
| **3 Β· Planner** | Sub-query cards with rationale for each step |
| **4 Β· Dense retrieval** | Cosine similarity bar chart + chunk cards with scores |
| **4 Β· Sparse retrieval** | BM25 normalized score chart + chunk cards |
| **4 Β· RRF fusion** | Merged ranking chart showing how both lists combine |
| **4 Β· Cross-encoder rerank** | BGE relevance score chart (final top-5) |
| **4 Β· Vector space** | 2D PCA scatter β€” query vs all hits, colored by source (dense / sparse / both) |
| **5 Β· Context assembly** | Exact passages handed to the LLM, with metadata |
| **6 Β· Self-critique** | Grounded βœ… Β· Complete βœ… Β· Confidence score vs threshold |
---
## πŸ—‚οΈ Knowledge Base
14 foundational AI papers pre-indexed as **1,934 semantic chunks**:
| Category | Papers |
|----------|--------|
| Transformers | Attention Is All You Need Β· BERT Β· GPT-3 |
| Diffusion | DDPM Β· DDIM |
| RAG | RAG Original Β· RAG Survey Β· Self-RAG Β· HyDE |
| Vision | ViT Β· CLIP |
| Agents | ReAct Β· Chain-of-Thought |
| LLMs | LLM Survey |
---
## βš™οΈ Tech Stack
| Component | Tool | Why |
|-----------|------|-----|
| Vector DB | ChromaDB (local) | No API cost, persistent |
| Dense embeddings | `all-MiniLM-L6-v2` | Fast, 384-dim, normalized |
| Sparse retrieval | `rank-bm25` (BM25Okapi) | Keyword precision |
| Fusion | Reciprocal Rank Fusion | Combines rankings without score normalization |
| Reranker | `BAAI/bge-reranker-base` | Cross-encoder, deep relevance scoring |
| LLM (local) | Qwen3-VL 8B via Ollama | Vision-language, runs on Apple Silicon |
| LLM (hosted) | LLaMA 3.1 8B via Groq | Free API, fast inference |
| UI | Streamlit | Fast to build, easy to demo |
---
## πŸš€ Run Locally
```bash
git clone https://github.com/Gh-Novel/AdaptiveRAG
cd AdaptiveRAG
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
# start Ollama with the vision-language model
ollama serve
ollama pull qwen3-vl:8b-instruct-q8_0-optimized
streamlit run app.py
```
Or use the CLI:
```bash
.venv/bin/python ask.py "How does Self-RAG decide when to retrieve?"
```
---
## ☁️ Hosted on Hugging Face
The live demo runs on HF Spaces (CPU free tier) with **Groq API** handling LLM calls.
- Embedding + retrieval + reranking run locally inside the container (MiniLM + BGE)
- `GROQ_API_KEY` secret drives routing, planning, answering, and self-critique
- Pre-built index (1,934 chunks, ~59 MB) is committed via git-lfs β€” no ingestion on startup
**[πŸ€— Open Live Demo](https://huggingface.co/spaces/NoobNovel/AdaptiveRAG)**