Spaces:
Sleeping
Sleeping
| title: AdaptiveRAG | |
| sdk: docker | |
| pinned: true | |
| license: mit | |
| short_description: Agentic + Self-RAG + Modular RAG with visual pipeline UI | |
| <div align="center"> | |
| # π AdaptiveRAG | |
| ### Production-grade RAG combining Modular Β· Self-RAG Β· Agentic patterns | |
| [](https://huggingface.co/spaces/NoobNovel/AdaptiveRAG) | |
| [](https://github.com/Gh-Novel/AdaptiveRAG) | |
| [](https://python.org) | |
| [](https://streamlit.io) | |
| **Every stage of the pipeline is visualized live β from raw text to grounded answer with citations.** | |
| [π Try Live Demo](https://huggingface.co/spaces/NoobNovel/AdaptiveRAG) Β· [π» Run Locally](#run-locally) | |
| </div> | |
| --- | |
| ## π¬ Demo | |
| <!-- Replace the URL below with your actual demo video link --> | |
| > πΉ **[Watch full pipeline demo β](https://your-video-link-here)** | |
| *Shows: question encoding β Self-RAG routing β hybrid retrieval β 2D vector space β self-critique* | |
| --- | |
| ## π§ What makes this different | |
| Most RAG demos do: `embed query β cosine search β stuff into prompt`. This does: | |
| ``` | |
| User question | |
| β embed (MiniLM-L6 β 384-dim vector) | |
| β Self-RAG router β RETRIEVE / ANSWER_DIRECTLY / CLARIFY | |
| β Planner β break into focused sub-queries | |
| β Dense retrieval β ChromaDB cosine similarity (k=12) | |
| β Sparse retrieval β BM25 keyword matching (k=12) | |
| β RRF fusion β Reciprocal Rank Fusion merge | |
| β Cross-encoder β BGE reranker deep relevance scoring (top 5) | |
| β LLM answer β Qwen3-VL (local) / LLaMA 3.1 via Groq (hosted) | |
| β Self-critique β grounded? complete? confidence score | |
| β Refine & retry β if confidence < 0.85 | |
| β Answer + citations + trace | |
| ``` | |
| --- | |
| ## π¬ Underhood Pipeline View | |
| Every step renders its inputs and outputs **as it runs**: | |
| | Step | What you see | | |
| |------|-------------| | |
| | **1 Β· Question encoding** | Embedding model Β· 384 dimensions Β· L2 norm Β· latency Β· first-32-dim bar chart Β· raw `vector[0:8]` values | | |
| | **2 Β· Self-RAG router** | Color-coded decision pill (`RETRIEVE` / `ANSWER_DIRECTLY` / `CLARIFY`) + LLM reasoning | | |
| | **3 Β· Planner** | Sub-query cards with rationale for each step | | |
| | **4 Β· Dense retrieval** | Cosine similarity bar chart + chunk cards with scores | | |
| | **4 Β· Sparse retrieval** | BM25 normalized score chart + chunk cards | | |
| | **4 Β· RRF fusion** | Merged ranking chart showing how both lists combine | | |
| | **4 Β· Cross-encoder rerank** | BGE relevance score chart (final top-5) | | |
| | **4 Β· Vector space** | 2D PCA scatter β query vs all hits, colored by source (dense / sparse / both) | | |
| | **5 Β· Context assembly** | Exact passages handed to the LLM, with metadata | | |
| | **6 Β· Self-critique** | Grounded β Β· Complete β Β· Confidence score vs threshold | | |
| --- | |
| ## ποΈ Knowledge Base | |
| 14 foundational AI papers pre-indexed as **1,934 semantic chunks**: | |
| | Category | Papers | | |
| |----------|--------| | |
| | Transformers | Attention Is All You Need Β· BERT Β· GPT-3 | | |
| | Diffusion | DDPM Β· DDIM | | |
| | RAG | RAG Original Β· RAG Survey Β· Self-RAG Β· HyDE | | |
| | Vision | ViT Β· CLIP | | |
| | Agents | ReAct Β· Chain-of-Thought | | |
| | LLMs | LLM Survey | | |
| --- | |
| ## βοΈ Tech Stack | |
| | Component | Tool | Why | | |
| |-----------|------|-----| | |
| | Vector DB | ChromaDB (local) | No API cost, persistent | | |
| | Dense embeddings | `all-MiniLM-L6-v2` | Fast, 384-dim, normalized | | |
| | Sparse retrieval | `rank-bm25` (BM25Okapi) | Keyword precision | | |
| | Fusion | Reciprocal Rank Fusion | Combines rankings without score normalization | | |
| | Reranker | `BAAI/bge-reranker-base` | Cross-encoder, deep relevance scoring | | |
| | LLM (local) | Qwen3-VL 8B via Ollama | Vision-language, runs on Apple Silicon | | |
| | LLM (hosted) | LLaMA 3.1 8B via Groq | Free API, fast inference | | |
| | UI | Streamlit | Fast to build, easy to demo | | |
| --- | |
| ## π Run Locally | |
| ```bash | |
| git clone https://github.com/Gh-Novel/AdaptiveRAG | |
| cd AdaptiveRAG | |
| python -m venv .venv && source .venv/bin/activate | |
| pip install -r requirements.txt | |
| # start Ollama with the vision-language model | |
| ollama serve | |
| ollama pull qwen3-vl:8b-instruct-q8_0-optimized | |
| streamlit run app.py | |
| ``` | |
| Or use the CLI: | |
| ```bash | |
| .venv/bin/python ask.py "How does Self-RAG decide when to retrieve?" | |
| ``` | |
| --- | |
| ## βοΈ Hosted on Hugging Face | |
| The live demo runs on HF Spaces (CPU free tier) with **Groq API** handling LLM calls. | |
| - Embedding + retrieval + reranking run locally inside the container (MiniLM + BGE) | |
| - `GROQ_API_KEY` secret drives routing, planning, answering, and self-critique | |
| - Pre-built index (1,934 chunks, ~59 MB) is committed via git-lfs β no ingestion on startup | |
| **[π€ Open Live Demo](https://huggingface.co/spaces/NoobNovel/AdaptiveRAG)** | |