Spaces:

NoobNovel
/

AdaptiveRAG

Sleeping

NoobNovel commited on Apr 30

Commit

e0670a4

1 Parent(s): ddcb7f3

AdaptiveRAG: Agentic + Self-RAG + Modular RAG pipeline with visual UI

- Semantic chunking + dual embedding (dense MiniLM + sparse BM25)
- Hybrid retrieval: Chroma cosine + BM25 fused via RRF + BGE cross-encoder rerank
- Self-RAG router (RETRIEVE / ANSWER_DIRECTLY / CLARIFY)
- Agentic loop: plan -> retrieve -> answer -> self-critique -> refine
- Streamlit UI exposing every pipeline stage with 2D vector space scatter plot
- Pre-built index: 1934 chunks across 14 foundational AI papers (git-lfs)
- Auto-switches backend: Ollama locally, Groq API when GROQ_API_KEY is set

Files changed (44) hide show

.gitattributes +2 -0
.gitignore +15 -0
.streamlit/config.toml +5 -0
README.md +38 -6
agent/__init__.py +0 -0
agent/critic.py +65 -0
agent/loop.py +198 -0
agent/planner.py +52 -0
agent/router.py +39 -0
agent/tools.py +85 -0
app.py +554 -0
ask.py +32 -0
config.py +57 -0
download_papers.sh +28 -0
ingest.py +43 -0
ingestion/__init__.py +0 -0
ingestion/chunker.py +104 -0
ingestion/embedder.py +32 -0
ingestion/indexer.py +134 -0
ingestion/loader.py +87 -0
llm/__init__.py +0 -0
llm/client_factory.py +16 -0
llm/groq_client.py +133 -0
llm/ollama_client.py +116 -0
requirements.txt +9 -0
retrieval/__init__.py +0 -0
retrieval/dense.py +39 -0
retrieval/hybrid.py +47 -0
retrieval/pipeline.py +12 -0
retrieval/reranker.py +36 -0
retrieval/sparse.py +44 -0
storage/bm25.pkl +3 -0
storage/chroma/bccd7ca5-4f87-4c9e-a569-6cf0dcdced21/data_level0.bin +3 -0
storage/chroma/bccd7ca5-4f87-4c9e-a569-6cf0dcdced21/header.bin +3 -0
storage/chroma/bccd7ca5-4f87-4c9e-a569-6cf0dcdced21/index_metadata.pickle +3 -0
storage/chroma/bccd7ca5-4f87-4c9e-a569-6cf0dcdced21/length.bin +3 -0
storage/chroma/bccd7ca5-4f87-4c9e-a569-6cf0dcdced21/link_lists.bin +3 -0
storage/chroma/chroma.sqlite3 +3 -0
storage/chroma/d7228068-4c70-4b64-a819-d7dbd7d28b63/data_level0.bin +3 -0
storage/chroma/d7228068-4c70-4b64-a819-d7dbd7d28b63/header.bin +3 -0
storage/chroma/d7228068-4c70-4b64-a819-d7dbd7d28b63/index_metadata.pickle +3 -0
storage/chroma/d7228068-4c70-4b64-a819-d7dbd7d28b63/length.bin +3 -0
storage/chroma/d7228068-4c70-4b64-a819-d7dbd7d28b63/link_lists.bin +3 -0
storage/manifest.json +19 -0

.gitattributes CHANGED Viewed

@@ -1,3 +1,4 @@
 *.7z filter=lfs diff=lfs merge=lfs -text
 *.arrow filter=lfs diff=lfs merge=lfs -text
 *.bin filter=lfs diff=lfs merge=lfs -text
@@ -33,3 +34,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

+# HF standard LFS patterns
 *.7z filter=lfs diff=lfs merge=lfs -text
 *.arrow filter=lfs diff=lfs merge=lfs -text
 *.bin filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+storage/chroma/chroma.sqlite3 filter=lfs diff=lfs merge=lfs -text

.gitignore ADDED Viewed

	@@ -0,0 +1,15 @@

+# Python
+.venv/
+__pycache__/
+*.py[cod]
+*.egg-info/
+.env
+# macOS
+.DS_Store
+# Papers — excluded (public ArXiv PDFs; re-download with download_papers.sh)
+papers/
+# Streamlit
+.streamlit/secrets.toml

.streamlit/config.toml ADDED Viewed

	@@ -0,0 +1,5 @@

+[browser]
+gatherUsageStats = false
+[server]
+headless = true

README.md CHANGED Viewed

@@ -1,12 +1,44 @@
 ---
 title: AdaptiveRAG
-emoji: 📊
-colorFrom: indigo
-colorTo: gray
 sdk: docker
-pinned: false
 license: mit
-short_description: production-grade RAG using Modular, Self-RAG,  Agentic
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
 title: AdaptiveRAG
+emoji: 📚
+colorFrom: blue
+colorTo: purple
 sdk: docker
+pinned: true
 license: mit
+short_description: Agentic + Self-RAG + Modular RAG with visual pipeline UI
 ---
+# AdaptiveRAG — Agentic + Self-RAG + Modular RAG
+Live demo of a production-grade RAG pipeline — every stage is visible in the UI.
+**Tech stack:** ChromaDB · sentence-transformers · BM25 · Reciprocal Rank Fusion · BGE cross-encoder · LLaMA 3.1 via Groq
+**Knowledge base:** 14 foundational AI papers (Transformers, BERT, GPT-3, DDPM, RAG, Self-RAG, HyDE, ViT, CLIP, ReAct, Chain-of-Thought, LLM Survey)
+## What it shows
+| Stage | What you see |
+|---|---|
+| Question encoding | 384-dim embedding vector + bar chart of first 32 dims |
+| Self-RAG router | RETRIEVE / ANSWER_DIRECTLY / CLARIFY decision + reason |
+| Planner | Sub-query decomposition with rationales |
+| Dense retrieval | Cosine similarity scores vs ChromaDB |
+| Sparse retrieval | BM25 keyword match scores |
+| RRF fusion | Combined ranking chart |
+| Cross-encoder rerank | BGE relevance scores |
+| Vector space | 2D PCA projection of query + hits |
+| Self-critique | Grounded / Complete / Confidence score |
+## Run locally
+```bash
+git clone https://github.com/Gh-Novel/AdaptiveRAG
+cd AdaptiveRAG
+python -m venv .venv && source .venv/bin/activate
+pip install -r requirements.txt
+# needs Ollama running with qwen3-vl:8b-instruct-q8_0-optimized
+streamlit run app.py
+```

agent/__init__.py ADDED Viewed

File without changes

agent/critic.py ADDED Viewed

	@@ -0,0 +1,65 @@

+"""Self-critique: judge answer for grounding, completeness, confidence."""
+from __future__ import annotations
+from llm.ollama_client import OllamaClient
+CRITIC_SYSTEM = (
+    "You are a strict reviewer. Judge whether an AI-generated answer is grounded "
+    "in the provided context and whether it fully answers the user's question."
+)
+CRITIC_PROMPT = """Question: {question}
+Answer to review:
+{answer}
+Context that was provided to the answerer:
+{context}
+Score the answer. Return strict JSON only:
+{{
+  "grounded": true | false,            // Is every factual claim supported by the context?
+  "complete": true | false,            // Does it fully address the question?
+  "confidence": 0.0-1.0,               // Overall confidence in the answer
+  "missing": "<what info is missing or weakly supported, or empty string>"
+}}
+"""
+def critique(question: str, answer: str, context: str, llm: OllamaClient | None = None) -> dict:
+    llm = llm or OllamaClient()
+    out = llm.generate_json(
+        prompt=CRITIC_PROMPT.format(question=question, answer=answer, context=context),
+        system=CRITIC_SYSTEM,
+        temperature=0.0,
+    )
+    return {
+        "grounded": bool(out.get("grounded", False)),
+        "complete": bool(out.get("complete", False)),
+        "confidence": float(out.get("confidence", 0.0) or 0.0),
+        "missing": str(out.get("missing", "") or ""),
+    }
+REFINE_SYSTEM = (
+    "You rewrite a search query so it retrieves the missing information."
+)
+REFINE_PROMPT = """Original question: {question}
+A previous attempt was missing the following information:
+{missing}
+Rewrite the query to specifically target the missing information. Output the
+rewritten search query as a single line of text, no quotes, no explanation.
+"""
+def refine_query(question: str, missing: str, llm: OllamaClient | None = None) -> str:
+    llm = llm or OllamaClient()
+    out = llm.generate(
+        prompt=REFINE_PROMPT.format(question=question, missing=missing or "more detail"),
+        system=REFINE_SYSTEM,
+        temperature=0.1,
+    )
+    return out.strip().splitlines()[0] if out.strip() else question

agent/loop.py ADDED Viewed

	@@ -0,0 +1,198 @@

+"""The agentic RAG loop.
+Pipeline:
+  1. Self-RAG router: RETRIEVE / ANSWER_DIRECTLY / CLARIFY
+  2. (RETRIEVE branch) plan -> tools -> answer -> self-critique
+  3. If confidence < threshold and budget left: refine and retry
+"""
+from __future__ import annotations
+from dataclasses import dataclass, field
+from typing import Any
+from config import AGENT_CONFIG
+from agent.critic import critique, refine_query
+from agent.planner import plan
+from agent.router import route
+from agent.tools import ToolResult, vector_search
+from llm.ollama_client import OllamaClient
+from retrieval.dense import Hit
+@dataclass
+class TraceStep:
+    kind: str
+    detail: dict[str, Any] = field(default_factory=dict)
+@dataclass
+class AgentResult:
+    answer: str
+    citations: list[dict]
+    confidence: float
+    trace: list[TraceStep]
+    iterations: int
+    route: str
+ANSWER_SYSTEM = (
+    "You are a careful research assistant. Use ONLY the provided passages to "
+    "answer the question. Cite sources inline with [N] where N is the passage "
+    "number. If the passages are insufficient, say so explicitly."
+)
+ANSWER_PROMPT = """Question: {question}
+Passages:
+{context}
+Write a concise, well-grounded answer. Use inline citations like [1], [2] that
+match the passage numbers above. If multiple passages support a claim, cite
+them all. If the passages do not contain enough information, say so plainly.
+"""
+def _format_context_block(hits: list[Hit]) -> tuple[str, list[dict]]:
+    lines = []
+    citations = []
+    for i, h in enumerate(hits, start=1):
+        meta = h.metadata
+        title = meta.get("title") or meta.get("source_path", "?")
+        pages = f"p.{meta.get('page_start')}-{meta.get('page_end')}"
+        head = f"[{i}] {title} ({pages})"
+        lines.append(f"{head}\n{h.text}")
+        citations.append(
+            {
+                "n": i,
+                "chunk_id": h.chunk_id,
+                "title": title,
+                "source_path": meta.get("source_path"),
+                "page_start": meta.get("page_start"),
+                "page_end": meta.get("page_end"),
+                "score": float(h.score),
+            }
+        )
+    return "\n\n".join(lines), citations
+def _dedupe_hits(hits: list[Hit], limit: int) -> list[Hit]:
+    seen: set[str] = set()
+    out: list[Hit] = []
+    for h in hits:
+        if h.chunk_id in seen:
+            continue
+        seen.add(h.chunk_id)
+        out.append(h)
+        if len(out) >= limit:
+            break
+    return out
+def run_agent(query: str, llm: OllamaClient | None = None) -> AgentResult:
+    llm = llm or OllamaClient()
+    trace: list[TraceStep] = []
+    # 1. Router
+    decision = route(query, llm=llm)
+    trace.append(TraceStep("router", decision))
+    if decision["action"] == "ANSWER_DIRECTLY":
+        ans = llm.generate(
+            prompt=query,
+            system="You are a helpful research assistant. Be concise.",
+            temperature=0.2,
+        )
+        return AgentResult(
+            answer=ans,
+            citations=[],
+            confidence=1.0,
+            trace=trace,
+            iterations=0,
+            route="ANSWER_DIRECTLY",
+        )
+    if decision["action"] == "CLARIFY":
+        ans = llm.generate(
+            prompt=(
+                "The user asked: " + query + "\n\n"
+                "It is too ambiguous to answer well. Ask one short clarifying "
+                "question to narrow it down."
+            ),
+            system="You are a helpful research assistant.",
+            temperature=0.2,
+        )
+        return AgentResult(
+            answer=ans,
+            citations=[],
+            confidence=0.0,
+            trace=trace,
+            iterations=0,
+            route="CLARIFY",
+        )
+    # 2. RETRIEVE branch — agentic loop
+    current_query = query
+    last_critique: dict[str, Any] = {}
+    accumulated: list[Hit] = []
+    for iteration in range(AGENT_CONFIG["max_iterations"]):
+        prior_summary = ""
+        if accumulated:
+            titles = sorted({h.metadata.get("title", "?") for h in accumulated})
+            prior_summary = "Already gathered passages from: " + ", ".join(titles)
+        steps = plan(current_query, prior_summary=prior_summary, llm=llm)
+        trace.append(TraceStep("plan", {"iteration": iteration, "steps": steps}))
+        for step in steps:
+            tool_res: ToolResult = vector_search(step["query"])
+            accumulated.extend(tool_res.hits)
+            trace.append(
+                TraceStep(
+                    "tool",
+                    {
+                        "tool": "vector_search",
+                        "query": step["query"],
+                        "n_hits": len(tool_res.hits),
+                        "top_titles": [
+                            h.metadata.get("title") for h in tool_res.hits[:3]
+                        ],
+                    },
+                )
+            )
+        unique_hits = _dedupe_hits(accumulated, limit=8)
+        context_block, citations = _format_context_block(unique_hits)
+        answer = llm.generate(
+            prompt=ANSWER_PROMPT.format(question=query, context=context_block),
+            system=ANSWER_SYSTEM,
+            temperature=0.1,
+        )
+        trace.append(TraceStep("answer", {"iteration": iteration, "n_passages": len(unique_hits)}))
+        crit = critique(query, answer, context_block, llm=llm)
+        last_critique = crit
+        trace.append(TraceStep("critique", {"iteration": iteration, **crit}))
+        if crit["confidence"] >= AGENT_CONFIG["confidence_threshold"] and crit["grounded"]:
+            return AgentResult(
+                answer=answer,
+                citations=citations,
+                confidence=crit["confidence"],
+                trace=trace,
+                iterations=iteration + 1,
+                route="RETRIEVE",
+            )
+        current_query = refine_query(query, crit.get("missing", ""), llm=llm)
+        trace.append(TraceStep("refine", {"new_query": current_query}))
+    return AgentResult(
+        answer=answer,
+        citations=citations,
+        confidence=last_critique.get("confidence", 0.0),
+        trace=trace,
+        iterations=AGENT_CONFIG["max_iterations"],
+        route="RETRIEVE",
+    )

agent/planner.py ADDED Viewed

	@@ -0,0 +1,52 @@

+"""Multi-step query planner. Break a question into focused sub-queries."""
+from __future__ import annotations
+from config import AGENT_CONFIG
+from llm.ollama_client import OllamaClient
+PLANNER_SYSTEM = (
+    "You are a research planner. Given a user question, decompose it into a small "
+    "number of focused sub-queries. Each sub-query targets one piece of information "
+    "needed to answer the original question. Avoid redundant or overly broad steps."
+)
+PLANNER_PROMPT = """Decompose the user question into 1-{max_steps} focused retrieval sub-queries.
+Use fewer steps when the question is simple; only use multiple steps for genuinely
+multi-part or comparative questions.
+Each sub-query should be a self-contained search query (10-20 words) phrased to
+match passages in academic papers.
+Respond with strict JSON only:
+{{"steps": [
+  {{"query": "<search query>", "rationale": "<what this sub-query is looking for>"}}
+]}}
+User question: {query}
+Context already gathered (may be empty):
+{context_summary}
+"""
+def plan(query: str, prior_summary: str = "", llm: OllamaClient | None = None) -> list[dict]:
+    llm = llm or OllamaClient()
+    out = llm.generate_json(
+        prompt=PLANNER_PROMPT.format(
+            query=query,
+            max_steps=AGENT_CONFIG["max_plan_steps"],
+            context_summary=prior_summary or "(none)",
+        ),
+        system=PLANNER_SYSTEM,
+        temperature=0.1,
+    )
+    steps = out.get("steps") if isinstance(out, dict) else None
+    if not steps or not isinstance(steps, list):
+        return [{"query": query, "rationale": "fallback: use the original question"}]
+    cleaned: list[dict] = []
+    for s in steps[: AGENT_CONFIG["max_plan_steps"]]:
+        if isinstance(s, dict) and s.get("query"):
+            cleaned.append(
+                {"query": str(s["query"]).strip(), "rationale": str(s.get("rationale", "")).strip()}
+            )
+    return cleaned or [{"query": query, "rationale": "fallback"}]

agent/router.py ADDED Viewed

	@@ -0,0 +1,39 @@

+"""Self-RAG router. Decide whether to retrieve, answer directly, or clarify."""
+from __future__ import annotations
+from llm.ollama_client import OllamaClient
+ROUTER_SYSTEM = (
+    "You are a routing classifier for an AI research assistant whose knowledge base "
+    "contains papers on Transformers, BERT, GPT-3, diffusion (DDPM/DDIM), RAG, "
+    "Self-RAG, HyDE, ViT, CLIP, ReAct, Chain-of-Thought, and an LLM survey. "
+    "Decide how to handle a user query."
+)
+ROUTER_PROMPT = """Classify the query into one of three actions:
+- "RETRIEVE": the user is asking about substantive content (concepts, methods, comparisons,
+  details from papers). The knowledge base is likely needed. Default to this when unsure.
+- "ANSWER_DIRECTLY": pure conversational/meta queries (greetings, "what can you do",
+  "thanks") that need NO knowledge lookup.
+- "CLARIFY": the query is too ambiguous or under-specified to act on (e.g. "tell me more"
+  with no prior context, "what about that paper" with no referent).
+Respond with strict JSON only:
+{{"action": "RETRIEVE" | "ANSWER_DIRECTLY" | "CLARIFY", "reason": "<one short sentence>"}}
+Query: {query}
+"""
+def route(query: str, llm: OllamaClient | None = None) -> dict:
+    llm = llm or OllamaClient()
+    out = llm.generate_json(
+        prompt=ROUTER_PROMPT.format(query=query),
+        system=ROUTER_SYSTEM,
+        temperature=0.0,
+    )
+    action = str(out.get("action", "RETRIEVE")).upper()
+    if action not in {"RETRIEVE", "ANSWER_DIRECTLY", "CLARIFY"}:
+        action = "RETRIEVE"
+    return {"action": action, "reason": out.get("reason", "")}

agent/tools.py ADDED Viewed

	@@ -0,0 +1,85 @@

+"""Agent tools.
+The vector_search tool drives the hybrid retriever. image_reason performs
+multimodal RAG: caption the image, retrieve text by caption+query, then ask
+Qwen3-VL to ground its answer in both image and text.
+"""
+from __future__ import annotations
+from dataclasses import dataclass
+from llm.ollama_client import OllamaClient
+from retrieval.dense import Hit
+from retrieval.pipeline import hybrid_retrieve
+@dataclass
+class ToolResult:
+    tool: str
+    query: str
+    hits: list[Hit]
+    notes: str = ""
+def vector_search(query: str, top_n: int | None = None) -> ToolResult:
+    hits = hybrid_retrieve(query, top_n=top_n)
+    return ToolResult(tool="vector_search", query=query, hits=hits)
+CAPTION_SYSTEM = "You describe images in concise, factual language."
+CAPTION_PROMPT = (
+    "Describe this image in 1-3 sentences. Mention the type of figure (chart, diagram, "
+    "screenshot, photo, equation, etc.), key labels, and the main visual content."
+)
+def caption_image(image_path: str, llm: OllamaClient | None = None) -> str:
+    llm = llm or OllamaClient()
+    return llm.generate(
+        prompt=CAPTION_PROMPT,
+        system=CAPTION_SYSTEM,
+        images=[image_path],
+        temperature=0.0,
+    )
+MM_SYSTEM = (
+    "You are a careful research assistant. Answer using ONLY the provided image and "
+    "the cited text passages. If the answer is not supported, say so."
+)
+MM_PROMPT = """Image (provided separately) + question.
+Question: {question}
+Relevant passages:
+{context}
+Answer concisely. When citing a passage, use [N] where N is the passage number.
+"""
+def image_retrieve_and_reason(
+    image_path: str, query: str, llm: OllamaClient | None = None
+) -> dict:
+    llm = llm or OllamaClient()
+    caption = caption_image(image_path, llm=llm)
+    fused_query = f"{caption} {query}".strip()
+    hits = hybrid_retrieve(fused_query)
+    context_block = _format_context(hits)
+    answer = llm.generate(
+        prompt=MM_PROMPT.format(question=query, context=context_block),
+        system=MM_SYSTEM,
+        images=[image_path],
+        temperature=0.1,
+    )
+    return {"caption": caption, "answer": answer, "hits": hits}
+def _format_context(hits: list[Hit]) -> str:
+    lines = []
+    for i, h in enumerate(hits, start=1):
+        meta = h.metadata
+        head = f"[{i}] {meta.get('title', meta.get('source_path', '?'))} "
+        head += f"(p.{meta.get('page_start')}-{meta.get('page_end')})"
+        lines.append(f"{head}\n{h.text}")
+    return "\n\n".join(lines)

app.py ADDED Viewed

	@@ -0,0 +1,554 @@

+"""AdaptiveRAG — under-the-hood pipeline visualizer.
+Run: streamlit run app.py
+"""
+from __future__ import annotations
+import json
+import logging
+import os
+import tempfile
+import time
+from pathlib import Path
+os.environ.setdefault("ANONYMIZED_TELEMETRY", "False")
+# suppress harmless noise from Streamlit's torch inspector + ChromaDB posthog client
+logging.getLogger("streamlit.watcher.local_sources_watcher").setLevel(logging.ERROR)
+logging.getLogger("chromadb.telemetry.product.posthog").setLevel(logging.ERROR)
+import numpy as np
+import pandas as pd
+import streamlit as st
+from agent.critic import critique, refine_query
+from agent.planner import plan
+from agent.router import route
+from agent.tools import image_retrieve_and_reason
+from config import AGENT_CONFIG, EMBEDDING_CONFIG, HOSTED, LLM_CONFIG, PATHS, RETRIEVAL_CONFIG
+from ingestion.embedder import embed_query
+from ingestion.indexer import fetch_embeddings
+from llm.client_factory import get_llm
+from retrieval.dense import Hit, dense_search
+from retrieval.hybrid import reciprocal_rank_fusion
+from retrieval.reranker import rerank
+from retrieval.sparse import sparse_search
+st.set_page_config(page_title="AdaptiveRAG — Underhood", page_icon="🔬", layout="wide")
+# ───────────────────────────── styling ──────────────────────────────
+st.markdown(
+    """
+<style>
+  .phase-card {
+    border-left: 4px solid var(--accent, #4f8cff);
+    padding: .6rem 1rem;
+    margin: .25rem 0 .5rem 0;
+    background: rgba(79,140,255,0.06);
+    border-radius: 6px;
+  }
+  .phase-num { color: #4f8cff; font-weight: 700; margin-right: .4rem; }
+  .pill { display: inline-block; padding: .15rem .55rem; border-radius: 999px;
+          font-size: .78rem; font-weight: 600; margin-right: .4rem; }
+  .pill-blue   { background: #1e3a5f; color: #9ec5ff; }
+  .pill-green  { background: #1e4f30; color: #a3e6b5; }
+  .pill-purple { background: #3d2a5e; color: #c8a8f5; }
+  .pill-amber  { background: #5e3f0e; color: #f3c97a; }
+  .pill-red    { background: #5a1f1f; color: #f3a3a3; }
+  .pill-grey   { background: #2c2c33; color: #b8b8c0; }
+  .chunk-card {
+    background: rgba(255,255,255,0.03);
+    border: 1px solid rgba(255,255,255,0.08);
+    border-radius: 6px; padding: .55rem .7rem; margin-bottom: .4rem;
+    font-size: .82rem;
+  }
+  .chunk-meta { color: #9aa3b2; font-size: .73rem; margin-bottom: .25rem; }
+  .mini-vec {
+    font-family: 'SF Mono', Menlo, monospace; font-size: .68rem;
+    color: #8b949e; word-break: break-all;
+  }
+</style>
+""",
+    unsafe_allow_html=True,
+)
+# ───────────────────────────── helpers ──────────────────────────────
+@st.cache_resource
+def _llm():
+    return get_llm()
+def _load_manifest() -> dict:
+    p = PATHS["manifest_path"]
+    return json.loads(p.read_text()) if p.exists() else {}
+def phase_header(num: int, title: str, subtitle: str = "") -> None:
+    st.markdown(
+        f"<div class='phase-card'><span class='phase-num'>STEP {num}</span>"
+        f"<b>{title}</b><br><span style='color:#9aa3b2;font-size:.85rem;'>{subtitle}</span>"
+        f"</div>",
+        unsafe_allow_html=True,
+    )
+def hits_to_df(hits: list[Hit], score_label: str = "score") -> pd.DataFrame:
+    rows = []
+    for h in hits:
+        title = h.metadata.get("title") or h.metadata.get("source_path", "?")
+        short = title.split(" (")[0]
+        if len(short) > 38:
+            short = short[:35] + "…"
+        label = f"{short} · p{h.metadata.get('page_start')} · {h.chunk_id.split('::')[-1]}"
+        rows.append({"chunk": label, score_label: float(h.score), "chunk_id": h.chunk_id})
+    return pd.DataFrame(rows)
+def render_hits(hits: list[Hit], badge_class: str, label: str, max_chars: int = 220) -> None:
+    if not hits:
+        st.caption(f"_(no {label.lower()} hits)_")
+        return
+    for i, h in enumerate(hits, start=1):
+        meta = h.metadata
+        snippet = h.text[:max_chars].replace("\n", " ")
+        if len(h.text) > max_chars:
+            snippet += "…"
+        st.markdown(
+            f"<div class='chunk-card'>"
+            f"<div class='chunk-meta'>"
+            f"<span class='pill {badge_class}'>{label} #{i}</span>"
+            f"score <b>{h.score:.3f}</b> · "
+            f"{meta.get('title','?')} · p.{meta.get('page_start')}–{meta.get('page_end')} · "
+            f"<code>{h.chunk_id}</code>"
+            f"</div>{snippet}</div>",
+            unsafe_allow_html=True,
+        )
+def pca_2d(matrix: np.ndarray) -> np.ndarray:
+    centered = matrix - matrix.mean(axis=0, keepdims=True)
+    _, _, vt = np.linalg.svd(centered, full_matrices=False)
+    return centered @ vt[:2].T
+def vector_space_plot(query_vec: list[float], fused_hits: list[Hit],
+                      dense_ids: set[str], sparse_ids: set[str],
+                      kept_ids: set[str]) -> None:
+    if not fused_hits:
+        st.caption("_(nothing to plot)_")
+        return
+    embs = fetch_embeddings([h.chunk_id for h in fused_hits])
+    rows = []
+    vecs = [np.array(query_vec, dtype=np.float32)]
+    for h in fused_hits:
+        v = embs.get(h.chunk_id)
+        if v is None:
+            continue
+        vecs.append(np.array(v, dtype=np.float32))
+        in_d, in_s = h.chunk_id in dense_ids, h.chunk_id in sparse_ids
+        in_keep = h.chunk_id in kept_ids
+        if in_d and in_s:
+            color = "fused (both)"
+        elif in_d:
+            color = "dense only"
+        elif in_s:
+            color = "sparse only"
+        else:
+            color = "other"
+        title = (h.metadata.get("title") or "?").split(" (")[0][:40]
+        label = f"{title} · p{h.metadata.get('page_start')}"
+        rows.append({"label": label, "color": color, "size": 90 if in_keep else 50})
+    if len(vecs) < 3:
+        st.caption("_(need at least 2 hits for a 2D projection)_")
+        return
+    proj = pca_2d(np.vstack(vecs))
+    df = pd.DataFrame(
+        [{"x": proj[0, 0], "y": proj[0, 1], "label": "🔎 your question",
+          "color": "QUERY", "size": 220}]
+        + [{"x": proj[i + 1, 0], "y": proj[i + 1, 1], **rows[i]}
+           for i in range(len(rows))]
+    )
+    st.scatter_chart(
+        df, x="x", y="y", color="color", size="size",
+        height=380, use_container_width=True,
+    )
+    st.caption(
+        "PCA projection of the query embedding + fused hit embeddings. "
+        "Larger points survived cross-encoder reranking."
+    )
+def render_embedding_card(query: str, qv: list[float], dt: float) -> None:
+    arr = np.array(qv, dtype=np.float32)
+    cols = st.columns([1, 1, 1, 3])
+    cols[0].metric("Model", EMBEDDING_CONFIG["model"].split("/")[-1])
+    cols[1].metric("Dimensions", len(qv))
+    cols[2].metric("L2 norm", f"{float(np.linalg.norm(arr)):.3f}")
+    cols[3].metric("Embed time", f"{dt*1000:.0f} ms")
+    st.caption(f"Question ({len(query)} chars, ~{len(query.split())} words):")
+    st.code(query, language="text")
+    st.caption("First 32 dimensions of the embedding vector:")
+    st.bar_chart(pd.DataFrame({"value": arr[:32]}), height=140, use_container_width=True)
+    preview = ", ".join(f"{x:+.3f}" for x in arr[:8]) + ", …"
+    st.markdown(f"<span class='mini-vec'>vector[0:8] = [{preview}]</span>",
+                unsafe_allow_html=True)
+# ───────────────────────────── pipeline view ──────────────────────────────
+def visual_pipeline(query: str) -> None:
+    llm = _llm()
+    # ── Step 1: embed the question ────────────────────────────────
+    phase_header(1, "Question encoding",
+                 "Convert text → 384-dim dense vector via sentence-transformers (MiniLM-L6).")
+    t0 = time.time()
+    qv = embed_query(query)
+    render_embedding_card(query, qv, time.time() - t0)
+    # ── Step 2: Self-RAG router ────────────────────────────────
+    phase_header(2, "Self-RAG router",
+                 "Decide whether to RETRIEVE, ANSWER_DIRECTLY, or CLARIFY before touching the index.")
+    t0 = time.time()
+    decision = route(query, llm=llm)
+    dt = time.time() - t0
+    pill_map = {"RETRIEVE": "pill-blue", "ANSWER_DIRECTLY": "pill-green", "CLARIFY": "pill-amber"}
+    pill = pill_map.get(decision["action"], "pill-grey")
+    st.markdown(
+        f"<span class='pill {pill}'>{decision['action']}</span>"
+        f"<span style='color:#9aa3b2;'>{decision.get('reason','')}</span>"
+        f"<span style='float:right;color:#9aa3b2;font-size:.78rem;'>"
+        f"router latency: {dt*1000:.0f} ms</span>",
+        unsafe_allow_html=True,
+    )
+    if decision["action"] == "ANSWER_DIRECTLY":
+        st.markdown("### Direct answer (no retrieval)")
+        ans = llm.generate(prompt=query,
+                           system="You are a helpful research assistant. Be concise.",
+                           temperature=0.2)
+        st.markdown(ans)
+        return
+    if decision["action"] == "CLARIFY":
+        st.markdown("### Clarifying question")
+        ans = llm.generate(
+            prompt=("The user asked: " + query +
+                    "\n\nIt is too ambiguous to answer well. Ask one short clarifying question."),
+            system="You are a helpful research assistant.",
+            temperature=0.2,
+        )
+        st.markdown(ans)
+        return
+    # ── Iterations of plan → retrieve → answer → critique ────────
+    accumulated: list[Hit] = []
+    current_query = query
+    for it in range(AGENT_CONFIG["max_iterations"]):
+        st.markdown(f"---\n## 🔁 Iteration {it + 1}")
+        if current_query != query:
+            st.info(f"Refined query → **{current_query}**")
+        # ── Step 3: plan ─────────────────────────────────────
+        phase_header(3, "Planner", "LLM decomposes the question into focused sub-queries.")
+        prior = ""
+        if accumulated:
+            titles = sorted({h.metadata.get("title", "?") for h in accumulated})
+            prior = "Already gathered passages from: " + ", ".join(titles)
+        t0 = time.time()
+        steps = plan(current_query, prior_summary=prior, llm=llm)
+        dt = time.time() - t0
+        st.caption(f"Generated {len(steps)} sub-quer{'y' if len(steps)==1 else 'ies'} in {dt*1000:.0f} ms")
+        for i, s in enumerate(steps, start=1):
+            st.markdown(
+                f"<div class='chunk-card'>"
+                f"<span class='pill pill-purple'>sub-query {i}</span>"
+                f"<b>{s['query']}</b>"
+                f"<div class='chunk-meta' style='margin-top:.3rem;'>"
+                f"rationale: {s.get('rationale','—')}</div></div>",
+                unsafe_allow_html=True,
+            )
+        # ── Step 4: retrieval per sub-query ──────────────────
+        phase_header(
+            4,
+            "Hybrid retrieval per sub-query",
+            f"Dense (Chroma cosine, k={RETRIEVAL_CONFIG['dense_k']}) ∥ "
+            f"Sparse (BM25, k={RETRIEVAL_CONFIG['sparse_k']}) → "
+            f"Reciprocal Rank Fusion → Cross-encoder rerank "
+            f"(BGE, top {RETRIEVAL_CONFIG['rerank_top_n']}).",
+        )
+        for si, step in enumerate(steps, start=1):
+            with st.expander(f"Sub-query {si}: {step['query']}", expanded=(si == 1)):
+                t0 = time.time()
+                dense_hits = dense_search(step["query"])
+                t_dense = time.time() - t0
+                t0 = time.time()
+                sparse_hits = sparse_search(step["query"])
+                t_sparse = time.time() - t0
+                t0 = time.time()
+                fused = reciprocal_rank_fusion([dense_hits, sparse_hits],
+                                               top_k=max(RETRIEVAL_CONFIG["dense_k"],
+                                                         RETRIEVAL_CONFIG["sparse_k"]))
+                t_fuse = time.time() - t0
+                t0 = time.time()
+                reranked = rerank(step["query"], fused)
+                t_rerank = time.time() - t0
+                m1, m2, m3, m4 = st.columns(4)
+                m1.metric("Dense hits", len(dense_hits), f"{t_dense*1000:.0f} ms")
+                m2.metric("Sparse hits", len(sparse_hits), f"{t_sparse*1000:.0f} ms")
+                m3.metric("After RRF", len(fused), f"{t_fuse*1000:.0f} ms")
+                m4.metric("After rerank", len(reranked), f"{t_rerank*1000:.0f} ms")
+                tabs = st.tabs([
+                    "🔵 Dense (vectors)",
+                    "🟢 Sparse (BM25)",
+                    "🟣 RRF fusion",
+                    "🟡 Cross-encoder rerank",
+                    "🗺️ Vector space",
+                ])
+                with tabs[0]:
+                    st.caption("Top-K nearest neighbors by cosine similarity.")
+                    if dense_hits:
+                        st.bar_chart(hits_to_df(dense_hits, "cosine_sim"),
+                                     x="chunk", y="cosine_sim",
+                                     height=260, use_container_width=True)
+                    render_hits(dense_hits[:5], "pill-blue", "DENSE")
+                with tabs[1]:
+                    st.caption("Top-K BM25 keyword matches (normalized).")
+                    if sparse_hits:
+                        st.bar_chart(hits_to_df(sparse_hits, "bm25_norm"),
+                                     x="chunk", y="bm25_norm",
+                                     height=260, use_container_width=True)
+                    render_hits(sparse_hits[:5], "pill-green", "BM25")
+                with tabs[2]:
+                    st.caption(
+                        "Reciprocal Rank Fusion: score(d) = Σ 1/(k + rank). "
+                        "Combines dense + sparse rankings into one merged list."
+                    )
+                    if fused:
+                        st.bar_chart(hits_to_df(fused[:12], "rrf_score"),
+                                     x="chunk", y="rrf_score",
+                                     height=280, use_container_width=True)
+                    render_hits(fused[:5], "pill-purple", "FUSED")
+                with tabs[3]:
+                    st.caption(
+                        "Cross-encoder scores (query, chunk) jointly — much more "
+                        "accurate than bi-encoder cosine, but slower → only run on "
+                        "the fused candidate set."
+                    )
+                    if reranked:
+                        st.bar_chart(hits_to_df(reranked, "ce_score"),
+                                     x="chunk", y="ce_score",
+                                     height=240, use_container_width=True)
+                    render_hits(reranked, "pill-amber", "RERANKED")
+                with tabs[4]:
+                    dense_ids = {h.chunk_id for h in dense_hits}
+                    sparse_ids = {h.chunk_id for h in sparse_hits}
+                    kept_ids = {h.chunk_id for h in reranked}
+                    vector_space_plot(qv, fused[:20], dense_ids, sparse_ids, kept_ids)
+                accumulated.extend(reranked)
+        # ── Step 5: answer ─────────────────────────────────────
+        # Dedupe + cap to 8 passages for the final prompt
+        seen: set[str] = set()
+        unique: list[Hit] = []
+        for h in accumulated:
+            if h.chunk_id in seen:
+                continue
+            seen.add(h.chunk_id)
+            unique.append(h)
+            if len(unique) >= 8:
+                break
+        context_lines, citations = [], []
+        for i, h in enumerate(unique, start=1):
+            meta = h.metadata
+            head = (f"[{i}] {meta.get('title','?')} "
+                    f"(p.{meta.get('page_start')}-{meta.get('page_end')})")
+            context_lines.append(f"{head}\n{h.text}")
+            citations.append({
+                "n": i, "chunk_id": h.chunk_id,
+                "title": meta.get("title"),
+                "source_path": meta.get("source_path"),
+                "page_start": meta.get("page_start"),
+                "page_end": meta.get("page_end"),
+                "score": float(h.score),
+            })
+        context_block = "\n\n".join(context_lines)
+        phase_header(5, "Context assembly + answer generation",
+                     f"Top {len(unique)} unique passages → {LLM_CONFIG['model']} via {LLM_CONFIG['provider']}.")
+        with st.expander("📦 Context handed to the LLM", expanded=False):
+            for c in citations:
+                st.markdown(
+                    f"**[{c['n']}]** {c['title']} · pages {c['page_start']}–{c['page_end']} · "
+                    f"score `{c['score']:.3f}`"
+                )
+            st.code(context_block[:3000] + ("…" if len(context_block) > 3000 else ""),
+                    language="text")
+        t0 = time.time()
+        ANSWER_SYSTEM = (
+            "You are a careful research assistant. Use ONLY the provided passages to "
+            "answer the question. Cite sources inline with [N] where N is the passage "
+            "number. If the passages are insufficient, say so explicitly."
+        )
+        ANSWER_PROMPT = (
+            f"Question: {query}\n\nPassages:\n{context_block}\n\n"
+            "Write a concise, well-grounded answer. Use inline citations like [1], [2] "
+            "that match the passage numbers above."
+        )
+        answer = llm.generate(prompt=ANSWER_PROMPT, system=ANSWER_SYSTEM, temperature=0.1)
+        st.caption(f"LLM generation: {time.time()-t0:.1f} s")
+        st.markdown("### Answer")
+        st.markdown(answer)
+        st.markdown("### Citations")
+        for c in citations:
+            st.markdown(
+                f"**[{c['n']}]** {c['title']} — pages {c['page_start']}–{c['page_end']} "
+                f"· score `{c['score']:.3f}` · `{Path(c['source_path']).name}`"
+            )
+        # ── Step 6: critic ─────────────────────────────────────
+        phase_header(6, "Self-critique",
+                     "LLM scores its own answer for grounding + completeness.")
+        t0 = time.time()
+        crit = critique(query, answer, context_block, llm=llm)
+        c1, c2, c3 = st.columns(3)
+        c1.metric("Grounded", "✅ yes" if crit["grounded"] else "⚠️ no")
+        c2.metric("Complete", "✅ yes" if crit["complete"] else "⚠️ no")
+        c3.metric("Confidence", f"{crit['confidence']:.2f}",
+                  delta=f"threshold {AGENT_CONFIG['confidence_threshold']:.2f}")
+        if crit.get("missing"):
+            st.warning(f"Missing: {crit['missing']}")
+        st.caption(f"Critique latency: {time.time()-t0:.1f} s")
+        if crit["confidence"] >= AGENT_CONFIG["confidence_threshold"] and crit["grounded"]:
+            st.success(f"✓ Confidence {crit['confidence']:.2f} ≥ threshold — answer accepted.")
+            return
+        if it < AGENT_CONFIG["max_iterations"] - 1:
+            st.warning("Confidence below threshold — refining query and retrying.")
+            current_query = refine_query(query, crit.get("missing", ""), llm=llm)
+        else:
+            st.error("Max iterations reached. Returning best-effort answer.")
+# ───────────────────────────── sidebar + tabs ──────────────────────────────
+def _sidebar() -> None:
+    st.sidebar.title("AdaptiveRAG")
+    st.sidebar.caption("Agentic + Self-RAG + Modular RAG")
+    llm = _llm()
+    ok = llm.health()
+    backend = "Groq API" if HOSTED else "Ollama (local)"
+    st.sidebar.markdown(f"**LLM backend**: {'🟢' if ok else '🔴'} {backend}")
+    st.sidebar.markdown(f"**Model**: `{LLM_CONFIG['model']}`")
+    st.sidebar.markdown(f"**Embedder**: `{EMBEDDING_CONFIG['model'].split('/')[-1]}`")
+    st.sidebar.markdown(f"**Reranker**: `bge-reranker-base`")
+    manifest = _load_manifest()
+    if manifest:
+        st.sidebar.markdown(f"**Index**: {manifest.get('n_chunks','?')} chunks across "
+                            f"{len(manifest.get('chunks_per_doc',{}))} docs")
+        with st.sidebar.expander("Documents"):
+            for doc, n in sorted(manifest.get("chunks_per_doc", {}).items()):
+                st.markdown(f"- `{doc}` — {n}")
+    else:
+        st.sidebar.warning("No index found. Run `python ingest.py --reset`.")
+    st.sidebar.divider()
+    st.sidebar.markdown("### Pipeline")
+    st.sidebar.code(
+        "question\n   ↓ embed (MiniLM)\n   ↓ Self-RAG router\n   ↓ planner → sub-queries\n"
+        "   ↓ dense ∥ sparse\n   ↓ RRF fusion\n   ↓ cross-encoder rerank\n   ↓ Qwen3-VL answer\n"
+        "   ↓ self-critique → retry?\n   → answer + citations",
+        language="text",
+    )
+def pipeline_tab() -> None:
+    st.subheader("🔬 Underhood: watch every stage of the agentic RAG pipeline")
+    st.caption(
+        "Each step renders its inputs and outputs as it runs — embedding vector, "
+        "router decision, planner sub-queries, dense vs sparse hits side-by-side, "
+        "RRF fusion, cross-encoder rerank, vector-space projection, answer, self-critique."
+    )
+    samples = [
+        "How does Self-RAG decide when to retrieve, and what reflection tokens does it use?",
+        "Compare DDPM and DDIM sampling — what does DDIM gain by being non-Markovian?",
+        "What is multi-head self-attention and why does parallelism matter?",
+        "How does HyDE improve dense retrieval without relevance labels?",
+        "How does ReAct combine reasoning and acting, vs chain-of-thought?",
+        "hello, what can you do?",
+    ]
+    if "vq" not in st.session_state:
+        st.session_state.vq = samples[0]
+    cols = st.columns(3)
+    for i, s in enumerate(samples):
+        if cols[i % 3].button(s, key=f"vs{i}", use_container_width=True):
+            st.session_state.vq = s
+    q = st.text_area("Question", value=st.session_state.vq, height=80, key="vq_input")
+    if st.button("▶ Run pipeline", type="primary"):
+        if q.strip():
+            visual_pipeline(q.strip())
+def image_tab() -> None:
+    st.subheader("🖼️ Multimodal RAG (Qwen3-VL)")
+    st.caption(
+        "Upload an image (e.g. a figure from a paper). Qwen3-VL captions it, the "
+        "caption + question drives hybrid retrieval, then the model reasons over "
+        "image + retrieved passages together."
+    )
+    uploaded = st.file_uploader("Image", type=["png", "jpg", "jpeg", "webp"])
+    q = st.text_input("Question about the image", "Explain what this figure shows.")
+    go = st.button("Reason", type="primary", key="img_go")
+    if uploaded:
+        st.image(uploaded, width=400)
+    if not (go and uploaded):
+        return
+    with tempfile.NamedTemporaryFile(suffix=Path(uploaded.name).suffix, delete=False) as f:
+        f.write(uploaded.getbuffer())
+        tmp_path = f.name
+    try:
+        with st.spinner("Captioning → retrieving → multimodal reasoning..."):
+            out = image_retrieve_and_reason(tmp_path, q, llm=_llm())
+        st.markdown("### Caption")
+        st.write(out["caption"])
+        st.markdown("### Answer")
+        st.markdown(out["answer"])
+        st.markdown("### Retrieved passages")
+        for i, h in enumerate(out["hits"], start=1):
+            st.markdown(
+                f"**[{i}]** {h.metadata.get('title')} "
+                f"(p.{h.metadata.get('page_start')}–{h.metadata.get('page_end')}) "
+                f"· score `{h.score:.3f}`"
+            )
+            st.caption(h.text[:300] + ("…" if len(h.text) > 300 else ""))
+    finally:
+        os.unlink(tmp_path)
+def main() -> None:
+    _sidebar()
+    st.title("AdaptiveRAG 📚🔬")
+    st.caption(
+        "Agentic + Self-RAG + Modular RAG over your local paper library — "
+        f"powered by `{LLM_CONFIG['model']}` via **{LLM_CONFIG['provider']}**. "
+        "Every pipeline stage is exposed below."
+    )
+    pipe, img = st.tabs(["🔬 Underhood pipeline", "🖼️ Image Q&A (multimodal)"])
+    with pipe:
+        pipeline_tab()
+    with img:
+        image_tab()
+if __name__ == "__main__":
+    main()

ask.py ADDED Viewed

	@@ -0,0 +1,32 @@

+"""CLI: python ask.py 'your question here'"""
+from __future__ import annotations
+import argparse
+import json
+from agent.loop import run_agent
+def main() -> None:
+    ap = argparse.ArgumentParser()
+    ap.add_argument("question", nargs="+")
+    ap.add_argument("--trace", action="store_true")
+    args = ap.parse_args()
+    q = " ".join(args.question)
+    res = run_agent(q)
+    print("\n=== ROUTE ===")
+    print(res.route)
+    print("\n=== ANSWER ===")
+    print(res.answer)
+    print("\n=== CITATIONS ===")
+    for c in res.citations:
+        print(f"  [{c['n']}] {c['title']} (p.{c['page_start']}-{c['page_end']}) score={c['score']:.3f}")
+    print(f"\nConfidence: {res.confidence:.2f}  iterations: {res.iterations}")
+    if args.trace:
+        print("\n=== TRACE ===")
+        for s in res.trace:
+            print(f"  • {s.kind}: {json.dumps(s.detail, default=str)[:200]}")
+if __name__ == "__main__":
+    main()

config.py ADDED Viewed

	@@ -0,0 +1,57 @@

+"""Central configuration for AdaptiveRAG."""
+import os
+from pathlib import Path
+ROOT = Path(__file__).parent.resolve()
+# Detect hosting environment
+HOSTED = bool(os.environ.get("GROQ_API_KEY"))
+LLM_CONFIG = {
+    "provider": "groq" if HOSTED else "ollama",
+    "model": "llama-3.1-8b-instant" if HOSTED else "qwen3-vl:8b-instruct-q8_0-optimized",
+    "base_url": "https://api.groq.com/openai/v1" if HOSTED else "http://localhost:11434",
+    "temperature": 0.1,
+    "timeout": 60 if HOSTED else 180,
+    "num_ctx": 8192,
+}
+EMBEDDING_CONFIG = {
+    "model": "sentence-transformers/all-MiniLM-L6-v2",
+    "device": "cpu",
+    "batch_size": 32,
+}
+RERANKER_CONFIG = {
+    "model": "BAAI/bge-reranker-base",
+    "device": "cpu",
+}
+CHUNKING_CONFIG = {
+    "target_chunk_chars": 1400,
+    "max_chunk_chars": 2200,
+    "min_chunk_chars": 350,
+    "overlap_chars": 200,
+}
+RETRIEVAL_CONFIG = {
+    "dense_k": 12,
+    "sparse_k": 12,
+    "rrf_k": 60,
+    "rerank_top_n": 5,
+}
+AGENT_CONFIG = {
+    "max_iterations": 3,
+    "confidence_threshold": 0.85,
+    "max_plan_steps": 3,
+}
+PATHS = {
+    "papers_dir": ROOT / "papers",
+    "chroma_dir": ROOT / "storage" / "chroma",
+    "bm25_path": ROOT / "storage" / "bm25.pkl",
+    "manifest_path": ROOT / "storage" / "manifest.json",
+}
+CHROMA_COLLECTION = "adaptive_rag"

download_papers.sh ADDED Viewed

	@@ -0,0 +1,28 @@

+#!/bin/bash
+mkdir -p papers && cd papers
+# Transformers
+curl -L "https://arxiv.org/pdf/1706.03762" -o "01_attention_is_all_you_need.pdf"
+curl -L "https://arxiv.org/pdf/1810.04805" -o "02_bert.pdf"
+curl -L "https://arxiv.org/pdf/2005.14165" -o "03_gpt3.pdf"
+# Diffusion
+curl -L "https://arxiv.org/pdf/2006.11239" -o "04_ddpm.pdf"
+curl -L "https://arxiv.org/pdf/2010.02502" -o "05_ddim.pdf"
+# RAG
+curl -L "https://arxiv.org/pdf/2005.11401" -o "06_rag_original.pdf"
+curl -L "https://arxiv.org/pdf/2312.10997" -o "07_rag_survey.pdf"
+curl -L "https://arxiv.org/pdf/2310.11511" -o "08_self_rag.pdf"
+curl -L "https://arxiv.org/pdf/2212.10496" -o "09_hyde.pdf"
+# Vision
+curl -L "https://arxiv.org/pdf/2010.11929" -o "10_vit.pdf"
+curl -L "https://arxiv.org/pdf/2103.00020" -o "11_clip.pdf"
+# Agents
+curl -L "https://arxiv.org/pdf/2210.03629" -o "12_react.pdf"
+curl -L "https://arxiv.org/pdf/2201.11903" -o "13_chain_of_thought.pdf"
+curl -L "https://arxiv.org/pdf/2303.18223" -o "14_llm_survey.pdf"
+echo "Downloaded $(ls *.pdf | wc -l) papers"

ingest.py ADDED Viewed

	@@ -0,0 +1,43 @@

+"""Ingest all PDFs from the papers/ directory into ChromaDB + BM25."""
+from __future__ import annotations
+import argparse
+import time
+from config import PATHS
+from ingestion.chunker import chunk_document
+from ingestion.indexer import index_chunks
+from ingestion.loader import discover_pdfs, load_pdf
+def main() -> None:
+    ap = argparse.ArgumentParser()
+    ap.add_argument("--reset", action="store_true", help="Reset the index first")
+    ap.add_argument("--papers-dir", default=str(PATHS["papers_dir"]))
+    args = ap.parse_args()
+    pdfs = discover_pdfs(args.papers_dir)
+    if not pdfs:
+        print(f"No PDFs in {args.papers_dir}")
+        return
+    print(f"Found {len(pdfs)} PDFs in {args.papers_dir}")
+    all_chunks = []
+    t0 = time.time()
+    for path in pdfs:
+        doc_id = path.stem
+        print(f"\n[{doc_id}]")
+        doc = load_pdf(path)
+        print(f"  Loaded: {len(doc.pages)} pages, title={doc.title!r}")
+        chunks = chunk_document(doc, doc_id=doc_id)
+        print(f"  Chunked: {len(chunks)} chunks (avg {sum(len(c.text) for c in chunks)//max(len(chunks),1)} chars)")
+        all_chunks.extend(chunks)
+    print(f"\nIndexing {len(all_chunks)} chunks total...")
+    manifest = index_chunks(all_chunks, reset=args.reset)
+    dt = time.time() - t0
+    print(f"\nDone in {dt:.1f}s. Manifest: {manifest}")
+if __name__ == "__main__":
+    main()

ingestion/__init__.py ADDED Viewed

File without changes

ingestion/chunker.py ADDED Viewed

	@@ -0,0 +1,104 @@

+"""Semantic-aware chunking.
+Strategy: split each page into sentences, then greedily group sentences into
+chunks targeting CHUNKING_CONFIG['target_chunk_chars']. Carry an overlap of
+the last few sentences (~overlap_chars) to the next chunk so context isn't
+sliced mid-thought. Headings hint chunk boundaries.
+"""
+from __future__ import annotations
+import re
+from dataclasses import dataclass
+from config import CHUNKING_CONFIG
+from ingestion.loader import LoadedDoc
+@dataclass
+class Chunk:
+    chunk_id: str
+    doc_id: str
+    source_path: str
+    title: str
+    page_start: int
+    page_end: int
+    text: str
+_SENTENCE_SPLIT = re.compile(r"(?<=[.!?])\s+(?=[A-Z0-9(])")
+_HEADING_HINT = re.compile(r"^(?:[0-9]+(?:\.[0-9]+)*\s+|abstract|introduction|conclusion|references|method|results|discussion|background)\b", re.IGNORECASE)
+def _split_sentences(text: str) -> list[str]:
+    parts: list[str] = []
+    for line in text.splitlines():
+        line = line.strip()
+        if not line:
+            continue
+        if _HEADING_HINT.match(line):
+            parts.append(line)
+            continue
+        for sent in _SENTENCE_SPLIT.split(line):
+            sent = sent.strip()
+            if sent:
+                parts.append(sent)
+    return parts
+def chunk_document(doc: LoadedDoc, doc_id: str) -> list[Chunk]:
+    target = CHUNKING_CONFIG["target_chunk_chars"]
+    max_chars = CHUNKING_CONFIG["max_chunk_chars"]
+    min_chars = CHUNKING_CONFIG["min_chunk_chars"]
+    overlap = CHUNKING_CONFIG["overlap_chars"]
+    units: list[tuple[int, str]] = []
+    for page in doc.pages:
+        for sent in _split_sentences(page.text):
+            units.append((page.page_number, sent))
+    chunks: list[Chunk] = []
+    buf: list[tuple[int, str]] = []
+    buf_len = 0
+    def flush() -> None:
+        nonlocal buf, buf_len
+        if not buf:
+            return
+        text = " ".join(s for _, s in buf).strip()
+        if len(text) < min_chars and chunks:
+            chunks[-1].text = (chunks[-1].text + " " + text).strip()
+            chunks[-1].page_end = buf[-1][0]
+            buf, buf_len = [], 0
+            return
+        chunk = Chunk(
+            chunk_id=f"{doc_id}::c{len(chunks):04d}",
+            doc_id=doc_id,
+            source_path=doc.source_path,
+            title=doc.title,
+            page_start=buf[0][0],
+            page_end=buf[-1][0],
+            text=text,
+        )
+        chunks.append(chunk)
+        carry: list[tuple[int, str]] = []
+        carry_len = 0
+        for pn, s in reversed(buf):
+            if carry_len + len(s) + 1 > overlap:
+                break
+            carry.insert(0, (pn, s))
+            carry_len += len(s) + 1
+        buf = carry
+        buf_len = sum(len(s) + 1 for _, s in buf)
+    for pn, sent in units:
+        is_heading = bool(_HEADING_HINT.match(sent))
+        if is_heading and buf_len >= min_chars:
+            flush()
+        buf.append((pn, sent))
+        buf_len += len(sent) + 1
+        if buf_len >= target:
+            flush()
+        elif buf_len >= max_chars:
+            flush()
+    flush()
+    return chunks

ingestion/embedder.py ADDED Viewed

	@@ -0,0 +1,32 @@

+"""Dense embeddings via sentence-transformers."""
+from __future__ import annotations
+from functools import lru_cache
+from sentence_transformers import SentenceTransformer
+from config import EMBEDDING_CONFIG
+@lru_cache(maxsize=1)
+def get_embedder() -> SentenceTransformer:
+    return SentenceTransformer(
+        EMBEDDING_CONFIG["model"],
+        device=EMBEDDING_CONFIG["device"],
+    )
+def embed_texts(texts: list[str]) -> list[list[float]]:
+    model = get_embedder()
+    vecs = model.encode(
+        texts,
+        batch_size=EMBEDDING_CONFIG["batch_size"],
+        convert_to_numpy=True,
+        normalize_embeddings=True,
+        show_progress_bar=False,
+    )
+    return vecs.tolist()
+def embed_query(text: str) -> list[float]:
+    return embed_texts([text])[0]

ingestion/indexer.py ADDED Viewed

	@@ -0,0 +1,134 @@

+"""ChromaDB management + BM25 corpus persistence."""
+from __future__ import annotations
+import json
+import os
+import pickle
+import re
+os.environ.setdefault("ANONYMIZED_TELEMETRY", "False")
+import chromadb
+from chromadb.config import Settings  # noqa: E402
+from config import CHROMA_COLLECTION, PATHS
+from ingestion.chunker import Chunk
+from ingestion.embedder import embed_texts
+def _ensure_dirs() -> None:
+    PATHS["chroma_dir"].mkdir(parents=True, exist_ok=True)
+    PATHS["bm25_path"].parent.mkdir(parents=True, exist_ok=True)
+def _client() -> chromadb.PersistentClient:
+    return chromadb.PersistentClient(
+        path=str(PATHS["chroma_dir"]),
+        settings=Settings(anonymized_telemetry=False),
+    )
+def get_chroma_collection():
+    _ensure_dirs()
+    client = _client()
+    return client.get_or_create_collection(
+        name=CHROMA_COLLECTION,
+        metadata={"hnsw:space": "cosine"},
+    )
+def reset_index() -> None:
+    _ensure_dirs()
+    client = _client()
+    try:
+        client.delete_collection(CHROMA_COLLECTION)
+    except Exception:
+        pass
+    if PATHS["bm25_path"].exists():
+        PATHS["bm25_path"].unlink()
+    if PATHS["manifest_path"].exists():
+        PATHS["manifest_path"].unlink()
+_TOKEN_RE = re.compile(r"[a-zA-Z0-9]+")
+def tokenize(text: str) -> list[str]:
+    return [t.lower() for t in _TOKEN_RE.findall(text)]
+def index_chunks(chunks: list[Chunk], reset: bool = False) -> dict:
+    _ensure_dirs()
+    if reset:
+        reset_index()
+    coll = get_chroma_collection()
+    ids = [c.chunk_id for c in chunks]
+    docs = [c.text for c in chunks]
+    metas = [
+        {
+            "doc_id": c.doc_id,
+            "source_path": c.source_path,
+            "title": c.title,
+            "page_start": c.page_start,
+            "page_end": c.page_end,
+        }
+        for c in chunks
+    ]
+    print(f"  Embedding {len(docs)} chunks...")
+    embeddings = embed_texts(docs)
+    print(f"  Writing to ChromaDB ({CHROMA_COLLECTION})...")
+    BATCH = 256
+    for i in range(0, len(ids), BATCH):
+        coll.upsert(
+            ids=ids[i : i + BATCH],
+            documents=docs[i : i + BATCH],
+            metadatas=metas[i : i + BATCH],
+            embeddings=embeddings[i : i + BATCH],
+        )
+    print("  Building BM25 corpus...")
+    tokenized = [tokenize(d) for d in docs]
+    with open(PATHS["bm25_path"], "wb") as f:
+        pickle.dump(
+            {"ids": ids, "tokenized": tokenized, "metas": metas, "docs": docs},
+            f,
+        )
+    manifest = {
+        "n_chunks": len(ids),
+        "chunks_per_doc": _group_count([c.doc_id for c in chunks]),
+    }
+    with open(PATHS["manifest_path"], "w") as f:
+        json.dump(manifest, f, indent=2)
+    return manifest
+def _group_count(items: list[str]) -> dict:
+    out: dict = {}
+    for x in items:
+        out[x] = out.get(x, 0) + 1
+    return out
+def fetch_embeddings(chunk_ids: list[str]) -> dict[str, list[float]]:
+    """Pull stored embeddings for a list of chunk ids (used for visualization)."""
+    if not chunk_ids:
+        return {}
+    coll = get_chroma_collection()
+    res = coll.get(ids=list(chunk_ids), include=["embeddings"])
+    out: dict[str, list[float]] = {}
+    for cid, vec in zip(res["ids"], res["embeddings"]):
+        out[cid] = list(vec)
+    return out
+def load_bm25_corpus() -> dict:
+    if not PATHS["bm25_path"].exists():
+        raise FileNotFoundError(
+            f"BM25 corpus not found at {PATHS['bm25_path']}. Run ingestion first."
+        )
+    with open(PATHS["bm25_path"], "rb") as f:
+        return pickle.load(f)

ingestion/loader.py ADDED Viewed

	@@ -0,0 +1,87 @@

+"""PDF loader. Returns per-page text + structural metadata."""
+from __future__ import annotations
+import re
+from dataclasses import dataclass, field
+from pathlib import Path
+import pymupdf
+@dataclass
+class PageText:
+    page_number: int
+    text: str
+@dataclass
+class LoadedDoc:
+    source_path: str
+    title: str
+    pages: list[PageText] = field(default_factory=list)
+    @property
+    def full_text(self) -> str:
+        return "\n\n".join(p.text for p in self.pages)
+_LIGATURES = {
+    "ﬀ": "ff", "ﬁ": "fi", "ﬂ": "fl",
+    "ﬃ": "ffi", "ﬄ": "ffl",
+}
+def _clean(text: str) -> str:
+    for k, v in _LIGATURES.items():
+        text = text.replace(k, v)
+    text = re.sub(r"-\n(?=\w)", "", text)
+    text = re.sub(r"[ \t]+", " ", text)
+    text = re.sub(r"\n{3,}", "\n\n", text)
+    return text.strip()
+_TITLE_OVERRIDES = {
+    "01_attention_is_all_you_need": "Attention Is All You Need (Vaswani et al., 2017)",
+    "02_bert": "BERT: Pre-training of Deep Bidirectional Transformers (Devlin et al., 2018)",
+    "03_gpt3": "Language Models are Few-Shot Learners (GPT-3, Brown et al., 2020)",
+    "04_ddpm": "Denoising Diffusion Probabilistic Models (Ho et al., 2020)",
+    "05_ddim": "Denoising Diffusion Implicit Models (Song et al., 2020)",
+    "06_rag_original": "Retrieval-Augmented Generation for Knowledge-Intensive NLP (Lewis et al., 2020)",
+    "07_rag_survey": "Retrieval-Augmented Generation for LLMs: A Survey (Gao et al., 2023)",
+    "08_self_rag": "Self-RAG: Learning to Retrieve, Generate, and Critique (Asai et al., 2023)",
+    "09_hyde": "Precise Zero-Shot Dense Retrieval with HyDE (Gao et al., 2022)",
+    "10_vit": "An Image is Worth 16x16 Words (Vision Transformer, Dosovitskiy et al., 2020)",
+    "11_clip": "Learning Transferable Visual Models from Natural Language Supervision (CLIP, Radford et al., 2021)",
+    "12_react": "ReAct: Synergizing Reasoning and Acting in Language Models (Yao et al., 2022)",
+    "13_chain_of_thought": "Chain-of-Thought Prompting Elicits Reasoning (Wei et al., 2022)",
+    "14_llm_survey": "A Survey of Large Language Models (Zhao et al., 2023)",
+}
+def load_pdf(path: str | Path) -> LoadedDoc:
+    path = Path(path)
+    doc = pymupdf.open(path)
+    pages: list[PageText] = []
+    for i, page in enumerate(doc, start=1):
+        raw = page.get_text("text")
+        cleaned = _clean(raw)
+        if cleaned:
+            pages.append(PageText(page_number=i, text=cleaned))
+    title = _TITLE_OVERRIDES.get(path.stem) or _guess_title(pages, fallback=path.stem)
+    doc.close()
+    return LoadedDoc(source_path=str(path), title=title, pages=pages)
+def _guess_title(pages: list[PageText], fallback: str) -> str:
+    if not pages:
+        return fallback
+    first = pages[0].text
+    for line in first.splitlines():
+        line = line.strip()
+        if 10 < len(line) < 180 and not line.lower().startswith(("abstract", "arxiv:")):
+            return line
+    return fallback
+def discover_pdfs(papers_dir: str | Path) -> list[Path]:
+    return sorted(Path(papers_dir).glob("*.pdf"))

llm/__init__.py ADDED Viewed

File without changes

llm/client_factory.py ADDED Viewed

	@@ -0,0 +1,16 @@

+"""Return the right LLM client based on environment.
+Local (Ollama running)  →  OllamaClient
+Hosted (GROQ_API_KEY set)  →  GroqClient
+"""
+from __future__ import annotations
+import os
+def get_llm():
+    if os.environ.get("GROQ_API_KEY"):
+        from llm.groq_client import GroqClient
+        return GroqClient()
+    from llm.ollama_client import OllamaClient
+    return OllamaClient()

llm/groq_client.py ADDED Viewed

	@@ -0,0 +1,133 @@

+"""Groq API client — drop-in replacement for OllamaClient when running hosted.
+Free tier: 14,400 requests/day, ~500 req/min.
+Text model  : llama-3.1-8b-instant  (fast, cheap)
+Vision model: llama-3.2-11b-vision-preview (images)
+"""
+from __future__ import annotations
+import base64
+import json
+import os
+from pathlib import Path
+from typing import Any
+import requests
+_TEXT_MODEL = "llama-3.1-8b-instant"
+_VISION_MODEL = "llama-3.2-11b-vision-preview"
+_BASE_URL = "https://api.groq.com/openai/v1"
+class GroqClient:
+    def __init__(self) -> None:
+        self.api_key = os.environ.get("GROQ_API_KEY", "")
+        self.temperature = 0.1
+        self.timeout = 60
+    def _headers(self) -> dict[str, str]:
+        return {
+            "Authorization": f"Bearer {self.api_key}",
+            "Content-Type": "application/json",
+        }
+    def _chat(
+        self,
+        messages: list[dict],
+        model: str,
+        temperature: float,
+        json_mode: bool = False,
+    ) -> str:
+        payload: dict[str, Any] = {
+            "model": model,
+            "messages": messages,
+            "temperature": temperature,
+        }
+        if json_mode:
+            payload["response_format"] = {"type": "json_object"}
+        r = requests.post(
+            f"{_BASE_URL}/chat/completions",
+            headers=self._headers(),
+            json=payload,
+            timeout=self.timeout,
+        )
+        r.raise_for_status()
+        return r.json()["choices"][0]["message"]["content"].strip()
+    def _build_messages(
+        self,
+        prompt: str,
+        system: str | None,
+        images: list[str] | None,
+    ) -> tuple[list[dict], str]:
+        msgs: list[dict] = []
+        if system:
+            msgs.append({"role": "system", "content": system})
+        if images:
+            content: list[dict] = [{"type": "text", "text": prompt}]
+            for img_path in images:
+                b64 = self._encode_image(img_path)
+                suffix = Path(img_path).suffix.lower().lstrip(".")
+                mime = {"jpg": "image/jpeg", "jpeg": "image/jpeg",
+                        "png": "image/png", "webp": "image/webp"}.get(suffix, "image/png")
+                content.append({"type": "image_url",
+                                 "image_url": {"url": f"data:{mime};base64,{b64}"}})
+            msgs.append({"role": "user", "content": content})
+            return msgs, _VISION_MODEL
+        msgs.append({"role": "user", "content": prompt})
+        return msgs, _TEXT_MODEL
+    def generate(
+        self,
+        prompt: str,
+        system: str | None = None,
+        temperature: float | None = None,
+        images: list[str] | None = None,
+        format: str | None = None,
+    ) -> str:
+        msgs, model = self._build_messages(prompt, system, images)
+        return self._chat(
+            msgs, model,
+            temperature if temperature is not None else self.temperature,
+            json_mode=(format == "json"),
+        )
+    def generate_json(
+        self,
+        prompt: str,
+        system: str | None = None,
+        temperature: float | None = None,
+        images: list[str] | None = None,
+    ) -> dict[str, Any]:
+        text = self.generate(prompt=prompt, system=system,
+                             temperature=temperature, images=images, format="json")
+        return _safe_json(text)
+    @staticmethod
+    def _encode_image(path: str | Path) -> str:
+        with open(path, "rb") as f:
+            return base64.b64encode(f.read()).decode("utf-8")
+    def health(self) -> bool:
+        if not self.api_key:
+            return False
+        try:
+            r = requests.get(f"{_BASE_URL}/models", headers=self._headers(), timeout=5)
+            return r.status_code == 200
+        except requests.RequestException:
+            return False
+def _safe_json(text: str) -> dict[str, Any]:
+    text = text.strip()
+    try:
+        return json.loads(text)
+    except json.JSONDecodeError:
+        pass
+    start, end = text.find("{"), text.rfind("}")
+    if start != -1 and end > start:
+        try:
+            return json.loads(text[start:end + 1])
+        except json.JSONDecodeError:
+            pass
+    return {"_raw": text}

llm/ollama_client.py ADDED Viewed

	@@ -0,0 +1,116 @@

+"""Thin Ollama HTTP client. Handles text + multimodal (image) requests."""
+from __future__ import annotations
+import base64
+import json
+from pathlib import Path
+from typing import Any
+import requests
+from config import LLM_CONFIG
+class OllamaClient:
+    def __init__(
+        self,
+        model: str | None = None,
+        base_url: str | None = None,
+        temperature: float | None = None,
+        timeout: int | None = None,
+        num_ctx: int | None = None,
+    ) -> None:
+        self.model = model or LLM_CONFIG["model"]
+        self.base_url = (base_url or LLM_CONFIG["base_url"]).rstrip("/")
+        self.temperature = temperature if temperature is not None else LLM_CONFIG["temperature"]
+        self.timeout = timeout or LLM_CONFIG["timeout"]
+        self.num_ctx = num_ctx or LLM_CONFIG["num_ctx"]
+    def _options(self, **overrides: Any) -> dict[str, Any]:
+        opts = {"temperature": self.temperature, "num_ctx": self.num_ctx}
+        opts.update({k: v for k, v in overrides.items() if v is not None})
+        return opts
+    def generate(
+        self,
+        prompt: str,
+        system: str | None = None,
+        temperature: float | None = None,
+        images: list[str] | None = None,
+        format: str | None = None,
+    ) -> str:
+        payload: dict[str, Any] = {
+            "model": self.model,
+            "prompt": prompt,
+            "stream": False,
+            "options": self._options(temperature=temperature),
+        }
+        if system:
+            payload["system"] = system
+        if images:
+            payload["images"] = [self._encode_image(p) for p in images]
+        if format:
+            payload["format"] = format
+        r = requests.post(
+            f"{self.base_url}/api/generate",
+            json=payload,
+            timeout=self.timeout,
+        )
+        r.raise_for_status()
+        return r.json().get("response", "").strip()
+    def generate_json(
+        self,
+        prompt: str,
+        system: str | None = None,
+        temperature: float | None = None,
+        images: list[str] | None = None,
+    ) -> dict[str, Any]:
+        """Ask the model for a JSON object and parse it. Falls back to extraction
+        if the model wraps JSON in prose or fences."""
+        text = self.generate(
+            prompt=prompt,
+            system=system,
+            temperature=temperature,
+            images=images,
+            format="json",
+        )
+        return _safe_json_loads(text)
+    @staticmethod
+    def _encode_image(path: str | Path) -> str:
+        with open(path, "rb") as f:
+            return base64.b64encode(f.read()).decode("utf-8")
+    def health(self) -> bool:
+        try:
+            r = requests.get(f"{self.base_url}/api/tags", timeout=5)
+            return r.status_code == 200
+        except requests.RequestException:
+            return False
+def _safe_json_loads(text: str) -> dict[str, Any]:
+    text = text.strip()
+    try:
+        return json.loads(text)
+    except json.JSONDecodeError:
+        pass
+    if "```" in text:
+        parts = text.split("```")
+        for part in parts:
+            stripped = part.strip()
+            if stripped.startswith("json"):
+                stripped = stripped[4:].strip()
+            try:
+                return json.loads(stripped)
+            except json.JSONDecodeError:
+                continue
+    start = text.find("{")
+    end = text.rfind("}")
+    if start != -1 and end != -1 and end > start:
+        try:
+            return json.loads(text[start : end + 1])
+        except json.JSONDecodeError:
+            pass
+    return {"_raw": text}

requirements.txt ADDED Viewed

	@@ -0,0 +1,9 @@

+chromadb==0.5.23
+sentence-transformers==3.3.1
+rank-bm25==0.2.2
+pymupdf==1.25.1
+requests==2.32.3
+numpy==1.26.4
+streamlit==1.41.1
+tqdm==4.67.1
+pillow==11.0.0

retrieval/__init__.py ADDED Viewed

File without changes

retrieval/dense.py ADDED Viewed

	@@ -0,0 +1,39 @@

+"""Dense (vector) retrieval over ChromaDB."""
+from __future__ import annotations
+from dataclasses import dataclass
+from typing import Any
+from config import RETRIEVAL_CONFIG
+from ingestion.embedder import embed_query
+from ingestion.indexer import get_chroma_collection
+@dataclass
+class Hit:
+    chunk_id: str
+    text: str
+    metadata: dict
+    score: float
+    rank: int
+def dense_search(query: str, k: int | None = None) -> list[Hit]:
+    k = k or RETRIEVAL_CONFIG["dense_k"]
+    coll = get_chroma_collection()
+    qv = embed_query(query)
+    res = coll.query(
+        query_embeddings=[qv],
+        n_results=k,
+        include=["documents", "metadatas", "distances"],
+    )
+    ids = res["ids"][0]
+    docs = res["documents"][0]
+    metas = res["metadatas"][0]
+    dists = res["distances"][0]
+    hits: list[Hit] = []
+    for r, (cid, doc, meta, dist) in enumerate(zip(ids, docs, metas, dists)):
+        # cosine distance → similarity in [0,1]; chroma returns 1 - cosine_sim for cosine space
+        score = max(0.0, 1.0 - float(dist))
+        hits.append(Hit(chunk_id=cid, text=doc, metadata=dict(meta), score=score, rank=r))
+    return hits

retrieval/hybrid.py ADDED Viewed

	@@ -0,0 +1,47 @@

+"""Reciprocal Rank Fusion of dense + sparse hits."""
+from __future__ import annotations
+from config import RETRIEVAL_CONFIG
+from retrieval.dense import Hit, dense_search
+from retrieval.sparse import sparse_search
+def reciprocal_rank_fusion(
+    rankings: list[list[Hit]],
+    k_const: int | None = None,
+    top_k: int | None = None,
+) -> list[Hit]:
+    k_const = k_const or RETRIEVAL_CONFIG["rrf_k"]
+    fused: dict[str, dict] = {}
+    for ranking in rankings:
+        for r, hit in enumerate(ranking):
+            entry = fused.setdefault(
+                hit.chunk_id,
+                {"hit": hit, "score": 0.0},
+            )
+            entry["score"] += 1.0 / (k_const + r + 1)
+            # prefer higher-ranked instance for the canonical hit object
+            if hit.rank < entry["hit"].rank:
+                entry["hit"] = hit
+    merged = sorted(fused.values(), key=lambda x: x["score"], reverse=True)
+    out: list[Hit] = []
+    for r, entry in enumerate(merged):
+        h = entry["hit"]
+        out.append(
+            Hit(
+                chunk_id=h.chunk_id,
+                text=h.text,
+                metadata=h.metadata,
+                score=entry["score"],
+                rank=r,
+            )
+        )
+    if top_k:
+        out = out[:top_k]
+    return out
+def hybrid_search(query: str, top_k: int | None = None) -> list[Hit]:
+    dense = dense_search(query)
+    sparse = sparse_search(query)
+    return reciprocal_rank_fusion([dense, sparse], top_k=top_k)

retrieval/pipeline.py ADDED Viewed

	@@ -0,0 +1,12 @@

+"""End-to-end retrieval: hybrid search + cross-encoder reranking."""
+from __future__ import annotations
+from config import RETRIEVAL_CONFIG
+from retrieval.dense import Hit
+from retrieval.hybrid import hybrid_search
+from retrieval.reranker import rerank
+def hybrid_retrieve(query: str, top_n: int | None = None) -> list[Hit]:
+    fused = hybrid_search(query, top_k=max(RETRIEVAL_CONFIG["dense_k"], RETRIEVAL_CONFIG["sparse_k"]))
+    return rerank(query, fused, top_n=top_n)

retrieval/reranker.py ADDED Viewed

	@@ -0,0 +1,36 @@

+"""Cross-encoder reranker: deep relevance scoring on top of fused hits."""
+from __future__ import annotations
+from functools import lru_cache
+from sentence_transformers import CrossEncoder
+from config import RERANKER_CONFIG, RETRIEVAL_CONFIG
+from retrieval.dense import Hit
+@lru_cache(maxsize=1)
+def get_reranker() -> CrossEncoder:
+    return CrossEncoder(RERANKER_CONFIG["model"], device=RERANKER_CONFIG["device"])
+def rerank(query: str, hits: list[Hit], top_n: int | None = None) -> list[Hit]:
+    if not hits:
+        return []
+    top_n = top_n or RETRIEVAL_CONFIG["rerank_top_n"]
+    model = get_reranker()
+    pairs = [(query, h.text) for h in hits]
+    scores = model.predict(pairs, show_progress_bar=False)
+    ranked = sorted(zip(hits, scores), key=lambda x: float(x[1]), reverse=True)[:top_n]
+    out: list[Hit] = []
+    for r, (h, s) in enumerate(ranked):
+        out.append(
+            Hit(
+                chunk_id=h.chunk_id,
+                text=h.text,
+                metadata=h.metadata,
+                score=float(s),
+                rank=r,
+            )
+        )
+    return out

retrieval/sparse.py ADDED Viewed

	@@ -0,0 +1,44 @@

+"""Sparse retrieval via BM25Okapi over the persisted token corpus."""
+from __future__ import annotations
+from functools import lru_cache
+from rank_bm25 import BM25Okapi
+from config import RETRIEVAL_CONFIG
+from ingestion.indexer import load_bm25_corpus, tokenize
+from retrieval.dense import Hit
+@lru_cache(maxsize=1)
+def _bm25_state():
+    corpus = load_bm25_corpus()
+    bm25 = BM25Okapi(corpus["tokenized"])
+    return bm25, corpus
+def sparse_search(query: str, k: int | None = None) -> list[Hit]:
+    k = k or RETRIEVAL_CONFIG["sparse_k"]
+    bm25, corpus = _bm25_state()
+    tokens = tokenize(query)
+    if not tokens:
+        return []
+    scores = bm25.get_scores(tokens)
+    idx_sorted = sorted(range(len(scores)), key=lambda i: scores[i], reverse=True)[:k]
+    max_score = float(scores[idx_sorted[0]]) if idx_sorted else 0.0
+    hits: list[Hit] = []
+    for r, i in enumerate(idx_sorted):
+        s = float(scores[i])
+        if s <= 0:
+            continue
+        norm = s / max_score if max_score > 0 else 0.0
+        hits.append(
+            Hit(
+                chunk_id=corpus["ids"][i],
+                text=corpus["docs"][i],
+                metadata=dict(corpus["metas"][i]),
+                score=norm,
+                rank=r,
+            )
+        )
+    return hits

storage/bm25.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:01eb71c2d4f2c3f021e925e3c96a3131fd8406592c017daf8a405ecac1578ec6
+size 5864776

storage/chroma/bccd7ca5-4f87-4c9e-a569-6cf0dcdced21/data_level0.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:657eee8d3193abf7c307965c3cb694d08ac955dcc9b6ed3139d1d60e746449b8
+size 1676000

storage/chroma/bccd7ca5-4f87-4c9e-a569-6cf0dcdced21/header.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c2ca0f721e87b23fa44989e0cb59d71339f5a16cc05a8eb3c7777e658757e2e5
+size 100

storage/chroma/bccd7ca5-4f87-4c9e-a569-6cf0dcdced21/index_metadata.pickle ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0a9b9b398fb0167ff163c08c65e2728ff53af2a5e421a2355c4ee1ecdc746e2c
+size 36020

storage/chroma/bccd7ca5-4f87-4c9e-a569-6cf0dcdced21/length.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d51e5510e0fb53d9746f2be0e91a1f7cc74c25bdf21a7ca8096fccf6074521a3
+size 4000

storage/chroma/bccd7ca5-4f87-4c9e-a569-6cf0dcdced21/link_lists.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:358ea02768d90865ee8a426535f44933635bc209dda6b9481c2b9f221f08b18b
+size 8148

storage/chroma/chroma.sqlite3 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9526de2262c6da38b5d8ca3bb71f7902e912713b831020b7fb9dc22ba9d4d7fc
+size 51924992

storage/chroma/d7228068-4c70-4b64-a819-d7dbd7d28b63/data_level0.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:be8ace3c8f7cfb3c3d7fca6e733651bff7d18d34a51fbeac7096129dd7bb883d
+size 1676000

storage/chroma/d7228068-4c70-4b64-a819-d7dbd7d28b63/header.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c2ca0f721e87b23fa44989e0cb59d71339f5a16cc05a8eb3c7777e658757e2e5
+size 100

storage/chroma/d7228068-4c70-4b64-a819-d7dbd7d28b63/index_metadata.pickle ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:56cdf88e893d7b0e61a770d05eb551f743b72862a8666e733304527383803612
+size 36020

storage/chroma/d7228068-4c70-4b64-a819-d7dbd7d28b63/length.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:49b0ed9d141de28649f8bb3bcf395a50c5a960b5f003bf4b4964e5ac62fba885
+size 4000

storage/chroma/d7228068-4c70-4b64-a819-d7dbd7d28b63/link_lists.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fa7615217e24a544a1ecbaaa207daea4e33fe9dca8b5967ca8c5a90b14a90782
+size 8148

storage/manifest.json ADDED Viewed

	@@ -0,0 +1,19 @@

+{
+  "n_chunks": 1934,
+  "chunks_per_doc": {
+    "01_attention_is_all_you_need": 34,
+    "02_bert": 60,
+    "03_gpt3": 237,
+    "04_ddpm": 48,
+    "05_ddim": 45,
+    "06_rag_original": 59,
+    "07_rag_survey": 95,
+    "08_self_rag": 88,
+    "09_hyde": 35,
+    "10_vit": 61,
+    "11_clip": 217,
+    "12_react": 95,
+    "13_chain_of_thought": 135,
+    "14_llm_survey": 725
+  }
+}