Spaces:

Hitan2004
/

agentic-corrective-rag

Runtime error

App Files Files Community

3v324v23 commited on Apr 11

Commit

1670833

1 Parent(s): 60f2a58

Auto deploy backend

Browse files

Files changed (19) hide show

.github/workflows/ci.yml +54 -4
README.md +196 -9
hf_backend/.github/workflows/ci.yml +27 -0
hf_backend/.gitignore +0 -0
hf_backend/Dockerfile +18 -0
hf_backend/Procfile +1 -0
hf_backend/README.md +196 -0
hf_backend/agent.py +141 -0
hf_backend/config.py +26 -0
hf_backend/ingestion.py +127 -0
hf_backend/main.py +104 -0
hf_backend/requirements.txt +17 -0
hf_backend/retriever.py +81 -0
hf_backend/runtime.txt +1 -0
hf_backend/tests/__init__.py +0 -0
hf_backend/tests/test_integration.py +51 -0
hf_backend/tests/test_unit.py +119 -0
pytest.ini +4 -0
tests/test_api.py +12 -0

.github/workflows/ci.yml CHANGED Viewed

@@ -1,4 +1,4 @@
-name: RAG Unit Tests
 on:
   push:
@@ -21,7 +21,57 @@ jobs:
       - name: Install dependencies
         run: pip install -r requirements.txt
-      - name: Run unit tests only   # ← integration tests are skipped here
         env:
-          GROQ_API_KEY: ${{ secrets.GROQ_API_KEY }}  # add this in GitHub → Settings → Secrets
-        run: pytest tests/test_unit.py -v

+name: RAG CI/CD
 on:
   push:
       - name: Install dependencies
         run: pip install -r requirements.txt
+      - name: Run unit tests only
         env:
+          GROQ_API_KEY: ${{ secrets.GROQ_API_KEY }}
+        run: pytest -v -m "not integration"
+      # 🚀 DEPLOY BACKEND
+      - name: Deploy Backend to HF
+        env:
+          HF_TOKEN: ${{ secrets.HF_TOKEN }}
+        run: |
+          set -e
+          pip install huggingface_hub
+          sudo apt-get update
+          sudo apt-get install -y rsync
+          git config --global user.email "you@example.com"
+          git config --global user.name "github-actions"
+          # clone repo
+          git clone https://huggingface.co/spaces/Hitan2004/agentic-corrective-rag hf_backend
+          cd hf_backend
+          # 🔥 FIXED AUTH (IMPORTANT)
+          git remote set-url origin https://user:${HF_TOKEN}@huggingface.co/spaces/Hitan2004/agentic-corrective-rag
+          # copy backend files (exclude UI + .git)
+          rsync -av --exclude='.git' --exclude='ui' ../ ./
+          git add .
+          git commit -m "Auto deploy backend" || echo "No changes to commit"
+          git push
+      # 🎨 DEPLOY UI
+      - name: Deploy UI to HF
+        env:
+          HF_TOKEN: ${{ secrets.HF_TOKEN }}
+        run: |
+          set -e
+          git clone https://huggingface.co/spaces/Hitan2004/agentic-corrective-rag-ui hf_ui
+          cd hf_ui
+          # 🔥 FIXED AUTH (IMPORTANT)
+          git remote set-url origin https://user:${HF_TOKEN}@huggingface.co/spaces/Hitan2004/agentic-corrective-rag-ui
+          # copy UI files only
+          rsync -av ../ui/ ./
+          git add .
+          git commit -m "Auto deploy UI" || echo "No changes to commit"
+          git push

README.md CHANGED Viewed

@@ -1,9 +1,196 @@
----
-title: Agentic Corrective RAG
-emoji: 🔍
-colorFrom: blue
-colorTo: green
-sdk: docker
-app_port: 7860
-pinned: false
----

+# Agentic Corrective RAG — Document Q&A
+[![RAG Unit Tests](https://github.com/Hitan547/agentic-corrective-rag/actions/workflows/ci.yml/badge.svg)](https://github.com/Hitan547/agentic-corrective-rag/actions)
+![Python](https://img.shields.io/badge/python-3.11-blue)
+![LLM](https://img.shields.io/badge/LLM-LLaMA%203.3%2070B-orange)
+![Framework](https://img.shields.io/badge/framework-LangGraph-green)
+> A production-aware document Q&A system that answers questions **only from your uploaded documents** — not from the model's imagination. Built with hybrid retrieval, cross-encoder reranking, and a self-correcting LangGraph agent that automatically retries if the answer isn't grounded in the source material.
+## 🔗 Live Demo
+| Service | URL |
+|---------|-----|
+| 🖥️ Frontend UI | [hitan2004-agentic-corrective-rag-ui.hf.space](https://hitan2004-agentic-corrective-rag-ui.hf.space) |
+| ⚙️ Backend API | [hitan2004-agentic-corrective-rag.hf.space](https://hitan2004-agentic-corrective-rag.hf.space) |
+| 📖 API Docs | [hitan2004-agentic-corrective-rag.hf.space/docs](https://hitan2004-agentic-corrective-rag.hf.space/docs) |
+## What It Does
+Upload any PDF or TXT file, ask a question, and get an answer backed by:
+- The exact source chunks it used
+- A validation verdict (PASS/FAIL)
+- How many self-correction retries were needed
+## Architecture
+```
+PDF/TXT Upload
+      │
+      ▼
+┌─────────────────────────────────┐
+│         Ingestion Pipeline      │
+│  PyMuPDF → Chunking → Embeddings│
+│  FAISS Index + BM25 Index       │
+└─────────────────────────────────┘
+      │
+      ▼
+┌─────────────────────────────────┐
+│       Hybrid Retrieval          │
+│  FAISS (dense) + BM25 (sparse)  │
+│  → RRF Fusion                   │
+│  → Cross-Encoder Reranking      │
+└─────────────────────────────────┘
+      │
+      ▼
+┌─────────────────────────────────┐
+│     Corrective RAG Agent        │
+│  LangGraph StateGraph           │
+│  Generate → Validate → Retry    │
+│  (up to 3 automatic retries)    │
+└─────────────────────────────────┘
+      │
+      ▼
+  Static HTML UI + FastAPI Backend
+```
+## Tech Stack
+| Layer | Technology |
+|-------|-----------|
+| LLM | LLaMA 3.3 70B via Groq API |
+| Agent Framework | LangGraph (StateGraph) |
+| Dense Retrieval | FAISS + all-MiniLM-L6-v2 |
+| Sparse Retrieval | BM25 (rank-bm25) |
+| Reranker | cross-encoder/ms-marco-MiniLM-L-6-v2 |
+| Fusion | Reciprocal Rank Fusion (RRF) |
+| PDF Parsing | PyMuPDF (fitz) |
+| Backend | FastAPI |
+| Frontend | Static HTML/CSS/JS |
+| Testing | pytest (unit + integration) |
+| CI/CD | GitHub Actions |
+| Deployment | Hugging Face Spaces (Docker) |
+## Key Features
+- **Hybrid Search** — combines FAISS semantic search and BM25 keyword search, fused with Reciprocal Rank Fusion (RRF)
+- **Cross-Encoder Reranking** — re-scores top candidates by reading query + chunk together for higher precision
+- **Self-Correcting Agent** — LangGraph pipeline automatically detects hallucinations and retries up to 3 times
+- **Hallucination Validation** — a second LLM call checks every answer against the source context before returning it
+- **Session Memory** — remembers last 5 turns of conversation per session
+- **Synchronous Indexing** — reliable document ingestion that completes before returning a response
+- **CI/CD** — unit tests run automatically on every push via GitHub Actions
+## Project Structure
+```
+agentic-corrective-rag/
+├── agent.py          # LangGraph corrective RAG agent
+├── retriever.py      # Hybrid retrieval + RRF + reranking
+├── ingestion.py      # PDF/TXT ingestion + FAISS/BM25 indexing
+├── main.py           # FastAPI backend
+├── config.py         # Configuration and constants
+├── requirements.txt
+├── Dockerfile        # HF Spaces deployment
+├── ui/
+│   └── index.html    # Static HTML/JS frontend
+├── tests/
+│   ├── test_unit.py        # Unit tests (CI)
+│   └── test_integration.py # Integration tests (local only)
+└── .github/
+    └── workflows/
+        └── ci.yml    # GitHub Actions CI pipeline
+```
+## Setup
+### 1. Clone the repo
+```bash
+git clone https://github.com/Hitan547/agentic-corrective-rag.git
+cd agentic-corrective-rag
+```
+### 2. Install dependencies
+```bash
+pip install -r requirements.txt
+```
+### 3. Set up environment
+```bash
+echo "GROQ_API_KEY=your_key_here" > .env
+```
+Get your free API key at [console.groq.com](https://console.groq.com)
+### 4. Run the backend
+```bash
+uvicorn main:app --reload --port 8000
+```
+### 5. Open the frontend
+Open `ui/index.html` in your browser, or serve it locally:
+```bash
+python -m http.server 3000
+# Visit http://localhost:3000/ui/index.html
+```
+## Running Tests
+```bash
+# Unit tests (fast, no API needed)
+python -m pytest tests/test_unit.py -v
+# Integration tests (requires GROQ_API_KEY)
+python -m pytest tests/test_integration.py -v -m integration
+```
+## How the Agent Works
+1. **Generate** — LLaMA 3.3 70B answers using only the retrieved chunks
+2. **Validate** — a second LLM call checks if every claim is supported by the context
+3. **Retry** — if validation fails, the agent retries with the failure reason as feedback
+4. **Stop** — returns the answer after PASS or after 3 retries
+## API Endpoints
+| Method | Endpoint | Description |
+|--------|----------|-------------|
+| `GET` | `/` | Health check |
+| `GET` | `/health` | Returns API status + index state |
+| `POST` | `/upload` | Upload and index a PDF or TXT file |
+| `POST` | `/query` | Ask a question, get a grounded answer |
+| `DELETE` | `/session/{id}` | Clear conversation history |
+| `GET` | `/docs` | Interactive Swagger UI |
+## Environment Variables
+| Variable | Required | Description |
+|----------|----------|-------------|
+| `GROQ_API_KEY` | ✅ Yes | Your Groq API key from console.groq.com |
+## Known Limitations
+- **No index persistence** — indexes are stored in-memory and reset on redeploy. Re-upload your document after each redeploy on free hosting.
+- **Free tier cold starts** — HF Spaces free tier may take 30–60 seconds to wake up after inactivity.
+- **Single document at a time** — uploading a new document replaces the previous index.
+## Deployment
+This project is deployed as two separate services on Hugging Face Spaces:
+- **Backend** (`agentic-corrective-rag`) — FastAPI app running in a Docker container
+- **Frontend** (`agentic-corrective-rag-ui`) — Static HTML/JS served via HF Static Space
+## Author
+**Hitan K** — Final-year CS undergraduate (AI specialization)
+[![LinkedIn](https://img.shields.io/badge/LinkedIn-hitan--k-blue)](https://linkedin.com/in/hitan-k)
+[![GitHub](https://img.shields.io/badge/GitHub-Hitan547-black)](https://github.com/Hitan547)
+[![HuggingFace](https://img.shields.io/badge/HuggingFace-Hitan2004-yellow)](https://huggingface.co/Hitan2004)

hf_backend/.github/workflows/ci.yml ADDED Viewed

	@@ -0,0 +1,27 @@

+name: RAG Unit Tests
+on:
+  push:
+    branches: [main]
+  pull_request:
+    branches: [main]
+jobs:
+  test:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - name: Set up Python
+        uses: actions/setup-python@v5
+        with:
+          python-version: "3.11"
+      - name: Install dependencies
+        run: pip install -r requirements.txt
+      - name: Run unit tests only   # ← integration tests are skipped here
+        env:
+          GROQ_API_KEY: ${{ secrets.GROQ_API_KEY }}  # add this in GitHub → Settings → Secrets
+        run: pytest tests/test_unit.py -v

hf_backend/.gitignore ADDED Viewed

Binary file (116 Bytes). View file

hf_backend/Dockerfile ADDED Viewed

	@@ -0,0 +1,18 @@

+FROM python:3.11-slim
+WORKDIR /app
+RUN apt-get update && apt-get install -y \
+    build-essential \
+    && rm -rf /var/lib/apt/lists/*
+COPY requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+COPY . .
+RUN mkdir -p docs indexes
+EXPOSE 7860
+CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "7860"]

hf_backend/Procfile ADDED Viewed

	@@ -0,0 +1 @@


1	+ web: uvicorn main:app --host 0.0.0.0 --port $PORT

hf_backend/README.md ADDED Viewed

	@@ -0,0 +1,196 @@

+# Agentic Corrective RAG — Document Q&A
+[![RAG Unit Tests](https://github.com/Hitan547/agentic-corrective-rag/actions/workflows/ci.yml/badge.svg)](https://github.com/Hitan547/agentic-corrective-rag/actions)
+![Python](https://img.shields.io/badge/python-3.11-blue)
+![LLM](https://img.shields.io/badge/LLM-LLaMA%203.3%2070B-orange)
+![Framework](https://img.shields.io/badge/framework-LangGraph-green)
+> A production-aware document Q&A system that answers questions **only from your uploaded documents** — not from the model's imagination. Built with hybrid retrieval, cross-encoder reranking, and a self-correcting LangGraph agent that automatically retries if the answer isn't grounded in the source material.
+## 🔗 Live Demo
+| Service | URL |
+|---------|-----|
+| 🖥️ Frontend UI | [hitan2004-agentic-corrective-rag-ui.hf.space](https://hitan2004-agentic-corrective-rag-ui.hf.space) |
+| ⚙️ Backend API | [hitan2004-agentic-corrective-rag.hf.space](https://hitan2004-agentic-corrective-rag.hf.space) |
+| 📖 API Docs | [hitan2004-agentic-corrective-rag.hf.space/docs](https://hitan2004-agentic-corrective-rag.hf.space/docs) |
+## What It Does
+Upload any PDF or TXT file, ask a question, and get an answer backed by:
+- The exact source chunks it used
+- A validation verdict (PASS/FAIL)
+- How many self-correction retries were needed
+## Architecture
+```
+PDF/TXT Upload
+      │
+      ▼
+┌─────────────────────────────────┐
+│         Ingestion Pipeline      │
+│  PyMuPDF → Chunking → Embeddings│
+│  FAISS Index + BM25 Index       │
+└─────────────────────────────────┘
+      │
+      ▼
+┌─────────────────────────────────┐
+│       Hybrid Retrieval          │
+│  FAISS (dense) + BM25 (sparse)  │
+│  → RRF Fusion                   │
+│  → Cross-Encoder Reranking      │
+└─────────────────────────────────┘
+      │
+      ▼
+┌─────────────────────────────────┐
+│     Corrective RAG Agent        │
+│  LangGraph StateGraph           │
+│  Generate → Validate → Retry    │
+│  (up to 3 automatic retries)    │
+└─────────────────────────────────┘
+      │
+      ▼
+  Static HTML UI + FastAPI Backend
+```
+## Tech Stack
+| Layer | Technology |
+|-------|-----------|
+| LLM | LLaMA 3.3 70B via Groq API |
+| Agent Framework | LangGraph (StateGraph) |
+| Dense Retrieval | FAISS + all-MiniLM-L6-v2 |
+| Sparse Retrieval | BM25 (rank-bm25) |
+| Reranker | cross-encoder/ms-marco-MiniLM-L-6-v2 |
+| Fusion | Reciprocal Rank Fusion (RRF) |
+| PDF Parsing | PyMuPDF (fitz) |
+| Backend | FastAPI |
+| Frontend | Static HTML/CSS/JS |
+| Testing | pytest (unit + integration) |
+| CI/CD | GitHub Actions |
+| Deployment | Hugging Face Spaces (Docker) |
+## Key Features
+- **Hybrid Search** — combines FAISS semantic search and BM25 keyword search, fused with Reciprocal Rank Fusion (RRF)
+- **Cross-Encoder Reranking** — re-scores top candidates by reading query + chunk together for higher precision
+- **Self-Correcting Agent** — LangGraph pipeline automatically detects hallucinations and retries up to 3 times
+- **Hallucination Validation** — a second LLM call checks every answer against the source context before returning it
+- **Session Memory** — remembers last 5 turns of conversation per session
+- **Synchronous Indexing** — reliable document ingestion that completes before returning a response
+- **CI/CD** — unit tests run automatically on every push via GitHub Actions
+## Project Structure
+```
+agentic-corrective-rag/
+├── agent.py          # LangGraph corrective RAG agent
+├── retriever.py      # Hybrid retrieval + RRF + reranking
+├── ingestion.py      # PDF/TXT ingestion + FAISS/BM25 indexing
+├── main.py           # FastAPI backend
+├── config.py         # Configuration and constants
+├── requirements.txt
+├── Dockerfile        # HF Spaces deployment
+├── ui/
+│   └── index.html    # Static HTML/JS frontend
+├── tests/
+│   ├── test_unit.py        # Unit tests (CI)
+│   └── test_integration.py # Integration tests (local only)
+└── .github/
+    └── workflows/
+        └── ci.yml    # GitHub Actions CI pipeline
+```
+## Setup
+### 1. Clone the repo
+```bash
+git clone https://github.com/Hitan547/agentic-corrective-rag.git
+cd agentic-corrective-rag
+```
+### 2. Install dependencies
+```bash
+pip install -r requirements.txt
+```
+### 3. Set up environment
+```bash
+echo "GROQ_API_KEY=your_key_here" > .env
+```
+Get your free API key at [console.groq.com](https://console.groq.com)
+### 4. Run the backend
+```bash
+uvicorn main:app --reload --port 8000
+```
+### 5. Open the frontend
+Open `ui/index.html` in your browser, or serve it locally:
+```bash
+python -m http.server 3000
+# Visit http://localhost:3000/ui/index.html
+```
+## Running Tests
+```bash
+# Unit tests (fast, no API needed)
+python -m pytest tests/test_unit.py -v
+# Integration tests (requires GROQ_API_KEY)
+python -m pytest tests/test_integration.py -v -m integration
+```
+## How the Agent Works
+1. **Generate** — LLaMA 3.3 70B answers using only the retrieved chunks
+2. **Validate** — a second LLM call checks if every claim is supported by the context
+3. **Retry** — if validation fails, the agent retries with the failure reason as feedback
+4. **Stop** — returns the answer after PASS or after 3 retries
+## API Endpoints
+| Method | Endpoint | Description |
+|--------|----------|-------------|
+| `GET` | `/` | Health check |
+| `GET` | `/health` | Returns API status + index state |
+| `POST` | `/upload` | Upload and index a PDF or TXT file |
+| `POST` | `/query` | Ask a question, get a grounded answer |
+| `DELETE` | `/session/{id}` | Clear conversation history |
+| `GET` | `/docs` | Interactive Swagger UI |
+## Environment Variables
+| Variable | Required | Description |
+|----------|----------|-------------|
+| `GROQ_API_KEY` | ✅ Yes | Your Groq API key from console.groq.com |
+## Known Limitations
+- **No index persistence** — indexes are stored in-memory and reset on redeploy. Re-upload your document after each redeploy on free hosting.
+- **Free tier cold starts** — HF Spaces free tier may take 30–60 seconds to wake up after inactivity.
+- **Single document at a time** — uploading a new document replaces the previous index.
+## Deployment
+This project is deployed as two separate services on Hugging Face Spaces:
+- **Backend** (`agentic-corrective-rag`) — FastAPI app running in a Docker container
+- **Frontend** (`agentic-corrective-rag-ui`) — Static HTML/JS served via HF Static Space
+## Author
+**Hitan K** — Final-year CS undergraduate (AI specialization)
+[![LinkedIn](https://img.shields.io/badge/LinkedIn-hitan--k-blue)](https://linkedin.com/in/hitan-k)
+[![GitHub](https://img.shields.io/badge/GitHub-Hitan547-black)](https://github.com/Hitan547)
+[![HuggingFace](https://img.shields.io/badge/HuggingFace-Hitan2004-yellow)](https://huggingface.co/Hitan2004)

hf_backend/agent.py ADDED Viewed

	@@ -0,0 +1,141 @@

+#agent.py
+from typing import TypedDict
+from langgraph.graph import StateGraph, END
+from langchain_groq import ChatGroq
+from langchain_core.messages import HumanMessage, AIMessage
+from config import GROQ_API_KEY, GROQ_MODEL, MAX_RETRIES
+llm = ChatGroq(
+    model=GROQ_MODEL,
+    temperature=0,
+    api_key=GROQ_API_KEY,
+)
+class RAGState(TypedDict):
+    question:          str
+    context_chunks:    list
+    answer:            str
+    validation_result: str
+    fail_reason:       str
+    retry_count:       int
+    chat_history:      list
+def generate_node(state: RAGState) -> dict:
+    context_text = "\n\n---\n\n".join(
+        f"[Source: {r['source']}]\n{r['chunk']}"
+        for r in state["context_chunks"]
+    )
+    history_lines = []
+    for msg in state.get("chat_history", [])[-6:]:
+        role = "User" if isinstance(msg, HumanMessage) else "Assistant"
+        history_lines.append(f"{role}: {msg.content}")
+    history_text = "\n".join(history_lines) or "None"
+    correction = ""
+    if state.get("retry_count", 0) > 0:
+        correction = (
+            f"\n\nIMPORTANT CORRECTION REQUIRED: Your previous answer was "
+            f"rejected because: {state.get('fail_reason', 'unverifiable claims')}. "
+            f"Re-answer using ONLY the context provided."
+        )
+    prompt = (
+        "You are an AI assistant that answers questions AND generates content based on provided documents.\n"
+        "Answer ONLY using information from the CONTEXT below.\n"
+        "If the answer cannot be found, say exactly: "
+        '"I don\'t have enough information in the provided documents."\n'
+        "Do NOT invent facts or use outside knowledge."
+        + correction
+        + f"\n\nPREVIOUS CONVERSATION:\n{history_text}"
+        + f"\n\nCONTEXT:\n{context_text}"
+        + f"\n\nQUESTION: {state['question']}\n\nAnswer:"
+    )
+    response = llm.invoke([HumanMessage(content=prompt)])
+    return {"answer": response.content}
+def validate_node(state: RAGState) -> dict:
+    context_text = "\n\n".join(r["chunk"] for r in state["context_chunks"])
+    prompt = (
+        "You are a strict hallucination checker for a RAG system.\n\n"
+        "Given the CONTEXT and the ANSWER below, check:\n"
+        "1. Is every factual claim directly supported by the context?\n"
+        "2. Does the answer address the question?\n"
+        "3. Are there any invented facts not in the context?\n\n"
+        f"Context:\n{context_text}\n\n"
+        f"Question: {state['question']}\n"
+        f"Answer: {state['answer']}\n\n"
+        "Respond in EXACTLY this format:\n"
+        "VERDICT: PASS\n"
+        "REASON: <one sentence>\n\n"
+        "or\n\n"
+        "VERDICT: FAIL\n"
+        "REASON: <one sentence explaining what is wrong>"
+    )
+    result = llm.invoke([HumanMessage(content=prompt)])
+    text   = result.content.strip()
+    verdict = "PASS" if "VERDICT: PASS" in text.upper() else "FAIL"
+    reason  = ""
+    for line in text.splitlines():
+        if line.upper().startswith("REASON:"):
+            reason = line.split(":", 1)[1].strip()
+            break
+    return {"validation_result": verdict, "fail_reason": reason}
+def increment_retry_node(state: RAGState) -> dict:
+    return {"retry_count": state.get("retry_count", 0) + 1}
+def route_after_validation(state: RAGState) -> str:
+    if (
+        state["validation_result"] == "FAIL"
+        and state.get("retry_count", 0) < MAX_RETRIES
+    ):
+        return "retry"
+    return "done"
+def _build_graph():
+    g = StateGraph(RAGState)
+    g.add_node("generate",        generate_node)
+    g.add_node("validate",        validate_node)
+    g.add_node("increment_retry", increment_retry_node)
+    g.set_entry_point("generate")
+    g.add_edge("generate", "validate")
+    g.add_conditional_edges(
+        "validate",
+        route_after_validation,
+        {"retry": "increment_retry", "done": END},
+    )
+    g.add_edge("increment_retry", "generate")
+    return g.compile()
+_rag_graph = _build_graph()
+def run_rag_agent(
+    question:       str,
+    context_chunks: list,
+    chat_history:   list = [],
+) -> tuple:
+    init_state: RAGState = {
+        "question":          question,
+        "context_chunks":    context_chunks,
+        "answer":            "",
+        "validation_result": "",
+        "fail_reason":       "",
+        "retry_count":       0,
+        "chat_history":      chat_history,
+    }
+    final = _rag_graph.invoke(init_state)
+    return final["answer"], final["retry_count"], final["validation_result"]

hf_backend/config.py ADDED Viewed

	@@ -0,0 +1,26 @@

+# config.py
+import os
+import warnings
+from dotenv import load_dotenv
+load_dotenv()
+GROQ_API_KEY = os.getenv("GROQ_API_KEY", "")
+if not GROQ_API_KEY:
+    warnings.warn("GROQ_API_KEY not set — LLM calls will fail")
+# ── Anchor all paths to the directory this file lives in ──
+_BASE = os.path.dirname(os.path.abspath(__file__))
+GROQ_MODEL        = "llama-3.3-70b-versatile"
+DOCS_DIR          = os.path.join(_BASE, "docs")
+FAISS_INDEX_PATH  = os.path.join(_BASE, "faiss.index")
+BM25_PATH         = os.path.join(_BASE, "bm25.pkl")
+CHUNKS_PATH       = os.path.join(_BASE, "chunks.pkl")
+SOURCES_PATH      = os.path.join(_BASE, "sources.pkl")
+EMBEDDER_NAME = "all-MiniLM-L6-v2"
+RERANKER_MODEL = "cross-encoder/ms-marco-MiniLM-L-6-v2"
+CHUNK_SIZE        = 500
+CHUNK_OVERLAP     = 50
+TOP_K             = 5
+MAX_RETRIES       = 3
+MAX_HISTORY_TURNS = 5

hf_backend/ingestion.py ADDED Viewed

	@@ -0,0 +1,127 @@

+# ingestion.py
+import os, pickle
+from pathlib import Path
+import numpy as np
+import faiss
+from sentence_transformers import SentenceTransformer
+from rank_bm25 import BM25Okapi
+from langchain.text_splitter import RecursiveCharacterTextSplitter
+from config import (
+    DOCS_DIR, FAISS_INDEX_PATH, BM25_PATH,
+    CHUNKS_PATH, SOURCES_PATH,
+    EMBEDDER_NAME, CHUNK_SIZE, CHUNK_OVERLAP
+)
+def read_pdf_text(fpath):
+    import fitz  # PyMuPDF
+    doc = fitz.open(fpath)
+    text = []
+    for page in doc:
+        text.append(page.get_text())
+    return "\n".join(text).strip()
+def clean_text(text):
+    return " ".join(text.split())
+def load_documents():
+    docs, filenames = [], []
+    path = Path(DOCS_DIR)
+    path.mkdir(exist_ok=True)
+    for fpath in path.glob("*.txt"):
+        try:
+            text = clean_text(fpath.read_text(encoding="utf-8"))
+            docs.append(text)
+            filenames.append(fpath.name)
+            print(f"  Loaded text: {fpath.name}")
+        except Exception as e:
+            print(f"  Skipped {fpath.name}: {e}")
+    for fpath in path.glob("*.pdf"):
+        try:
+            text = clean_text(read_pdf_text(fpath))
+            if text:
+                docs.append(text)
+                filenames.append(fpath.name)
+                print(f"  Loaded PDF:  {fpath.name}")
+            else:
+                print(f"  WARNING: {fpath.name} extracted empty text")
+        except Exception as e:
+            print(f"  Skipped {fpath.name}: {e}")
+    if not docs:
+        raise FileNotFoundError(
+            f"No .txt or .pdf files found in '{DOCS_DIR}'. "
+            "Add at least one document and re-run."
+        )
+    print(f"\nLoaded {len(docs)} document(s)")
+    return docs, filenames
+def semantic_chunk(docs, filenames):
+    splitter = RecursiveCharacterTextSplitter(
+        chunk_size=CHUNK_SIZE,
+        chunk_overlap=CHUNK_OVERLAP,
+        separators=["\n\n", "\n", ". ", " "],
+    )
+    all_chunks, all_sources = [], []
+    for doc, fname in zip(docs, filenames):
+        chunks = splitter.split_text(doc)
+        all_chunks.extend(chunks)
+        all_sources.extend([fname] * len(chunks))
+    print(f"Created {len(all_chunks)} chunks "
+          f"(avg {sum(len(c) for c in all_chunks)//len(all_chunks)} chars each)")
+    print("\n--- SAMPLE CHUNK ---")
+    print(all_chunks[0][:500])
+    print("--------------------\n")
+    return all_chunks, all_sources
+def build_indexes(chunks, model=None):
+    print("\nBuilding dense embeddings...")
+    if model is None:
+        model = SentenceTransformer(EMBEDDER_NAME)
+    embeddings = model.encode(chunks, show_progress_bar=True, batch_size=32)
+    embeddings = np.array(embeddings, dtype="float32")
+    faiss.normalize_L2(embeddings)
+    dim = embeddings.shape[1]
+    faiss_index = faiss.IndexFlatIP(dim)
+    faiss_index.add(embeddings)
+    print(f"FAISS index: {faiss_index.ntotal} vectors, dim={dim}")
+    tokenized = [c.lower().split() for c in chunks]
+    bm25_index = BM25Okapi(tokenized)
+    print("BM25 index: built")
+    return faiss_index, bm25_index
+def save_indexes(faiss_index, bm25_index, chunks, sources):
+    faiss.write_index(faiss_index, FAISS_INDEX_PATH)
+    with open(BM25_PATH, "wb") as f:
+        pickle.dump(bm25_index, f)
+    with open(CHUNKS_PATH, "wb") as f:
+        pickle.dump(chunks, f)
+    with open(SOURCES_PATH, "wb") as f:
+        pickle.dump(sources, f)
+    print("\nSaved indexes to disk.")
+def run_ingestion(model=None):
+    print("=== Starting ingestion ===\n")
+    docs, filenames = load_documents()
+    chunks, sources = semantic_chunk(docs, filenames)
+    fi, bm25 = build_indexes(chunks, model=model)
+    save_indexes(fi, bm25, chunks, sources)
+    print("\n=== Ingestion complete ===")
+if __name__ == "__main__":
+    run_ingestion()

hf_backend/main.py ADDED Viewed

	@@ -0,0 +1,104 @@

+import os
+import shutil
+from contextlib import asynccontextmanager
+from fastapi import FastAPI, UploadFile, File, HTTPException
+from pydantic import BaseModel
+from langchain_core.messages import HumanMessage, AIMessage
+from retriever import load_indexes, reload_indexes, hybrid_retrieve, indexes_loaded as _indexes_loaded
+from agent import run_rag_agent
+from ingestion import run_ingestion
+from config import DOCS_DIR, TOP_K, MAX_HISTORY_TURNS
+sessions: dict = {}
+@asynccontextmanager
+async def lifespan(app: FastAPI):
+    try:
+        load_indexes()
+    except FileNotFoundError:
+        print("WARNING: No indexes found. Upload documents first.")
+    yield
+app = FastAPI(title="Corrective RAG API", version="1.0", lifespan=lifespan)
+@app.get("/")
+def home():
+    return {"message": "RAG API running 🚀"}
+class QueryRequest(BaseModel):
+    question:   str
+    session_id: str = "default"
+    top_k:      int = TOP_K
+class QueryResponse(BaseModel):
+    answer:       str
+    sources:      list
+    retries_used: int
+    validation:   str
+    session_id:   str
+@app.post("/query", response_model=QueryResponse)
+async def query(req: QueryRequest):
+    if not _indexes_loaded():
+        try:
+            load_indexes()
+        except Exception:
+            pass
+    if not _indexes_loaded():
+        raise HTTPException(
+            status_code=503,
+            detail="Indexes not ready. Upload and index documents first."
+        )
+    results = hybrid_retrieve(req.question, top_k=req.top_k)
+    if not results:
+        raise HTTPException(status_code=404, detail="No relevant chunks found.")
+    history = sessions.get(req.session_id, [])
+    answer, retries, verdict = run_rag_agent(req.question, results, history)
+    history.append(HumanMessage(content=req.question))
+    history.append(AIMessage(content=answer))
+    sessions[req.session_id] = history[-(MAX_HISTORY_TURNS * 2):]
+    return QueryResponse(
+        answer=answer,
+        sources=[{"chunk": r["chunk"][:300], "source": r["source"]} for r in results],
+        retries_used=retries,
+        validation=verdict,
+        session_id=req.session_id,
+    )
+@app.post("/upload")
+async def upload(file: UploadFile = File(...)):
+    allowed = {".txt", ".pdf"}
+    ext = os.path.splitext(file.filename or "")[1].lower()
+    if ext not in allowed:
+        raise HTTPException(status_code=400, detail="Only .txt and .pdf files allowed.")
+    os.makedirs(DOCS_DIR, exist_ok=True)
+    dest = os.path.join(DOCS_DIR, file.filename)
+    with open(dest, "wb") as f:
+        shutil.copyfileobj(file.file, f)
+    _reindex()
+    return {"status": "uploaded", "filename": file.filename,
+            "message": "Indexing complete."}
+def _reindex():
+    try:
+        run_ingestion()
+        print("Ingestion done, reloading indexes...")
+        reload_indexes()
+        print(f"Re-indexing complete. Indexes loaded: {_indexes_loaded()}")
+    except Exception as e:
+        import traceback
+        print(f"Re-indexing failed: {e}")
+        traceback.print_exc()
+@app.delete("/session/{session_id}")
+def clear_session(session_id: str):
+    sessions.pop(session_id, None)
+    return {"status": "cleared", "session_id": session_id}
+@app.get("/health")
+def health():
+    return {"status": "ok", "indexes_loaded": _indexes_loaded()}
+if __name__ == "__main__":
+    import uvicorn
+    uvicorn.run(app, host="0.0.0.0", port=int(os.getenv("PORT", 7860)))

hf_backend/requirements.txt ADDED Viewed

	@@ -0,0 +1,17 @@

+langchain==0.3.25
+langchain-groq==0.3.2
+langgraph==0.3.29
+sentence-transformers==3.4.1
+faiss-cpu==1.13.2
+rank-bm25==0.2.2
+fastapi==0.115.12
+uvicorn==0.34.0
+pymupdf==1.25.3
+python-dotenv==1.1.0
+numpy==1.26.4
+requests==2.32.3
+pydantic>=2.7
+pydantic-core>=2.20.0
+python-multipart==0.0.20
+pytest==8.3.5

hf_backend/retriever.py ADDED Viewed

	@@ -0,0 +1,81 @@

+import os
+import pickle
+import numpy as np
+import faiss
+from sentence_transformers import SentenceTransformer, CrossEncoder
+from config import (
+    FAISS_INDEX_PATH, BM25_PATH, CHUNKS_PATH,
+    SOURCES_PATH, EMBEDDER_NAME, RERANKER_MODEL
+)
+_faiss_index = None
+_bm25_index  = None
+_chunks      = None
+_sources     = None
+_model       = None
+_reranker    = None
+def indexes_loaded() -> bool:
+    return _faiss_index is not None
+def load_indexes():
+    global _faiss_index, _bm25_index, _chunks, _sources, _model, _reranker
+    if not os.path.exists(FAISS_INDEX_PATH):
+        print("WARNING: No FAISS index found at startup. Upload documents to initialize.")
+        return
+    _faiss_index = faiss.read_index(FAISS_INDEX_PATH)
+    with open(BM25_PATH,    "rb") as f: _bm25_index = pickle.load(f)
+    with open(CHUNKS_PATH,  "rb") as f: _chunks     = pickle.load(f)
+    with open(SOURCES_PATH, "rb") as f: _sources    = pickle.load(f)
+    _model    = SentenceTransformer(EMBEDDER_NAME)
+    _reranker = CrossEncoder(RERANKER_MODEL)
+    print(f"Indexes loaded: {_faiss_index.ntotal} vectors, {len(_chunks)} chunks")
+def reload_indexes():
+    global _faiss_index, _bm25_index, _chunks, _sources, _model, _reranker
+    _faiss_index = _bm25_index = _chunks = _sources = _model = _reranker = None
+    load_indexes()
+def _reciprocal_rank_fusion(lists: list, k: int = 60) -> dict:
+    scores: dict = {}
+    for ranked_list in lists:
+        for rank, doc_id in enumerate(ranked_list):
+            scores[doc_id] = scores.get(doc_id, 0.0) + 1.0 / (k + rank + 1)
+    return scores
+def hybrid_retrieve(query: str, top_k: int = 5) -> list:
+    if not indexes_loaded():
+        raise RuntimeError("Indexes not loaded. Call load_indexes() first.")
+    q_emb = _model.encode([query], convert_to_numpy=True).astype("float32")
+    faiss.normalize_L2(q_emb)
+    _, dense_ids  = _faiss_index.search(q_emb, top_k * 3)
+    dense_ranking = [int(i) for i in dense_ids[0] if i >= 0]
+    bm25_scores    = _bm25_index.get_scores(query.lower().split())
+    sparse_ranking = np.argsort(bm25_scores)[::-1][: top_k * 3].tolist()
+    rrf_scores = _reciprocal_rank_fusion([dense_ranking, sparse_ranking])
+    fused_ids  = sorted(rrf_scores, key=rrf_scores.get, reverse=True)[: top_k * 2]
+    candidates = [(query, _chunks[i]) for i in fused_ids]
+    ce_scores  = _reranker.predict(candidates)
+    ranked = sorted(
+        zip(fused_ids, ce_scores),
+        key=lambda x: x[1],
+        reverse=True,
+    )[:top_k]
+    return [
+        {
+            "chunk":     _chunks[i],
+            "source":    _sources[i],
+            "chunk_id":  i,
+            "rrf_score": round(float(rrf_scores[i]), 4),
+            "ce_score":  round(float(score), 4),
+        }
+        for i, score in ranked
+    ]

hf_backend/runtime.txt ADDED Viewed

	@@ -0,0 +1 @@


1	+ python-3.11.9

hf_backend/tests/__init__.py ADDED Viewed

File without changes

hf_backend/tests/test_integration.py ADDED Viewed

	@@ -0,0 +1,51 @@

+# tests/test_integration.py
+# Run with:  pytest tests/test_integration.py -v -m integration
+# These call real APIs — don't run in CI automatically.
+import pytest
+pytestmark = pytest.mark.integration   # tag so CI can skip these
+def test_groq_connection_live():
+    from langchain_groq import ChatGroq
+    from langchain_core.messages import HumanMessage
+    from config import GROQ_API_KEY, GROQ_MODEL
+    llm = ChatGroq(model=GROQ_MODEL, temperature=0, api_key=GROQ_API_KEY)
+    r   = llm.invoke([HumanMessage(content="Reply with just the word OK")])
+    assert len(r.content) > 0
+def test_full_pipeline_live():
+    """Ingests a tiny doc, retrieves, runs agent — end to end."""
+    import os
+    from pathlib import Path
+    # Write test doc
+    Path("./docs").mkdir(exist_ok=True)
+    test_file = Path("./docs/_pytest_temp.txt")
+    test_file.write_text(
+        "The Eiffel Tower is in Paris, France. "
+        "It was built in 1889. It is 330 metres tall."
+    )
+    try:
+        from ingestion import run_ingestion
+        from retriever import load_indexes, hybrid_retrieve
+        from agent import run_rag_agent
+        run_ingestion()
+        load_indexes()
+        results = hybrid_retrieve("How tall is the Eiffel Tower?", top_k=3)
+        assert len(results) > 0
+        assert "ce_score" in results[0]          # reranker ran
+        answer, retries, verdict = run_rag_agent(
+            "How tall is the Eiffel Tower?", results
+        )
+        assert "330" in answer or "metres" in answer.lower()
+        assert verdict in {"PASS", "FAIL"}
+    finally:
+        test_file.unlink(missing_ok=True)        # always clean up

hf_backend/tests/test_unit.py ADDED Viewed

	@@ -0,0 +1,119 @@

+# tests/test_unit.py
+import pytest
+# ── RRF logic ─────────────────────────────────────────────────────────────────
+def test_rrf_prefers_doc_appearing_in_both_lists():
+    from retriever import _reciprocal_rank_fusion
+    scores = _reciprocal_rank_fusion([[0, 1, 2], [2, 0, 1]])
+    # doc 2 is rank-0 in sparse and rank-2 in dense → should beat doc 1
+    assert scores[2] > scores[1]
+def test_rrf_returns_all_docs():
+    from retriever import _reciprocal_rank_fusion
+    scores = _reciprocal_rank_fusion([[0, 1], [1, 2]])
+    assert set(scores.keys()) == {0, 1, 2}
+def test_rrf_scores_are_positive():
+    from retriever import _reciprocal_rank_fusion
+    scores = _reciprocal_rank_fusion([[0, 1, 2]])
+    assert all(v > 0 for v in scores.values())
+# ── Config sanity ─────────────────────────────────────────────────────────────
+def test_config_values_are_sane():
+    from config import CHUNK_SIZE, CHUNK_OVERLAP, TOP_K, MAX_RETRIES
+    assert CHUNK_SIZE > CHUNK_OVERLAP,  "overlap must be smaller than chunk size"
+    assert TOP_K > 0,                   "TOP_K must be positive"
+    assert MAX_RETRIES >= 1,            "need at least 1 retry"
+def test_groq_api_key_present(monkeypatch):
+    # patch so we don't need a real key in CI
+    monkeypatch.setenv("GROQ_API_KEY", "gsk_fakekeyfortesting1234567890")
+    import importlib, config
+    importlib.reload(config)             # re-reads env
+    assert len(config.GROQ_API_KEY) > 10
+# ── Agent routing logic ───────────────────────────────────────────────────────
+def test_route_returns_done_on_pass():
+    from agent import route_after_validation
+    state = {"validation_result": "PASS", "retry_count": 0}
+    assert route_after_validation(state) == "done"
+def test_route_returns_retry_on_fail_within_limit():
+    from agent import route_after_validation
+    state = {"validation_result": "FAIL", "retry_count": 0}
+    assert route_after_validation(state) == "retry"
+def test_route_returns_done_when_retries_exhausted():
+    from agent import route_after_validation
+    state = {"validation_result": "FAIL", "retry_count": 3}
+    assert route_after_validation(state) == "done"
+def test_increment_retry_node():
+    from agent import increment_retry_node
+    result = increment_retry_node({"retry_count": 1})
+    assert result["retry_count"] == 2
+# ── Retriever output shape (mocked indexes) ───────────────────────────────────
+@pytest.fixture
+def mock_indexes(monkeypatch):
+    """Patches all globals in retriever so no files need to exist."""
+    import numpy as np
+    import retriever
+    # Fake chunks and sources
+    fake_chunks  = ["Paris is in France.", "Tower is 330m tall.", "Built in 1889."]
+    fake_sources = ["doc1.txt", "doc1.txt", "doc1.txt"]
+    # Fake FAISS index that always returns ids [0, 1, 2]
+    class FakeFaiss:
+        ntotal = 3
+        def search(self, vec, k):
+            ids = np.array([[0, 1, 2]])
+            return None, ids
+    # Fake BM25 that returns uniform scores
+    class FakeBM25:
+        def get_scores(self, tokens):
+            return np.array([0.9, 0.5, 0.3])
+    # Fake embedder
+    class FakeModel:
+        def encode(self, texts, convert_to_numpy=True):
+            return np.random.rand(len(texts), 384).astype("float32")
+    # Fake cross-encoder
+    class FakeReranker:
+        def predict(self, pairs):
+            return np.array([0.9, 0.7, 0.5][: len(pairs)])
+    monkeypatch.setattr(retriever, "_faiss_index", FakeFaiss())
+    monkeypatch.setattr(retriever, "_bm25_index",  FakeBM25())
+    monkeypatch.setattr(retriever, "_chunks",      fake_chunks)
+    monkeypatch.setattr(retriever, "_sources",     fake_sources)
+    monkeypatch.setattr(retriever, "_model",       FakeModel())
+    monkeypatch.setattr(retriever, "_reranker",    FakeReranker())
+    return fake_chunks
+def test_hybrid_retrieve_returns_top_k(mock_indexes):
+    from retriever import hybrid_retrieve
+    results = hybrid_retrieve("Where is Paris?", top_k=2)
+    assert len(results) == 2
+def test_hybrid_retrieve_result_has_required_keys(mock_indexes):
+    from retriever import hybrid_retrieve
+    result = hybrid_retrieve("Where is Paris?", top_k=1)[0]
+    assert "chunk"     in result
+    assert "source"    in result
+    assert "rrf_score" in result
+    assert "ce_score"  in result
+def test_hybrid_retrieve_scores_are_floats(mock_indexes):
+    from retriever import hybrid_retrieve
+    result = hybrid_retrieve("test", top_k=1)[0]
+    assert isinstance(result["rrf_score"], float)
+    assert isinstance(result["ce_score"],  float)

pytest.ini ADDED Viewed

	@@ -0,0 +1,4 @@

+[pytest]
+markers =
+    integration: marks integration tests
+addopts = -ra

tests/test_api.py ADDED Viewed

	@@ -0,0 +1,12 @@

+import sys
+import os
+sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), "..")))
+from main import app
+from fastapi.testclient import TestClient
+client = TestClient(app)
+def test_health():
+    response = client.get("/")
+    assert response.status_code == 200