Spaces:

Hitan2004
/

agentic-corrective-rag

Sleeping

App Files Files Community

Hitan2004 commited on Apr 11

Commit

53d50d4

verified ·

1 Parent(s): 472b56b

Update README.md

Browse files

Files changed (1) hide show

README.md +17 -708

README.md CHANGED Viewed

@@ -1,714 +1,23 @@
-# 🧠 Agentic Corrective RAG — Document Q&A with Self-Correction
-<div align="center">
-**Production-grade document retrieval system with self-correcting agent reasoning**
-[![Frontend UI](https://img.shields.io/badge/Frontend-HuggingFace%20Spaces-blue?style=for-the-badge&logo=huggingface)](https://huggingface.co/spaces/Hitan2004/agentic-corrective-rag-ui)
-[![Backend API](https://img.shields.io/badge/API-HuggingFace%20Spaces-blue?style=for-the-badge&logo=huggingface)](https://huggingface.co/spaces/Hitan2004/agentic-corrective-rag)
-[![API Docs](https://img.shields.io/badge/Swagger-Docs-green?style=for-the-badge)](https://hitan2004-agentic-corrective-rag.hf.space/docs)
-[![GitHub](https://img.shields.io/badge/GitHub-Repository-black?style=for-the-badge&logo=github)](https://github.com/Hitan547/agentic-corrective-rag)
-[![Python](https://img.shields.io/badge/Python-3.10-blue?style=for-the-badge&logo=python)](#tech-stack)
-*Upload documents, ask questions, get answers grounded in source material with automated hallucination detection and self-correction.*
-## 🎯 Overview
-Agentic Corrective RAG is a production-grade document Q&A system that combines advanced retrieval techniques with intelligent agent reasoning. Unlike naive RAG systems that often hallucinate, this system automatically validates every answer against source material and retries up to 3 times if validation fails.
-### ⚡ Core Features
-| Feature | Capability |
-|---------|-----------|
-| **Hybrid Retrieval** | FAISS semantic + BM25 keyword search with RRF fusion |
-| **Intelligent Reranking** | Cross-encoder re-scores top-k candidates for precision |
-| **Self-Correcting Agent** | LangGraph pipeline validates answers and auto-retries |
-| **Hallucination Detection** | Second LLM call verifies every claim against context |
-| **Session Memory** | Remembers last 5 conversation turns per session |
-| **Streaming Ingestion** | Synchronous indexing with FAISS + BM25 persistence |
-| **CI/CD Pipeline** | GitHub Actions with unit + integration test separation |
-| **Multi-Service Deployment** | Backend API + separate frontend UI on HuggingFace Spaces |
----
-## 🏗️ Architecture
-### System Diagram
-```
-┌─────────────────────────────────────────────────────────┐
-│            Agentic Corrective RAG Pipeline              │
-└─────────────────────────────────────────────────────────┘
-Document Upload
-    ↓
-┌─────────────────────────────────────────┐
-│         Ingestion Pipeline              │
-│  ┌─────────────────────────────────┐   │
-│  │ PyMuPDF / TXT Parser            │   │
-│  │ Split into 512-token chunks     │   │
-│  │ 20-token overlap for context    │   │
-│  └────────────┬────────────────────┘   │
-│               │                         │
-│  ┌────────────▼───────────────────┐   │
-│  │ Embedding Generation           │   │
-│  │ all-MiniLM-L6-v2 (384-dim)    │   │
-│  └────────────┬───────────────────┘   │
-│               │                         │
-│  ┌────────────▼──────────────────┐    │
-│  │ Index Creation               │    │
-│  │ FAISS (dense vectors)        │    │
-│  │ BM25 (sparse inverted index) │    │
-│  └──────────────────────────────┘    │
-└─────────────────────────────────────────┘
-Query Processing
-    ↓
-┌─────────────────────────────────────────┐
-│      Hybrid Retrieval Pipeline          │
-│                                         │
-│  ┌──────────┐      ┌──────────┐       │
-│  │FAISS Top │      │BM25 Top  │       │
-│  │ 10 Hits  │      │ 10 Hits  │       │
-│  └────┬─────┘      └────┬─────┘       │
-│       └────────┬─────────┘             │
-│                │                       │
-│        ┌───────▼──────────┐           │
-│        │ RRF Fusion       │           │
-│        │ (Top 5 combined) │           │
-│        └���──────┬──────────┘           │
-│                │                       │
-│        ┌───────▼──────────────────┐  │
-│        │ Cross-Encoder Reranking  │  │
-│        │ ms-marco-MiniLM-L-6-v2   │  │
-│        │ Re-score + sort          │  │
-│        └───────┬──────────────────┘  │
-└─────────────────────────────────────────┘
-Agent Reasoning Loop
-    ↓
-┌─────────────────────────────────────────┐
-│      Corrective RAG Agent (LangGraph)    │
-│                                         │
-│  Generate (LLaMA 3.3 70B)              │
-│  ├─ Answer using top-3 chunks          │
-│  └─ Confidence score                   │
-│       ↓                                │
-│  Validate (LLM Validation Call)        │
-│  ├─ Is answer grounded?                │
-│  └─ All claims supported?              │
-│       ↓                                │
-│  Retry Logic (up to 3 times)           │
-│  ├─ If PASS → Return answer            │
-│  ├─ If FAIL & retries left:            │
-│  │   → Use failure reason as feedback  │
-│  │   → Re-retrieve with new query      │
-│  │   → Regenerate answer               │
-│  └─ If 3 retries exhausted → Return    │
-│       best attempt with FAIL verdict   │
-└─────────────────────────────────────────┘
-Response
-    ↓
-JSON with:
-  - answer (generated text)
-  - source_chunks (exact matched context)
-  - validation_verdict (PASS/FAIL)
-  - retry_count (0-3)
-  - confidence (0.0-1.0)
-```
-### Component Breakdown
-#### 1. **Ingestion (`ingestion.py`)**
-Converts documents to searchable indexes
-```python
-def ingest_documents(file_path: str) -> Dict:
-    """
-    Input: PDF or TXT file
-    Process:
-      1. Extract text with PyMuPDF or plain read
-      2. Split into 512-token chunks (20-token overlap)
-      3. Generate embeddings (all-MiniLM-L6-v2)
-      4. Create FAISS dense index
-      5. Create BM25 sparse index
-    Output: Ready for retrieval
-    """
-```
-**Supported Formats:**
-- PDF (single/multi-page)
-- TXT (plain text)
-- Auto-detects and routes to correct parser
-#### 2. **Retriever (`retriever.py`)**
-Hybrid search with intelligent ranking
-```python
-def hybrid_retrieve(query: str, k: int = 5) -> List[Chunk]:
-    """
-    Process:
-      1. Dense retrieval: FAISS semantic search (top 10)
-      2. Sparse retrieval: BM25 keyword search (top 10)
-      3. RRF Fusion: Merge and rank by reciprocal rank
-      4. Cross-Encoder: Re-rank top-5 using semantic + lexical
-    Output: Top-k chunks with scores
-    """
-```
-**Fusion Algorithm (RRF):**
-```
-For each document d:
-  score(d) = Σ(1 / (rank_dense(d) + k)) + Σ(1 / (rank_sparse(d) + k))
-Where k=60 (typical offset to avoid division by zero)
-```
-#### 3. **Agent (`agent.py`)**
-Self-correcting reasoning loop using LangGraph
-```python
-class CorrectiveRAGAgent:
-    """
-    State machine with 4 nodes:
-    Generate Node:
-      - Takes query + top-3 chunks
-      - Calls LLaMA 3.3 70B
-      - Returns answer + initial confidence
-    Validate Node:
-      - Takes answer + source chunks
-      - Calls validation LLM (fact-checking)
-      - Checks: Is answer grounded? All claims supported?
-      - Returns verdict (PASS/FAIL)
-    Retry Logic:
-      - If PASS → End, return answer
-      - If FAIL and retry_count < 3:
-        → Inform agent of failure reason
-        → Re-retrieve with modified query
-        → Regenerate answer
-      - If 3 retries exhausted → Return best attempt
-    Output Node:
-      - Formats response
-      - Includes source chunks
-      - Validation verdict
-      - Retry count
-    """
-```
-#### 4. **FastAPI Backend (`main.py`)**
-REST API orchestrating the full pipeline
-```python
-@app.post("/upload")
-async def upload_document(file: UploadFile) -> Dict:
-    """
-    - Receives PDF/TXT file
-    - Calls ingestion pipeline
-    - Returns: {status, message, doc_size, chunk_count}
-    """
-@app.post("/query")
-async def query_documents(query: str, session_id: str) -> Dict:
-    """
-    - Receives question
-    - Runs corrective agent
-    - Returns:
-      {
-        "answer": str,
-        "source_chunks": [chunk1, chunk2, chunk3],
-        "validation_verdict": "PASS" or "FAIL",
-        "retry_count": 0-3,
-        "confidence": 0.0-1.0
-      }
-    """
-```
----
-## 🧪 Testing Architecture
-### Unit Tests (`tests/test_unit.py`)
-```python
-✅ test_rrf_fusion
-   - Verifies Reciprocal Rank Fusion math
-   - Checks score normalization
-✅ test_cross_encoder_reranking
-   - Validates reranking modifies order
-   - Confirms scores are properly scaled
-✅ test_config_validation
-   - Ensures chunk_size > 0
-   - Validates max_retries in range
-✅ test_chunk_processing
-   - Tests document splitting logic
-   - Checks overlap preservation
-✅ test_agent_routing
-   - Verifies state machine transitions
-   - Confirms node execution order
-```
-**Run locally:**
-```bash
-pytest tests/test_unit.py -v
-```
-### Integration Tests (`tests/test_integration.py`)
-```python
-✅ test_full_pipeline_end_to_end
-   - Upload document
-   - Index with FAISS + BM25
-   - Query with agent
-   - Validate response structure
-   - Requires GROQ_API_KEY
-✅ test_groq_api_connection
-   - Confirms Groq API is reachable
-   - Tests actual LLM inference
-   - Validates response format
-✅ test_retrieval_quality
-   - Uploads test document
-   - Queries for information
-   - Verifies retrieved chunks contain answer
-✅ test_agent_hallucination_detection
-   - Forces out-of-context query
-   - Confirms validation catches hallucination
-   - Checks retry mechanism
-```
-**Run locally (requires API key):**
-```bash
-export GROQ_API_KEY=your_key
-pytest tests/test_integration.py -v -m integration
-```
-### CI/CD Test Strategy
-**GitHub Actions:**
-```yaml
-on: [push, pull_request]
-jobs:
-  test:
-    runs-on: ubuntu-latest
-    steps:
-      - uses: actions/checkout@v3
-      - uses: actions/setup-python@v4
-      - run: pip install -r requirements.txt
-      - run: pytest tests/test_unit.py -v
-        # ✅ Unit tests run (fast, no API)
-      - run: pytest tests/test_integration.py -v -m "not integration"
-        # ✅ Integration tests skip (expensive API calls)
-```
-**Key Insight:** Tests marked with `@pytest.mark.integration` are automatically skipped in CI but run locally with API key. This prevents wasting API credits while maintaining code quality.
 ---
-## 📊 Model & LLM Stack
-### Retrieval Models
-| Component | Model | Capability |
-|-----------|-------|-----------|
-| **Dense Embeddings** | `all-MiniLM-L6-v2` | 384-dim vectors, optimized for retrieval |
-| **Sparse Search** | BM25 (rank-bm25 lib) | Keyword indexing, recall enhancement |
-| **Reranker** | `cross-encoder/ms-marco-MiniLM-L-6-v2` | Semantic + lexical re-scoring |
-### Reasoning Engine
-| Component | Model | Role |
-|-----------|-------|------|
-| **Main Generator** | LLaMA 3.3 70B (Groq API) | Answer generation from context |
-| **Validator** | LLaMA 3.3 70B (Groq API) | Hallucination detection & fact-checking |
-### Why These Choices?
-✅ **all-MiniLM-L6-v2**
-- 384-dim embeddings (good balance of size/quality)
-- Specifically trained for retrieval tasks
-- Fast inference, low memory
-✅ **BM25**
-- Complementary to dense embeddings (catches keyword matches)
-- Sparse representation (memory efficient)
-- Proven effective in hybrid search
-✅ **Cross-Encoder Reranking**
-- Reads query + chunk together (interaction model)
-- Higher precision than encoding separately
-- Scales to top-k reranking
-✅ **LLaMA 3.3 70B via Groq**
-- Strong reasoning on diverse topics
-- Fast inference (Groq's optimized runtime)
-- Production-grade availability
-- Cost-effective for hobby projects
----
-## 🚀 Quick Start
-### Prerequisites
-- Python 3.10+
-- Free Groq API key (from console.groq.com)
-- 1GB disk for models + indexes
-### Local Setup (10 minutes)
-```bash
-# 1. Clone repository
-git clone https://github.com/Hitan547/agentic-corrective-rag.git
-cd agentic-corrective-rag
-# 2. Create virtual environment
-python -m venv venv
-source venv/bin/activate  # Windows: venv\Scripts\activate
-# 3. Install dependencies
-pip install -r requirements.txt
-# 4. Set up environment
-echo "GROQ_API_KEY=your_api_key_here" > .env
-# 5. Run backend
-uvicorn main:app --reload --port 8000
-# 6. In another terminal, serve frontend
-python -m http.server 3000 --directory ui
-# 7. Open browser
-# → http://localhost:3000/index.html
-```
-### Docker Setup
-```bash
-# Build
-docker build -t agentic-rag:latest .
-# Run
-docker run -e GROQ_API_KEY=your_key -p 8000:8000 agentic-rag:latest
-# Access at http://localhost:8000
-```
-### HuggingFace Spaces Deployment
-**Backend Space:**
-1. Create new Space (Python)
-2. Add secret: `GROQ_API_KEY`
-3. Push repo (includes Dockerfile)
-4. Auto-deploys as FastAPI service
-**Frontend Space:**
-1. Create new Space (Static)
-2. Push `ui/` directory
-3. Serves HTML directly
----
-## 🔌 REST API Reference
-### GET `/health`
-System health check
-**Response:**
-```json
-{
-  "status": "online",
-  "model": "corrective-rag-v1",
-  "indexes": {
-    "faiss": "ready",
-    "bm25": "ready"
-  },
-  "sessions": 42
-}
-```
-### POST `/upload`
-Upload and index a document
-**Request:**
-```bash
-curl -X POST \
-  -F "file=@document.pdf" \
-  http://localhost:8000/upload
-```
-**Response:**
-```json
-{
-  "status": "success",
-  "message": "Document indexed successfully",
-  "doc_name": "document.pdf",
-  "chunk_count": 24,
-  "token_count": 12345,
-  "file_size_bytes": 2048000
-}
-```
-### POST `/query`
-Ask a question about uploaded documents
-**Request:**
-```json
-{
-  "query": "What is the main thesis?",
-  "session_id": "user_123",
-  "temperature": 0.7,
-  "max_retries": 3
-}
-```
-**Response:**
-```json
-{
-  "answer": "The main thesis argues that...",
-  "source_chunks": [
-    {
-      "text": "The thesis states that...",
-      "chunk_id": 3,
-      "score": 0.92
-    },
-    {
-      "text": "This is supported by...",
-      "chunk_id": 5,
-      "score": 0.87
-    }
-  ],
-  "validation_verdict": "PASS",
-  "retry_count": 0,
-  "confidence": 0.94,
-  "processing_time_ms": 3200
-}
-```
-### DELETE `/session/{id}`
-Clear conversation history for a session
-**Response:**
-```json
-{
-  "status": "success",
-  "message": "Session cleared"
-}
-```
-### GET `/docs`
-Interactive Swagger UI
-Navigate to: `http://localhost:8000/docs`
----
-## 📁 Project Structure
-```
-agentic-corrective-rag/
-├── agent.py
-│   └── CorrectiveRAGAgent
-│       ├── generate(query, chunks) → answer
-│       ├── validate(answer, chunks) → verdict
-│       └── retry_loop() → final_answer
-├── retriever.py
-│   ├── hybrid_retrieve() → RRF + reranking
-│   ├── faiss_search() → dense vectors
-│   └── bm25_search() → keyword search
-├── ingestion.py
-│   ├── ingest_pdf()
-│   ├── ingest_txt()
-│   └── create_indexes() → FAISS + BM25
-├── main.py
-│   ├── FastAPI app
-│   ├── /upload endpoint
-│   ├── /query endpoint
-│   └── /session/{id} endpoint
-├── config.py
-│   ├── CHUNK_SIZE = 512
-│   ├── CHUNK_OVERLAP = 20
-│   ├── MAX_RETRIES = 3
-│   └── MODEL_PARAMS = {...}
-├── requirements.txt
-├── Dockerfile
-├── .github/workflows/ci.yml
-├── ui/
-│   └── index.html (static HTML/JS frontend)
-├── tests/
-│   ├── test_unit.py
-│   │   ├── test_rrf_fusion
-│   │   ├── test_cross_encoder_reranking
-│   │   └── test_config_validation
-│   └── test_integration.py
-│       ├── test_full_pipeline_end_to_end
-│       ├── test_groq_api_connection
-│       └── test_agent_hallucination_detection
-└── README.md
-```
----
-## 🔄 CI/CD Pipeline
-### GitHub Actions Workflow
-**Trigger:** Push to main or PR
-```yaml
-jobs:
-  test:
-    runs-on: ubuntu-latest
-    steps:
-      - uses: actions/checkout@v3
-      - uses: actions/setup-python@v4
-        with:
-          python-version: '3.10'
-      - name: Install dependencies
-        run: pip install -r requirements.txt
-      - name: Run unit tests
-        run: pytest tests/test_unit.py -v
-        # ✅ Fast tests, no external API calls
-      - name: Skip integration tests in CI
-        run: pytest tests/test_integration.py -v -m "not integration"
-        # ✅ Prevents wasting Groq API credits
-      - name: Docker build test
-        run: docker build -t agentic-rag:test .
-        # ✅ Ensures Dockerfile is valid
-```
-### Deployment Pipeline
-**Backend (API Service):**
-1. HuggingFace Space (Docker runtime)
-2. Auto-deploys on push to `main`
-3. Exposes FastAPI at `https://hitan2004-agentic-corrective-rag.hf.space`
-**Frontend (Static Service):**
-1. HuggingFace Space (Static runtime)
-2. Auto-deploys on push to `main`
-3. Serves HTML at `https://hitan2004-agentic-corrective-rag-ui.hf.space`
----
-## 🎓 What I Learned
-✅ **Advanced Retrieval**
-- Hybrid search (dense + sparse) outperforms single modality
-- RRF fusion effectively combines different ranking signals
-- Cross-encoders improve precision over bi-encoders
-- Trade-off: reranking adds latency but improves quality
-✅ **Agent-Based Reasoning**
-- State machines (LangGraph) cleanly express retry logic
-- Validation is critical for production RAG systems
-- Feedback loops enable graceful degradation
-- Session memory prevents repeated errors
-✅ **Production ML System Design**
-- Test separation (unit vs. integration) reduces CI/CD costs
-- Configuration as code improves reproducibility
-- Synchronous indexing ensures consistency
-- Proper error handling for external API calls
-✅ **LLM Integration**
-- Groq API's speed enables interactive applications
-- Temperature tuning affects consistency vs. creativity
-- Prompt engineering for specific tasks (validation vs. generation)
-- Cost-benefit of multi-turn API calls
-✅ **Full-Stack Web Development**
-- FastAPI for modern async backends
-- Static HTML/JS for simple UIs
-- Docker for reproducible deployments
-- GitHub Actions for automated testing and CI/CD
----
-## 📈 Performance Metrics
-### Retrieval Quality
-| Scenario | Metric | Value |
-|----------|--------|-------|
-| Exact answer in docs | Recall@3 | 94% |
-| Paraphrased answer | Recall@5 | 87% |
-| Complex multi-doc answer | Recall@10 | 92% |
-### Agent Performance
-| Metric | Value |
-|--------|-------|
-| Validation PASS rate (correct answers) | 97% |
-| Hallucination detection rate | 94% |
-| Avg retries (when needed) | 1.2 |
-| Zero-shot success (no retries) | 89% |
-### Latency (end-to-end, on Groq API)
-| Operation | Time |
-|-----------|------|
-| Hybrid retrieval | 200ms |
-| Reranking (top-10) | 150ms |
-| LLM generation | 1500ms |
-| Validation call | 1200ms |
-| **Total (no retries)** | **3050ms** |
----
-## 🤝 Contributing
-This is a portfolio project. Contributions are welcome!
-**Ideas for enhancement:**
-- [ ] Add multi-document support (merge indexes)
-- [ ] Implement persistent vector DB (Pinecone/Weaviate)
-- [ ] Add citation highlighting in frontend
-- [ ] Implement streaming responses with Server-Sent Events
-- [ ] Add support for images (multimodal embeddings)
----
-## 📜 License
-MIT License — Use freely for learning or commercial purposes.
----
-## 📞 Contact
-**Hitan K** — AI Systems Engineer
-- 🔗 [LinkedIn](https://linkedin.com/in/hitan-k)
-- 🐙 [GitHub](https://github.com/Hitan547)
-- 🤗 [HuggingFace](https://huggingface.co/Hitan2004)
-- 📧 [Email](mailto:hitan.k@outlook.com)
 ---
-<div align="center">
-**⭐ Found this helpful? Please star the repo! ⭐**
-*Built with ❤️ for production and learning.*
-</div>

 ---
+title: Agentic Corrective RAG
+emoji: 🤖
+colorFrom: blue
+colorTo: purple
+sdk: docker
+app_file: main.py
+pinned: false
 ---
+# 🤖 Agentic Corrective RAG
+AI-powered Retrieval-Augmented Generation system with self-correcting agent loop.
+## Features
+- Hybrid Retrieval (FAISS + BM25)
+- Cross-Encoder Reranking
+- Agent-based reasoning with retries
+- Groq LLM integration
+- CI/CD with GitHub Actions
+## Usage
+Upload documents and ask questions to get accurate answers with sources.