3v324v23's picture
Auto deploy backend
5b8c2e5
# 🧠 Agentic Corrective RAG β€” Document Q&A with Self-Correction
<div align="center">
**Production-grade document retrieval system with self-correcting agent reasoning**
[![Frontend UI](https://img.shields.io/badge/Frontend-HuggingFace%20Spaces-blue?style=for-the-badge&logo=huggingface)](https://huggingface.co/spaces/Hitan2004/agentic-corrective-rag-ui)
[![Backend API](https://img.shields.io/badge/API-HuggingFace%20Spaces-blue?style=for-the-badge&logo=huggingface)](https://huggingface.co/spaces/Hitan2004/agentic-corrective-rag)
[![API Docs](https://img.shields.io/badge/Swagger-Docs-green?style=for-the-badge)](https://hitan2004-agentic-corrective-rag.hf.space/docs)
[![GitHub](https://img.shields.io/badge/GitHub-Repository-black?style=for-the-badge&logo=github)](https://github.com/Hitan547/agentic-corrective-rag)
[![Python](https://img.shields.io/badge/Python-3.10-blue?style=for-the-badge&logo=python)](#tech-stack)
*Upload documents, ask questions, get answers grounded in source material with automated hallucination detection and self-correction.*
</div>
---
## 🎯 Overview
Agentic Corrective RAG is a production-grade document Q&A system that combines advanced retrieval techniques with intelligent agent reasoning. Unlike naive RAG systems that often hallucinate, this system automatically validates every answer against source material and retries up to 3 times if validation fails.
### ⚑ Core Features
| Feature | Capability |
|---------|-----------|
| **Hybrid Retrieval** | ChromaDB semantic + BM25 keyword search with RRF fusion |
| **Intelligent Reranking** | Cross-encoder re-scores top-k candidates for precision |
| **Self-Correcting Agent** | LangGraph pipeline validates answers and auto-retries |
| **Hallucination Detection** | Second LLM call verifies every claim against context |
| **Session Memory** | Remembers last 5 conversation turns per session |
| **MCP Integration** | Exposes RAG pipeline as callable tools for AI agents |
| **CI/CD Pipeline** | GitHub Actions with unit + integration test separation |
| **Multi-Service Deployment** | Backend API + separate frontend UI on HuggingFace Spaces |
---
## πŸ”Œ MCP Server (NEW)
This project now exposes the full RAG pipeline as **Model Context Protocol (MCP) tools**, allowing any MCP-compatible AI agent (Claude Desktop, LangChain agents, etc.) to call it autonomously.
### Available MCP Tools
| Tool | Description |
|------|-------------|
| `query_rag` | Ask a question β€” runs full corrective RAG pipeline |
| `ingest_document` | Upload and index a PDF or TXT file |
| `clear_session` | Clear conversation memory for a session |
### Run MCP Server
```bash
pip install mcp
python mcp_server.py
```
### Connect to Claude Desktop
Add to your `claude_desktop_config.json`:
```json
{
"mcpServers": {
"agentic-rag": {
"command": "python",
"args": ["path/to/mcp_server.py"]
}
}
}
```
Claude Desktop will now have access to your RAG pipeline as native tools.
---
## πŸ—οΈ Architecture
### System Diagram
```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Agentic Corrective RAG Pipeline β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
Document Upload
↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Ingestion Pipeline β”‚
β”‚ PyMuPDF / TXT Parser β”‚
β”‚ Split into 512-token chunks β”‚
β”‚ Embedding: all-MiniLM-L6-v2 β”‚
β”‚ Index: ChromaDB (dense) + BM25 (sparse) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
Query Processing
↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Hybrid Retrieval Pipeline β”‚
β”‚ ChromaDB Top 10 + BM25 Top 10 β”‚
β”‚ β†’ RRF Fusion (Top 5 combined) β”‚
β”‚ β†’ Cross-Encoder Reranking β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
Agent Reasoning Loop
↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Corrective RAG Agent (LangGraph) β”‚
β”‚ Generate (LLaMA 3.3 70B) β”‚
β”‚ β†’ Validate (hallucination check) β”‚
β”‚ β†’ Retry up to 3x if FAIL β”‚
β”‚ β†’ Return answer + verdict + sources β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
MCP Layer (NEW)
↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ MCP Server (mcp_server.py) β”‚
β”‚ Wraps the HuggingFace API endpoints β”‚
β”‚ Exposes 3 tools to any AI agent β”‚
β”‚ Compatible with Claude Desktop, etc. β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```
---
## πŸ“Š Model & LLM Stack
| Component | Model | Role |
|-----------|-------|------|
| **Dense Embeddings** | `all-MiniLM-L6-v2` | 384-dim vectors for semantic search |
| **Sparse Search** | BM25 (rank-bm25) | Keyword indexing for recall |
| **Reranker** | `cross-encoder/ms-marco-MiniLM-L-6-v2` | Precision re-scoring |
| **Generator** | LLaMA 3.3 70B (Groq) | Answer generation |
| **Validator** | LLaMA 3.3 70B (Groq) | Hallucination detection |
---
## πŸš€ Quick Start
### Local Setup
```bash
# 1. Clone repository
git clone https://github.com/Hitan547/agentic-corrective-rag.git
cd agentic-corrective-rag
# 2. Install dependencies
pip install -r requirements.txt
# 3. Set up environment
echo "GROQ_API_KEY=your_api_key_here" > .env
# 4. Run backend
uvicorn main:app --reload --port 8000
# 5. Run MCP server (optional)
python mcp_server.py
```
### Docker Setup
```bash
docker build -t agentic-rag:latest .
docker run -e GROQ_API_KEY=your_key -p 8000:8000 agentic-rag:latest
```
---
## πŸ”Œ REST API Reference
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/health` | GET | System health check |
| `/upload` | POST | Upload and index a document |
| `/query` | POST | Ask a question |
| `/session/{id}` | DELETE | Clear session memory |
| `/docs` | GET | Swagger UI |
---
## πŸ“ Project Structure
```
agentic-corrective-rag/
β”œβ”€β”€ agent.py # LangGraph corrective agent
β”œβ”€β”€ retriever.py # Hybrid ChromaDB + BM25 retrieval
β”œβ”€β”€ ingestion.py # Document parsing and indexing
β”œβ”€β”€ main.py # FastAPI backend
β”œβ”€β”€ mcp_server.py # MCP tool server (NEW)
β”œβ”€β”€ config.py # Configuration constants
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ Dockerfile
β”œβ”€β”€ .github/workflows/ci.yml
β”œβ”€β”€ ui/
β”‚ └── index.html
└── tests/
β”œβ”€β”€ test_unit.py
└── test_integration.py
```
---
## πŸ“ˆ Performance Metrics
| Metric | Value |
|--------|-------|
| Recall@3 (exact answer in docs) | 94% |
| Hallucination detection rate | 94% |
| Validation PASS rate | 97% |
| Avg retries when needed | 1.2 |
| End-to-end latency (no retries) | ~3s |
---
## 🀝 Contributing
Ideas for enhancement:
- [ ] Persistent vector DB (Pinecone/Weaviate)
- [ ] Streaming responses with SSE
- [ ] Multi-document support
- [ ] Multimodal embeddings (images)
- [ ] Citation highlighting in frontend
---
## πŸ“œ License
MIT License β€” Use freely for learning or commercial purposes.
---
## πŸ“ž Contact
**Hitan K** β€” AI Systems Engineer
- πŸ”— [LinkedIn](https://linkedin.com/in/hitan-k)
- πŸ™ [GitHub](https://github.com/Hitan547)
- πŸ€— [HuggingFace](https://huggingface.co/Hitan2004)
---
<div align="center">
**⭐ Found this helpful? Please star the repo! ⭐**
*Built for production and learning.*
</div>