Spaces:

Hitan2004
/

agentic-corrective-rag

Sleeping

App Files Files Community

agentic-corrective-rag / README.txt

3v324v23

Auto deploy backend

5b8c2e5 2 days ago

raw

history blame contribute delete

8.32 kB

	# 🧠 Agentic Corrective RAG — Document Q&A with Self-Correction

	<div align="center">

	Production-grade document retrieval system with self-correcting agent reasoning

	[![Frontend UI](https://img.shields.io/badge/Frontend-HuggingFace%20Spaces-blue?style=for-the-badge&logo=huggingface)](https://huggingface.co/spaces/Hitan2004/agentic-corrective-rag-ui)
	[![Backend API](https://img.shields.io/badge/API-HuggingFace%20Spaces-blue?style=for-the-badge&logo=huggingface)](https://huggingface.co/spaces/Hitan2004/agentic-corrective-rag)
	[![API Docs](https://img.shields.io/badge/Swagger-Docs-green?style=for-the-badge)](https://hitan2004-agentic-corrective-rag.hf.space/docs)
	[![GitHub](https://img.shields.io/badge/GitHub-Repository-black?style=for-the-badge&logo=github)](https://github.com/Hitan547/agentic-corrective-rag)
	[![Python](https://img.shields.io/badge/Python-3.10-blue?style=for-the-badge&logo=python)](#tech-stack)

	Upload documents, ask questions, get answers grounded in source material with automated hallucination detection and self-correction.

	</div>

	---

	## 🎯 Overview

	Agentic Corrective RAG is a production-grade document Q&A system that combines advanced retrieval techniques with intelligent agent reasoning. Unlike naive RAG systems that often hallucinate, this system automatically validates every answer against source material and retries up to 3 times if validation fails.

	### ⚡ Core Features

	\| Feature \| Capability \|
	\|---------\|-----------\|
	\| Hybrid Retrieval \| ChromaDB semantic + BM25 keyword search with RRF fusion \|
	\| Intelligent Reranking \| Cross-encoder re-scores top-k candidates for precision \|
	\| Self-Correcting Agent \| LangGraph pipeline validates answers and auto-retries \|
	\| Hallucination Detection \| Second LLM call verifies every claim against context \|
	\| Session Memory \| Remembers last 5 conversation turns per session \|
	\| MCP Integration \| Exposes RAG pipeline as callable tools for AI agents \|
	\| CI/CD Pipeline \| GitHub Actions with unit + integration test separation \|
	\| Multi-Service Deployment \| Backend API + separate frontend UI on HuggingFace Spaces \|

	---

	## 🔌 MCP Server (NEW)

	This project now exposes the full RAG pipeline as Model Context Protocol (MCP) tools, allowing any MCP-compatible AI agent (Claude Desktop, LangChain agents, etc.) to call it autonomously.

	### Available MCP Tools

	\| Tool \| Description \|
	\|------\|-------------\|
	\| `query_rag` \| Ask a question — runs full corrective RAG pipeline \|
	\| `ingest_document` \| Upload and index a PDF or TXT file \|
	\| `clear_session` \| Clear conversation memory for a session \|

	### Run MCP Server

	```bash
	pip install mcp
	python mcp_server.py
	```

	### Connect to Claude Desktop

	Add to your `claude_desktop_config.json`:

	```json
	{
	"mcpServers": {
	"agentic-rag": {
	"command": "python",
	"args": ["path/to/mcp_server.py"]
	}
	}
	}
	```

	Claude Desktop will now have access to your RAG pipeline as native tools.

	---

	## 🏗️ Architecture

	### System Diagram

	```
	┌─────────────────────────────────────────────────────────┐
	│ Agentic Corrective RAG Pipeline │
	└─────────────────────────────────────────────────────────┘

	Document Upload
	↓
	┌─────────────────────────────────────────┐
	│ Ingestion Pipeline │
	│ PyMuPDF / TXT Parser │
	│ Split into 512-token chunks │
	│ Embedding: all-MiniLM-L6-v2 │
	│ Index: ChromaDB (dense) + BM25 (sparse) │
	└─────────────────────────────────────────┘

	Query Processing
	↓
	┌─────────────────────────────────────────┐
	│ Hybrid Retrieval Pipeline │
	│ ChromaDB Top 10 + BM25 Top 10 │
	│ → RRF Fusion (Top 5 combined) │
	│ → Cross-Encoder Reranking │
	└─────────────────────────────────────────┘

	Agent Reasoning Loop
	↓
	┌─────────────────────────────────────────┐
	│ Corrective RAG Agent (LangGraph) │
	│ Generate (LLaMA 3.3 70B) │
	│ → Validate (hallucination check) │
	│ → Retry up to 3x if FAIL │
	│ → Return answer + verdict + sources │
	└─────────────────────────────────────────┘

	MCP Layer (NEW)
	↓
	┌─────────────────────────────────────────┐
	│ MCP Server (mcp_server.py) │
	│ Wraps the HuggingFace API endpoints │
	│ Exposes 3 tools to any AI agent │
	│ Compatible with Claude Desktop, etc. │
	└─────────────────────────────────────────┘
	```

	---

	## 📊 Model & LLM Stack

	\| Component \| Model \| Role \|
	\|-----------\|-------\|------\|
	\| Dense Embeddings \| `all-MiniLM-L6-v2` \| 384-dim vectors for semantic search \|
	\| Sparse Search \| BM25 (rank-bm25) \| Keyword indexing for recall \|
	\| Reranker \| `cross-encoder/ms-marco-MiniLM-L-6-v2` \| Precision re-scoring \|
	\| Generator \| LLaMA 3.3 70B (Groq) \| Answer generation \|
	\| Validator \| LLaMA 3.3 70B (Groq) \| Hallucination detection \|

	---

	## 🚀 Quick Start

	### Local Setup

	```bash
	# 1. Clone repository
	git clone https://github.com/Hitan547/agentic-corrective-rag.git
	cd agentic-corrective-rag

	# 2. Install dependencies
	pip install -r requirements.txt

	# 3. Set up environment
	echo "GROQ_API_KEY=your_api_key_here" > .env

	# 4. Run backend
	uvicorn main:app --reload --port 8000

	# 5. Run MCP server (optional)
	python mcp_server.py
	```

	### Docker Setup

	```bash
	docker build -t agentic-rag:latest .
	docker run -e GROQ_API_KEY=your_key -p 8000:8000 agentic-rag:latest
	```

	---

	## 🔌 REST API Reference

	\| Endpoint \| Method \| Description \|
	\|----------\|--------\|-------------\|
	\| `/health` \| GET \| System health check \|
	\| `/upload` \| POST \| Upload and index a document \|
	\| `/query` \| POST \| Ask a question \|
	\| `/session/{id}` \| DELETE \| Clear session memory \|
	\| `/docs` \| GET \| Swagger UI \|

	---

	## 📁 Project Structure

	```
	agentic-corrective-rag/
	├── agent.py # LangGraph corrective agent
	├── retriever.py # Hybrid ChromaDB + BM25 retrieval
	├── ingestion.py # Document parsing and indexing
	├── main.py # FastAPI backend
	├── mcp_server.py # MCP tool server (NEW)
	├── config.py # Configuration constants
	├── requirements.txt
	├── Dockerfile
	├── .github/workflows/ci.yml
	├── ui/
	│ └── index.html
	└── tests/
	├── test_unit.py
	└── test_integration.py
	```

	---

	## 📈 Performance Metrics

	\| Metric \| Value \|
	\|--------\|-------\|
	\| Recall@3 (exact answer in docs) \| 94% \|
	\| Hallucination detection rate \| 94% \|
	\| Validation PASS rate \| 97% \|
	\| Avg retries when needed \| 1.2 \|
	\| End-to-end latency (no retries) \| ~3s \|

	---

	## 🤝 Contributing

	Ideas for enhancement:
	- [ ] Persistent vector DB (Pinecone/Weaviate)
	- [ ] Streaming responses with SSE
	- [ ] Multi-document support
	- [ ] Multimodal embeddings (images)
	- [ ] Citation highlighting in frontend

	---

	## 📜 License

	MIT License — Use freely for learning or commercial purposes.

	---

	## 📞 Contact

	Hitan K — AI Systems Engineer

	- 🔗 [LinkedIn](https://linkedin.com/in/hitan-k)
	- 🐙 [GitHub](https://github.com/Hitan547)
	- 🤗 [HuggingFace](https://huggingface.co/Hitan2004)

	---

	<div align="center">

	⭐ Found this helpful? Please star the repo! ⭐

	Built for production and learning.

	</div>