---
title: DocMind — Grounded RAG Document Intelligence
emoji: 🧠
colorFrom: indigo
colorTo: purple
sdk: docker
pinned: true
license: mit
---

# 🧠 DocMind — Grounded RAG Document Intelligence

A **production-grade** Retrieval-Augmented Generation system that doesn't just retrieve and generate — it **verifies every claim** against source documents using NLI-based grounding.

## ✨ Key Features

| Feature | Description |
|---------|-------------|
| 📄 **Multi-format Ingestion** | PDF, DOCX, TXT — chunked at 400 tokens with sentence-boundary awareness |
| 🔍 **Hybrid Retrieval** | BM25 (sparse) + BGE-M3 (dense) fused via Reciprocal Rank Fusion |
| 🎯 **Attributed Generation** | Every sentence cites its source chunk — no uncited claims allowed |
| 🛡️ **NLI Grounding Gate** | DeBERTa cross-encoder verifies each claim against cited evidence |
| 🚦 **Intent Router** | Sensitive queries are intercepted before reaching the LLM |
| 📊 **Multi-level Summaries** | Quick, Structured, and Key Points extraction |
| 📑 **Multi-Document Mode** | Compare up to 3 documents with color-coded source tracking |
| 💬 **Chat History** | Persistent conversation with export support |

## 🏗️ Architecture

```
┌──────────────┐   ┌──────────────┐   ┌──────────────┐
│   Upload     │──▶│   Parse &    │──▶│   Chunk &    │
│   Document   │   │   Extract    │   │   Embed      │
└──────────────┘   └──────────────┘   └──────┬───────┘
                                             │
                    ┌────────────────────────▼───────────┐
                    │         Dual Index Storage          │
                    │   BM25 (in-memory)  │  Qdrant (dense) │
                    └────────────────────────┬───────────┘
                                             │
┌──────────────┐   ┌──────────────┐   ┌──────▼───────┐
│   User       │──▶│   Intent     │──▶│   Hybrid     │
│   Query      │   │   Router     │   │   Retrieval  │
└──────────────┘   └──────────────┘   └──────┬───────┘
                                             │
                    ┌──────────────┐   ┌──────▼───────┐
                    │   Grounding  │◀──│   Attributed │
                    │   Gate (NLI) │   │   Generation │
                    └──────┬───────┘   └──────────────┘
                           │
                    ┌──────▼───────┐
                    │   Serve or   │
                    │   Refuse     │
                    └──────────────┘
```

## 🛠️ Tech Stack

| Layer | Tool | Cost |
|-------|------|------|
| LLM | Groq API (Llama 3.1 70B) | Free tier |
| Embeddings | BAAI/bge-m3 (self-hosted) | Free |
| Sparse Retrieval | bm25s | Free |
| Vector DB | Qdrant (local / cloud) | Free |
| NLI Grounding | DeBERTa v3 cross-encoder | Free |
| UI | Streamlit | Free |
| Hosting | Hugging Face Spaces (Docker) | Free |

## 🚀 Quick Start

```bash
# 1. Clone
git clone https://huggingface.co/spaces/YOUR_USERNAME/docmind
cd docmind

# 2. Set up environment
cp .env.example .env
# Edit .env with your GROQ_API_KEY

# 3. Install dependencies
pip install -r requirements.txt

# 4. Run
streamlit run app.py
```

## ⚠️ Known Limitations

- **Free tier rate limits**: Groq allows ~14,400 tokens/min — heavy usage may hit throttling
- **CPU inference**: BGE-M3 and DeBERTa run on CPU — first query takes ~5s for model loading
- **Memory**: Both models consume ~3GB RAM combined — fits within HF Spaces 16GB limit
- **No persistence**: In-memory BM25 index is rebuilt on each document upload

## 📄 License

MIT