metadata
title: DocMind — Grounded RAG Document Intelligence
emoji: 🧠
colorFrom: indigo
colorTo: purple
sdk: docker
pinned: true
license: mit
🧠 DocMind — Grounded RAG Document Intelligence
A production-grade Retrieval-Augmented Generation system that doesn't just retrieve and generate — it verifies every claim against source documents using NLI-based grounding.
✨ Key Features
| Feature | Description |
|---|---|
| 📄 Multi-format Ingestion | PDF, DOCX, TXT — chunked at 400 tokens with sentence-boundary awareness |
| 🔍 Hybrid Retrieval | BM25 (sparse) + BGE-M3 (dense) fused via Reciprocal Rank Fusion |
| 🎯 Attributed Generation | Every sentence cites its source chunk — no uncited claims allowed |
| 🛡️ NLI Grounding Gate | DeBERTa cross-encoder verifies each claim against cited evidence |
| 🚦 Intent Router | Sensitive queries are intercepted before reaching the LLM |
| 📊 Multi-level Summaries | Quick, Structured, and Key Points extraction |
| 📑 Multi-Document Mode | Compare up to 3 documents with color-coded source tracking |
| 💬 Chat History | Persistent conversation with export support |
🏗️ Architecture
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Upload │──▶│ Parse & │──▶│ Chunk & │
│ Document │ │ Extract │ │ Embed │
└──────────────┘ └──────────────┘ └──────┬───────┘
│
┌────────────────────────▼───────────┐
│ Dual Index Storage │
│ BM25 (in-memory) │ Qdrant (dense) │
└────────────────────────┬───────────┘
│
┌──────────────┐ ┌──────────────┐ ┌──────▼───────┐
│ User │──▶│ Intent │──▶│ Hybrid │
│ Query │ │ Router │ │ Retrieval │
└──────────────┘ └──────────────┘ └──────┬───────┘
│
┌──────────────┐ ┌──────▼───────┐
│ Grounding │◀──│ Attributed │
│ Gate (NLI) │ │ Generation │
└──────┬───────┘ └──────────────┘
│
┌──────▼───────┐
│ Serve or │
│ Refuse │
└──────────────┘
🛠️ Tech Stack
| Layer | Tool | Cost |
|---|---|---|
| LLM | Groq API (Llama 3.1 70B) | Free tier |
| Embeddings | BAAI/bge-m3 (self-hosted) | Free |
| Sparse Retrieval | bm25s | Free |
| Vector DB | Qdrant (local / cloud) | Free |
| NLI Grounding | DeBERTa v3 cross-encoder | Free |
| UI | Streamlit | Free |
| Hosting | Hugging Face Spaces (Docker) | Free |
🚀 Quick Start
# 1. Clone
git clone https://huggingface.co/spaces/YOUR_USERNAME/docmind
cd docmind
# 2. Set up environment
cp .env.example .env
# Edit .env with your GROQ_API_KEY
# 3. Install dependencies
pip install -r requirements.txt
# 4. Run
streamlit run app.py
⚠️ Known Limitations
- Free tier rate limits: Groq allows ~14,400 tokens/min — heavy usage may hit throttling
- CPU inference: BGE-M3 and DeBERTa run on CPU — first query takes ~5s for model loading
- Memory: Both models consume ~3GB RAM combined — fits within HF Spaces 16GB limit
- No persistence: In-memory BM25 index is rebuilt on each document upload
📄 License
MIT