docmind / README.md
AI Engineer
Initial commit for DocMind
6cca5b1
|
Raw
History Blame Contribute Delete
4.39 kB
---
title: DocMind Grounded RAG Document Intelligence
emoji: 🧠
colorFrom: indigo
colorTo: purple
sdk: docker
pinned: true
license: mit
---
# 🧠 DocMind — Grounded RAG Document Intelligence
A **production-grade** Retrieval-Augmented Generation system that doesn't just retrieve and generate — it **verifies every claim** against source documents using NLI-based grounding.
## ✨ Key Features
| Feature | Description |
|---------|-------------|
| 📄 **Multi-format Ingestion** | PDF, DOCX, TXT — chunked at 400 tokens with sentence-boundary awareness |
| 🔍 **Hybrid Retrieval** | BM25 (sparse) + BGE-M3 (dense) fused via Reciprocal Rank Fusion |
| 🎯 **Attributed Generation** | Every sentence cites its source chunk — no uncited claims allowed |
| 🛡️ **NLI Grounding Gate** | DeBERTa cross-encoder verifies each claim against cited evidence |
| 🚦 **Intent Router** | Sensitive queries are intercepted before reaching the LLM |
| 📊 **Multi-level Summaries** | Quick, Structured, and Key Points extraction |
| 📑 **Multi-Document Mode** | Compare up to 3 documents with color-coded source tracking |
| 💬 **Chat History** | Persistent conversation with export support |
## 🏗️ Architecture
```
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Upload │──▶│ Parse & │──▶│ Chunk & │
│ Document │ │ Extract │ │ Embed │
└──────────────┘ └──────────────┘ └──────┬───────┘
┌────────────────────────▼───────────┐
│ Dual Index Storage │
│ BM25 (in-memory) │ Qdrant (dense) │
└────────────────────────┬───────────┘
┌──────────────┐ ┌──────────────┐ ┌──────▼───────┐
│ User │──▶│ Intent │──▶│ Hybrid │
│ Query │ │ Router │ │ Retrieval │
└──────────────┘ └──────────────┘ └──────┬───────┘
┌──────────────┐ ┌──────▼───────┐
│ Grounding │◀──│ Attributed │
│ Gate (NLI) │ │ Generation │
└──────┬───────┘ └──────────────┘
┌──────▼───────┐
│ Serve or │
│ Refuse │
└──────────────┘
```
## 🛠️ Tech Stack
| Layer | Tool | Cost |
|-------|------|------|
| LLM | Groq API (Llama 3.1 70B) | Free tier |
| Embeddings | BAAI/bge-m3 (self-hosted) | Free |
| Sparse Retrieval | bm25s | Free |
| Vector DB | Qdrant (local / cloud) | Free |
| NLI Grounding | DeBERTa v3 cross-encoder | Free |
| UI | Streamlit | Free |
| Hosting | Hugging Face Spaces (Docker) | Free |
## 🚀 Quick Start
```bash
# 1. Clone
git clone https://huggingface.co/spaces/YOUR_USERNAME/docmind
cd docmind
# 2. Set up environment
cp .env.example .env
# Edit .env with your GROQ_API_KEY
# 3. Install dependencies
pip install -r requirements.txt
# 4. Run
streamlit run app.py
```
## ⚠️ Known Limitations
- **Free tier rate limits**: Groq allows ~14,400 tokens/min — heavy usage may hit throttling
- **CPU inference**: BGE-M3 and DeBERTa run on CPU — first query takes ~5s for model loading
- **Memory**: Both models consume ~3GB RAM combined — fits within HF Spaces 16GB limit
- **No persistence**: In-memory BM25 index is rebuilt on each document upload
## 📄 License
MIT