| --- |
| title: DocMind — Grounded RAG Document Intelligence |
| emoji: 🧠 |
| colorFrom: indigo |
| colorTo: purple |
| sdk: docker |
| pinned: true |
| license: mit |
| --- |
| |
| # 🧠 DocMind — Grounded RAG Document Intelligence |
|
|
| A **production-grade** Retrieval-Augmented Generation system that doesn't just retrieve and generate — it **verifies every claim** against source documents using NLI-based grounding. |
|
|
| ## ✨ Key Features |
|
|
| | Feature | Description | |
| |---------|-------------| |
| | 📄 **Multi-format Ingestion** | PDF, DOCX, TXT — chunked at 400 tokens with sentence-boundary awareness | |
| | 🔍 **Hybrid Retrieval** | BM25 (sparse) + BGE-M3 (dense) fused via Reciprocal Rank Fusion | |
| | 🎯 **Attributed Generation** | Every sentence cites its source chunk — no uncited claims allowed | |
| | 🛡️ **NLI Grounding Gate** | DeBERTa cross-encoder verifies each claim against cited evidence | |
| | 🚦 **Intent Router** | Sensitive queries are intercepted before reaching the LLM | |
| | 📊 **Multi-level Summaries** | Quick, Structured, and Key Points extraction | |
| | 📑 **Multi-Document Mode** | Compare up to 3 documents with color-coded source tracking | |
| | 💬 **Chat History** | Persistent conversation with export support | |
|
|
| ## 🏗️ Architecture |
|
|
| ``` |
| ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ |
| │ Upload │──▶│ Parse & │──▶│ Chunk & │ |
| │ Document │ │ Extract │ │ Embed │ |
| └──────────────┘ └──────────────┘ └──────┬───────┘ |
| │ |
| ┌────────────────────────▼───────────┐ |
| │ Dual Index Storage │ |
| │ BM25 (in-memory) │ Qdrant (dense) │ |
| └────────────────────────┬───────────┘ |
| │ |
| ┌──────────────┐ ┌──────────────┐ ┌──────▼───────┐ |
| │ User │──▶│ Intent │──▶│ Hybrid │ |
| │ Query │ │ Router │ │ Retrieval │ |
| └──────────────┘ └──────────────┘ └──────┬───────┘ |
| │ |
| ┌──────────────┐ ┌──────▼───────┐ |
| │ Grounding │◀──│ Attributed │ |
| │ Gate (NLI) │ │ Generation │ |
| └──────┬───────┘ └──────────────┘ |
| │ |
| ┌──────▼───────┐ |
| │ Serve or │ |
| │ Refuse │ |
| └──────────────┘ |
| ``` |
|
|
| ## 🛠️ Tech Stack |
|
|
| | Layer | Tool | Cost | |
| |-------|------|------| |
| | LLM | Groq API (Llama 3.1 70B) | Free tier | |
| | Embeddings | BAAI/bge-m3 (self-hosted) | Free | |
| | Sparse Retrieval | bm25s | Free | |
| | Vector DB | Qdrant (local / cloud) | Free | |
| | NLI Grounding | DeBERTa v3 cross-encoder | Free | |
| | UI | Streamlit | Free | |
| | Hosting | Hugging Face Spaces (Docker) | Free | |
|
|
| ## 🚀 Quick Start |
|
|
| ```bash |
| # 1. Clone |
| git clone https://huggingface.co/spaces/YOUR_USERNAME/docmind |
| cd docmind |
| |
| # 2. Set up environment |
| cp .env.example .env |
| # Edit .env with your GROQ_API_KEY |
| |
| # 3. Install dependencies |
| pip install -r requirements.txt |
| |
| # 4. Run |
| streamlit run app.py |
| ``` |
|
|
| ## ⚠️ Known Limitations |
|
|
| - **Free tier rate limits**: Groq allows ~14,400 tokens/min — heavy usage may hit throttling |
| - **CPU inference**: BGE-M3 and DeBERTa run on CPU — first query takes ~5s for model loading |
| - **Memory**: Both models consume ~3GB RAM combined — fits within HF Spaces 16GB limit |
| - **No persistence**: In-memory BM25 index is rebuilt on each document upload |
|
|
| ## 📄 License |
|
|
| MIT |
|
|