--- title: DocMind β€” Grounded RAG Document Intelligence emoji: 🧠 colorFrom: indigo colorTo: purple sdk: docker pinned: true license: mit --- # 🧠 DocMind β€” Grounded RAG Document Intelligence A **production-grade** Retrieval-Augmented Generation system that doesn't just retrieve and generate β€” it **verifies every claim** against source documents using NLI-based grounding. ## ✨ Key Features | Feature | Description | |---------|-------------| | πŸ“„ **Multi-format Ingestion** | PDF, DOCX, TXT β€” chunked at 400 tokens with sentence-boundary awareness | | πŸ” **Hybrid Retrieval** | BM25 (sparse) + BGE-M3 (dense) fused via Reciprocal Rank Fusion | | 🎯 **Attributed Generation** | Every sentence cites its source chunk β€” no uncited claims allowed | | πŸ›‘οΈ **NLI Grounding Gate** | DeBERTa cross-encoder verifies each claim against cited evidence | | 🚦 **Intent Router** | Sensitive queries are intercepted before reaching the LLM | | πŸ“Š **Multi-level Summaries** | Quick, Structured, and Key Points extraction | | πŸ“‘ **Multi-Document Mode** | Compare up to 3 documents with color-coded source tracking | | πŸ’¬ **Chat History** | Persistent conversation with export support | ## πŸ—οΈ Architecture ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Upload │──▢│ Parse & │──▢│ Chunk & β”‚ β”‚ Document β”‚ β”‚ Extract β”‚ β”‚ Embed β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Dual Index Storage β”‚ β”‚ BM25 (in-memory) β”‚ Qdrant (dense) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β” β”‚ User │──▢│ Intent │──▢│ Hybrid β”‚ β”‚ Query β”‚ β”‚ Router β”‚ β”‚ Retrieval β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β” β”‚ Grounding │◀──│ Attributed β”‚ β”‚ Gate (NLI) β”‚ β”‚ Generation β”‚ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β” β”‚ Serve or β”‚ β”‚ Refuse β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` ## πŸ› οΈ Tech Stack | Layer | Tool | Cost | |-------|------|------| | LLM | Groq API (Llama 3.1 70B) | Free tier | | Embeddings | BAAI/bge-m3 (self-hosted) | Free | | Sparse Retrieval | bm25s | Free | | Vector DB | Qdrant (local / cloud) | Free | | NLI Grounding | DeBERTa v3 cross-encoder | Free | | UI | Streamlit | Free | | Hosting | Hugging Face Spaces (Docker) | Free | ## πŸš€ Quick Start ```bash # 1. Clone git clone https://huggingface.co/spaces/YOUR_USERNAME/docmind cd docmind # 2. Set up environment cp .env.example .env # Edit .env with your GROQ_API_KEY # 3. Install dependencies pip install -r requirements.txt # 4. Run streamlit run app.py ``` ## ⚠️ Known Limitations - **Free tier rate limits**: Groq allows ~14,400 tokens/min β€” heavy usage may hit throttling - **CPU inference**: BGE-M3 and DeBERTa run on CPU β€” first query takes ~5s for model loading - **Memory**: Both models consume ~3GB RAM combined β€” fits within HF Spaces 16GB limit - **No persistence**: In-memory BM25 index is rebuilt on each document upload ## πŸ“„ License MIT