docmind / README.md
AI Engineer
Initial commit for DocMind
6cca5b1
|
Raw
History Blame Contribute Delete
4.39 kB
metadata
title: DocMind  Grounded RAG Document Intelligence
emoji: 🧠
colorFrom: indigo
colorTo: purple
sdk: docker
pinned: true
license: mit

🧠 DocMind — Grounded RAG Document Intelligence

A production-grade Retrieval-Augmented Generation system that doesn't just retrieve and generate — it verifies every claim against source documents using NLI-based grounding.

✨ Key Features

Feature Description
📄 Multi-format Ingestion PDF, DOCX, TXT — chunked at 400 tokens with sentence-boundary awareness
🔍 Hybrid Retrieval BM25 (sparse) + BGE-M3 (dense) fused via Reciprocal Rank Fusion
🎯 Attributed Generation Every sentence cites its source chunk — no uncited claims allowed
🛡️ NLI Grounding Gate DeBERTa cross-encoder verifies each claim against cited evidence
🚦 Intent Router Sensitive queries are intercepted before reaching the LLM
📊 Multi-level Summaries Quick, Structured, and Key Points extraction
📑 Multi-Document Mode Compare up to 3 documents with color-coded source tracking
💬 Chat History Persistent conversation with export support

🏗️ Architecture

┌──────────────┐   ┌──────────────┐   ┌──────────────┐
│   Upload     │──▶│   Parse &    │──▶│   Chunk &    │
│   Document   │   │   Extract    │   │   Embed      │
└──────────────┘   └──────────────┘   └──────┬───────┘
                                             │
                    ┌────────────────────────▼───────────┐
                    │         Dual Index Storage          │
                    │   BM25 (in-memory)  │  Qdrant (dense) │
                    └────────────────────────┬───────────┘
                                             │
┌──────────────┐   ┌──────────────┐   ┌──────▼───────┐
│   User       │──▶│   Intent     │──▶│   Hybrid     │
│   Query      │   │   Router     │   │   Retrieval  │
└──────────────┘   └──────────────┘   └──────┬───────┘
                                             │
                    ┌──────────────┐   ┌──────▼───────┐
                    │   Grounding  │◀──│   Attributed │
                    │   Gate (NLI) │   │   Generation │
                    └──────┬───────┘   └──────────────┘
                           │
                    ┌──────▼───────┐
                    │   Serve or   │
                    │   Refuse     │
                    └──────────────┘

🛠️ Tech Stack

Layer Tool Cost
LLM Groq API (Llama 3.1 70B) Free tier
Embeddings BAAI/bge-m3 (self-hosted) Free
Sparse Retrieval bm25s Free
Vector DB Qdrant (local / cloud) Free
NLI Grounding DeBERTa v3 cross-encoder Free
UI Streamlit Free
Hosting Hugging Face Spaces (Docker) Free

🚀 Quick Start

# 1. Clone
git clone https://huggingface.co/spaces/YOUR_USERNAME/docmind
cd docmind

# 2. Set up environment
cp .env.example .env
# Edit .env with your GROQ_API_KEY

# 3. Install dependencies
pip install -r requirements.txt

# 4. Run
streamlit run app.py

⚠️ Known Limitations

  • Free tier rate limits: Groq allows ~14,400 tokens/min — heavy usage may hit throttling
  • CPU inference: BGE-M3 and DeBERTa run on CPU — first query takes ~5s for model loading
  • Memory: Both models consume ~3GB RAM combined — fits within HF Spaces 16GB limit
  • No persistence: In-memory BM25 index is rebuilt on each document upload

📄 License

MIT