Spaces:

Param2121
/

docmind

Sleeping

App Files Files Community

docmind / README.md

AI Engineer

Initial commit for DocMind

6cca5b1 18 days ago

preview code

Raw

History Blame Contribute Delete

4.39 kB

metadata

title: DocMind — Grounded RAG Document Intelligence
emoji: 🧠
colorFrom: indigo
colorTo: purple
sdk: docker
pinned: true
license: mit

🧠 DocMind — Grounded RAG Document Intelligence

A production-grade Retrieval-Augmented Generation system that doesn't just retrieve and generate — it verifies every claim against source documents using NLI-based grounding.

✨ Key Features

Feature	Description
📄 Multi-format Ingestion	PDF, DOCX, TXT — chunked at 400 tokens with sentence-boundary awareness
🔍 Hybrid Retrieval	BM25 (sparse) + BGE-M3 (dense) fused via Reciprocal Rank Fusion
🎯 Attributed Generation	Every sentence cites its source chunk — no uncited claims allowed
🛡️ NLI Grounding Gate	DeBERTa cross-encoder verifies each claim against cited evidence
🚦 Intent Router	Sensitive queries are intercepted before reaching the LLM
📊 Multi-level Summaries	Quick, Structured, and Key Points extraction
📑 Multi-Document Mode	Compare up to 3 documents with color-coded source tracking
💬 Chat History	Persistent conversation with export support

🏗️ Architecture

┌──────────────┐   ┌──────────────┐   ┌──────────────┐
│   Upload     │──▶│   Parse &    │──▶│   Chunk &    │
│   Document   │   │   Extract    │   │   Embed      │
└──────────────┘   └──────────────┘   └──────┬───────┘
                                             │
                    ┌────────────────────────▼───────────┐
                    │         Dual Index Storage          │
                    │   BM25 (in-memory)  │  Qdrant (dense) │
                    └────────────────────────┬───────────┘
                                             │
┌──────────────┐   ┌──────────────┐   ┌──────▼───────┐
│   User       │──▶│   Intent     │──▶│   Hybrid     │
│   Query      │   │   Router     │   │   Retrieval  │
└──────────────┘   └──────────────┘   └──────┬───────┘
                                             │
                    ┌──────────────┐   ┌──────▼───────┐
                    │   Grounding  │◀──│   Attributed │
                    │   Gate (NLI) │   │   Generation │
                    └──────┬───────┘   └──────────────┘
                           │
                    ┌──────▼───────┐
                    │   Serve or   │
                    │   Refuse     │
                    └──────────────┘

🛠️ Tech Stack

Layer	Tool	Cost
LLM	Groq API (Llama 3.1 70B)	Free tier
Embeddings	BAAI/bge-m3 (self-hosted)	Free
Sparse Retrieval	bm25s	Free
Vector DB	Qdrant (local / cloud)	Free
NLI Grounding	DeBERTa v3 cross-encoder	Free
UI	Streamlit	Free
Hosting	Hugging Face Spaces (Docker)	Free

🚀 Quick Start

# 1. Clone
git clone https://huggingface.co/spaces/YOUR_USERNAME/docmind
cd docmind

# 2. Set up environment
cp .env.example .env
# Edit .env with your GROQ_API_KEY

# 3. Install dependencies
pip install -r requirements.txt

# 4. Run
streamlit run app.py

⚠️ Known Limitations

Free tier rate limits: Groq allows ~14,400 tokens/min — heavy usage may hit throttling
CPU inference: BGE-M3 and DeBERTa run on CPU — first query takes ~5s for model loading
Memory: Both models consume ~3GB RAM combined — fits within HF Spaces 16GB limit
No persistence: In-memory BM25 index is rebuilt on each document upload

📄 License

MIT