Spaces:

Param2121
/

docmind

Sleeping

App Files Files Community

docmind / README.md

AI Engineer

Initial commit for DocMind

6cca5b1 19 days ago

preview code

Raw

History Blame Contribute Delete

4.39 kB

	---
	title: DocMind — Grounded RAG Document Intelligence
	emoji: 🧠
	colorFrom: indigo
	colorTo: purple
	sdk: docker
	pinned: true
	license: mit
	---

	# 🧠 DocMind — Grounded RAG Document Intelligence

	A production-grade Retrieval-Augmented Generation system that doesn't just retrieve and generate — it verifies every claim against source documents using NLI-based grounding.

	## ✨ Key Features

	\| Feature \| Description \|
	\|---------\|-------------\|
	\| 📄 Multi-format Ingestion \| PDF, DOCX, TXT — chunked at 400 tokens with sentence-boundary awareness \|
	\| 🔍 Hybrid Retrieval \| BM25 (sparse) + BGE-M3 (dense) fused via Reciprocal Rank Fusion \|
	\| 🎯 Attributed Generation \| Every sentence cites its source chunk — no uncited claims allowed \|
	\| 🛡️ NLI Grounding Gate \| DeBERTa cross-encoder verifies each claim against cited evidence \|
	\| 🚦 Intent Router \| Sensitive queries are intercepted before reaching the LLM \|
	\| 📊 Multi-level Summaries \| Quick, Structured, and Key Points extraction \|
	\| 📑 Multi-Document Mode \| Compare up to 3 documents with color-coded source tracking \|
	\| 💬 Chat History \| Persistent conversation with export support \|

	## 🏗️ Architecture

	```
	┌──────────────┐ ┌──────────────┐ ┌──────────────┐
	│ Upload │──▶│ Parse & │──▶│ Chunk & │
	│ Document │ │ Extract │ │ Embed │
	└──────────────┘ └──────────────┘ └──────┬───────┘
	│
	┌────────────────────────▼───────────┐
	│ Dual Index Storage │
	│ BM25 (in-memory) │ Qdrant (dense) │
	└────────────────────────┬───────────┘
	│
	┌──────────────┐ ┌──────────────┐ ┌──────▼───────┐
	│ User │──▶│ Intent │──▶│ Hybrid │
	│ Query │ │ Router │ │ Retrieval │
	└──────────────┘ └──────────────┘ └──────┬───────┘
	│
	┌──────────────┐ ┌──────▼───────┐
	│ Grounding │◀──│ Attributed │
	│ Gate (NLI) │ │ Generation │
	└──────┬───────┘ └──────────────┘
	│
	┌──────▼───────┐
	│ Serve or │
	│ Refuse │
	└──────────────┘
	```

	## 🛠️ Tech Stack

	\| Layer \| Tool \| Cost \|
	\|-------\|------\|------\|
	\| LLM \| Groq API (Llama 3.1 70B) \| Free tier \|
	\| Embeddings \| BAAI/bge-m3 (self-hosted) \| Free \|
	\| Sparse Retrieval \| bm25s \| Free \|
	\| Vector DB \| Qdrant (local / cloud) \| Free \|
	\| NLI Grounding \| DeBERTa v3 cross-encoder \| Free \|
	\| UI \| Streamlit \| Free \|
	\| Hosting \| Hugging Face Spaces (Docker) \| Free \|

	## 🚀 Quick Start

	```bash
	# 1. Clone
	git clone https://huggingface.co/spaces/YOUR_USERNAME/docmind
	cd docmind

	# 2. Set up environment
	cp .env.example .env
	# Edit .env with your GROQ_API_KEY

	# 3. Install dependencies
	pip install -r requirements.txt

	# 4. Run
	streamlit run app.py
	```

	## ⚠️ Known Limitations

	- Free tier rate limits: Groq allows ~14,400 tokens/min — heavy usage may hit throttling
	- CPU inference: BGE-M3 and DeBERTa run on CPU — first query takes ~5s for model loading
	- Memory: Both models consume ~3GB RAM combined — fits within HF Spaces 16GB limit
	- No persistence: In-memory BM25 index is rebuilt on each document upload

	## 📄 License

	MIT