Spaces:

mnoorchenar
/

docmind

Running

App Files Files Community

docmind / README.md

mnoorchenar

Update 2026-03-22 20:53:33

693f74a 2 months ago

preview code

raw

history blame contribute delete

10.6 kB

	---
	title: DocMind-Agentic-Research
	colorFrom: blue
	colorTo: indigo
	sdk: docker
	---

	<div align="center">

	<h1>🧠 DocMind — Agentic Research Platform</h1>
	<img src="https://readme-typing-svg.demolab.com?font=Fira+Code&size=22&duration=3000&pause=1000&color=4f8ef7&center=true&vCenter=true&width=700&lines=LangGraph+%C2%B7+5+Agents+%C2%B7+Hybrid+RAG;Qwen+2.5-7B+%C2%B7+3+LLM+Calls+per+Query;Deployed+Free+on+HuggingFace+Spaces" alt="Typing SVG"/>

	<br/>

	[![Python](https://img.shields.io/badge/Python-3.10+-3b82f6?style=for-the-badge&logo=python&logoColor=white)](https://www.python.org/)
	[![LangGraph](https://img.shields.io/badge/LangGraph-0.2-06b6d4?style=for-the-badge)](https://github.com/langchain-ai/langgraph)
	[![LangChain](https://img.shields.io/badge/LangChain-0.3-4f46e5?style=for-the-badge)](https://langchain.com/)
	[![Flask](https://img.shields.io/badge/Flask-3.1-3b82f6?style=for-the-badge&logo=flask&logoColor=white)](https://flask.palletsprojects.com/)
	[![Docker](https://img.shields.io/badge/Docker-Ready-3b82f6?style=for-the-badge&logo=docker&logoColor=white)](https://www.docker.com/)
	[![HuggingFace](https://img.shields.io/badge/HuggingFace-Spaces-ffcc00?style=for-the-badge&logo=huggingface&logoColor=black)](https://huggingface.co/mnoorchenar/spaces)
	[![Status](https://img.shields.io/badge/Status-Active-22c55e?style=for-the-badge)](#)

	<br/>

	🧠 DocMind — A clean, minimal agentic document research platform. Five specialized LangGraph agents plan, retrieve, grade, generate, and critique answers from uploaded PDFs and web pages using hybrid search and Qwen 2.5-7B — all running free on HuggingFace Spaces.

	<br/>

	---

	</div>

	## Table of Contents
	- [Features](#-features)
	- [Architecture](#️-architecture)
	- [Getting Started](#-getting-started)
	- [Docker Deployment](#-docker-deployment)
	- [Dashboard Modules](#-dashboard-modules)
	- [ML Models](#-ml-models)
	- [Project Structure](#-project-structure)
	- [Author](#-author)
	- [Contributing](#-contributing)
	- [Disclaimer](#disclaimer)
	- [License](#-license)

	---

	## ✨ Features

	<table>
	<tr><td>🧠 <b>LangGraph State Machine</b></td><td>Five agents wired into a linear StateGraph — Planner → Retriever → Grader → Generator → Critic.</td></tr>
	<tr><td>🔍 <b>Hybrid RAG (FAISS + BM25)</b></td><td>Semantic vector search combined with BM25 keyword search, fused via Reciprocal Rank Fusion for precision retrieval.</td></tr>
	<tr><td>🤖 <b>Multi-Agent Orchestration</b></td><td>Planner, Retriever, Grader, Generator, and Critic agents each with specialized roles — only 3 LLM calls per query.</td></tr>
	<tr><td>⚡ <b>Score-Based Grading</b></td><td>Grader uses hybrid search scores + keyword overlap — no LLM call needed, instant and deterministic relevance scoring.</td></tr>
	<tr><td>📄 <b>PDF & URL Ingestion</b></td><td>Upload PDF files up to 10 MB or paste any public URL — both are chunked, embedded, and indexed automatically.</td></tr>
	<tr><td>🔒 <b>Secure by Design</b></td><td>Stateless REST backend, no user data persisted, HF token kept server-side only.</td></tr>
	<tr><td>🐳 <b>Containerized Deployment</b></td><td>Docker-first with Gunicorn, embedding model pre-downloaded at build time for fast cold starts.</td></tr>
	</table>

	---

	## 🏗️ Architecture

	```
	┌──────────────────────────────────────────────────────────────┐
	│ DocMind — LangGraph Flow │
	│ │
	│ PDF / URL ──▶ Ingestor ──▶ FAISS+BM25 Hybrid Vector Store │
	│ │ │
	│ User Query ──▶ [PLANNER Agent] │ (Qwen 2.5-7B, 0.3) │
	│ │ │ │
	│ [RETRIEVER] ◀──────┘ (FAISS+BM25+RRF) │
	│ │ │
	│ [GRADER] (score-based, no LLM call) │
	│ │ │
	│ [GENERATOR] (Qwen 2.5-7B, 0.4) │
	│ │ │
	│ [CRITIC] (Qwen 2.5-7B, 0.1) │
	│ │ │
	│ [OUTPUT] Flask API + Single-Page UI │
	└──────────────────────────────────────────────────────────────┘
	```

	---

	## 🚀 Getting Started

	### Prerequisites
	- Python 3.10+ · Docker · Git · Free HuggingFace account

	### Local Installation

	```bash
	git clone https://github.com/mnoorchenar/docmind.git
	cd docmind

	python -m venv venv
	source venv/bin/activate # Windows: venv\Scripts\activate

	pip install -r requirements.txt

	cp .env.example .env
	# Edit .env — set HF_TOKEN to your free HuggingFace Read token

	python app.py
	```

	Open `http://localhost:7860` 🎉

	### Getting your free HuggingFace token
	1. Create a free account at [huggingface.co](https://huggingface.co)
	2. Go to Settings → Access Tokens → New Token → Role: Read
	3. Copy the token and set it as `HF_TOKEN` in your `.env` file or Space secrets

	---

	## 🐳 Docker Deployment

	```bash
	docker build -t docmind .
	docker run -p 7860:7860 -e HF_TOKEN=hf_your_token_here docmind
	```

	---

	## 📊 App Modules

	\| Module \| Description \| Status \|
	\|--------\|-------------\|--------\|
	\| 📤 Upload & Index \| PDF / URL ingest, chunk, embed (local BAAI model), FAISS+BM25 index \| ✅ Live \|
	\| 🔍 Research Query \| LangGraph 5-agent pipeline with real-time trace log \| ✅ Live \|

	---

	## 🧠 ML Models

	```python
	stack = {
	# ── LLM (LangChain LCEL chains) ──────────────────────────────────────────
	"llm": "Qwen/Qwen2.5-7B-Instruct", # via HF Router
	"lcel_chain": "ChatPromptTemplate \| ChatOpenAI \| StrOutputParser",
	"retry": "ChatOpenAI.with_retry(stop_after_attempt=2)",

	# ── RAG (LangChain + custom hybrid) ──────────────────────────────────────
	"splitter": "RecursiveCharacterTextSplitter (langchain-text-splitters)",
	"documents": "langchain_core.documents.Document",
	"embeddings": "HuggingFaceEmbeddings (BAAI/bge-small-en-v1.5, local)",
	"vector_index": "FAISS IndexFlatIP (cosine)",
	"keyword_index": "BM25Okapi (rank-bm25)",
	"fusion": "Reciprocal Rank Fusion (RRF k=60)",
	"grader": "score-based (hybrid score × 0.7 + keyword overlap × 0.3)",

	# ── Orchestration (LangGraph) ─────────────────────────────────────────────
	"graph": "LangGraph 0.2 StateGraph — 5 nodes, linear pipeline",
	}
	```

	---

	## 📁 Project Structure

	```
	docmind/
	├── 📄 app.py # Flask entry point, 5 REST routes
	├── 📄 requirements.txt
	├── 📄 Dockerfile # Port 7860, embedding model pre-downloaded
	├── 📄 .env.example
	├── 📂 agents/
	│ ├── 📄 llm_factory.py # get_llm() → LangChain ChatOpenAI (HF Router)
	│ ├── 📄 planner.py # LCEL: ChatPromptTemplate \| ChatOpenAI \| StrOutputParser
	│ ├── 📄 retriever.py # Hybrid FAISS+BM25 search wrapper
	│ ├── 📄 grader.py # Score-based relevance grading (no LLM call)
	│ ├── 📄 generator.py # LCEL chain — cited answer generation
	│ └── 📄 critic.py # LCEL chain — hallucination detection
	├── 📂 graph/
	│ └── 📄 research_graph.py # LangGraph StateGraph (5 nodes, linear pipeline)
	├── 📂 rag/
	│ ├── 📄 ingestor.py # RecursiveCharacterTextSplitter + Document objects
	│ ├── 📄 vector_store.py # FAISS + BM25 + RRF, accepts Document or dict
	│ └── 📄 embeddings.py # LangChain HuggingFaceEmbeddings (bge-small-en-v1.5)
	├── 📂 tracing/
	│ └── 📄 tracer.py # Thread-safe in-memory trace store
	├── 📂 templates/
	│ └── 📄 index.html # Dark-mode single-page UI
	└── 📂 docs/
	└── 📄 project-template.html # Portfolio showcase page
	```

	---

	## 👨‍💻 Author

	<div align="center">
	<table><tr><td align="center" width="100%">
	<img src="https://avatars.githubusercontent.com/mnoorchenar" width="120" style="border-radius:50%;border:3px solid #4f46e5" alt="Mohammad Noorchenarboo"/>
	<h3>Mohammad Noorchenarboo</h3>
	<code>Data Scientist</code>  \|  <code>AI Researcher</code>  \|  <code>Biostatistician</code>
	📍 Ontario, Canada    📧 mohammadnoorchenarboo@gmail.com

	[![LinkedIn](https://img.shields.io/badge/LinkedIn-0077B5?style=for-the-badge&logo=linkedin&logoColor=white)](https://www.linkedin.com/in/mnoorchenar)
	[![HuggingFace](https://img.shields.io/badge/HuggingFace-ffcc00?style=for-the-badge&logo=huggingface&logoColor=black)](https://huggingface.co/mnoorchenar/spaces)
	[![GitHub](https://img.shields.io/badge/GitHub-181717?style=for-the-badge&logo=github&logoColor=white)](https://github.com/mnoorchenar)
	</td></tr></table>
	</div>

	---

	## 🤝 Contributing

	1. Fork the repository
	2. Create a feature branch: `git checkout -b feature/amazing-feature`
	3. Commit: `git commit -m 'Add amazing feature'`
	4. Push: `git push origin feature/amazing-feature`
	5. Open a Pull Request

	---

	## Disclaimer

	<span style="color:red">This project is developed strictly for educational and research purposes. All LLM outputs are AI-generated and may contain inaccuracies. No real user data is stored. Provided "as is" without warranty of any kind.</span>

	---

	## 📜 License

	Distributed under the MIT License.

	<div align="center">
	<img src="https://capsule-render.vercel.app/api?type=waving&color=0:3b82f6,100:4f46e5&height=120&section=footer&text=Made%20with%20%E2%9D%A4%EF%B8%8F%20by%20Mohammad%20Noorchenarboo&fontColor=ffffff&fontSize=18&fontAlignY=80" width="100%"/>
	</div>