docmind / README.md
mnoorchenar's picture
Update 2026-03-22 20:53:33
693f74a
---
title: DocMind-Agentic-Research
colorFrom: blue
colorTo: indigo
sdk: docker
---
<div align="center">
<h1>🧠 DocMind β€” Agentic Research Platform</h1>
<img src="https://readme-typing-svg.demolab.com?font=Fira+Code&size=22&duration=3000&pause=1000&color=4f8ef7&center=true&vCenter=true&width=700&lines=LangGraph+%C2%B7+5+Agents+%C2%B7+Hybrid+RAG;Qwen+2.5-7B+%C2%B7+3+LLM+Calls+per+Query;Deployed+Free+on+HuggingFace+Spaces" alt="Typing SVG"/>
<br/>
[![Python](https://img.shields.io/badge/Python-3.10+-3b82f6?style=for-the-badge&logo=python&logoColor=white)](https://www.python.org/)
[![LangGraph](https://img.shields.io/badge/LangGraph-0.2-06b6d4?style=for-the-badge)](https://github.com/langchain-ai/langgraph)
[![LangChain](https://img.shields.io/badge/LangChain-0.3-4f46e5?style=for-the-badge)](https://langchain.com/)
[![Flask](https://img.shields.io/badge/Flask-3.1-3b82f6?style=for-the-badge&logo=flask&logoColor=white)](https://flask.palletsprojects.com/)
[![Docker](https://img.shields.io/badge/Docker-Ready-3b82f6?style=for-the-badge&logo=docker&logoColor=white)](https://www.docker.com/)
[![HuggingFace](https://img.shields.io/badge/HuggingFace-Spaces-ffcc00?style=for-the-badge&logo=huggingface&logoColor=black)](https://huggingface.co/mnoorchenar/spaces)
[![Status](https://img.shields.io/badge/Status-Active-22c55e?style=for-the-badge)](#)
<br/>
**🧠 DocMind** β€” A clean, minimal agentic document research platform. Five specialized LangGraph agents plan, retrieve, grade, generate, and critique answers from uploaded PDFs and web pages using hybrid search and Qwen 2.5-7B β€” all running free on HuggingFace Spaces.
<br/>
---
</div>
## Table of Contents
- [Features](#-features)
- [Architecture](#️-architecture)
- [Getting Started](#-getting-started)
- [Docker Deployment](#-docker-deployment)
- [Dashboard Modules](#-dashboard-modules)
- [ML Models](#-ml-models)
- [Project Structure](#-project-structure)
- [Author](#-author)
- [Contributing](#-contributing)
- [Disclaimer](#disclaimer)
- [License](#-license)
---
## ✨ Features
<table>
<tr><td>🧠 <b>LangGraph State Machine</b></td><td>Five agents wired into a linear StateGraph β€” Planner β†’ Retriever β†’ Grader β†’ Generator β†’ Critic.</td></tr>
<tr><td>πŸ” <b>Hybrid RAG (FAISS + BM25)</b></td><td>Semantic vector search combined with BM25 keyword search, fused via Reciprocal Rank Fusion for precision retrieval.</td></tr>
<tr><td>πŸ€– <b>Multi-Agent Orchestration</b></td><td>Planner, Retriever, Grader, Generator, and Critic agents each with specialized roles β€” only 3 LLM calls per query.</td></tr>
<tr><td>⚑ <b>Score-Based Grading</b></td><td>Grader uses hybrid search scores + keyword overlap β€” no LLM call needed, instant and deterministic relevance scoring.</td></tr>
<tr><td>πŸ“„ <b>PDF &amp; URL Ingestion</b></td><td>Upload PDF files up to 10 MB or paste any public URL β€” both are chunked, embedded, and indexed automatically.</td></tr>
<tr><td>πŸ”’ <b>Secure by Design</b></td><td>Stateless REST backend, no user data persisted, HF token kept server-side only.</td></tr>
<tr><td>🐳 <b>Containerized Deployment</b></td><td>Docker-first with Gunicorn, embedding model pre-downloaded at build time for fast cold starts.</td></tr>
</table>
---
## πŸ—οΈ Architecture
```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ DocMind β€” LangGraph Flow β”‚
β”‚ β”‚
β”‚ PDF / URL ──▢ Ingestor ──▢ FAISS+BM25 Hybrid Vector Store β”‚
β”‚ β”‚ β”‚
β”‚ User Query ──▢ [PLANNER Agent] β”‚ (Qwen 2.5-7B, 0.3) β”‚
β”‚ β”‚ β”‚ β”‚
β”‚ [RETRIEVER] β—€β”€β”€β”€β”€β”€β”€β”˜ (FAISS+BM25+RRF) β”‚
β”‚ β”‚ β”‚
β”‚ [GRADER] (score-based, no LLM call) β”‚
β”‚ β”‚ β”‚
β”‚ [GENERATOR] (Qwen 2.5-7B, 0.4) β”‚
β”‚ β”‚ β”‚
β”‚ [CRITIC] (Qwen 2.5-7B, 0.1) β”‚
β”‚ β”‚ β”‚
β”‚ [OUTPUT] Flask API + Single-Page UI β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```
---
## πŸš€ Getting Started
### Prerequisites
- Python 3.10+ Β· Docker Β· Git Β· Free HuggingFace account
### Local Installation
```bash
git clone https://github.com/mnoorchenar/docmind.git
cd docmind
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
cp .env.example .env
# Edit .env β€” set HF_TOKEN to your free HuggingFace Read token
python app.py
```
Open `http://localhost:7860` πŸŽ‰
### Getting your free HuggingFace token
1. Create a free account at [huggingface.co](https://huggingface.co)
2. Go to Settings β†’ Access Tokens β†’ New Token β†’ Role: **Read**
3. Copy the token and set it as `HF_TOKEN` in your `.env` file or Space secrets
---
## 🐳 Docker Deployment
```bash
docker build -t docmind .
docker run -p 7860:7860 -e HF_TOKEN=hf_your_token_here docmind
```
---
## πŸ“Š App Modules
| Module | Description | Status |
|--------|-------------|--------|
| πŸ“€ Upload & Index | PDF / URL ingest, chunk, embed (local BAAI model), FAISS+BM25 index | βœ… Live |
| πŸ” Research Query | LangGraph 5-agent pipeline with real-time trace log | βœ… Live |
---
## 🧠 ML Models
```python
stack = {
# ── LLM (LangChain LCEL chains) ──────────────────────────────────────────
"llm": "Qwen/Qwen2.5-7B-Instruct", # via HF Router
"lcel_chain": "ChatPromptTemplate | ChatOpenAI | StrOutputParser",
"retry": "ChatOpenAI.with_retry(stop_after_attempt=2)",
# ── RAG (LangChain + custom hybrid) ──────────────────────────────────────
"splitter": "RecursiveCharacterTextSplitter (langchain-text-splitters)",
"documents": "langchain_core.documents.Document",
"embeddings": "HuggingFaceEmbeddings (BAAI/bge-small-en-v1.5, local)",
"vector_index": "FAISS IndexFlatIP (cosine)",
"keyword_index": "BM25Okapi (rank-bm25)",
"fusion": "Reciprocal Rank Fusion (RRF k=60)",
"grader": "score-based (hybrid score Γ— 0.7 + keyword overlap Γ— 0.3)",
# ── Orchestration (LangGraph) ─────────────────────────────────────────────
"graph": "LangGraph 0.2 StateGraph β€” 5 nodes, linear pipeline",
}
```
---
## πŸ“ Project Structure
```
docmind/
β”œβ”€β”€ πŸ“„ app.py # Flask entry point, 5 REST routes
β”œβ”€β”€ πŸ“„ requirements.txt
β”œβ”€β”€ πŸ“„ Dockerfile # Port 7860, embedding model pre-downloaded
β”œβ”€β”€ πŸ“„ .env.example
β”œβ”€β”€ πŸ“‚ agents/
β”‚ β”œβ”€β”€ πŸ“„ llm_factory.py # get_llm() β†’ LangChain ChatOpenAI (HF Router)
β”‚ β”œβ”€β”€ πŸ“„ planner.py # LCEL: ChatPromptTemplate | ChatOpenAI | StrOutputParser
β”‚ β”œβ”€β”€ πŸ“„ retriever.py # Hybrid FAISS+BM25 search wrapper
β”‚ β”œβ”€β”€ πŸ“„ grader.py # Score-based relevance grading (no LLM call)
β”‚ β”œβ”€β”€ πŸ“„ generator.py # LCEL chain β€” cited answer generation
β”‚ └── πŸ“„ critic.py # LCEL chain β€” hallucination detection
β”œβ”€β”€ πŸ“‚ graph/
β”‚ └── πŸ“„ research_graph.py # LangGraph StateGraph (5 nodes, linear pipeline)
β”œβ”€β”€ πŸ“‚ rag/
β”‚ β”œβ”€β”€ πŸ“„ ingestor.py # RecursiveCharacterTextSplitter + Document objects
β”‚ β”œβ”€β”€ πŸ“„ vector_store.py # FAISS + BM25 + RRF, accepts Document or dict
β”‚ └── πŸ“„ embeddings.py # LangChain HuggingFaceEmbeddings (bge-small-en-v1.5)
β”œβ”€β”€ πŸ“‚ tracing/
β”‚ └── πŸ“„ tracer.py # Thread-safe in-memory trace store
β”œβ”€β”€ πŸ“‚ templates/
β”‚ └── πŸ“„ index.html # Dark-mode single-page UI
└── πŸ“‚ docs/
└── πŸ“„ project-template.html # Portfolio showcase page
```
---
## πŸ‘¨β€πŸ’» Author
<div align="center">
<table><tr><td align="center" width="100%">
<img src="https://avatars.githubusercontent.com/mnoorchenar" width="120" style="border-radius:50%;border:3px solid #4f46e5" alt="Mohammad Noorchenarboo"/>
<h3>Mohammad Noorchenarboo</h3>
<code>Data Scientist</code> &nbsp;|&nbsp; <code>AI Researcher</code> &nbsp;|&nbsp; <code>Biostatistician</code>
πŸ“ Ontario, Canada &nbsp;&nbsp; πŸ“§ mohammadnoorchenarboo@gmail.com
[![LinkedIn](https://img.shields.io/badge/LinkedIn-0077B5?style=for-the-badge&logo=linkedin&logoColor=white)](https://www.linkedin.com/in/mnoorchenar)
[![HuggingFace](https://img.shields.io/badge/HuggingFace-ffcc00?style=for-the-badge&logo=huggingface&logoColor=black)](https://huggingface.co/mnoorchenar/spaces)
[![GitHub](https://img.shields.io/badge/GitHub-181717?style=for-the-badge&logo=github&logoColor=white)](https://github.com/mnoorchenar)
</td></tr></table>
</div>
---
## 🀝 Contributing
1. Fork the repository
2. Create a feature branch: `git checkout -b feature/amazing-feature`
3. Commit: `git commit -m 'Add amazing feature'`
4. Push: `git push origin feature/amazing-feature`
5. Open a Pull Request
---
## Disclaimer
<span style="color:red">This project is developed strictly for educational and research purposes. All LLM outputs are AI-generated and may contain inaccuracies. No real user data is stored. Provided "as is" without warranty of any kind.</span>
---
## πŸ“œ License
Distributed under the **MIT License**.
<div align="center">
<img src="https://capsule-render.vercel.app/api?type=waving&color=0:3b82f6,100:4f46e5&height=120&section=footer&text=Made%20with%20%E2%9D%A4%EF%B8%8F%20by%20Mohammad%20Noorchenarboo&fontColor=ffffff&fontSize=18&fontAlignY=80" width="100%"/>
</div>