---
title: DocMind-Agentic-Research
colorFrom: blue
colorTo: indigo
sdk: docker
---
๐ง DocMind โ Agentic Research Platform
[](https://www.python.org/)
[](https://github.com/langchain-ai/langgraph)
[](https://langchain.com/)
[](https://flask.palletsprojects.com/)
[](https://www.docker.com/)
[](https://huggingface.co/mnoorchenar/spaces)
[](#)
**๐ง DocMind** โ A clean, minimal agentic document research platform. Five specialized LangGraph agents plan, retrieve, grade, generate, and critique answers from uploaded PDFs and web pages using hybrid search and Qwen 2.5-7B โ all running free on HuggingFace Spaces.
---
## Table of Contents
- [Features](#-features)
- [Architecture](#๏ธ-architecture)
- [Getting Started](#-getting-started)
- [Docker Deployment](#-docker-deployment)
- [Dashboard Modules](#-dashboard-modules)
- [ML Models](#-ml-models)
- [Project Structure](#-project-structure)
- [Author](#-author)
- [Contributing](#-contributing)
- [Disclaimer](#disclaimer)
- [License](#-license)
---
## โจ Features
| ๐ง LangGraph State Machine | Five agents wired into a linear StateGraph โ Planner โ Retriever โ Grader โ Generator โ Critic. |
| ๐ Hybrid RAG (FAISS + BM25) | Semantic vector search combined with BM25 keyword search, fused via Reciprocal Rank Fusion for precision retrieval. |
| ๐ค Multi-Agent Orchestration | Planner, Retriever, Grader, Generator, and Critic agents each with specialized roles โ only 3 LLM calls per query. |
| โก Score-Based Grading | Grader uses hybrid search scores + keyword overlap โ no LLM call needed, instant and deterministic relevance scoring. |
| ๐ PDF & URL Ingestion | Upload PDF files up to 10 MB or paste any public URL โ both are chunked, embedded, and indexed automatically. |
| ๐ Secure by Design | Stateless REST backend, no user data persisted, HF token kept server-side only. |
| ๐ณ Containerized Deployment | Docker-first with Gunicorn, embedding model pre-downloaded at build time for fast cold starts. |
---
## ๐๏ธ Architecture
```
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ DocMind โ LangGraph Flow โ
โ โ
โ PDF / URL โโโถ Ingestor โโโถ FAISS+BM25 Hybrid Vector Store โ
โ โ โ
โ User Query โโโถ [PLANNER Agent] โ (Qwen 2.5-7B, 0.3) โ
โ โ โ โ
โ [RETRIEVER] โโโโโโโโ (FAISS+BM25+RRF) โ
โ โ โ
โ [GRADER] (score-based, no LLM call) โ
โ โ โ
โ [GENERATOR] (Qwen 2.5-7B, 0.4) โ
โ โ โ
โ [CRITIC] (Qwen 2.5-7B, 0.1) โ
โ โ โ
โ [OUTPUT] Flask API + Single-Page UI โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
```
---
## ๐ Getting Started
### Prerequisites
- Python 3.10+ ยท Docker ยท Git ยท Free HuggingFace account
### Local Installation
```bash
git clone https://github.com/mnoorchenar/docmind.git
cd docmind
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
cp .env.example .env
# Edit .env โ set HF_TOKEN to your free HuggingFace Read token
python app.py
```
Open `http://localhost:7860` ๐
### Getting your free HuggingFace token
1. Create a free account at [huggingface.co](https://huggingface.co)
2. Go to Settings โ Access Tokens โ New Token โ Role: **Read**
3. Copy the token and set it as `HF_TOKEN` in your `.env` file or Space secrets
---
## ๐ณ Docker Deployment
```bash
docker build -t docmind .
docker run -p 7860:7860 -e HF_TOKEN=hf_your_token_here docmind
```
---
## ๐ App Modules
| Module | Description | Status |
|--------|-------------|--------|
| ๐ค Upload & Index | PDF / URL ingest, chunk, embed (local BAAI model), FAISS+BM25 index | โ
Live |
| ๐ Research Query | LangGraph 5-agent pipeline with real-time trace log | โ
Live |
---
## ๐ง ML Models
```python
stack = {
# โโ LLM (LangChain LCEL chains) โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
"llm": "Qwen/Qwen2.5-7B-Instruct", # via HF Router
"lcel_chain": "ChatPromptTemplate | ChatOpenAI | StrOutputParser",
"retry": "ChatOpenAI.with_retry(stop_after_attempt=2)",
# โโ RAG (LangChain + custom hybrid) โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
"splitter": "RecursiveCharacterTextSplitter (langchain-text-splitters)",
"documents": "langchain_core.documents.Document",
"embeddings": "HuggingFaceEmbeddings (BAAI/bge-small-en-v1.5, local)",
"vector_index": "FAISS IndexFlatIP (cosine)",
"keyword_index": "BM25Okapi (rank-bm25)",
"fusion": "Reciprocal Rank Fusion (RRF k=60)",
"grader": "score-based (hybrid score ร 0.7 + keyword overlap ร 0.3)",
# โโ Orchestration (LangGraph) โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
"graph": "LangGraph 0.2 StateGraph โ 5 nodes, linear pipeline",
}
```
---
## ๐ Project Structure
```
docmind/
โโโ ๐ app.py # Flask entry point, 5 REST routes
โโโ ๐ requirements.txt
โโโ ๐ Dockerfile # Port 7860, embedding model pre-downloaded
โโโ ๐ .env.example
โโโ ๐ agents/
โ โโโ ๐ llm_factory.py # get_llm() โ LangChain ChatOpenAI (HF Router)
โ โโโ ๐ planner.py # LCEL: ChatPromptTemplate | ChatOpenAI | StrOutputParser
โ โโโ ๐ retriever.py # Hybrid FAISS+BM25 search wrapper
โ โโโ ๐ grader.py # Score-based relevance grading (no LLM call)
โ โโโ ๐ generator.py # LCEL chain โ cited answer generation
โ โโโ ๐ critic.py # LCEL chain โ hallucination detection
โโโ ๐ graph/
โ โโโ ๐ research_graph.py # LangGraph StateGraph (5 nodes, linear pipeline)
โโโ ๐ rag/
โ โโโ ๐ ingestor.py # RecursiveCharacterTextSplitter + Document objects
โ โโโ ๐ vector_store.py # FAISS + BM25 + RRF, accepts Document or dict
โ โโโ ๐ embeddings.py # LangChain HuggingFaceEmbeddings (bge-small-en-v1.5)
โโโ ๐ tracing/
โ โโโ ๐ tracer.py # Thread-safe in-memory trace store
โโโ ๐ templates/
โ โโโ ๐ index.html # Dark-mode single-page UI
โโโ ๐ docs/
โโโ ๐ project-template.html # Portfolio showcase page
```
---
## ๐จโ๐ป Author