--- title: DocMind-Agentic-Research colorFrom: blue colorTo: indigo sdk: docker ---

๐Ÿง  DocMind โ€” Agentic Research Platform

Typing SVG
[![Python](https://img.shields.io/badge/Python-3.10+-3b82f6?style=for-the-badge&logo=python&logoColor=white)](https://www.python.org/) [![LangGraph](https://img.shields.io/badge/LangGraph-0.2-06b6d4?style=for-the-badge)](https://github.com/langchain-ai/langgraph) [![LangChain](https://img.shields.io/badge/LangChain-0.3-4f46e5?style=for-the-badge)](https://langchain.com/) [![Flask](https://img.shields.io/badge/Flask-3.1-3b82f6?style=for-the-badge&logo=flask&logoColor=white)](https://flask.palletsprojects.com/) [![Docker](https://img.shields.io/badge/Docker-Ready-3b82f6?style=for-the-badge&logo=docker&logoColor=white)](https://www.docker.com/) [![HuggingFace](https://img.shields.io/badge/HuggingFace-Spaces-ffcc00?style=for-the-badge&logo=huggingface&logoColor=black)](https://huggingface.co/mnoorchenar/spaces) [![Status](https://img.shields.io/badge/Status-Active-22c55e?style=for-the-badge)](#)
**๐Ÿง  DocMind** โ€” A clean, minimal agentic document research platform. Five specialized LangGraph agents plan, retrieve, grade, generate, and critique answers from uploaded PDFs and web pages using hybrid search and Qwen 2.5-7B โ€” all running free on HuggingFace Spaces.
---
## Table of Contents - [Features](#-features) - [Architecture](#๏ธ-architecture) - [Getting Started](#-getting-started) - [Docker Deployment](#-docker-deployment) - [Dashboard Modules](#-dashboard-modules) - [ML Models](#-ml-models) - [Project Structure](#-project-structure) - [Author](#-author) - [Contributing](#-contributing) - [Disclaimer](#disclaimer) - [License](#-license) --- ## โœจ Features
๐Ÿง  LangGraph State MachineFive agents wired into a linear StateGraph โ€” Planner โ†’ Retriever โ†’ Grader โ†’ Generator โ†’ Critic.
๐Ÿ” Hybrid RAG (FAISS + BM25)Semantic vector search combined with BM25 keyword search, fused via Reciprocal Rank Fusion for precision retrieval.
๐Ÿค– Multi-Agent OrchestrationPlanner, Retriever, Grader, Generator, and Critic agents each with specialized roles โ€” only 3 LLM calls per query.
โšก Score-Based GradingGrader uses hybrid search scores + keyword overlap โ€” no LLM call needed, instant and deterministic relevance scoring.
๐Ÿ“„ PDF & URL IngestionUpload PDF files up to 10 MB or paste any public URL โ€” both are chunked, embedded, and indexed automatically.
๐Ÿ”’ Secure by DesignStateless REST backend, no user data persisted, HF token kept server-side only.
๐Ÿณ Containerized DeploymentDocker-first with Gunicorn, embedding model pre-downloaded at build time for fast cold starts.
--- ## ๐Ÿ—๏ธ Architecture ``` โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ DocMind โ€” LangGraph Flow โ”‚ โ”‚ โ”‚ โ”‚ PDF / URL โ”€โ”€โ–ถ Ingestor โ”€โ”€โ–ถ FAISS+BM25 Hybrid Vector Store โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ User Query โ”€โ”€โ–ถ [PLANNER Agent] โ”‚ (Qwen 2.5-7B, 0.3) โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ [RETRIEVER] โ—€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ (FAISS+BM25+RRF) โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ [GRADER] (score-based, no LLM call) โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ [GENERATOR] (Qwen 2.5-7B, 0.4) โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ [CRITIC] (Qwen 2.5-7B, 0.1) โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ [OUTPUT] Flask API + Single-Page UI โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ ``` --- ## ๐Ÿš€ Getting Started ### Prerequisites - Python 3.10+ ยท Docker ยท Git ยท Free HuggingFace account ### Local Installation ```bash git clone https://github.com/mnoorchenar/docmind.git cd docmind python -m venv venv source venv/bin/activate # Windows: venv\Scripts\activate pip install -r requirements.txt cp .env.example .env # Edit .env โ€” set HF_TOKEN to your free HuggingFace Read token python app.py ``` Open `http://localhost:7860` ๐ŸŽ‰ ### Getting your free HuggingFace token 1. Create a free account at [huggingface.co](https://huggingface.co) 2. Go to Settings โ†’ Access Tokens โ†’ New Token โ†’ Role: **Read** 3. Copy the token and set it as `HF_TOKEN` in your `.env` file or Space secrets --- ## ๐Ÿณ Docker Deployment ```bash docker build -t docmind . docker run -p 7860:7860 -e HF_TOKEN=hf_your_token_here docmind ``` --- ## ๐Ÿ“Š App Modules | Module | Description | Status | |--------|-------------|--------| | ๐Ÿ“ค Upload & Index | PDF / URL ingest, chunk, embed (local BAAI model), FAISS+BM25 index | โœ… Live | | ๐Ÿ” Research Query | LangGraph 5-agent pipeline with real-time trace log | โœ… Live | --- ## ๐Ÿง  ML Models ```python stack = { # โ”€โ”€ LLM (LangChain LCEL chains) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ "llm": "Qwen/Qwen2.5-7B-Instruct", # via HF Router "lcel_chain": "ChatPromptTemplate | ChatOpenAI | StrOutputParser", "retry": "ChatOpenAI.with_retry(stop_after_attempt=2)", # โ”€โ”€ RAG (LangChain + custom hybrid) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ "splitter": "RecursiveCharacterTextSplitter (langchain-text-splitters)", "documents": "langchain_core.documents.Document", "embeddings": "HuggingFaceEmbeddings (BAAI/bge-small-en-v1.5, local)", "vector_index": "FAISS IndexFlatIP (cosine)", "keyword_index": "BM25Okapi (rank-bm25)", "fusion": "Reciprocal Rank Fusion (RRF k=60)", "grader": "score-based (hybrid score ร— 0.7 + keyword overlap ร— 0.3)", # โ”€โ”€ Orchestration (LangGraph) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ "graph": "LangGraph 0.2 StateGraph โ€” 5 nodes, linear pipeline", } ``` --- ## ๐Ÿ“ Project Structure ``` docmind/ โ”œโ”€โ”€ ๐Ÿ“„ app.py # Flask entry point, 5 REST routes โ”œโ”€โ”€ ๐Ÿ“„ requirements.txt โ”œโ”€โ”€ ๐Ÿ“„ Dockerfile # Port 7860, embedding model pre-downloaded โ”œโ”€โ”€ ๐Ÿ“„ .env.example โ”œโ”€โ”€ ๐Ÿ“‚ agents/ โ”‚ โ”œโ”€โ”€ ๐Ÿ“„ llm_factory.py # get_llm() โ†’ LangChain ChatOpenAI (HF Router) โ”‚ โ”œโ”€โ”€ ๐Ÿ“„ planner.py # LCEL: ChatPromptTemplate | ChatOpenAI | StrOutputParser โ”‚ โ”œโ”€โ”€ ๐Ÿ“„ retriever.py # Hybrid FAISS+BM25 search wrapper โ”‚ โ”œโ”€โ”€ ๐Ÿ“„ grader.py # Score-based relevance grading (no LLM call) โ”‚ โ”œโ”€โ”€ ๐Ÿ“„ generator.py # LCEL chain โ€” cited answer generation โ”‚ โ””โ”€โ”€ ๐Ÿ“„ critic.py # LCEL chain โ€” hallucination detection โ”œโ”€โ”€ ๐Ÿ“‚ graph/ โ”‚ โ””โ”€โ”€ ๐Ÿ“„ research_graph.py # LangGraph StateGraph (5 nodes, linear pipeline) โ”œโ”€โ”€ ๐Ÿ“‚ rag/ โ”‚ โ”œโ”€โ”€ ๐Ÿ“„ ingestor.py # RecursiveCharacterTextSplitter + Document objects โ”‚ โ”œโ”€โ”€ ๐Ÿ“„ vector_store.py # FAISS + BM25 + RRF, accepts Document or dict โ”‚ โ””โ”€โ”€ ๐Ÿ“„ embeddings.py # LangChain HuggingFaceEmbeddings (bge-small-en-v1.5) โ”œโ”€โ”€ ๐Ÿ“‚ tracing/ โ”‚ โ””โ”€โ”€ ๐Ÿ“„ tracer.py # Thread-safe in-memory trace store โ”œโ”€โ”€ ๐Ÿ“‚ templates/ โ”‚ โ””โ”€โ”€ ๐Ÿ“„ index.html # Dark-mode single-page UI โ””โ”€โ”€ ๐Ÿ“‚ docs/ โ””โ”€โ”€ ๐Ÿ“„ project-template.html # Portfolio showcase page ``` --- ## ๐Ÿ‘จโ€๐Ÿ’ป Author
Mohammad Noorchenarboo

Mohammad Noorchenarboo

Data Scientist  |  AI Researcher  |  Biostatistician ๐Ÿ“ Ontario, Canada    ๐Ÿ“ง mohammadnoorchenarboo@gmail.com [![LinkedIn](https://img.shields.io/badge/LinkedIn-0077B5?style=for-the-badge&logo=linkedin&logoColor=white)](https://www.linkedin.com/in/mnoorchenar) [![HuggingFace](https://img.shields.io/badge/HuggingFace-ffcc00?style=for-the-badge&logo=huggingface&logoColor=black)](https://huggingface.co/mnoorchenar/spaces) [![GitHub](https://img.shields.io/badge/GitHub-181717?style=for-the-badge&logo=github&logoColor=white)](https://github.com/mnoorchenar)
--- ## ๐Ÿค Contributing 1. Fork the repository 2. Create a feature branch: `git checkout -b feature/amazing-feature` 3. Commit: `git commit -m 'Add amazing feature'` 4. Push: `git push origin feature/amazing-feature` 5. Open a Pull Request --- ## Disclaimer This project is developed strictly for educational and research purposes. All LLM outputs are AI-generated and may contain inaccuracies. No real user data is stored. Provided "as is" without warranty of any kind. --- ## ๐Ÿ“œ License Distributed under the **MIT License**.