🧠 DocMind — Agentic Research Platform

[![Python](https://img.shields.io/badge/Python-3.10+-3b82f6?style=for-the-badge&logo=python&logoColor=white)](https://www.python.org/) [![LangGraph](https://img.shields.io/badge/LangGraph-0.2-06b6d4?style=for-the-badge)](https://github.com/langchain-ai/langgraph) [![LangChain](https://img.shields.io/badge/LangChain-0.3-4f46e5?style=for-the-badge)](https://langchain.com/) [![Flask](https://img.shields.io/badge/Flask-3.1-3b82f6?style=for-the-badge&logo=flask&logoColor=white)](https://flask.palletsprojects.com/) [![Docker](https://img.shields.io/badge/Docker-Ready-3b82f6?style=for-the-badge&logo=docker&logoColor=white)](https://www.docker.com/) [![HuggingFace](https://img.shields.io/badge/HuggingFace-Spaces-ffcc00?style=for-the-badge&logo=huggingface&logoColor=black)](https://huggingface.co/mnoorchenar/spaces) [![Status](https://img.shields.io/badge/Status-Active-22c55e?style=for-the-badge)](#)
**🧠 DocMind** — A clean, minimal agentic document research platform. Five specialized LangGraph agents plan, retrieve, grade, generate, and critique answers from uploaded PDFs and web pages using hybrid search and Qwen 2.5-7B — all running free on HuggingFace Spaces.
---

Mohammad Noorchenarboo

Data Scientist | AI Researcher | Biostatistician 📍 Ontario, Canada 📧 mohammadnoorchenarboo@gmail.com [![LinkedIn](https://img.shields.io/badge/LinkedIn-0077B5?style=for-the-badge&logo=linkedin&logoColor=white)](https://www.linkedin.com/in/mnoorchenar) [![HuggingFace](https://img.shields.io/badge/HuggingFace-ffcc00?style=for-the-badge&logo=huggingface&logoColor=black)](https://huggingface.co/mnoorchenar/spaces) [![GitHub](https://img.shields.io/badge/GitHub-181717?style=for-the-badge&logo=github&logoColor=white)](https://github.com/mnoorchenar)

🧠 LangGraph State Machine	Five agents wired into a linear StateGraph — Planner → Retriever → Grader → Generator → Critic.
🔍 Hybrid RAG (FAISS + BM25)	Semantic vector search combined with BM25 keyword search, fused via Reciprocal Rank Fusion for precision retrieval.
🤖 Multi-Agent Orchestration	Planner, Retriever, Grader, Generator, and Critic agents each with specialized roles — only 3 LLM calls per query.
⚡ Score-Based Grading	Grader uses hybrid search scores + keyword overlap — no LLM call needed, instant and deterministic relevance scoring.
📄 PDF & URL Ingestion	Upload PDF files up to 10 MB or paste any public URL — both are chunked, embedded, and indexed automatically.
🔒 Secure by Design	Stateless REST backend, no user data persisted, HF token kept server-side only.
🐳 Containerized Deployment	Docker-first with Gunicorn, embedding model pre-downloaded at build time for fast cold starts.