Spaces:
Running
Running
| title: DocMind-Agentic-Research | |
| colorFrom: blue | |
| colorTo: indigo | |
| sdk: docker | |
| <div align="center"> | |
| <h1>π§ DocMind β Agentic Research Platform</h1> | |
| <img src="https://readme-typing-svg.demolab.com?font=Fira+Code&size=22&duration=3000&pause=1000&color=4f8ef7¢er=true&vCenter=true&width=700&lines=LangGraph+%C2%B7+5+Agents+%C2%B7+Hybrid+RAG;Qwen+2.5-7B+%C2%B7+3+LLM+Calls+per+Query;Deployed+Free+on+HuggingFace+Spaces" alt="Typing SVG"/> | |
| <br/> | |
| [](https://www.python.org/) | |
| [](https://github.com/langchain-ai/langgraph) | |
| [](https://langchain.com/) | |
| [](https://flask.palletsprojects.com/) | |
| [](https://www.docker.com/) | |
| [](https://huggingface.co/mnoorchenar/spaces) | |
| [](#) | |
| <br/> | |
| **π§ DocMind** β A clean, minimal agentic document research platform. Five specialized LangGraph agents plan, retrieve, grade, generate, and critique answers from uploaded PDFs and web pages using hybrid search and Qwen 2.5-7B β all running free on HuggingFace Spaces. | |
| <br/> | |
| --- | |
| </div> | |
| ## Table of Contents | |
| - [Features](#-features) | |
| - [Architecture](#οΈ-architecture) | |
| - [Getting Started](#-getting-started) | |
| - [Docker Deployment](#-docker-deployment) | |
| - [Dashboard Modules](#-dashboard-modules) | |
| - [ML Models](#-ml-models) | |
| - [Project Structure](#-project-structure) | |
| - [Author](#-author) | |
| - [Contributing](#-contributing) | |
| - [Disclaimer](#disclaimer) | |
| - [License](#-license) | |
| --- | |
| ## β¨ Features | |
| <table> | |
| <tr><td>π§ <b>LangGraph State Machine</b></td><td>Five agents wired into a linear StateGraph β Planner β Retriever β Grader β Generator β Critic.</td></tr> | |
| <tr><td>π <b>Hybrid RAG (FAISS + BM25)</b></td><td>Semantic vector search combined with BM25 keyword search, fused via Reciprocal Rank Fusion for precision retrieval.</td></tr> | |
| <tr><td>π€ <b>Multi-Agent Orchestration</b></td><td>Planner, Retriever, Grader, Generator, and Critic agents each with specialized roles β only 3 LLM calls per query.</td></tr> | |
| <tr><td>β‘ <b>Score-Based Grading</b></td><td>Grader uses hybrid search scores + keyword overlap β no LLM call needed, instant and deterministic relevance scoring.</td></tr> | |
| <tr><td>π <b>PDF & URL Ingestion</b></td><td>Upload PDF files up to 10 MB or paste any public URL β both are chunked, embedded, and indexed automatically.</td></tr> | |
| <tr><td>π <b>Secure by Design</b></td><td>Stateless REST backend, no user data persisted, HF token kept server-side only.</td></tr> | |
| <tr><td>π³ <b>Containerized Deployment</b></td><td>Docker-first with Gunicorn, embedding model pre-downloaded at build time for fast cold starts.</td></tr> | |
| </table> | |
| --- | |
| ## ποΈ Architecture | |
| ``` | |
| ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β DocMind β LangGraph Flow β | |
| β β | |
| β PDF / URL βββΆ Ingestor βββΆ FAISS+BM25 Hybrid Vector Store β | |
| β β β | |
| β User Query βββΆ [PLANNER Agent] β (Qwen 2.5-7B, 0.3) β | |
| β β β β | |
| β [RETRIEVER] ββββββββ (FAISS+BM25+RRF) β | |
| β β β | |
| β [GRADER] (score-based, no LLM call) β | |
| β β β | |
| β [GENERATOR] (Qwen 2.5-7B, 0.4) β | |
| β β β | |
| β [CRITIC] (Qwen 2.5-7B, 0.1) β | |
| β β β | |
| β [OUTPUT] Flask API + Single-Page UI β | |
| ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| ``` | |
| --- | |
| ## π Getting Started | |
| ### Prerequisites | |
| - Python 3.10+ Β· Docker Β· Git Β· Free HuggingFace account | |
| ### Local Installation | |
| ```bash | |
| git clone https://github.com/mnoorchenar/docmind.git | |
| cd docmind | |
| python -m venv venv | |
| source venv/bin/activate # Windows: venv\Scripts\activate | |
| pip install -r requirements.txt | |
| cp .env.example .env | |
| # Edit .env β set HF_TOKEN to your free HuggingFace Read token | |
| python app.py | |
| ``` | |
| Open `http://localhost:7860` π | |
| ### Getting your free HuggingFace token | |
| 1. Create a free account at [huggingface.co](https://huggingface.co) | |
| 2. Go to Settings β Access Tokens β New Token β Role: **Read** | |
| 3. Copy the token and set it as `HF_TOKEN` in your `.env` file or Space secrets | |
| --- | |
| ## π³ Docker Deployment | |
| ```bash | |
| docker build -t docmind . | |
| docker run -p 7860:7860 -e HF_TOKEN=hf_your_token_here docmind | |
| ``` | |
| --- | |
| ## π App Modules | |
| | Module | Description | Status | | |
| |--------|-------------|--------| | |
| | π€ Upload & Index | PDF / URL ingest, chunk, embed (local BAAI model), FAISS+BM25 index | β Live | | |
| | π Research Query | LangGraph 5-agent pipeline with real-time trace log | β Live | | |
| --- | |
| ## π§ ML Models | |
| ```python | |
| stack = { | |
| # ββ LLM (LangChain LCEL chains) ββββββββββββββββββββββββββββββββββββββββββ | |
| "llm": "Qwen/Qwen2.5-7B-Instruct", # via HF Router | |
| "lcel_chain": "ChatPromptTemplate | ChatOpenAI | StrOutputParser", | |
| "retry": "ChatOpenAI.with_retry(stop_after_attempt=2)", | |
| # ββ RAG (LangChain + custom hybrid) ββββββββββββββββββββββββββββββββββββββ | |
| "splitter": "RecursiveCharacterTextSplitter (langchain-text-splitters)", | |
| "documents": "langchain_core.documents.Document", | |
| "embeddings": "HuggingFaceEmbeddings (BAAI/bge-small-en-v1.5, local)", | |
| "vector_index": "FAISS IndexFlatIP (cosine)", | |
| "keyword_index": "BM25Okapi (rank-bm25)", | |
| "fusion": "Reciprocal Rank Fusion (RRF k=60)", | |
| "grader": "score-based (hybrid score Γ 0.7 + keyword overlap Γ 0.3)", | |
| # ββ Orchestration (LangGraph) βββββββββββββββββββββββββββββββββββββββββββββ | |
| "graph": "LangGraph 0.2 StateGraph β 5 nodes, linear pipeline", | |
| } | |
| ``` | |
| --- | |
| ## π Project Structure | |
| ``` | |
| docmind/ | |
| βββ π app.py # Flask entry point, 5 REST routes | |
| βββ π requirements.txt | |
| βββ π Dockerfile # Port 7860, embedding model pre-downloaded | |
| βββ π .env.example | |
| βββ π agents/ | |
| β βββ π llm_factory.py # get_llm() β LangChain ChatOpenAI (HF Router) | |
| β βββ π planner.py # LCEL: ChatPromptTemplate | ChatOpenAI | StrOutputParser | |
| β βββ π retriever.py # Hybrid FAISS+BM25 search wrapper | |
| β βββ π grader.py # Score-based relevance grading (no LLM call) | |
| β βββ π generator.py # LCEL chain β cited answer generation | |
| β βββ π critic.py # LCEL chain β hallucination detection | |
| βββ π graph/ | |
| β βββ π research_graph.py # LangGraph StateGraph (5 nodes, linear pipeline) | |
| βββ π rag/ | |
| β βββ π ingestor.py # RecursiveCharacterTextSplitter + Document objects | |
| β βββ π vector_store.py # FAISS + BM25 + RRF, accepts Document or dict | |
| β βββ π embeddings.py # LangChain HuggingFaceEmbeddings (bge-small-en-v1.5) | |
| βββ π tracing/ | |
| β βββ π tracer.py # Thread-safe in-memory trace store | |
| βββ π templates/ | |
| β βββ π index.html # Dark-mode single-page UI | |
| βββ π docs/ | |
| βββ π project-template.html # Portfolio showcase page | |
| ``` | |
| --- | |
| ## π¨βπ» Author | |
| <div align="center"> | |
| <table><tr><td align="center" width="100%"> | |
| <img src="https://avatars.githubusercontent.com/mnoorchenar" width="120" style="border-radius:50%;border:3px solid #4f46e5" alt="Mohammad Noorchenarboo"/> | |
| <h3>Mohammad Noorchenarboo</h3> | |
| <code>Data Scientist</code> | <code>AI Researcher</code> | <code>Biostatistician</code> | |
| π Ontario, Canada π§ mohammadnoorchenarboo@gmail.com | |
| [](https://www.linkedin.com/in/mnoorchenar) | |
| [](https://huggingface.co/mnoorchenar/spaces) | |
| [](https://github.com/mnoorchenar) | |
| </td></tr></table> | |
| </div> | |
| --- | |
| ## π€ Contributing | |
| 1. Fork the repository | |
| 2. Create a feature branch: `git checkout -b feature/amazing-feature` | |
| 3. Commit: `git commit -m 'Add amazing feature'` | |
| 4. Push: `git push origin feature/amazing-feature` | |
| 5. Open a Pull Request | |
| --- | |
| ## Disclaimer | |
| <span style="color:red">This project is developed strictly for educational and research purposes. All LLM outputs are AI-generated and may contain inaccuracies. No real user data is stored. Provided "as is" without warranty of any kind.</span> | |
| --- | |
| ## π License | |
| Distributed under the **MIT License**. | |
| <div align="center"> | |
| <img src="https://capsule-render.vercel.app/api?type=waving&color=0:3b82f6,100:4f46e5&height=120§ion=footer&text=Made%20with%20%E2%9D%A4%EF%B8%8F%20by%20Mohammad%20Noorchenarboo&fontColor=ffffff&fontSize=18&fontAlignY=80" width="100%"/> | |
| </div> |