Spaces:

mnoorchenar
/

docmind

Sleeping

App Files Files Community

docmind / README.md

mnoorchenar

Update 2026-03-22 20:53:33

693f74a 2 months ago

preview code

raw

history blame contribute delete

10.6 kB

metadata

title: DocMind-Agentic-Research
colorFrom: blue
colorTo: indigo
sdk: docker

🧠 DocMind — Agentic Research Platform

🧠 DocMind — A clean, minimal agentic document research platform. Five specialized LangGraph agents plan, retrieve, grade, generate, and critique answers from uploaded PDFs and web pages using hybrid search and Qwen 2.5-7B — all running free on HuggingFace Spaces.

Features
Architecture
Getting Started
Docker Deployment
Dashboard Modules
ML Models
Project Structure
Author
Contributing
Disclaimer
License

✨ Features

🧠 LangGraph State Machine	Five agents wired into a linear StateGraph — Planner → Retriever → Grader → Generator → Critic.
🔍 Hybrid RAG (FAISS + BM25)	Semantic vector search combined with BM25 keyword search, fused via Reciprocal Rank Fusion for precision retrieval.
🤖 Multi-Agent Orchestration	Planner, Retriever, Grader, Generator, and Critic agents each with specialized roles — only 3 LLM calls per query.
⚡ Score-Based Grading	Grader uses hybrid search scores + keyword overlap — no LLM call needed, instant and deterministic relevance scoring.
📄 PDF & URL Ingestion	Upload PDF files up to 10 MB or paste any public URL — both are chunked, embedded, and indexed automatically.
🔒 Secure by Design	Stateless REST backend, no user data persisted, HF token kept server-side only.
🐳 Containerized Deployment	Docker-first with Gunicorn, embedding model pre-downloaded at build time for fast cold starts.

🏗️ Architecture

┌──────────────────────────────────────────────────────────────┐
│                   DocMind — LangGraph Flow                    │
│                                                              │
│  PDF / URL ──▶ Ingestor ──▶ FAISS+BM25 Hybrid Vector Store  │
│                                    │                         │
│  User Query ──▶ [PLANNER Agent]    │   (Qwen 2.5-7B, 0.3)   │
│                      │             │                         │
│                 [RETRIEVER] ◀──────┘  (FAISS+BM25+RRF)      │
│                      │                                       │
│                 [GRADER]  (score-based, no LLM call)         │
│                      │                                       │
│                 [GENERATOR]         (Qwen 2.5-7B, 0.4)       │
│                      │                                       │
│                  [CRITIC]           (Qwen 2.5-7B, 0.1)       │
│                      │                                       │
│                  [OUTPUT]  Flask API + Single-Page UI         │
└──────────────────────────────────────────────────────────────┘

🚀 Getting Started

Prerequisites

Python 3.10+ · Docker · Git · Free HuggingFace account

Local Installation

git clone https://github.com/mnoorchenar/docmind.git
cd docmind

python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

pip install -r requirements.txt

cp .env.example .env
# Edit .env — set HF_TOKEN to your free HuggingFace Read token

python app.py

Open http://localhost:7860 🎉

Getting your free HuggingFace token

Create a free account at huggingface.co
Go to Settings → Access Tokens → New Token → Role: Read
Copy the token and set it as HF_TOKEN in your .env file or Space secrets

🐳 Docker Deployment

docker build -t docmind .
docker run -p 7860:7860 -e HF_TOKEN=hf_your_token_here docmind

📊 App Modules

Module	Description	Status
📤 Upload & Index	PDF / URL ingest, chunk, embed (local BAAI model), FAISS+BM25 index	✅ Live
🔍 Research Query	LangGraph 5-agent pipeline with real-time trace log	✅ Live

🧠 ML Models

stack = {
    # ── LLM (LangChain LCEL chains) ──────────────────────────────────────────
    "llm":             "Qwen/Qwen2.5-7B-Instruct",         # via HF Router
    "lcel_chain":      "ChatPromptTemplate | ChatOpenAI | StrOutputParser",
    "retry":           "ChatOpenAI.with_retry(stop_after_attempt=2)",

    # ── RAG (LangChain + custom hybrid) ──────────────────────────────────────
    "splitter":        "RecursiveCharacterTextSplitter (langchain-text-splitters)",
    "documents":       "langchain_core.documents.Document",
    "embeddings":      "HuggingFaceEmbeddings (BAAI/bge-small-en-v1.5, local)",
    "vector_index":    "FAISS IndexFlatIP (cosine)",
    "keyword_index":   "BM25Okapi (rank-bm25)",
    "fusion":          "Reciprocal Rank Fusion (RRF k=60)",
    "grader":          "score-based (hybrid score × 0.7 + keyword overlap × 0.3)",

    # ── Orchestration (LangGraph) ─────────────────────────────────────────────
    "graph":           "LangGraph 0.2 StateGraph — 5 nodes, linear pipeline",
}

📁 Project Structure

docmind/
├── 📄 app.py                     # Flask entry point, 5 REST routes
├── 📄 requirements.txt
├── 📄 Dockerfile                 # Port 7860, embedding model pre-downloaded
├── 📄 .env.example
├── 📂 agents/
│   ├── 📄 llm_factory.py         # get_llm() → LangChain ChatOpenAI (HF Router)
│   ├── 📄 planner.py             # LCEL: ChatPromptTemplate | ChatOpenAI | StrOutputParser
│   ├── 📄 retriever.py           # Hybrid FAISS+BM25 search wrapper
│   ├── 📄 grader.py              # Score-based relevance grading (no LLM call)
│   ├── 📄 generator.py           # LCEL chain — cited answer generation
│   └── 📄 critic.py              # LCEL chain — hallucination detection
├── 📂 graph/
│   └── 📄 research_graph.py      # LangGraph StateGraph (5 nodes, linear pipeline)
├── 📂 rag/
│   ├── 📄 ingestor.py            # RecursiveCharacterTextSplitter + Document objects
│   ├── 📄 vector_store.py        # FAISS + BM25 + RRF, accepts Document or dict
│   └── 📄 embeddings.py          # LangChain HuggingFaceEmbeddings (bge-small-en-v1.5)
├── 📂 tracing/
│   └── 📄 tracer.py              # Thread-safe in-memory trace store
├── 📂 templates/
│   └── 📄 index.html             # Dark-mode single-page UI
└── 📂 docs/
    └── 📄 project-template.html  # Portfolio showcase page

👨‍💻 Author

Mohammad Noorchenarboo

Data Scientist | AI Researcher | Biostatistician 📍 Ontario, Canada 📧 mohammadnoorchenarboo@gmail.com

🤝 Contributing

Fork the repository
Create a feature branch: git checkout -b feature/amazing-feature
Commit: git commit -m 'Add amazing feature'
Push: git push origin feature/amazing-feature
Open a Pull Request

Disclaimer

This project is developed strictly for educational and research purposes. All LLM outputs are AI-generated and may contain inaccuracies. No real user data is stored. Provided "as is" without warranty of any kind.

📜 License

Distributed under the MIT License.