Amit-kr26's picture
Initial commit: Multimodal Math Mentor
3c25c17

A newer version of the Gradio SDK is available: 6.11.0

Upgrade
metadata
title: Multimodal Math Mentor
emoji: ๐Ÿงฎ
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.23.0
app_file: app.py
pinned: false

Multimodal Math Mentor

AI-powered JEE Math Problem Solver with RAG, Multi-Agent System, HITL, and Memory.

Live Demo: huggingface.co/spaces/Amit-kr26/Multimodal_Math_Mentor

Architecture

flowchart LR
    A[Input] --> B[Extractor]
    B -->|low conf| H1[HITL Review]
    H1 --> C
    B --> C[Guardrail]
    C -->|invalid| X[Reject]
    C --> D[Parser]
    D -->|ambiguous| H2[HITL Clarify]
    H2 --> D
    D --> E[Router]
    E --> F[RAG + Memory]
    F --> G[Solver]
    G --> V[Verifier]
    V -->|retry| G
    V -->|low conf| H3[HITL Verify]
    V --> K[Explainer]
    H3 --> K
    K --> M[Save to Memory]

Tech Stack

Component Technology
UI Gradio
Agent Orchestration LangGraph
Vector Store FAISS
Embeddings sentence-transformers/all-MiniLM-L6-v2
OCR EasyOCR
ASR Whisper
Math Computation SymPy (sandboxed)
LLM Configurable (any OpenAI-compatible API)
RAG Retrieval Hybrid: FAISS + BM25 (reciprocal rank fusion)
Web Search DuckDuckGo
Deployment HuggingFace Spaces

Agents (7+)

# Agent Role Tools
1 Extractor OCR/ASR/text input processing EasyOCR, Whisper
2 Guardrail Filters off-topic, injection attempts LLM
3 Parser Structures problem into JSON LLM
4 Router Topic classification + strategy + tool selection LLM
5 Solver Solves via RAG + Memory + Web Search SymPy, DuckDuckGo, LLM
6 Verifier Correctness + confidence check, triggers retries SymPy, LLM
7 Explainer Student-friendly explanation + diagram generation Matplotlib, LLM
8 Memory Saver Stores problem-solution pairs for reuse JSONL + embeddings

HITL (Human-in-the-Loop) Interrupt Points

  • After extraction: Low OCR/ASR confidence โ†’ user reviews text
  • After parsing: Ambiguous input โ†’ user clarifies
  • After verification: Low confidence or incorrect โ†’ user decides

Agent Communication

All agents share state via LangGraph's MathMentorState TypedDict. The graph uses conditional edges for branching (guardrail pass/fail, verification retry/approve, HITL interrupts) and MemorySaver checkpointer for pause/resume.

Setup

# Clone
git clone https://github.com/Amit-kr26/Multimodal_Math_Mentor
cd Multimodal_Math_Mentor

# Install dependencies
poetry install

# Configure LLM
cp .env.example .env
# Edit .env with your LLM settings

# Build RAG index
poetry run python rag/indexer.py

# Run the app
poetry run python app.py

Configuration

Set these in .env or via the UI settings panel:

LLM_BASE_URL=http://localhost:11434/v1   # Ollama, OpenAI, Together, etc.
LLM_MODEL=llama3                          # Any model name
LLM_API_KEY=not-needed                    # API key if required

Features

  • Multimodal Input: Text, Image (EasyOCR), Audio (Whisper)
  • Human-in-the-Loop: 3 interrupt points (extraction, parsing, verification)
  • Memory & Self-Learning: Stores problems in JSONL, retrieves similar via cosine similarity, learns from user feedback
  • Hybrid RAG: 20 knowledge base documents, FAISS vector search + BM25 keyword search with reciprocal rank fusion
  • Math Computation: SymPy sandbox (subprocess isolation) for reliable arithmetic
  • Web Search: DuckDuckGo integration when router determines it's needed
  • Streaming UI: Real-time pipeline progress bar with per-agent status
  • Multi-turn Chat: Follow-up questions with full conversation context
  • Diagrams: LLM-driven expression extraction โ†’ matplotlib plots
  • Solver Retries: Verifier feedback passed back to solver for self-correction
  • Guardrail: Prompt injection detection including OCR-injected attacks
  • Evaluation Suite: 25 curated test problems with batch runner and markdown reports

Evaluation

poetry run python eval/run_eval.py

Runs 25 problems across 4 topics (algebra, probability, calculus, linear algebra) with 3 difficulty levels. Generates:

  • Console summary with per-topic accuracy
  • JSON report with full results
  • Markdown report table

Project Structure

Multimodal_Math_Mentor/
โ”œโ”€โ”€ app.py                    # Gradio UI + event handlers
โ”œโ”€โ”€ config.py                 # Settings, env loading, log suppression
โ”œโ”€โ”€ agents/
โ”‚   โ”œโ”€โ”€ graph.py              # LangGraph state machine (10 nodes, 5 conditional edges)
โ”‚   โ”œโ”€โ”€ state.py              # MathMentorState TypedDict
โ”‚   โ”œโ”€โ”€ guardrail_agent.py    # Input validation + injection detection
โ”‚   โ”œโ”€โ”€ parser_agent.py       # Structured problem extraction
โ”‚   โ”œโ”€โ”€ router_agent.py       # Topic/strategy/tool classification
โ”‚   โ”œโ”€โ”€ solver_agent.py       # RAG + Memory + Web + SymPy solving
โ”‚   โ”œโ”€โ”€ verifier_agent.py     # Correctness verification
โ”‚   โ””โ”€โ”€ explainer_agent.py    # Step-by-step explanation + diagrams
โ”œโ”€โ”€ input_handlers/
โ”‚   โ”œโ”€โ”€ text_handler.py       # Text passthrough
โ”‚   โ”œโ”€โ”€ image_handler.py      # EasyOCR (lazy loaded)
โ”‚   โ””โ”€โ”€ audio_handler.py      # Whisper (lazy loaded)
โ”œโ”€โ”€ rag/
โ”‚   โ”œโ”€โ”€ indexer.py             # FAISS index builder
โ”‚   โ”œโ”€โ”€ retriever.py           # Hybrid FAISS+BM25 retrieval
โ”‚   โ””โ”€โ”€ knowledge_base/       # 20 math topic documents
โ”œโ”€โ”€ memory/
โ”‚   โ”œโ”€โ”€ store.py               # JSONL read/write/feedback
โ”‚   โ””โ”€โ”€ retriever.py           # Cosine similarity search (cached)
โ”œโ”€โ”€ tools/
โ”‚   โ”œโ”€โ”€ calculator.py          # SymPy sandbox (subprocess)
โ”‚   โ”œโ”€โ”€ web_search.py          # DuckDuckGo search
โ”‚   โ””โ”€โ”€ plotter.py             # Matplotlib function plotter
โ”œโ”€โ”€ llm/
โ”‚   โ””โ”€โ”€ client.py              # OpenAI-compatible LLM factory
โ”œโ”€โ”€ ui/
โ”‚   โ””โ”€โ”€ callbacks.py           # Pipeline orchestration + settings
โ”œโ”€โ”€ eval/
โ”‚   โ”œโ”€โ”€ test_problems.json     # 25 curated problems
โ”‚   โ””โ”€โ”€ run_eval.py            # Batch evaluation + reporting
โ””โ”€โ”€ requirements.txt           # HF Spaces dependencies