---
title: Multimodal Math Mentor
emoji: 🧮
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: "5.23.0"
app_file: app.py
pinned: false
---

# Multimodal Math Mentor

AI-powered JEE Math Problem Solver with RAG, Multi-Agent System, HITL, and Memory.

**Live Demo:** [huggingface.co/spaces/Amit-kr26/Multimodal_Math_Mentor](https://huggingface.co/spaces/Amit-kr26/Multimodal_Math_Mentor)

## Architecture

```mermaid
flowchart LR
    A[Input] --> B[Extractor]
    B -->|low conf| H1[HITL Review]
    H1 --> C
    B --> C[Guardrail]
    C -->|invalid| X[Reject]
    C --> D[Parser]
    D -->|ambiguous| H2[HITL Clarify]
    H2 --> D
    D --> E[Router]
    E --> F[RAG + Memory]
    F --> G[Solver]
    G --> V[Verifier]
    V -->|retry| G
    V -->|low conf| H3[HITL Verify]
    V --> K[Explainer]
    H3 --> K
    K --> M[Save to Memory]
```

## Tech Stack

| Component | Technology |
|---|---|
| UI | Gradio |
| Agent Orchestration | LangGraph |
| Vector Store | FAISS |
| Embeddings | sentence-transformers/all-MiniLM-L6-v2 |
| OCR | EasyOCR |
| ASR | Whisper |
| Math Computation | SymPy (sandboxed) |
| LLM | Configurable (any OpenAI-compatible API) |
| RAG Retrieval | Hybrid: FAISS + BM25 (reciprocal rank fusion) |
| Web Search | DuckDuckGo |
| Deployment | HuggingFace Spaces |

## Agents (7+)

| # | Agent | Role | Tools |
|---|-------|------|-------|
| 1 | **Extractor** | OCR/ASR/text input processing | EasyOCR, Whisper |
| 2 | **Guardrail** | Filters off-topic, injection attempts | LLM |
| 3 | **Parser** | Structures problem into JSON | LLM |
| 4 | **Router** | Topic classification + strategy + tool selection | LLM |
| 5 | **Solver** | Solves via RAG + Memory + Web Search | SymPy, DuckDuckGo, LLM |
| 6 | **Verifier** | Correctness + confidence check, triggers retries | SymPy, LLM |
| 7 | **Explainer** | Student-friendly explanation + diagram generation | Matplotlib, LLM |
| 8 | **Memory Saver** | Stores problem-solution pairs for reuse | JSONL + embeddings |

### HITL (Human-in-the-Loop) Interrupt Points
- **After extraction**: Low OCR/ASR confidence → user reviews text
- **After parsing**: Ambiguous input → user clarifies
- **After verification**: Low confidence or incorrect → user decides

### Agent Communication
All agents share state via LangGraph's `MathMentorState` TypedDict. The graph uses conditional edges for branching (guardrail pass/fail, verification retry/approve, HITL interrupts) and `MemorySaver` checkpointer for pause/resume.

## Setup

```bash
# Clone
git clone https://github.com/Amit-kr26/Multimodal_Math_Mentor
cd Multimodal_Math_Mentor

# Install dependencies
poetry install

# Configure LLM
cp .env.example .env
# Edit .env with your LLM settings

# Build RAG index
poetry run python rag/indexer.py

# Run the app
poetry run python app.py
```

## Configuration

Set these in `.env` or via the UI settings panel:

```
LLM_BASE_URL=http://localhost:11434/v1   # Ollama, OpenAI, Together, etc.
LLM_MODEL=llama3                          # Any model name
LLM_API_KEY=not-needed                    # API key if required
```

## Features

- **Multimodal Input**: Text, Image (EasyOCR), Audio (Whisper)
- **Human-in-the-Loop**: 3 interrupt points (extraction, parsing, verification)
- **Memory & Self-Learning**: Stores problems in JSONL, retrieves similar via cosine similarity, learns from user feedback
- **Hybrid RAG**: 20 knowledge base documents, FAISS vector search + BM25 keyword search with reciprocal rank fusion
- **Math Computation**: SymPy sandbox (subprocess isolation) for reliable arithmetic
- **Web Search**: DuckDuckGo integration when router determines it's needed
- **Streaming UI**: Real-time pipeline progress bar with per-agent status
- **Multi-turn Chat**: Follow-up questions with full conversation context
- **Diagrams**: LLM-driven expression extraction → matplotlib plots
- **Solver Retries**: Verifier feedback passed back to solver for self-correction
- **Guardrail**: Prompt injection detection including OCR-injected attacks
- **Evaluation Suite**: 25 curated test problems with batch runner and markdown reports

## Evaluation

```bash
poetry run python eval/run_eval.py
```

Runs 25 problems across 4 topics (algebra, probability, calculus, linear algebra) with 3 difficulty levels. Generates:
- Console summary with per-topic accuracy
- JSON report with full results
- Markdown report table

## Project Structure

```
Multimodal_Math_Mentor/
├── app.py                    # Gradio UI + event handlers
├── config.py                 # Settings, env loading, log suppression
├── agents/
│   ├── graph.py              # LangGraph state machine (10 nodes, 5 conditional edges)
│   ├── state.py              # MathMentorState TypedDict
│   ├── guardrail_agent.py    # Input validation + injection detection
│   ├── parser_agent.py       # Structured problem extraction
│   ├── router_agent.py       # Topic/strategy/tool classification
│   ├── solver_agent.py       # RAG + Memory + Web + SymPy solving
│   ├── verifier_agent.py     # Correctness verification
│   └── explainer_agent.py    # Step-by-step explanation + diagrams
├── input_handlers/
│   ├── text_handler.py       # Text passthrough
│   ├── image_handler.py      # EasyOCR (lazy loaded)
│   └── audio_handler.py      # Whisper (lazy loaded)
├── rag/
│   ├── indexer.py             # FAISS index builder
│   ├── retriever.py           # Hybrid FAISS+BM25 retrieval
│   └── knowledge_base/       # 20 math topic documents
├── memory/
│   ├── store.py               # JSONL read/write/feedback
│   └── retriever.py           # Cosine similarity search (cached)
├── tools/
│   ├── calculator.py          # SymPy sandbox (subprocess)
│   ├── web_search.py          # DuckDuckGo search
│   └── plotter.py             # Matplotlib function plotter
├── llm/
│   └── client.py              # OpenAI-compatible LLM factory
├── ui/
│   └── callbacks.py           # Pipeline orchestration + settings
├── eval/
│   ├── test_problems.json     # 25 curated problems
│   └── run_eval.py            # Batch evaluation + reporting
└── requirements.txt           # HF Spaces dependencies
```