Amit-kr26's picture
Initial commit: Multimodal Math Mentor
3c25c17
---
title: Multimodal Math Mentor
emoji: ๐Ÿงฎ
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: "5.23.0"
app_file: app.py
pinned: false
---
# Multimodal Math Mentor
AI-powered JEE Math Problem Solver with RAG, Multi-Agent System, HITL, and Memory.
**Live Demo:** [huggingface.co/spaces/Amit-kr26/Multimodal_Math_Mentor](https://huggingface.co/spaces/Amit-kr26/Multimodal_Math_Mentor)
## Architecture
```mermaid
flowchart LR
A[Input] --> B[Extractor]
B -->|low conf| H1[HITL Review]
H1 --> C
B --> C[Guardrail]
C -->|invalid| X[Reject]
C --> D[Parser]
D -->|ambiguous| H2[HITL Clarify]
H2 --> D
D --> E[Router]
E --> F[RAG + Memory]
F --> G[Solver]
G --> V[Verifier]
V -->|retry| G
V -->|low conf| H3[HITL Verify]
V --> K[Explainer]
H3 --> K
K --> M[Save to Memory]
```
## Tech Stack
| Component | Technology |
|---|---|
| UI | Gradio |
| Agent Orchestration | LangGraph |
| Vector Store | FAISS |
| Embeddings | sentence-transformers/all-MiniLM-L6-v2 |
| OCR | EasyOCR |
| ASR | Whisper |
| Math Computation | SymPy (sandboxed) |
| LLM | Configurable (any OpenAI-compatible API) |
| RAG Retrieval | Hybrid: FAISS + BM25 (reciprocal rank fusion) |
| Web Search | DuckDuckGo |
| Deployment | HuggingFace Spaces |
## Agents (7+)
| # | Agent | Role | Tools |
|---|-------|------|-------|
| 1 | **Extractor** | OCR/ASR/text input processing | EasyOCR, Whisper |
| 2 | **Guardrail** | Filters off-topic, injection attempts | LLM |
| 3 | **Parser** | Structures problem into JSON | LLM |
| 4 | **Router** | Topic classification + strategy + tool selection | LLM |
| 5 | **Solver** | Solves via RAG + Memory + Web Search | SymPy, DuckDuckGo, LLM |
| 6 | **Verifier** | Correctness + confidence check, triggers retries | SymPy, LLM |
| 7 | **Explainer** | Student-friendly explanation + diagram generation | Matplotlib, LLM |
| 8 | **Memory Saver** | Stores problem-solution pairs for reuse | JSONL + embeddings |
### HITL (Human-in-the-Loop) Interrupt Points
- **After extraction**: Low OCR/ASR confidence โ†’ user reviews text
- **After parsing**: Ambiguous input โ†’ user clarifies
- **After verification**: Low confidence or incorrect โ†’ user decides
### Agent Communication
All agents share state via LangGraph's `MathMentorState` TypedDict. The graph uses conditional edges for branching (guardrail pass/fail, verification retry/approve, HITL interrupts) and `MemorySaver` checkpointer for pause/resume.
## Setup
```bash
# Clone
git clone https://github.com/Amit-kr26/Multimodal_Math_Mentor
cd Multimodal_Math_Mentor
# Install dependencies
poetry install
# Configure LLM
cp .env.example .env
# Edit .env with your LLM settings
# Build RAG index
poetry run python rag/indexer.py
# Run the app
poetry run python app.py
```
## Configuration
Set these in `.env` or via the UI settings panel:
```
LLM_BASE_URL=http://localhost:11434/v1 # Ollama, OpenAI, Together, etc.
LLM_MODEL=llama3 # Any model name
LLM_API_KEY=not-needed # API key if required
```
## Features
- **Multimodal Input**: Text, Image (EasyOCR), Audio (Whisper)
- **Human-in-the-Loop**: 3 interrupt points (extraction, parsing, verification)
- **Memory & Self-Learning**: Stores problems in JSONL, retrieves similar via cosine similarity, learns from user feedback
- **Hybrid RAG**: 20 knowledge base documents, FAISS vector search + BM25 keyword search with reciprocal rank fusion
- **Math Computation**: SymPy sandbox (subprocess isolation) for reliable arithmetic
- **Web Search**: DuckDuckGo integration when router determines it's needed
- **Streaming UI**: Real-time pipeline progress bar with per-agent status
- **Multi-turn Chat**: Follow-up questions with full conversation context
- **Diagrams**: LLM-driven expression extraction โ†’ matplotlib plots
- **Solver Retries**: Verifier feedback passed back to solver for self-correction
- **Guardrail**: Prompt injection detection including OCR-injected attacks
- **Evaluation Suite**: 25 curated test problems with batch runner and markdown reports
## Evaluation
```bash
poetry run python eval/run_eval.py
```
Runs 25 problems across 4 topics (algebra, probability, calculus, linear algebra) with 3 difficulty levels. Generates:
- Console summary with per-topic accuracy
- JSON report with full results
- Markdown report table
## Project Structure
```
Multimodal_Math_Mentor/
โ”œโ”€โ”€ app.py # Gradio UI + event handlers
โ”œโ”€โ”€ config.py # Settings, env loading, log suppression
โ”œโ”€โ”€ agents/
โ”‚ โ”œโ”€โ”€ graph.py # LangGraph state machine (10 nodes, 5 conditional edges)
โ”‚ โ”œโ”€โ”€ state.py # MathMentorState TypedDict
โ”‚ โ”œโ”€โ”€ guardrail_agent.py # Input validation + injection detection
โ”‚ โ”œโ”€โ”€ parser_agent.py # Structured problem extraction
โ”‚ โ”œโ”€โ”€ router_agent.py # Topic/strategy/tool classification
โ”‚ โ”œโ”€โ”€ solver_agent.py # RAG + Memory + Web + SymPy solving
โ”‚ โ”œโ”€โ”€ verifier_agent.py # Correctness verification
โ”‚ โ””โ”€โ”€ explainer_agent.py # Step-by-step explanation + diagrams
โ”œโ”€โ”€ input_handlers/
โ”‚ โ”œโ”€โ”€ text_handler.py # Text passthrough
โ”‚ โ”œโ”€โ”€ image_handler.py # EasyOCR (lazy loaded)
โ”‚ โ””โ”€โ”€ audio_handler.py # Whisper (lazy loaded)
โ”œโ”€โ”€ rag/
โ”‚ โ”œโ”€โ”€ indexer.py # FAISS index builder
โ”‚ โ”œโ”€โ”€ retriever.py # Hybrid FAISS+BM25 retrieval
โ”‚ โ””โ”€โ”€ knowledge_base/ # 20 math topic documents
โ”œโ”€โ”€ memory/
โ”‚ โ”œโ”€โ”€ store.py # JSONL read/write/feedback
โ”‚ โ””โ”€โ”€ retriever.py # Cosine similarity search (cached)
โ”œโ”€โ”€ tools/
โ”‚ โ”œโ”€โ”€ calculator.py # SymPy sandbox (subprocess)
โ”‚ โ”œโ”€โ”€ web_search.py # DuckDuckGo search
โ”‚ โ””โ”€โ”€ plotter.py # Matplotlib function plotter
โ”œโ”€โ”€ llm/
โ”‚ โ””โ”€โ”€ client.py # OpenAI-compatible LLM factory
โ”œโ”€โ”€ ui/
โ”‚ โ””โ”€โ”€ callbacks.py # Pipeline orchestration + settings
โ”œโ”€โ”€ eval/
โ”‚ โ”œโ”€โ”€ test_problems.json # 25 curated problems
โ”‚ โ””โ”€โ”€ run_eval.py # Batch evaluation + reporting
โ””โ”€โ”€ requirements.txt # HF Spaces dependencies
```