File size: 6,378 Bytes
3c25c17
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
---
title: Multimodal Math Mentor
emoji: ๐Ÿงฎ
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: "5.23.0"
app_file: app.py
pinned: false
---

# Multimodal Math Mentor

AI-powered JEE Math Problem Solver with RAG, Multi-Agent System, HITL, and Memory.

**Live Demo:** [huggingface.co/spaces/Amit-kr26/Multimodal_Math_Mentor](https://huggingface.co/spaces/Amit-kr26/Multimodal_Math_Mentor)

## Architecture

```mermaid
flowchart LR
    A[Input] --> B[Extractor]
    B -->|low conf| H1[HITL Review]
    H1 --> C
    B --> C[Guardrail]
    C -->|invalid| X[Reject]
    C --> D[Parser]
    D -->|ambiguous| H2[HITL Clarify]
    H2 --> D
    D --> E[Router]
    E --> F[RAG + Memory]
    F --> G[Solver]
    G --> V[Verifier]
    V -->|retry| G
    V -->|low conf| H3[HITL Verify]
    V --> K[Explainer]
    H3 --> K
    K --> M[Save to Memory]
```

## Tech Stack

| Component | Technology |
|---|---|
| UI | Gradio |
| Agent Orchestration | LangGraph |
| Vector Store | FAISS |
| Embeddings | sentence-transformers/all-MiniLM-L6-v2 |
| OCR | EasyOCR |
| ASR | Whisper |
| Math Computation | SymPy (sandboxed) |
| LLM | Configurable (any OpenAI-compatible API) |
| RAG Retrieval | Hybrid: FAISS + BM25 (reciprocal rank fusion) |
| Web Search | DuckDuckGo |
| Deployment | HuggingFace Spaces |

## Agents (7+)

| # | Agent | Role | Tools |
|---|-------|------|-------|
| 1 | **Extractor** | OCR/ASR/text input processing | EasyOCR, Whisper |
| 2 | **Guardrail** | Filters off-topic, injection attempts | LLM |
| 3 | **Parser** | Structures problem into JSON | LLM |
| 4 | **Router** | Topic classification + strategy + tool selection | LLM |
| 5 | **Solver** | Solves via RAG + Memory + Web Search | SymPy, DuckDuckGo, LLM |
| 6 | **Verifier** | Correctness + confidence check, triggers retries | SymPy, LLM |
| 7 | **Explainer** | Student-friendly explanation + diagram generation | Matplotlib, LLM |
| 8 | **Memory Saver** | Stores problem-solution pairs for reuse | JSONL + embeddings |

### HITL (Human-in-the-Loop) Interrupt Points
- **After extraction**: Low OCR/ASR confidence โ†’ user reviews text
- **After parsing**: Ambiguous input โ†’ user clarifies
- **After verification**: Low confidence or incorrect โ†’ user decides

### Agent Communication
All agents share state via LangGraph's `MathMentorState` TypedDict. The graph uses conditional edges for branching (guardrail pass/fail, verification retry/approve, HITL interrupts) and `MemorySaver` checkpointer for pause/resume.

## Setup

```bash
# Clone
git clone https://github.com/Amit-kr26/Multimodal_Math_Mentor
cd Multimodal_Math_Mentor

# Install dependencies
poetry install

# Configure LLM
cp .env.example .env
# Edit .env with your LLM settings

# Build RAG index
poetry run python rag/indexer.py

# Run the app
poetry run python app.py
```

## Configuration

Set these in `.env` or via the UI settings panel:

```
LLM_BASE_URL=http://localhost:11434/v1   # Ollama, OpenAI, Together, etc.
LLM_MODEL=llama3                          # Any model name
LLM_API_KEY=not-needed                    # API key if required
```

## Features

- **Multimodal Input**: Text, Image (EasyOCR), Audio (Whisper)
- **Human-in-the-Loop**: 3 interrupt points (extraction, parsing, verification)
- **Memory & Self-Learning**: Stores problems in JSONL, retrieves similar via cosine similarity, learns from user feedback
- **Hybrid RAG**: 20 knowledge base documents, FAISS vector search + BM25 keyword search with reciprocal rank fusion
- **Math Computation**: SymPy sandbox (subprocess isolation) for reliable arithmetic
- **Web Search**: DuckDuckGo integration when router determines it's needed
- **Streaming UI**: Real-time pipeline progress bar with per-agent status
- **Multi-turn Chat**: Follow-up questions with full conversation context
- **Diagrams**: LLM-driven expression extraction โ†’ matplotlib plots
- **Solver Retries**: Verifier feedback passed back to solver for self-correction
- **Guardrail**: Prompt injection detection including OCR-injected attacks
- **Evaluation Suite**: 25 curated test problems with batch runner and markdown reports

## Evaluation

```bash
poetry run python eval/run_eval.py
```

Runs 25 problems across 4 topics (algebra, probability, calculus, linear algebra) with 3 difficulty levels. Generates:
- Console summary with per-topic accuracy
- JSON report with full results
- Markdown report table

## Project Structure

```
Multimodal_Math_Mentor/
โ”œโ”€โ”€ app.py                    # Gradio UI + event handlers
โ”œโ”€โ”€ config.py                 # Settings, env loading, log suppression
โ”œโ”€โ”€ agents/
โ”‚   โ”œโ”€โ”€ graph.py              # LangGraph state machine (10 nodes, 5 conditional edges)
โ”‚   โ”œโ”€โ”€ state.py              # MathMentorState TypedDict
โ”‚   โ”œโ”€โ”€ guardrail_agent.py    # Input validation + injection detection
โ”‚   โ”œโ”€โ”€ parser_agent.py       # Structured problem extraction
โ”‚   โ”œโ”€โ”€ router_agent.py       # Topic/strategy/tool classification
โ”‚   โ”œโ”€โ”€ solver_agent.py       # RAG + Memory + Web + SymPy solving
โ”‚   โ”œโ”€โ”€ verifier_agent.py     # Correctness verification
โ”‚   โ””โ”€โ”€ explainer_agent.py    # Step-by-step explanation + diagrams
โ”œโ”€โ”€ input_handlers/
โ”‚   โ”œโ”€โ”€ text_handler.py       # Text passthrough
โ”‚   โ”œโ”€โ”€ image_handler.py      # EasyOCR (lazy loaded)
โ”‚   โ””โ”€โ”€ audio_handler.py      # Whisper (lazy loaded)
โ”œโ”€โ”€ rag/
โ”‚   โ”œโ”€โ”€ indexer.py             # FAISS index builder
โ”‚   โ”œโ”€โ”€ retriever.py           # Hybrid FAISS+BM25 retrieval
โ”‚   โ””โ”€โ”€ knowledge_base/       # 20 math topic documents
โ”œโ”€โ”€ memory/
โ”‚   โ”œโ”€โ”€ store.py               # JSONL read/write/feedback
โ”‚   โ””โ”€โ”€ retriever.py           # Cosine similarity search (cached)
โ”œโ”€โ”€ tools/
โ”‚   โ”œโ”€โ”€ calculator.py          # SymPy sandbox (subprocess)
โ”‚   โ”œโ”€โ”€ web_search.py          # DuckDuckGo search
โ”‚   โ””โ”€โ”€ plotter.py             # Matplotlib function plotter
โ”œโ”€โ”€ llm/
โ”‚   โ””โ”€โ”€ client.py              # OpenAI-compatible LLM factory
โ”œโ”€โ”€ ui/
โ”‚   โ””โ”€โ”€ callbacks.py           # Pipeline orchestration + settings
โ”œโ”€โ”€ eval/
โ”‚   โ”œโ”€โ”€ test_problems.json     # 25 curated problems
โ”‚   โ””โ”€โ”€ run_eval.py            # Batch evaluation + reporting
โ””โ”€โ”€ requirements.txt           # HF Spaces dependencies
```