Spaces:
Sleeping
Sleeping
File size: 6,378 Bytes
3c25c17 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 | ---
title: Multimodal Math Mentor
emoji: ๐งฎ
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: "5.23.0"
app_file: app.py
pinned: false
---
# Multimodal Math Mentor
AI-powered JEE Math Problem Solver with RAG, Multi-Agent System, HITL, and Memory.
**Live Demo:** [huggingface.co/spaces/Amit-kr26/Multimodal_Math_Mentor](https://huggingface.co/spaces/Amit-kr26/Multimodal_Math_Mentor)
## Architecture
```mermaid
flowchart LR
A[Input] --> B[Extractor]
B -->|low conf| H1[HITL Review]
H1 --> C
B --> C[Guardrail]
C -->|invalid| X[Reject]
C --> D[Parser]
D -->|ambiguous| H2[HITL Clarify]
H2 --> D
D --> E[Router]
E --> F[RAG + Memory]
F --> G[Solver]
G --> V[Verifier]
V -->|retry| G
V -->|low conf| H3[HITL Verify]
V --> K[Explainer]
H3 --> K
K --> M[Save to Memory]
```
## Tech Stack
| Component | Technology |
|---|---|
| UI | Gradio |
| Agent Orchestration | LangGraph |
| Vector Store | FAISS |
| Embeddings | sentence-transformers/all-MiniLM-L6-v2 |
| OCR | EasyOCR |
| ASR | Whisper |
| Math Computation | SymPy (sandboxed) |
| LLM | Configurable (any OpenAI-compatible API) |
| RAG Retrieval | Hybrid: FAISS + BM25 (reciprocal rank fusion) |
| Web Search | DuckDuckGo |
| Deployment | HuggingFace Spaces |
## Agents (7+)
| # | Agent | Role | Tools |
|---|-------|------|-------|
| 1 | **Extractor** | OCR/ASR/text input processing | EasyOCR, Whisper |
| 2 | **Guardrail** | Filters off-topic, injection attempts | LLM |
| 3 | **Parser** | Structures problem into JSON | LLM |
| 4 | **Router** | Topic classification + strategy + tool selection | LLM |
| 5 | **Solver** | Solves via RAG + Memory + Web Search | SymPy, DuckDuckGo, LLM |
| 6 | **Verifier** | Correctness + confidence check, triggers retries | SymPy, LLM |
| 7 | **Explainer** | Student-friendly explanation + diagram generation | Matplotlib, LLM |
| 8 | **Memory Saver** | Stores problem-solution pairs for reuse | JSONL + embeddings |
### HITL (Human-in-the-Loop) Interrupt Points
- **After extraction**: Low OCR/ASR confidence โ user reviews text
- **After parsing**: Ambiguous input โ user clarifies
- **After verification**: Low confidence or incorrect โ user decides
### Agent Communication
All agents share state via LangGraph's `MathMentorState` TypedDict. The graph uses conditional edges for branching (guardrail pass/fail, verification retry/approve, HITL interrupts) and `MemorySaver` checkpointer for pause/resume.
## Setup
```bash
# Clone
git clone https://github.com/Amit-kr26/Multimodal_Math_Mentor
cd Multimodal_Math_Mentor
# Install dependencies
poetry install
# Configure LLM
cp .env.example .env
# Edit .env with your LLM settings
# Build RAG index
poetry run python rag/indexer.py
# Run the app
poetry run python app.py
```
## Configuration
Set these in `.env` or via the UI settings panel:
```
LLM_BASE_URL=http://localhost:11434/v1 # Ollama, OpenAI, Together, etc.
LLM_MODEL=llama3 # Any model name
LLM_API_KEY=not-needed # API key if required
```
## Features
- **Multimodal Input**: Text, Image (EasyOCR), Audio (Whisper)
- **Human-in-the-Loop**: 3 interrupt points (extraction, parsing, verification)
- **Memory & Self-Learning**: Stores problems in JSONL, retrieves similar via cosine similarity, learns from user feedback
- **Hybrid RAG**: 20 knowledge base documents, FAISS vector search + BM25 keyword search with reciprocal rank fusion
- **Math Computation**: SymPy sandbox (subprocess isolation) for reliable arithmetic
- **Web Search**: DuckDuckGo integration when router determines it's needed
- **Streaming UI**: Real-time pipeline progress bar with per-agent status
- **Multi-turn Chat**: Follow-up questions with full conversation context
- **Diagrams**: LLM-driven expression extraction โ matplotlib plots
- **Solver Retries**: Verifier feedback passed back to solver for self-correction
- **Guardrail**: Prompt injection detection including OCR-injected attacks
- **Evaluation Suite**: 25 curated test problems with batch runner and markdown reports
## Evaluation
```bash
poetry run python eval/run_eval.py
```
Runs 25 problems across 4 topics (algebra, probability, calculus, linear algebra) with 3 difficulty levels. Generates:
- Console summary with per-topic accuracy
- JSON report with full results
- Markdown report table
## Project Structure
```
Multimodal_Math_Mentor/
โโโ app.py # Gradio UI + event handlers
โโโ config.py # Settings, env loading, log suppression
โโโ agents/
โ โโโ graph.py # LangGraph state machine (10 nodes, 5 conditional edges)
โ โโโ state.py # MathMentorState TypedDict
โ โโโ guardrail_agent.py # Input validation + injection detection
โ โโโ parser_agent.py # Structured problem extraction
โ โโโ router_agent.py # Topic/strategy/tool classification
โ โโโ solver_agent.py # RAG + Memory + Web + SymPy solving
โ โโโ verifier_agent.py # Correctness verification
โ โโโ explainer_agent.py # Step-by-step explanation + diagrams
โโโ input_handlers/
โ โโโ text_handler.py # Text passthrough
โ โโโ image_handler.py # EasyOCR (lazy loaded)
โ โโโ audio_handler.py # Whisper (lazy loaded)
โโโ rag/
โ โโโ indexer.py # FAISS index builder
โ โโโ retriever.py # Hybrid FAISS+BM25 retrieval
โ โโโ knowledge_base/ # 20 math topic documents
โโโ memory/
โ โโโ store.py # JSONL read/write/feedback
โ โโโ retriever.py # Cosine similarity search (cached)
โโโ tools/
โ โโโ calculator.py # SymPy sandbox (subprocess)
โ โโโ web_search.py # DuckDuckGo search
โ โโโ plotter.py # Matplotlib function plotter
โโโ llm/
โ โโโ client.py # OpenAI-compatible LLM factory
โโโ ui/
โ โโโ callbacks.py # Pipeline orchestration + settings
โโโ eval/
โ โโโ test_problems.json # 25 curated problems
โ โโโ run_eval.py # Batch evaluation + reporting
โโโ requirements.txt # HF Spaces dependencies
```
|