Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available: 6.11.0
metadata
title: Multimodal Math Mentor
emoji: ๐งฎ
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.23.0
app_file: app.py
pinned: false
Multimodal Math Mentor
AI-powered JEE Math Problem Solver with RAG, Multi-Agent System, HITL, and Memory.
Live Demo: huggingface.co/spaces/Amit-kr26/Multimodal_Math_Mentor
Architecture
flowchart LR
A[Input] --> B[Extractor]
B -->|low conf| H1[HITL Review]
H1 --> C
B --> C[Guardrail]
C -->|invalid| X[Reject]
C --> D[Parser]
D -->|ambiguous| H2[HITL Clarify]
H2 --> D
D --> E[Router]
E --> F[RAG + Memory]
F --> G[Solver]
G --> V[Verifier]
V -->|retry| G
V -->|low conf| H3[HITL Verify]
V --> K[Explainer]
H3 --> K
K --> M[Save to Memory]
Tech Stack
| Component | Technology |
|---|---|
| UI | Gradio |
| Agent Orchestration | LangGraph |
| Vector Store | FAISS |
| Embeddings | sentence-transformers/all-MiniLM-L6-v2 |
| OCR | EasyOCR |
| ASR | Whisper |
| Math Computation | SymPy (sandboxed) |
| LLM | Configurable (any OpenAI-compatible API) |
| RAG Retrieval | Hybrid: FAISS + BM25 (reciprocal rank fusion) |
| Web Search | DuckDuckGo |
| Deployment | HuggingFace Spaces |
Agents (7+)
| # | Agent | Role | Tools |
|---|---|---|---|
| 1 | Extractor | OCR/ASR/text input processing | EasyOCR, Whisper |
| 2 | Guardrail | Filters off-topic, injection attempts | LLM |
| 3 | Parser | Structures problem into JSON | LLM |
| 4 | Router | Topic classification + strategy + tool selection | LLM |
| 5 | Solver | Solves via RAG + Memory + Web Search | SymPy, DuckDuckGo, LLM |
| 6 | Verifier | Correctness + confidence check, triggers retries | SymPy, LLM |
| 7 | Explainer | Student-friendly explanation + diagram generation | Matplotlib, LLM |
| 8 | Memory Saver | Stores problem-solution pairs for reuse | JSONL + embeddings |
HITL (Human-in-the-Loop) Interrupt Points
- After extraction: Low OCR/ASR confidence โ user reviews text
- After parsing: Ambiguous input โ user clarifies
- After verification: Low confidence or incorrect โ user decides
Agent Communication
All agents share state via LangGraph's MathMentorState TypedDict. The graph uses conditional edges for branching (guardrail pass/fail, verification retry/approve, HITL interrupts) and MemorySaver checkpointer for pause/resume.
Setup
# Clone
git clone https://github.com/Amit-kr26/Multimodal_Math_Mentor
cd Multimodal_Math_Mentor
# Install dependencies
poetry install
# Configure LLM
cp .env.example .env
# Edit .env with your LLM settings
# Build RAG index
poetry run python rag/indexer.py
# Run the app
poetry run python app.py
Configuration
Set these in .env or via the UI settings panel:
LLM_BASE_URL=http://localhost:11434/v1 # Ollama, OpenAI, Together, etc.
LLM_MODEL=llama3 # Any model name
LLM_API_KEY=not-needed # API key if required
Features
- Multimodal Input: Text, Image (EasyOCR), Audio (Whisper)
- Human-in-the-Loop: 3 interrupt points (extraction, parsing, verification)
- Memory & Self-Learning: Stores problems in JSONL, retrieves similar via cosine similarity, learns from user feedback
- Hybrid RAG: 20 knowledge base documents, FAISS vector search + BM25 keyword search with reciprocal rank fusion
- Math Computation: SymPy sandbox (subprocess isolation) for reliable arithmetic
- Web Search: DuckDuckGo integration when router determines it's needed
- Streaming UI: Real-time pipeline progress bar with per-agent status
- Multi-turn Chat: Follow-up questions with full conversation context
- Diagrams: LLM-driven expression extraction โ matplotlib plots
- Solver Retries: Verifier feedback passed back to solver for self-correction
- Guardrail: Prompt injection detection including OCR-injected attacks
- Evaluation Suite: 25 curated test problems with batch runner and markdown reports
Evaluation
poetry run python eval/run_eval.py
Runs 25 problems across 4 topics (algebra, probability, calculus, linear algebra) with 3 difficulty levels. Generates:
- Console summary with per-topic accuracy
- JSON report with full results
- Markdown report table
Project Structure
Multimodal_Math_Mentor/
โโโ app.py # Gradio UI + event handlers
โโโ config.py # Settings, env loading, log suppression
โโโ agents/
โ โโโ graph.py # LangGraph state machine (10 nodes, 5 conditional edges)
โ โโโ state.py # MathMentorState TypedDict
โ โโโ guardrail_agent.py # Input validation + injection detection
โ โโโ parser_agent.py # Structured problem extraction
โ โโโ router_agent.py # Topic/strategy/tool classification
โ โโโ solver_agent.py # RAG + Memory + Web + SymPy solving
โ โโโ verifier_agent.py # Correctness verification
โ โโโ explainer_agent.py # Step-by-step explanation + diagrams
โโโ input_handlers/
โ โโโ text_handler.py # Text passthrough
โ โโโ image_handler.py # EasyOCR (lazy loaded)
โ โโโ audio_handler.py # Whisper (lazy loaded)
โโโ rag/
โ โโโ indexer.py # FAISS index builder
โ โโโ retriever.py # Hybrid FAISS+BM25 retrieval
โ โโโ knowledge_base/ # 20 math topic documents
โโโ memory/
โ โโโ store.py # JSONL read/write/feedback
โ โโโ retriever.py # Cosine similarity search (cached)
โโโ tools/
โ โโโ calculator.py # SymPy sandbox (subprocess)
โ โโโ web_search.py # DuckDuckGo search
โ โโโ plotter.py # Matplotlib function plotter
โโโ llm/
โ โโโ client.py # OpenAI-compatible LLM factory
โโโ ui/
โ โโโ callbacks.py # Pipeline orchestration + settings
โโโ eval/
โ โโโ test_problems.json # 25 curated problems
โ โโโ run_eval.py # Batch evaluation + reporting
โโโ requirements.txt # HF Spaces dependencies