--- title: Multimodal Math Mentor emoji: 🧮 colorFrom: blue colorTo: purple sdk: gradio sdk_version: "5.23.0" app_file: app.py pinned: false --- # Multimodal Math Mentor AI-powered JEE Math Problem Solver with RAG, Multi-Agent System, HITL, and Memory. **Live Demo:** [huggingface.co/spaces/Amit-kr26/Multimodal_Math_Mentor](https://huggingface.co/spaces/Amit-kr26/Multimodal_Math_Mentor) ## Architecture ```mermaid flowchart LR A[Input] --> B[Extractor] B -->|low conf| H1[HITL Review] H1 --> C B --> C[Guardrail] C -->|invalid| X[Reject] C --> D[Parser] D -->|ambiguous| H2[HITL Clarify] H2 --> D D --> E[Router] E --> F[RAG + Memory] F --> G[Solver] G --> V[Verifier] V -->|retry| G V -->|low conf| H3[HITL Verify] V --> K[Explainer] H3 --> K K --> M[Save to Memory] ``` ## Tech Stack | Component | Technology | |---|---| | UI | Gradio | | Agent Orchestration | LangGraph | | Vector Store | FAISS | | Embeddings | sentence-transformers/all-MiniLM-L6-v2 | | OCR | EasyOCR | | ASR | Whisper | | Math Computation | SymPy (sandboxed) | | LLM | Configurable (any OpenAI-compatible API) | | RAG Retrieval | Hybrid: FAISS + BM25 (reciprocal rank fusion) | | Web Search | DuckDuckGo | | Deployment | HuggingFace Spaces | ## Agents (7+) | # | Agent | Role | Tools | |---|-------|------|-------| | 1 | **Extractor** | OCR/ASR/text input processing | EasyOCR, Whisper | | 2 | **Guardrail** | Filters off-topic, injection attempts | LLM | | 3 | **Parser** | Structures problem into JSON | LLM | | 4 | **Router** | Topic classification + strategy + tool selection | LLM | | 5 | **Solver** | Solves via RAG + Memory + Web Search | SymPy, DuckDuckGo, LLM | | 6 | **Verifier** | Correctness + confidence check, triggers retries | SymPy, LLM | | 7 | **Explainer** | Student-friendly explanation + diagram generation | Matplotlib, LLM | | 8 | **Memory Saver** | Stores problem-solution pairs for reuse | JSONL + embeddings | ### HITL (Human-in-the-Loop) Interrupt Points - **After extraction**: Low OCR/ASR confidence → user reviews text - **After parsing**: Ambiguous input → user clarifies - **After verification**: Low confidence or incorrect → user decides ### Agent Communication All agents share state via LangGraph's `MathMentorState` TypedDict. The graph uses conditional edges for branching (guardrail pass/fail, verification retry/approve, HITL interrupts) and `MemorySaver` checkpointer for pause/resume. ## Setup ```bash # Clone git clone https://github.com/Amit-kr26/Multimodal_Math_Mentor cd Multimodal_Math_Mentor # Install dependencies poetry install # Configure LLM cp .env.example .env # Edit .env with your LLM settings # Build RAG index poetry run python rag/indexer.py # Run the app poetry run python app.py ``` ## Configuration Set these in `.env` or via the UI settings panel: ``` LLM_BASE_URL=http://localhost:11434/v1 # Ollama, OpenAI, Together, etc. LLM_MODEL=llama3 # Any model name LLM_API_KEY=not-needed # API key if required ``` ## Features - **Multimodal Input**: Text, Image (EasyOCR), Audio (Whisper) - **Human-in-the-Loop**: 3 interrupt points (extraction, parsing, verification) - **Memory & Self-Learning**: Stores problems in JSONL, retrieves similar via cosine similarity, learns from user feedback - **Hybrid RAG**: 20 knowledge base documents, FAISS vector search + BM25 keyword search with reciprocal rank fusion - **Math Computation**: SymPy sandbox (subprocess isolation) for reliable arithmetic - **Web Search**: DuckDuckGo integration when router determines it's needed - **Streaming UI**: Real-time pipeline progress bar with per-agent status - **Multi-turn Chat**: Follow-up questions with full conversation context - **Diagrams**: LLM-driven expression extraction → matplotlib plots - **Solver Retries**: Verifier feedback passed back to solver for self-correction - **Guardrail**: Prompt injection detection including OCR-injected attacks - **Evaluation Suite**: 25 curated test problems with batch runner and markdown reports ## Evaluation ```bash poetry run python eval/run_eval.py ``` Runs 25 problems across 4 topics (algebra, probability, calculus, linear algebra) with 3 difficulty levels. Generates: - Console summary with per-topic accuracy - JSON report with full results - Markdown report table ## Project Structure ``` Multimodal_Math_Mentor/ ├── app.py # Gradio UI + event handlers ├── config.py # Settings, env loading, log suppression ├── agents/ │ ├── graph.py # LangGraph state machine (10 nodes, 5 conditional edges) │ ├── state.py # MathMentorState TypedDict │ ├── guardrail_agent.py # Input validation + injection detection │ ├── parser_agent.py # Structured problem extraction │ ├── router_agent.py # Topic/strategy/tool classification │ ├── solver_agent.py # RAG + Memory + Web + SymPy solving │ ├── verifier_agent.py # Correctness verification │ └── explainer_agent.py # Step-by-step explanation + diagrams ├── input_handlers/ │ ├── text_handler.py # Text passthrough │ ├── image_handler.py # EasyOCR (lazy loaded) │ └── audio_handler.py # Whisper (lazy loaded) ├── rag/ │ ├── indexer.py # FAISS index builder │ ├── retriever.py # Hybrid FAISS+BM25 retrieval │ └── knowledge_base/ # 20 math topic documents ├── memory/ │ ├── store.py # JSONL read/write/feedback │ └── retriever.py # Cosine similarity search (cached) ├── tools/ │ ├── calculator.py # SymPy sandbox (subprocess) │ ├── web_search.py # DuckDuckGo search │ └── plotter.py # Matplotlib function plotter ├── llm/ │ └── client.py # OpenAI-compatible LLM factory ├── ui/ │ └── callbacks.py # Pipeline orchestration + settings ├── eval/ │ ├── test_problems.json # 25 curated problems │ └── run_eval.py # Batch evaluation + reporting └── requirements.txt # HF Spaces dependencies ```