Spaces:
Sleeping
Sleeping
| title: Multimodal Math Mentor | |
| emoji: ๐งฎ | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: gradio | |
| sdk_version: "5.23.0" | |
| app_file: app.py | |
| pinned: false | |
| # Multimodal Math Mentor | |
| AI-powered JEE Math Problem Solver with RAG, Multi-Agent System, HITL, and Memory. | |
| **Live Demo:** [huggingface.co/spaces/Amit-kr26/Multimodal_Math_Mentor](https://huggingface.co/spaces/Amit-kr26/Multimodal_Math_Mentor) | |
| ## Architecture | |
| ```mermaid | |
| flowchart LR | |
| A[Input] --> B[Extractor] | |
| B -->|low conf| H1[HITL Review] | |
| H1 --> C | |
| B --> C[Guardrail] | |
| C -->|invalid| X[Reject] | |
| C --> D[Parser] | |
| D -->|ambiguous| H2[HITL Clarify] | |
| H2 --> D | |
| D --> E[Router] | |
| E --> F[RAG + Memory] | |
| F --> G[Solver] | |
| G --> V[Verifier] | |
| V -->|retry| G | |
| V -->|low conf| H3[HITL Verify] | |
| V --> K[Explainer] | |
| H3 --> K | |
| K --> M[Save to Memory] | |
| ``` | |
| ## Tech Stack | |
| | Component | Technology | | |
| |---|---| | |
| | UI | Gradio | | |
| | Agent Orchestration | LangGraph | | |
| | Vector Store | FAISS | | |
| | Embeddings | sentence-transformers/all-MiniLM-L6-v2 | | |
| | OCR | EasyOCR | | |
| | ASR | Whisper | | |
| | Math Computation | SymPy (sandboxed) | | |
| | LLM | Configurable (any OpenAI-compatible API) | | |
| | RAG Retrieval | Hybrid: FAISS + BM25 (reciprocal rank fusion) | | |
| | Web Search | DuckDuckGo | | |
| | Deployment | HuggingFace Spaces | | |
| ## Agents (7+) | |
| | # | Agent | Role | Tools | | |
| |---|-------|------|-------| | |
| | 1 | **Extractor** | OCR/ASR/text input processing | EasyOCR, Whisper | | |
| | 2 | **Guardrail** | Filters off-topic, injection attempts | LLM | | |
| | 3 | **Parser** | Structures problem into JSON | LLM | | |
| | 4 | **Router** | Topic classification + strategy + tool selection | LLM | | |
| | 5 | **Solver** | Solves via RAG + Memory + Web Search | SymPy, DuckDuckGo, LLM | | |
| | 6 | **Verifier** | Correctness + confidence check, triggers retries | SymPy, LLM | | |
| | 7 | **Explainer** | Student-friendly explanation + diagram generation | Matplotlib, LLM | | |
| | 8 | **Memory Saver** | Stores problem-solution pairs for reuse | JSONL + embeddings | | |
| ### HITL (Human-in-the-Loop) Interrupt Points | |
| - **After extraction**: Low OCR/ASR confidence โ user reviews text | |
| - **After parsing**: Ambiguous input โ user clarifies | |
| - **After verification**: Low confidence or incorrect โ user decides | |
| ### Agent Communication | |
| All agents share state via LangGraph's `MathMentorState` TypedDict. The graph uses conditional edges for branching (guardrail pass/fail, verification retry/approve, HITL interrupts) and `MemorySaver` checkpointer for pause/resume. | |
| ## Setup | |
| ```bash | |
| # Clone | |
| git clone https://github.com/Amit-kr26/Multimodal_Math_Mentor | |
| cd Multimodal_Math_Mentor | |
| # Install dependencies | |
| poetry install | |
| # Configure LLM | |
| cp .env.example .env | |
| # Edit .env with your LLM settings | |
| # Build RAG index | |
| poetry run python rag/indexer.py | |
| # Run the app | |
| poetry run python app.py | |
| ``` | |
| ## Configuration | |
| Set these in `.env` or via the UI settings panel: | |
| ``` | |
| LLM_BASE_URL=http://localhost:11434/v1 # Ollama, OpenAI, Together, etc. | |
| LLM_MODEL=llama3 # Any model name | |
| LLM_API_KEY=not-needed # API key if required | |
| ``` | |
| ## Features | |
| - **Multimodal Input**: Text, Image (EasyOCR), Audio (Whisper) | |
| - **Human-in-the-Loop**: 3 interrupt points (extraction, parsing, verification) | |
| - **Memory & Self-Learning**: Stores problems in JSONL, retrieves similar via cosine similarity, learns from user feedback | |
| - **Hybrid RAG**: 20 knowledge base documents, FAISS vector search + BM25 keyword search with reciprocal rank fusion | |
| - **Math Computation**: SymPy sandbox (subprocess isolation) for reliable arithmetic | |
| - **Web Search**: DuckDuckGo integration when router determines it's needed | |
| - **Streaming UI**: Real-time pipeline progress bar with per-agent status | |
| - **Multi-turn Chat**: Follow-up questions with full conversation context | |
| - **Diagrams**: LLM-driven expression extraction โ matplotlib plots | |
| - **Solver Retries**: Verifier feedback passed back to solver for self-correction | |
| - **Guardrail**: Prompt injection detection including OCR-injected attacks | |
| - **Evaluation Suite**: 25 curated test problems with batch runner and markdown reports | |
| ## Evaluation | |
| ```bash | |
| poetry run python eval/run_eval.py | |
| ``` | |
| Runs 25 problems across 4 topics (algebra, probability, calculus, linear algebra) with 3 difficulty levels. Generates: | |
| - Console summary with per-topic accuracy | |
| - JSON report with full results | |
| - Markdown report table | |
| ## Project Structure | |
| ``` | |
| Multimodal_Math_Mentor/ | |
| โโโ app.py # Gradio UI + event handlers | |
| โโโ config.py # Settings, env loading, log suppression | |
| โโโ agents/ | |
| โ โโโ graph.py # LangGraph state machine (10 nodes, 5 conditional edges) | |
| โ โโโ state.py # MathMentorState TypedDict | |
| โ โโโ guardrail_agent.py # Input validation + injection detection | |
| โ โโโ parser_agent.py # Structured problem extraction | |
| โ โโโ router_agent.py # Topic/strategy/tool classification | |
| โ โโโ solver_agent.py # RAG + Memory + Web + SymPy solving | |
| โ โโโ verifier_agent.py # Correctness verification | |
| โ โโโ explainer_agent.py # Step-by-step explanation + diagrams | |
| โโโ input_handlers/ | |
| โ โโโ text_handler.py # Text passthrough | |
| โ โโโ image_handler.py # EasyOCR (lazy loaded) | |
| โ โโโ audio_handler.py # Whisper (lazy loaded) | |
| โโโ rag/ | |
| โ โโโ indexer.py # FAISS index builder | |
| โ โโโ retriever.py # Hybrid FAISS+BM25 retrieval | |
| โ โโโ knowledge_base/ # 20 math topic documents | |
| โโโ memory/ | |
| โ โโโ store.py # JSONL read/write/feedback | |
| โ โโโ retriever.py # Cosine similarity search (cached) | |
| โโโ tools/ | |
| โ โโโ calculator.py # SymPy sandbox (subprocess) | |
| โ โโโ web_search.py # DuckDuckGo search | |
| โ โโโ plotter.py # Matplotlib function plotter | |
| โโโ llm/ | |
| โ โโโ client.py # OpenAI-compatible LLM factory | |
| โโโ ui/ | |
| โ โโโ callbacks.py # Pipeline orchestration + settings | |
| โโโ eval/ | |
| โ โโโ test_problems.json # 25 curated problems | |
| โ โโโ run_eval.py # Batch evaluation + reporting | |
| โโโ requirements.txt # HF Spaces dependencies | |
| ``` | |