Spaces:

Amit-kr26
/

Multimodal_Math_Mentor

Sleeping

App Files Files Community

Multimodal_Math_Mentor / README.md

Amit-kr26

Initial commit: Multimodal Math Mentor

3c25c17 about 1 month ago

preview code

raw

history blame contribute delete

6.38 kB

	---
	title: Multimodal Math Mentor
	emoji: 🧮
	colorFrom: blue
	colorTo: purple
	sdk: gradio
	sdk_version: "5.23.0"
	app_file: app.py
	pinned: false
	---

	# Multimodal Math Mentor

	AI-powered JEE Math Problem Solver with RAG, Multi-Agent System, HITL, and Memory.

	Live Demo: [huggingface.co/spaces/Amit-kr26/Multimodal_Math_Mentor](https://huggingface.co/spaces/Amit-kr26/Multimodal_Math_Mentor)

	## Architecture

	```mermaid
	flowchart LR
	A[Input] --> B[Extractor]
	B -->\|low conf\| H1[HITL Review]
	H1 --> C
	B --> C[Guardrail]
	C -->\|invalid\| X[Reject]
	C --> D[Parser]
	D -->\|ambiguous\| H2[HITL Clarify]
	H2 --> D
	D --> E[Router]
	E --> F[RAG + Memory]
	F --> G[Solver]
	G --> V[Verifier]
	V -->\|retry\| G
	V -->\|low conf\| H3[HITL Verify]
	V --> K[Explainer]
	H3 --> K
	K --> M[Save to Memory]
	```

	## Tech Stack

	\| Component \| Technology \|
	\|---\|---\|
	\| UI \| Gradio \|
	\| Agent Orchestration \| LangGraph \|
	\| Vector Store \| FAISS \|
	\| Embeddings \| sentence-transformers/all-MiniLM-L6-v2 \|
	\| OCR \| EasyOCR \|
	\| ASR \| Whisper \|
	\| Math Computation \| SymPy (sandboxed) \|
	\| LLM \| Configurable (any OpenAI-compatible API) \|
	\| RAG Retrieval \| Hybrid: FAISS + BM25 (reciprocal rank fusion) \|
	\| Web Search \| DuckDuckGo \|
	\| Deployment \| HuggingFace Spaces \|

	## Agents (7+)

	\| # \| Agent \| Role \| Tools \|
	\|---\|-------\|------\|-------\|
	\| 1 \| Extractor \| OCR/ASR/text input processing \| EasyOCR, Whisper \|
	\| 2 \| Guardrail \| Filters off-topic, injection attempts \| LLM \|
	\| 3 \| Parser \| Structures problem into JSON \| LLM \|
	\| 4 \| Router \| Topic classification + strategy + tool selection \| LLM \|
	\| 5 \| Solver \| Solves via RAG + Memory + Web Search \| SymPy, DuckDuckGo, LLM \|
	\| 6 \| Verifier \| Correctness + confidence check, triggers retries \| SymPy, LLM \|
	\| 7 \| Explainer \| Student-friendly explanation + diagram generation \| Matplotlib, LLM \|
	\| 8 \| Memory Saver \| Stores problem-solution pairs for reuse \| JSONL + embeddings \|

	### HITL (Human-in-the-Loop) Interrupt Points
	- After extraction: Low OCR/ASR confidence → user reviews text
	- After parsing: Ambiguous input → user clarifies
	- After verification: Low confidence or incorrect → user decides

	### Agent Communication
	All agents share state via LangGraph's `MathMentorState` TypedDict. The graph uses conditional edges for branching (guardrail pass/fail, verification retry/approve, HITL interrupts) and `MemorySaver` checkpointer for pause/resume.

	## Setup

	```bash
	# Clone
	git clone https://github.com/Amit-kr26/Multimodal_Math_Mentor
	cd Multimodal_Math_Mentor

	# Install dependencies
	poetry install

	# Configure LLM
	cp .env.example .env
	# Edit .env with your LLM settings

	# Build RAG index
	poetry run python rag/indexer.py

	# Run the app
	poetry run python app.py
	```

	## Configuration

	Set these in `.env` or via the UI settings panel:

	```
	LLM_BASE_URL=http://localhost:11434/v1 # Ollama, OpenAI, Together, etc.
	LLM_MODEL=llama3 # Any model name
	LLM_API_KEY=not-needed # API key if required
	```

	## Features

	- Multimodal Input: Text, Image (EasyOCR), Audio (Whisper)
	- Human-in-the-Loop: 3 interrupt points (extraction, parsing, verification)
	- Memory & Self-Learning: Stores problems in JSONL, retrieves similar via cosine similarity, learns from user feedback
	- Hybrid RAG: 20 knowledge base documents, FAISS vector search + BM25 keyword search with reciprocal rank fusion
	- Math Computation: SymPy sandbox (subprocess isolation) for reliable arithmetic
	- Web Search: DuckDuckGo integration when router determines it's needed
	- Streaming UI: Real-time pipeline progress bar with per-agent status
	- Multi-turn Chat: Follow-up questions with full conversation context
	- Diagrams: LLM-driven expression extraction → matplotlib plots
	- Solver Retries: Verifier feedback passed back to solver for self-correction
	- Guardrail: Prompt injection detection including OCR-injected attacks
	- Evaluation Suite: 25 curated test problems with batch runner and markdown reports

	## Evaluation

	```bash
	poetry run python eval/run_eval.py
	```

	Runs 25 problems across 4 topics (algebra, probability, calculus, linear algebra) with 3 difficulty levels. Generates:
	- Console summary with per-topic accuracy
	- JSON report with full results
	- Markdown report table

	## Project Structure

	```
	Multimodal_Math_Mentor/
	├── app.py # Gradio UI + event handlers
	├── config.py # Settings, env loading, log suppression
	├── agents/
	│ ├── graph.py # LangGraph state machine (10 nodes, 5 conditional edges)
	│ ├── state.py # MathMentorState TypedDict
	│ ├── guardrail_agent.py # Input validation + injection detection
	│ ├── parser_agent.py # Structured problem extraction
	│ ├── router_agent.py # Topic/strategy/tool classification
	│ ├── solver_agent.py # RAG + Memory + Web + SymPy solving
	│ ├── verifier_agent.py # Correctness verification
	│ └── explainer_agent.py # Step-by-step explanation + diagrams
	├── input_handlers/
	│ ├── text_handler.py # Text passthrough
	│ ├── image_handler.py # EasyOCR (lazy loaded)
	│ └── audio_handler.py # Whisper (lazy loaded)
	├── rag/
	│ ├── indexer.py # FAISS index builder
	│ ├── retriever.py # Hybrid FAISS+BM25 retrieval
	│ └── knowledge_base/ # 20 math topic documents
	├── memory/
	│ ├── store.py # JSONL read/write/feedback
	│ └── retriever.py # Cosine similarity search (cached)
	├── tools/
	│ ├── calculator.py # SymPy sandbox (subprocess)
	│ ├── web_search.py # DuckDuckGo search
	│ └── plotter.py # Matplotlib function plotter
	├── llm/
	│ └── client.py # OpenAI-compatible LLM factory
	├── ui/
	│ └── callbacks.py # Pipeline orchestration + settings
	├── eval/
	│ ├── test_problems.json # 25 curated problems
	│ └── run_eval.py # Batch evaluation + reporting
	└── requirements.txt # HF Spaces dependencies
	```