Spaces:

Amit-kr26
/

Multimodal_Math_Mentor

Sleeping

App Files Files Community

Multimodal_Math_Mentor / README.md

Amit-kr26

Initial commit: Multimodal Math Mentor

3c25c17 29 days ago

preview code

raw

history blame contribute delete

6.38 kB

A newer version of the Gradio SDK is available: 6.11.0

Upgrade

metadata

title: Multimodal Math Mentor
emoji: 🧮
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.23.0
app_file: app.py
pinned: false

Multimodal Math Mentor

AI-powered JEE Math Problem Solver with RAG, Multi-Agent System, HITL, and Memory.

Live Demo: huggingface.co/spaces/Amit-kr26/Multimodal_Math_Mentor

Architecture

flowchart LR
    A[Input] --> B[Extractor]
    B -->|low conf| H1[HITL Review]
    H1 --> C
    B --> C[Guardrail]
    C -->|invalid| X[Reject]
    C --> D[Parser]
    D -->|ambiguous| H2[HITL Clarify]
    H2 --> D
    D --> E[Router]
    E --> F[RAG + Memory]
    F --> G[Solver]
    G --> V[Verifier]
    V -->|retry| G
    V -->|low conf| H3[HITL Verify]
    V --> K[Explainer]
    H3 --> K
    K --> M[Save to Memory]

Tech Stack

Component	Technology
UI	Gradio
Agent Orchestration	LangGraph
Vector Store	FAISS
Embeddings	sentence-transformers/all-MiniLM-L6-v2
OCR	EasyOCR
ASR	Whisper
Math Computation	SymPy (sandboxed)
LLM	Configurable (any OpenAI-compatible API)
RAG Retrieval	Hybrid: FAISS + BM25 (reciprocal rank fusion)
Web Search	DuckDuckGo
Deployment	HuggingFace Spaces

Agents (7+)

#	Agent	Role	Tools
1	Extractor	OCR/ASR/text input processing	EasyOCR, Whisper
2	Guardrail	Filters off-topic, injection attempts	LLM
3	Parser	Structures problem into JSON	LLM
4	Router	Topic classification + strategy + tool selection	LLM
5	Solver	Solves via RAG + Memory + Web Search	SymPy, DuckDuckGo, LLM
6	Verifier	Correctness + confidence check, triggers retries	SymPy, LLM
7	Explainer	Student-friendly explanation + diagram generation	Matplotlib, LLM
8	Memory Saver	Stores problem-solution pairs for reuse	JSONL + embeddings

HITL (Human-in-the-Loop) Interrupt Points

After extraction: Low OCR/ASR confidence → user reviews text
After parsing: Ambiguous input → user clarifies
After verification: Low confidence or incorrect → user decides

Agent Communication

All agents share state via LangGraph's MathMentorState TypedDict. The graph uses conditional edges for branching (guardrail pass/fail, verification retry/approve, HITL interrupts) and MemorySaver checkpointer for pause/resume.

Setup

# Clone
git clone https://github.com/Amit-kr26/Multimodal_Math_Mentor
cd Multimodal_Math_Mentor

# Install dependencies
poetry install

# Configure LLM
cp .env.example .env
# Edit .env with your LLM settings

# Build RAG index
poetry run python rag/indexer.py

# Run the app
poetry run python app.py

Configuration

Set these in .env or via the UI settings panel:

LLM_BASE_URL=http://localhost:11434/v1   # Ollama, OpenAI, Together, etc.
LLM_MODEL=llama3                          # Any model name
LLM_API_KEY=not-needed                    # API key if required

Features

Multimodal Input: Text, Image (EasyOCR), Audio (Whisper)
Human-in-the-Loop: 3 interrupt points (extraction, parsing, verification)
Memory & Self-Learning: Stores problems in JSONL, retrieves similar via cosine similarity, learns from user feedback
Hybrid RAG: 20 knowledge base documents, FAISS vector search + BM25 keyword search with reciprocal rank fusion
Math Computation: SymPy sandbox (subprocess isolation) for reliable arithmetic
Web Search: DuckDuckGo integration when router determines it's needed
Streaming UI: Real-time pipeline progress bar with per-agent status
Multi-turn Chat: Follow-up questions with full conversation context
Diagrams: LLM-driven expression extraction → matplotlib plots
Solver Retries: Verifier feedback passed back to solver for self-correction
Guardrail: Prompt injection detection including OCR-injected attacks
Evaluation Suite: 25 curated test problems with batch runner and markdown reports

Evaluation

poetry run python eval/run_eval.py

Runs 25 problems across 4 topics (algebra, probability, calculus, linear algebra) with 3 difficulty levels. Generates:

Console summary with per-topic accuracy
JSON report with full results
Markdown report table

Project Structure

Multimodal_Math_Mentor/
├── app.py                    # Gradio UI + event handlers
├── config.py                 # Settings, env loading, log suppression
├── agents/
│   ├── graph.py              # LangGraph state machine (10 nodes, 5 conditional edges)
│   ├── state.py              # MathMentorState TypedDict
│   ├── guardrail_agent.py    # Input validation + injection detection
│   ├── parser_agent.py       # Structured problem extraction
│   ├── router_agent.py       # Topic/strategy/tool classification
│   ├── solver_agent.py       # RAG + Memory + Web + SymPy solving
│   ├── verifier_agent.py     # Correctness verification
│   └── explainer_agent.py    # Step-by-step explanation + diagrams
├── input_handlers/
│   ├── text_handler.py       # Text passthrough
│   ├── image_handler.py      # EasyOCR (lazy loaded)
│   └── audio_handler.py      # Whisper (lazy loaded)
├── rag/
│   ├── indexer.py             # FAISS index builder
│   ├── retriever.py           # Hybrid FAISS+BM25 retrieval
│   └── knowledge_base/       # 20 math topic documents
├── memory/
│   ├── store.py               # JSONL read/write/feedback
│   └── retriever.py           # Cosine similarity search (cached)
├── tools/
│   ├── calculator.py          # SymPy sandbox (subprocess)
│   ├── web_search.py          # DuckDuckGo search
│   └── plotter.py             # Matplotlib function plotter
├── llm/
│   └── client.py              # OpenAI-compatible LLM factory
├── ui/
│   └── callbacks.py           # Pipeline orchestration + settings
├── eval/
│   ├── test_problems.json     # 25 curated problems
│   └── run_eval.py            # Batch evaluation + reporting
└── requirements.txt           # HF Spaces dependencies