# MVMยฒ: MVMยฒ - Multi-Modal Multi-Model Mathematical Reasoning Verification System **VNR VJIET Major Project 2025** **Team:** Brahma Teja, Vinith Kulkarni, Varshith Dharmaj V, Bhavitha Yaragorla ![Status](https://img.shields.io/badge/status-production--ready-green) ![Version](https://img.shields.io/badge/version-2.0.0-blue) ![Python](https://img.shields.io/badge/python-3.10+-blue) ![Docker](https://img.shields.io/badge/docker-enabled-blue) --- ## ๐Ÿ“„ Problem Statement Validating mathematical reasoning generated by Large Language Models (LLMs) is critical but challenging, especially when inputs are multimodal (images of handwritten or printed text). **Key Challenges:** 1. **Hallucinations:** LLMs often generate plausible-sounding but logically flawed steps. 2. **OCR Noise:** Extracting math from images introduces errors (e.g., confusing '5' with 'S' or missing integrals) that downstream verifiers blindly accept. 3. **Lack of Formal Uncertainty:** Existing systems do not account for OCR confidence when making final validity judgments. **MVMยฒ Solution:** A unified pipeline that combines **OCR with formal uncertainty propagation**, **symbolic verification (SymPy)**, and **multi-agent LLM consensus** to robustly verify mathematical solutions. --- ## ๐Ÿ—๏ธ System Architecture The system follows a modular service-oriented architecture located in the `backend/` directory: | Service | Responsibility | |---|---| | **1. Input Receiver** | (`backend/input_receiver.py`) Validates text/image inputs via Pydantic models. | | **2. Preprocessing** | (`backend/preprocessing_service.py`) cleans images using OpenCV (denoising, binarization). | | **3. OCR Service** | (`backend/ocr_service.py`) Hybrid engine combining Tesseract and specialized Handwritten models. **Calculates OCR Confidence ($C_{ocr}$).** | | **4. Representation** | (`backend/representation_service.py`) Normalizes inputs into a canonical LaTeX-like Intermediate Representation (IR). | | **5. Verification** | (`backend/verification_service.py`) Orchestrates **SymPy** for arithmetic checks and **Multi-Agent LLMs** (Solver, Critic, Verifier) for logic. | | **6. Classification** | (`backend/classifier_service.py`) Aggregates scores using the **MVMยฒ Hybrid Formula**. | | **7. Reporting** | (`backend/reporting_service.py`) Generates detailed JSON/HTML reports for the user. | --- ## โญ Key Innovations ### 1. OCR-Aware Confidence Propagation Unlike standard pipelines that treat OCR text as ground truth, MVMยฒ formally propagates visual uncertainty into the final confidence score ($C_{final}$). $$ C_{final} = S_{weighted} \times (0.9 + 0.1 \times C_{ocr}) $$ This ensures that a verification result is heavily penalized if the input image was ambiguous, preventing false positives on noisy data. ### 2. Step-Level Multi-Agent Consensus We deploy a **Multi-Agent System** (Solver, Critic, Verifier) to analyze solution steps. We compute a **Hallucination Rate** by checking consensus across agents for each step. - **Agreement:** +Confidence - **Disagreement:** Flags potential hallucination ### 3. Hybrid Scoring Mechanism The final validity score ($S_{weighted}$) is a weighted ensemble of three distinct signals: - **Symbolic Score ($\alpha=0.40$):** SymPy's formal verification of arithmetic. - **Logical Score ($\beta=0.35$):** LLM consensus on reasoning flow. - **Classifier Score ($\gamma=0.25$):** Rule-based patterns (e.g., detecting uncertainty keywords). $$ S_{weighted} = 0.40 \cdot S_{sym} + 0.35 \cdot S_{log} + 0.25 \cdot S_{clf} $$ --- ## ๐Ÿš€ Getting Started ### Prerequisites - Python 3.10+ - Tesseract OCR installed ([Instructions](https://github.com/tesseract-ocr/tesseract)) - Google Gemini API Key ### Installation 1. Clone the repository: ```bash git clone https://github.com/yourusername/mvm2.git cd mvm2 ``` 2. Install dependencies: ```bash pip install -r requirements.txt ``` 3. Set API Key: ```powershell # Windows PowerShell $env:GEMINI_API_KEY="your_api_key_here" ``` ### Running the System **1. Backend API (FastAPI)** ```bash python backend/main.py # Server runs at http://localhost:8000 ``` **2. Frontend Interface** Open `frontend/index.html` in your web browser. *(No build step required for this lightweight UI)* **3. Docker Deployment** MVMยฒ is container-ready. We provide a full docker-compose setup. ```bash docker-compose up --build -d ``` - Backend API will be available at `http://localhost:8000` - Frontend UI will be available at `http://localhost:8080` --- ## ๐Ÿงช Experiments & Evaluation We provide a custom evaluation suite to reproduce our ablation studies. ### 1. Dataset The evaluation uses `datasets/sample_data.json`. You can add your own samples here. ### 2. Running Ablation Modes The `run_evaluation.py` script automatically compares 4 system configurations: | Mode | Description | Hypothesis | |---|---|---| | `single_llm_only` | Baseline (1 Agent) | High hallucination rate, low accuracy. | | `llm_plus_sympy` | Hybrid (1 Agent + SymPy) | Better arithmetic, still hallucinates logic. | | `multi_agent_no_ocr_conf` | Multi-Agent Consensus | Low hallucination, but overconfident on noisy images. | | **`full_mvm2`** | **Complete System** | **Highest reliability and calibrated confidence.** | **Command:** ```bash python run_evaluation.py ``` ### 3. Results Outputs are saved to `evaluation_results.csv` containing: - Accuracy (Exact Match) - Hallucination Rate - Latency (ms) - Verdicts --- ## ๐Ÿ“ Project Structure ``` math_verification_mvp/ โ”œโ”€โ”€ backend/ โ”‚ โ”œโ”€โ”€ config.py # Central Configuration โ”‚ โ”œโ”€โ”€ core/ # Core Logic Services (MVMยฒ Modules) โ”‚ โ”‚ โ”œโ”€โ”€ input_receiver.py โ”‚ โ”‚ โ”œโ”€โ”€ ocr_service.py โ”‚ โ”‚ โ”œโ”€โ”€ verification_service.py โ”‚ โ”‚ โ”œโ”€โ”€ classifier_service.py โ”‚ โ”‚ โ””โ”€โ”€ ... โ”‚ โ”œโ”€โ”€ tests/ # Unit Tests โ”‚ โ””โ”€โ”€ main.py # FastAPI Entry Point โ”œโ”€โ”€ frontend/ # Lightweight UI โ”œโ”€โ”€ datasets/ # Evaluation Data & Results โ”œโ”€โ”€ scripts/ # Evaluation & Benchmark Scripts โ”‚ โ”œโ”€โ”€ run_evaluation.py โ”‚ โ”œโ”€โ”€ run_benchmarks.py โ”‚ โ””โ”€โ”€ quick_test.py โ”œโ”€โ”€ docs/ # Documentation โ””โ”€โ”€ requirements.txt # Dependencies ``` ## ๐Ÿš€ Getting Started ### Prerequisites - Python 3.10+ - Tesseract OCR installed ([Instructions](https://github.com/tesseract-ocr/tesseract)) - Google Gemini API Key ### Installation 1. Clone the repository: ```bash git clone https://github.com/yourusername/mvm2.git cd math_verification_mvp ``` 2. Install dependencies: ```bash pip install -r requirements.txt ``` 3. Set API Key: ```powershell # Windows PowerShell $env:GEMINI_API_KEY="your_api_key_here" ``` ### Running the System **1. Backend API (FastAPI)** ```bash python backend/main.py # Server runs at http://localhost:8000 ``` **2. Frontend Interface** Open `frontend/index.html` in your web browser. **3. Running Experiments** ```bash # Run full evaluation suite python scripts/run_evaluation.py ```