Varshith dharmaj
Upload docs/README.md with huggingface_hub
64fc2b8 verified

A newer version of the Gradio SDK is available: 6.10.0

Upgrade

MVMยฒ: MVMยฒ - Multi-Modal Multi-Model Mathematical Reasoning Verification System

VNR VJIET Major Project 2025
Team: Brahma Teja, Vinith Kulkarni, Varshith Dharmaj V, Bhavitha Yaragorla

Status Version Python Docker


๐Ÿ“„ Problem Statement

Validating mathematical reasoning generated by Large Language Models (LLMs) is critical but challenging, especially when inputs are multimodal (images of handwritten or printed text).

Key Challenges:

  1. Hallucinations: LLMs often generate plausible-sounding but logically flawed steps.
  2. OCR Noise: Extracting math from images introduces errors (e.g., confusing '5' with 'S' or missing integrals) that downstream verifiers blindly accept.
  3. Lack of Formal Uncertainty: Existing systems do not account for OCR confidence when making final validity judgments.

MVMยฒ Solution: A unified pipeline that combines OCR with formal uncertainty propagation, symbolic verification (SymPy), and multi-agent LLM consensus to robustly verify mathematical solutions.


๐Ÿ—๏ธ System Architecture

The system follows a modular service-oriented architecture located in the backend/ directory:

Service Responsibility
1. Input Receiver (backend/input_receiver.py) Validates text/image inputs via Pydantic models.
2. Preprocessing (backend/preprocessing_service.py) cleans images using OpenCV (denoising, binarization).
3. OCR Service (backend/ocr_service.py) Hybrid engine combining Tesseract and specialized Handwritten models. Calculates OCR Confidence ($C_{ocr}$).
4. Representation (backend/representation_service.py) Normalizes inputs into a canonical LaTeX-like Intermediate Representation (IR).
5. Verification (backend/verification_service.py) Orchestrates SymPy for arithmetic checks and Multi-Agent LLMs (Solver, Critic, Verifier) for logic.
6. Classification (backend/classifier_service.py) Aggregates scores using the MVMยฒ Hybrid Formula.
7. Reporting (backend/reporting_service.py) Generates detailed JSON/HTML reports for the user.

โญ Key Innovations

1. OCR-Aware Confidence Propagation

Unlike standard pipelines that treat OCR text as ground truth, MVMยฒ formally propagates visual uncertainty into the final confidence score ($C_{final}$).

Cfinal=Sweightedร—(0.9+0.1ร—Cocr) C_{final} = S_{weighted} \times (0.9 + 0.1 \times C_{ocr})

This ensures that a verification result is heavily penalized if the input image was ambiguous, preventing false positives on noisy data.

2. Step-Level Multi-Agent Consensus

We deploy a Multi-Agent System (Solver, Critic, Verifier) to analyze solution steps. We compute a Hallucination Rate by checking consensus across agents for each step.

  • Agreement: +Confidence
  • Disagreement: Flags potential hallucination

3. Hybrid Scoring Mechanism

The final validity score ($S_{weighted}$) is a weighted ensemble of three distinct signals:

  • Symbolic Score ($\alpha=0.40$): SymPy's formal verification of arithmetic.
  • Logical Score ($\beta=0.35$): LLM consensus on reasoning flow.
  • Classifier Score ($\gamma=0.25$): Rule-based patterns (e.g., detecting uncertainty keywords).

Sweighted=0.40โ‹…Ssym+0.35โ‹…Slog+0.25โ‹…Sclf S_{weighted} = 0.40 \cdot S_{sym} + 0.35 \cdot S_{log} + 0.25 \cdot S_{clf}


๐Ÿš€ Getting Started

Prerequisites

  • Python 3.10+
  • Tesseract OCR installed (Instructions)
  • Google Gemini API Key

Installation

  1. Clone the repository:
    git clone https://github.com/yourusername/mvm2.git
    cd mvm2
    
  2. Install dependencies:
    pip install -r requirements.txt
    
  3. Set API Key:
    # Windows PowerShell
    $env:GEMINI_API_KEY="your_api_key_here"
    

Running the System

1. Backend API (FastAPI)

python backend/main.py
# Server runs at http://localhost:8000

2. Frontend Interface Open frontend/index.html in your web browser. (No build step required for this lightweight UI)

3. Docker Deployment MVMยฒ is container-ready. We provide a full docker-compose setup.

docker-compose up --build -d
  • Backend API will be available at http://localhost:8000
  • Frontend UI will be available at http://localhost:8080

๐Ÿงช Experiments & Evaluation

We provide a custom evaluation suite to reproduce our ablation studies.

1. Dataset

The evaluation uses datasets/sample_data.json. You can add your own samples here.

2. Running Ablation Modes

The run_evaluation.py script automatically compares 4 system configurations:

Mode Description Hypothesis
single_llm_only Baseline (1 Agent) High hallucination rate, low accuracy.
llm_plus_sympy Hybrid (1 Agent + SymPy) Better arithmetic, still hallucinates logic.
multi_agent_no_ocr_conf Multi-Agent Consensus Low hallucination, but overconfident on noisy images.
full_mvm2 Complete System Highest reliability and calibrated confidence.

Command:

python run_evaluation.py

3. Results

Outputs are saved to evaluation_results.csv containing:

  • Accuracy (Exact Match)
  • Hallucination Rate
  • Latency (ms)
  • Verdicts

๐Ÿ“ Project Structure

math_verification_mvp/
โ”œโ”€โ”€ backend/
โ”‚   โ”œโ”€โ”€ config.py             # Central Configuration
โ”‚   โ”œโ”€โ”€ core/                 # Core Logic Services (MVMยฒ Modules)
โ”‚   โ”‚   โ”œโ”€โ”€ input_receiver.py
โ”‚   โ”‚   โ”œโ”€โ”€ ocr_service.py
โ”‚   โ”‚   โ”œโ”€โ”€ verification_service.py
โ”‚   โ”‚   โ”œโ”€โ”€ classifier_service.py
โ”‚   โ”‚   โ””โ”€โ”€ ...
โ”‚   โ”œโ”€โ”€ tests/                # Unit Tests
โ”‚   โ””โ”€โ”€ main.py               # FastAPI Entry Point
โ”œโ”€โ”€ frontend/                 # Lightweight UI
โ”œโ”€โ”€ datasets/                 # Evaluation Data & Results
โ”œโ”€โ”€ scripts/                  # Evaluation & Benchmark Scripts
โ”‚   โ”œโ”€โ”€ run_evaluation.py
โ”‚   โ”œโ”€โ”€ run_benchmarks.py
โ”‚   โ””โ”€โ”€ quick_test.py
โ”œโ”€โ”€ docs/                     # Documentation
โ””โ”€โ”€ requirements.txt          # Dependencies

๐Ÿš€ Getting Started

Prerequisites

  • Python 3.10+
  • Tesseract OCR installed (Instructions)
  • Google Gemini API Key

Installation

  1. Clone the repository:
    git clone https://github.com/yourusername/mvm2.git
    cd math_verification_mvp
    
  2. Install dependencies:
    pip install -r requirements.txt
    
  3. Set API Key:
    # Windows PowerShell
    $env:GEMINI_API_KEY="your_api_key_here"
    

Running the System

1. Backend API (FastAPI)

python backend/main.py
# Server runs at http://localhost:8000

2. Frontend Interface Open frontend/index.html in your web browser.

3. Running Experiments

# Run full evaluation suite
python scripts/run_evaluation.py