Spaces:

Varshithdharmajv
/

mvm2-math-verification

Sleeping

App Files Files Community

Varshith dharmaj commited on Mar 12

Commit

64fc2b8

verified ·

1 Parent(s): a3ed50f

Upload docs/README.md with huggingface_hub

Browse files

Files changed (1) hide show

docs/README.md +208 -0

docs/README.md ADDED Viewed

	@@ -0,0 +1,208 @@

+# MVM²: MVM² - Multi-Modal Multi-Model Mathematical Reasoning Verification System
+**VNR VJIET Major Project 2025**
+**Team:** Brahma Teja, Vinith Kulkarni, Varshith Dharmaj V, Bhavitha Yaragorla
+![Status](https://img.shields.io/badge/status-production--ready-green)
+![Version](https://img.shields.io/badge/version-2.0.0-blue)
+![Python](https://img.shields.io/badge/python-3.10+-blue)
+![Docker](https://img.shields.io/badge/docker-enabled-blue)
+---
+## 📄 Problem Statement
+Validating mathematical reasoning generated by Large Language Models (LLMs) is critical but challenging, especially when inputs are multimodal (images of handwritten or printed text).
+**Key Challenges:**
+1.  **Hallucinations:** LLMs often generate plausible-sounding but logically flawed steps.
+2.  **OCR Noise:** Extracting math from images introduces errors (e.g., confusing '5' with 'S' or missing integrals) that downstream verifiers blindly accept.
+3.  **Lack of Formal Uncertainty:** Existing systems do not account for OCR confidence when making final validity judgments.
+**MVM² Solution:** A unified pipeline that combines **OCR with formal uncertainty propagation**, **symbolic verification (SymPy)**, and **multi-agent LLM consensus** to robustly verify mathematical solutions.
+---
+## 🏗️ System Architecture
+The system follows a modular service-oriented architecture located in the `backend/` directory:
+| Service | Responsibility |
+|---|---|
+| **1. Input Receiver** | (`backend/input_receiver.py`) Validates text/image inputs via Pydantic models. |
+| **2. Preprocessing** | (`backend/preprocessing_service.py`) cleans images using OpenCV (denoising, binarization). |
+| **3. OCR Service** | (`backend/ocr_service.py`) Hybrid engine combining Tesseract and specialized Handwritten models. **Calculates OCR Confidence ($C_{ocr}$).** |
+| **4. Representation** | (`backend/representation_service.py`) Normalizes inputs into a canonical LaTeX-like Intermediate Representation (IR). |
+| **5. Verification** | (`backend/verification_service.py`) Orchestrates **SymPy** for arithmetic checks and **Multi-Agent LLMs** (Solver, Critic, Verifier) for logic. |
+| **6. Classification** | (`backend/classifier_service.py`) Aggregates scores using the **MVM² Hybrid Formula**. |
+| **7. Reporting** | (`backend/reporting_service.py`) Generates detailed JSON/HTML reports for the user. |
+---
+## ⭐ Key Innovations
+### 1. OCR-Aware Confidence Propagation
+Unlike standard pipelines that treat OCR text as ground truth, MVM² formally propagates visual uncertainty into the final confidence score ($C_{final}$).
+$$
+C_{final} = S_{weighted} \times (0.9 + 0.1 \times C_{ocr})
+$$
+This ensures that a verification result is heavily penalized if the input image was ambiguous, preventing false positives on noisy data.
+### 2. Step-Level Multi-Agent Consensus
+We deploy a **Multi-Agent System** (Solver, Critic, Verifier) to analyze solution steps. We compute a **Hallucination Rate** by checking consensus across agents for each step.
+- **Agreement:** +Confidence
+- **Disagreement:** Flags potential hallucination
+### 3. Hybrid Scoring Mechanism
+The final validity score ($S_{weighted}$) is a weighted ensemble of three distinct signals:
+- **Symbolic Score ($\alpha=0.40$):** SymPy's formal verification of arithmetic.
+- **Logical Score ($\beta=0.35$):** LLM consensus on reasoning flow.
+- **Classifier Score ($\gamma=0.25$):** Rule-based patterns (e.g., detecting uncertainty keywords).
+$$
+S_{weighted} = 0.40 \cdot S_{sym} + 0.35 \cdot S_{log} + 0.25 \cdot S_{clf}
+$$
+---
+## 🚀 Getting Started
+### Prerequisites
+- Python 3.10+
+- Tesseract OCR installed ([Instructions](https://github.com/tesseract-ocr/tesseract))
+- Google Gemini API Key
+### Installation
+1.  Clone the repository:
+    ```bash
+    git clone https://github.com/yourusername/mvm2.git
+    cd mvm2
+    ```
+2.  Install dependencies:
+    ```bash
+    pip install -r requirements.txt
+    ```
+3.  Set API Key:
+    ```powershell
+    # Windows PowerShell
+    $env:GEMINI_API_KEY="your_api_key_here"
+    ```
+### Running the System
+**1. Backend API (FastAPI)**
+```bash
+python backend/main.py
+# Server runs at http://localhost:8000
+```
+**2. Frontend Interface**
+Open `frontend/index.html` in your web browser.
+*(No build step required for this lightweight UI)*
+**3. Docker Deployment**
+MVM² is container-ready. We provide a full docker-compose setup.
+```bash
+docker-compose up --build -d
+```
+- Backend API will be available at `http://localhost:8000`
+- Frontend UI will be available at `http://localhost:8080`
+---
+## 🧪 Experiments & Evaluation
+We provide a custom evaluation suite to reproduce our ablation studies.
+### 1. Dataset
+The evaluation uses `datasets/sample_data.json`. You can add your own samples here.
+### 2. Running Ablation Modes
+The `run_evaluation.py` script automatically compares 4 system configurations:
+| Mode | Description | Hypothesis |
+|---|---|---|
+| `single_llm_only` | Baseline (1 Agent) | High hallucination rate, low accuracy. |
+| `llm_plus_sympy` | Hybrid (1 Agent + SymPy) | Better arithmetic, still hallucinates logic. |
+| `multi_agent_no_ocr_conf` | Multi-Agent Consensus | Low hallucination, but overconfident on noisy images. |
+| **`full_mvm2`** | **Complete System** | **Highest reliability and calibrated confidence.** |
+**Command:**
+```bash
+python run_evaluation.py
+```
+### 3. Results
+Outputs are saved to `evaluation_results.csv` containing:
+- Accuracy (Exact Match)
+- Hallucination Rate
+- Latency (ms)
+- Verdicts
+---
+## 📁 Project Structure
+```
+math_verification_mvp/
+├── backend/
+│   ├── config.py             # Central Configuration
+│   ├── core/                 # Core Logic Services (MVM² Modules)
+│   │   ├── input_receiver.py
+│   │   ├── ocr_service.py
+│   │   ├── verification_service.py
+│   │   ├── classifier_service.py
+│   │   └── ...
+│   ├── tests/                # Unit Tests
+│   └── main.py               # FastAPI Entry Point
+├── frontend/                 # Lightweight UI
+├── datasets/                 # Evaluation Data & Results
+├── scripts/                  # Evaluation & Benchmark Scripts
+│   ├── run_evaluation.py
+│   ├── run_benchmarks.py
+│   └── quick_test.py
+├── docs/                     # Documentation
+└── requirements.txt          # Dependencies
+```
+## 🚀 Getting Started
+### Prerequisites
+- Python 3.10+
+- Tesseract OCR installed ([Instructions](https://github.com/tesseract-ocr/tesseract))
+- Google Gemini API Key
+### Installation
+1.  Clone the repository:
+    ```bash
+    git clone https://github.com/yourusername/mvm2.git
+    cd math_verification_mvp
+    ```
+2.  Install dependencies:
+    ```bash
+    pip install -r requirements.txt
+    ```
+3.  Set API Key:
+    ```powershell
+    # Windows PowerShell
+    $env:GEMINI_API_KEY="your_api_key_here"
+    ```
+### Running the System
+**1. Backend API (FastAPI)**
+```bash
+python backend/main.py
+# Server runs at http://localhost:8000
+```
+**2. Frontend Interface**
+Open `frontend/index.html` in your web browser.
+**3. Running Experiments**
+```bash
+# Run full evaluation suite
+python scripts/run_evaluation.py
+```