Varshith dharmaj commited on
Commit
64fc2b8
ยท
verified ยท
1 Parent(s): a3ed50f

Upload docs/README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. docs/README.md +208 -0
docs/README.md ADDED
@@ -0,0 +1,208 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # MVMยฒ: MVMยฒ - Multi-Modal Multi-Model Mathematical Reasoning Verification System
2
+
3
+ **VNR VJIET Major Project 2025**
4
+ **Team:** Brahma Teja, Vinith Kulkarni, Varshith Dharmaj V, Bhavitha Yaragorla
5
+
6
+ ![Status](https://img.shields.io/badge/status-production--ready-green)
7
+ ![Version](https://img.shields.io/badge/version-2.0.0-blue)
8
+ ![Python](https://img.shields.io/badge/python-3.10+-blue)
9
+ ![Docker](https://img.shields.io/badge/docker-enabled-blue)
10
+
11
+ ---
12
+
13
+ ## ๐Ÿ“„ Problem Statement
14
+
15
+ Validating mathematical reasoning generated by Large Language Models (LLMs) is critical but challenging, especially when inputs are multimodal (images of handwritten or printed text).
16
+
17
+ **Key Challenges:**
18
+ 1. **Hallucinations:** LLMs often generate plausible-sounding but logically flawed steps.
19
+ 2. **OCR Noise:** Extracting math from images introduces errors (e.g., confusing '5' with 'S' or missing integrals) that downstream verifiers blindly accept.
20
+ 3. **Lack of Formal Uncertainty:** Existing systems do not account for OCR confidence when making final validity judgments.
21
+
22
+ **MVMยฒ Solution:** A unified pipeline that combines **OCR with formal uncertainty propagation**, **symbolic verification (SymPy)**, and **multi-agent LLM consensus** to robustly verify mathematical solutions.
23
+
24
+ ---
25
+
26
+ ## ๐Ÿ—๏ธ System Architecture
27
+
28
+ The system follows a modular service-oriented architecture located in the `backend/` directory:
29
+
30
+ | Service | Responsibility |
31
+ |---|---|
32
+ | **1. Input Receiver** | (`backend/input_receiver.py`) Validates text/image inputs via Pydantic models. |
33
+ | **2. Preprocessing** | (`backend/preprocessing_service.py`) cleans images using OpenCV (denoising, binarization). |
34
+ | **3. OCR Service** | (`backend/ocr_service.py`) Hybrid engine combining Tesseract and specialized Handwritten models. **Calculates OCR Confidence ($C_{ocr}$).** |
35
+ | **4. Representation** | (`backend/representation_service.py`) Normalizes inputs into a canonical LaTeX-like Intermediate Representation (IR). |
36
+ | **5. Verification** | (`backend/verification_service.py`) Orchestrates **SymPy** for arithmetic checks and **Multi-Agent LLMs** (Solver, Critic, Verifier) for logic. |
37
+ | **6. Classification** | (`backend/classifier_service.py`) Aggregates scores using the **MVMยฒ Hybrid Formula**. |
38
+ | **7. Reporting** | (`backend/reporting_service.py`) Generates detailed JSON/HTML reports for the user. |
39
+
40
+ ---
41
+
42
+ ## โญ Key Innovations
43
+
44
+ ### 1. OCR-Aware Confidence Propagation
45
+ Unlike standard pipelines that treat OCR text as ground truth, MVMยฒ formally propagates visual uncertainty into the final confidence score ($C_{final}$).
46
+
47
+ $$
48
+ C_{final} = S_{weighted} \times (0.9 + 0.1 \times C_{ocr})
49
+ $$
50
+
51
+ This ensures that a verification result is heavily penalized if the input image was ambiguous, preventing false positives on noisy data.
52
+
53
+ ### 2. Step-Level Multi-Agent Consensus
54
+ We deploy a **Multi-Agent System** (Solver, Critic, Verifier) to analyze solution steps. We compute a **Hallucination Rate** by checking consensus across agents for each step.
55
+ - **Agreement:** +Confidence
56
+ - **Disagreement:** Flags potential hallucination
57
+
58
+ ### 3. Hybrid Scoring Mechanism
59
+ The final validity score ($S_{weighted}$) is a weighted ensemble of three distinct signals:
60
+ - **Symbolic Score ($\alpha=0.40$):** SymPy's formal verification of arithmetic.
61
+ - **Logical Score ($\beta=0.35$):** LLM consensus on reasoning flow.
62
+ - **Classifier Score ($\gamma=0.25$):** Rule-based patterns (e.g., detecting uncertainty keywords).
63
+
64
+ $$
65
+ S_{weighted} = 0.40 \cdot S_{sym} + 0.35 \cdot S_{log} + 0.25 \cdot S_{clf}
66
+ $$
67
+
68
+ ---
69
+
70
+ ## ๐Ÿš€ Getting Started
71
+
72
+ ### Prerequisites
73
+ - Python 3.10+
74
+ - Tesseract OCR installed ([Instructions](https://github.com/tesseract-ocr/tesseract))
75
+ - Google Gemini API Key
76
+
77
+ ### Installation
78
+ 1. Clone the repository:
79
+ ```bash
80
+ git clone https://github.com/yourusername/mvm2.git
81
+ cd mvm2
82
+ ```
83
+ 2. Install dependencies:
84
+ ```bash
85
+ pip install -r requirements.txt
86
+ ```
87
+ 3. Set API Key:
88
+ ```powershell
89
+ # Windows PowerShell
90
+ $env:GEMINI_API_KEY="your_api_key_here"
91
+ ```
92
+
93
+ ### Running the System
94
+
95
+ **1. Backend API (FastAPI)**
96
+ ```bash
97
+ python backend/main.py
98
+ # Server runs at http://localhost:8000
99
+ ```
100
+
101
+ **2. Frontend Interface**
102
+ Open `frontend/index.html` in your web browser.
103
+ *(No build step required for this lightweight UI)*
104
+
105
+ **3. Docker Deployment**
106
+ MVMยฒ is container-ready. We provide a full docker-compose setup.
107
+ ```bash
108
+ docker-compose up --build -d
109
+ ```
110
+ - Backend API will be available at `http://localhost:8000`
111
+ - Frontend UI will be available at `http://localhost:8080`
112
+
113
+ ---
114
+
115
+ ## ๐Ÿงช Experiments & Evaluation
116
+
117
+ We provide a custom evaluation suite to reproduce our ablation studies.
118
+
119
+ ### 1. Dataset
120
+ The evaluation uses `datasets/sample_data.json`. You can add your own samples here.
121
+
122
+ ### 2. Running Ablation Modes
123
+ The `run_evaluation.py` script automatically compares 4 system configurations:
124
+
125
+ | Mode | Description | Hypothesis |
126
+ |---|---|---|
127
+ | `single_llm_only` | Baseline (1 Agent) | High hallucination rate, low accuracy. |
128
+ | `llm_plus_sympy` | Hybrid (1 Agent + SymPy) | Better arithmetic, still hallucinates logic. |
129
+ | `multi_agent_no_ocr_conf` | Multi-Agent Consensus | Low hallucination, but overconfident on noisy images. |
130
+ | **`full_mvm2`** | **Complete System** | **Highest reliability and calibrated confidence.** |
131
+
132
+ **Command:**
133
+ ```bash
134
+ python run_evaluation.py
135
+ ```
136
+
137
+ ### 3. Results
138
+ Outputs are saved to `evaluation_results.csv` containing:
139
+ - Accuracy (Exact Match)
140
+ - Hallucination Rate
141
+ - Latency (ms)
142
+ - Verdicts
143
+
144
+ ---
145
+
146
+ ## ๐Ÿ“ Project Structure
147
+
148
+ ```
149
+ math_verification_mvp/
150
+ โ”œโ”€โ”€ backend/
151
+ โ”‚ โ”œโ”€โ”€ config.py # Central Configuration
152
+ โ”‚ โ”œโ”€โ”€ core/ # Core Logic Services (MVMยฒ Modules)
153
+ โ”‚ โ”‚ โ”œโ”€โ”€ input_receiver.py
154
+ โ”‚ โ”‚ โ”œโ”€โ”€ ocr_service.py
155
+ โ”‚ โ”‚ โ”œโ”€โ”€ verification_service.py
156
+ โ”‚ โ”‚ โ”œโ”€โ”€ classifier_service.py
157
+ โ”‚ โ”‚ โ””โ”€โ”€ ...
158
+ โ”‚ โ”œโ”€โ”€ tests/ # Unit Tests
159
+ โ”‚ โ””โ”€โ”€ main.py # FastAPI Entry Point
160
+ โ”œโ”€โ”€ frontend/ # Lightweight UI
161
+ โ”œโ”€โ”€ datasets/ # Evaluation Data & Results
162
+ โ”œโ”€โ”€ scripts/ # Evaluation & Benchmark Scripts
163
+ โ”‚ โ”œโ”€โ”€ run_evaluation.py
164
+ โ”‚ โ”œโ”€โ”€ run_benchmarks.py
165
+ โ”‚ โ””โ”€โ”€ quick_test.py
166
+ โ”œโ”€โ”€ docs/ # Documentation
167
+ โ””โ”€โ”€ requirements.txt # Dependencies
168
+ ```
169
+
170
+ ## ๐Ÿš€ Getting Started
171
+
172
+ ### Prerequisites
173
+ - Python 3.10+
174
+ - Tesseract OCR installed ([Instructions](https://github.com/tesseract-ocr/tesseract))
175
+ - Google Gemini API Key
176
+
177
+ ### Installation
178
+ 1. Clone the repository:
179
+ ```bash
180
+ git clone https://github.com/yourusername/mvm2.git
181
+ cd math_verification_mvp
182
+ ```
183
+ 2. Install dependencies:
184
+ ```bash
185
+ pip install -r requirements.txt
186
+ ```
187
+ 3. Set API Key:
188
+ ```powershell
189
+ # Windows PowerShell
190
+ $env:GEMINI_API_KEY="your_api_key_here"
191
+ ```
192
+
193
+ ### Running the System
194
+
195
+ **1. Backend API (FastAPI)**
196
+ ```bash
197
+ python backend/main.py
198
+ # Server runs at http://localhost:8000
199
+ ```
200
+
201
+ **2. Frontend Interface**
202
+ Open `frontend/index.html` in your web browser.
203
+
204
+ **3. Running Experiments**
205
+ ```bash
206
+ # Run full evaluation suite
207
+ python scripts/run_evaluation.py
208
+ ```