๐ง Manthan-M1
"Manthan" means churning โ the churning of ideas to produce clarity, depth, and structured reasoning.
Manthan-M1 is a ~24โ25B parameter multimodal reasoning model built in India for high-performance STEM, competitive exam solving, and multilingual understanding.
It combines a Vision-Language encoder with a fine-tuned reasoning LLM and is optimized for structured, tool-augmented problem solving.
Built independently with a focus on Indian academic excellence ๐ฎ๐ณ
๐ Overview
- Model Size: ~24โ25B parameters
- Architecture: Two-stage unified multimodal reasoning pipeline
- Quantization: MXFP4 (LLM), BF16 (VLM)
- Repository Size: ~20GB
- Primary Focus:
- JEE (Mains & Advanced)
- AIME / IMO-level reasoning
- STEM benchmarks
- Multilingual Indian language support
- Tool-augmented mathematical reasoning
๐ Architecture
Manthan-M1 is a two-stage reasoning system wrapped as a single model interface:
1๏ธโฃ Vision Module
- Based on Qwen-3-VL encoder
- Converts image โ dense embeddings
- Handles:
- Diagrams
- Geometry figures
- OCR-heavy exam sheets
- Charts and tables
2๏ธโฃ Reasoning Module
- Based on openai/gpt-oss-20b
- Heavily fine-tuned on:
- 30 years of competitive math exams
- Olympiad-style problems
- Indian entrance examinations
- Structured reasoning datasets
- Optimized for tool usage and step-based reasoning
Designed to outperform larger open-weight baselines in math-heavy benchmarks.
๐ฏ Target Performance (Indian Competitive Exams)
| Exam | Accuracy (with tools) |
|---|---|
| JEE Mains | ~98.3% |
| JEE Advanced | ~97.1% |
| AIME 2024 | ~99.9% |
| AIME 2025 | ~99.7โ100% |
| IMO 2025 (Answer Bench) | 39/44 (Gold Medal range) |
For comparison:
GPT-OSS-20B
- AIME 2024 (with tools): ~96.0%
- AIME 2025 (with tools): ~98.7%
๐ Benchmark Results
๐ง STEM & Puzzle (Multimodal)
| Benchmark | Manthan-M1 |
|---|---|
| We-Math | 83.5 |
| DynaMath | 84.9 |
| ZEROBench | 8 |
| ZEROBench_sub | 34.8 |
| BabyVision | 38.0 |
๐ General VQA
| Benchmark | Score |
|---|---|
| RealWorldQA | 82.5 |
| MMStar | 81.2 |
| HallusionBench | 69.0 |
| MMBenchEN-DEV-v1.1 | 92.0 |
| SimpleVQA | 69.0 |
๐ OCR & Document Intelligence
| Benchmark | Score |
|---|---|
| OmniDocBench1.5 | 88.0 |
| CharXiv (RQ) | 75.0 |
| MMLongBench-Doc | 59.5 |
| CC-OCR | 80.0 |
| AI2D_TEST | 92.5 |
| OCRBench | 91.0 |
๐งญ Spatial Intelligence
| Benchmark | Score |
|---|---|
| ERQA | 60.0 |
| CountBench | 95.0 |
| RefCOCO (avg) | 89.0 |
| ODInW13 | 45.0 |
| EmbSpatialBench | 82.0 |
| RefSpatialBench | 70.0 |
| LingoQA | 75.0 |
| Hypersim | 12.0 |
| SUNRGBD | 36.0 |
| Nuscene | 15.0 |
๐ฅ Video Understanding
| Benchmark | Score |
|---|---|
| VideoMME (w sub.) | 87.0 |
| VideoMME (w/o sub.) | 84.0 |
| VideoMMMU | 85.0 |
| MLVU (M-Avg) | 85.8 |
| MVBench | 76.0 |
| LVBench | 74.0 |
| MMVU | 78.5 |
๐ค Agent Benchmarks
General Agent
| Benchmark | Score |
|---|---|
| BFCL-V4 | 71.0 |
| TAU2-Bench | 86.0 |
| VITA-Bench | 45.0 |
| DeepPlanning | 30.0 |
| Tool Decathlon | 35.0 |
| MCP-Mark | 45.0 |
Search Agent
| Benchmark | Score |
|---|---|
| HLE w/ tool | 47.0 |
| BrowseComp | 66.5 |
| BrowseComp-zh | 68.0 |
| WideSearch | 73.0 |
| Seal-0 | 46.5 |
๐ Multilingual Performance
| Benchmark | Score |
|---|---|
| MMMLU | 88.0 |
| MMLU-ProX | 84.0 |
| NOVA-63 | 55.5 |
| INCLUDE | 86.0 |
| Global PIQA | 90.0 |
| PolyMATH | 70.0 |
| WMT24++ | 79.0 |
| MAXIFE | 85.0 |
Strong support for:
- Hindi
- Tamil
- Telugu
- Bengali
- Marathi
- Gujarati
- Code-mixed Hinglish
๐ป Coding & Tool Use
| Benchmark | Score |
|---|---|
| SWE-bench Verified | 78.0 |
| SWE-bench Multilingual | 71.0 |
| SecCodeBench | 65.0 |
| Terminal Bench 2 | 50.0 |
๐ Repository Layout
Manthan-M1/
โโโ config.json
โโโ model.safetensors.index.json
โโโ vlm/
โโโ llm/
โโโ vlm_processor/
โโโ llm_tokenizer/
vlm/โ Vision encoder weights (BF16)llm/โ Reasoning LLM weights (MXFP4)vlm_processor/โ Image processor + tokenizerllm_tokenizer/โ LLM tokenizer
๐ Usage
Multimodal (Image + Text)
import torch
from transformers import AutoProcessor, AutoTokenizer
from PIL import Image
from modeling_unified import ManthanM1
model = ManthanM1.from_pretrained(
"/tmp/Manthan-M1",
dtype=torch.bfloat16,
device_map="auto",
)
vlm_processor = AutoProcessor.from_pretrained("/tmp/Manthan-M1/vlm_processor")
llm_tokenizer = AutoTokenizer.from_pretrained("/tmp/Manthan-M1/llm_tokenizer")
image = Image.open("test_image.jpg").convert("RGB")
response = model.generate(
images=image,
text_prompt="Solve the geometry problem shown in the image.",
vlm_processor=vlm_processor,
llm_tokenizer=llm_tokenizer,
max_new_tokens=1024,
)
print(response)
Text-Only
response = model.generate(
text_prompt="Prove that the sum of first n odd numbers is n^2.",
llm_tokenizer=llm_tokenizer,
max_new_tokens=1024,
)
print(response)
๐ Training Data (High-Level)
30 years of:
- Indian competitive exams
- Olympiad-style math
- Engineering entrance problems
Structured reasoning datasets
Multilingual Indic corpora
Diagram-heavy math datasets
๐ฎ๐ณ Philosophy
Manthan-M1 is built with a simple belief:
India doesnโt just need AI that chats. It needs AI that solves.
Structured reasoning. Exam-grade mathematics. Indic-native understanding. Tool-augmented intelligence.
๐ License
Apache 2.0
๐ค Author
Built independently by an Indian developer focused on competitive reasoning systems.
Manthan-M1 โ Churning Intelligence.
- Downloads last month
- 30