๐Ÿง  Manthan-M1

"Manthan" means churning โ€” the churning of ideas to produce clarity, depth, and structured reasoning.

Manthan-M1 is a ~24โ€“25B parameter multimodal reasoning model built in India for high-performance STEM, competitive exam solving, and multilingual understanding.
It combines a Vision-Language encoder with a fine-tuned reasoning LLM and is optimized for structured, tool-augmented problem solving.

Built independently with a focus on Indian academic excellence ๐Ÿ‡ฎ๐Ÿ‡ณ


๐Ÿš€ Overview

  • Model Size: ~24โ€“25B parameters
  • Architecture: Two-stage unified multimodal reasoning pipeline
  • Quantization: MXFP4 (LLM), BF16 (VLM)
  • Repository Size: ~20GB
  • Primary Focus:
    • JEE (Mains & Advanced)
    • AIME / IMO-level reasoning
    • STEM benchmarks
    • Multilingual Indian language support
    • Tool-augmented mathematical reasoning

๐Ÿ— Architecture

Manthan-M1 is a two-stage reasoning system wrapped as a single model interface:

1๏ธโƒฃ Vision Module

  • Based on Qwen-3-VL encoder
  • Converts image โ†’ dense embeddings
  • Handles:
    • Diagrams
    • Geometry figures
    • OCR-heavy exam sheets
    • Charts and tables

2๏ธโƒฃ Reasoning Module

  • Based on openai/gpt-oss-20b
  • Heavily fine-tuned on:
    • 30 years of competitive math exams
    • Olympiad-style problems
    • Indian entrance examinations
    • Structured reasoning datasets
  • Optimized for tool usage and step-based reasoning

Designed to outperform larger open-weight baselines in math-heavy benchmarks.


๐ŸŽฏ Target Performance (Indian Competitive Exams)

Exam Accuracy (with tools)
JEE Mains ~98.3%
JEE Advanced ~97.1%
AIME 2024 ~99.9%
AIME 2025 ~99.7โ€“100%
IMO 2025 (Answer Bench) 39/44 (Gold Medal range)

For comparison:

GPT-OSS-20B

  • AIME 2024 (with tools): ~96.0%
  • AIME 2025 (with tools): ~98.7%

๐Ÿ“Š Benchmark Results


๐Ÿง  STEM & Puzzle (Multimodal)

Benchmark Manthan-M1
We-Math 83.5
DynaMath 84.9
ZEROBench 8
ZEROBench_sub 34.8
BabyVision 38.0

๐ŸŒ General VQA

Benchmark Score
RealWorldQA 82.5
MMStar 81.2
HallusionBench 69.0
MMBenchEN-DEV-v1.1 92.0
SimpleVQA 69.0

๐Ÿ“„ OCR & Document Intelligence

Benchmark Score
OmniDocBench1.5 88.0
CharXiv (RQ) 75.0
MMLongBench-Doc 59.5
CC-OCR 80.0
AI2D_TEST 92.5
OCRBench 91.0

๐Ÿงญ Spatial Intelligence

Benchmark Score
ERQA 60.0
CountBench 95.0
RefCOCO (avg) 89.0
ODInW13 45.0
EmbSpatialBench 82.0
RefSpatialBench 70.0
LingoQA 75.0
Hypersim 12.0
SUNRGBD 36.0
Nuscene 15.0

๐ŸŽฅ Video Understanding

Benchmark Score
VideoMME (w sub.) 87.0
VideoMME (w/o sub.) 84.0
VideoMMMU 85.0
MLVU (M-Avg) 85.8
MVBench 76.0
LVBench 74.0
MMVU 78.5

๐Ÿค– Agent Benchmarks

General Agent

Benchmark Score
BFCL-V4 71.0
TAU2-Bench 86.0
VITA-Bench 45.0
DeepPlanning 30.0
Tool Decathlon 35.0
MCP-Mark 45.0

Search Agent

Benchmark Score
HLE w/ tool 47.0
BrowseComp 66.5
BrowseComp-zh 68.0
WideSearch 73.0
Seal-0 46.5

๐ŸŒ Multilingual Performance

Benchmark Score
MMMLU 88.0
MMLU-ProX 84.0
NOVA-63 55.5
INCLUDE 86.0
Global PIQA 90.0
PolyMATH 70.0
WMT24++ 79.0
MAXIFE 85.0

Strong support for:

  • Hindi
  • Tamil
  • Telugu
  • Bengali
  • Marathi
  • Gujarati
  • Code-mixed Hinglish

๐Ÿ’ป Coding & Tool Use

Benchmark Score
SWE-bench Verified 78.0
SWE-bench Multilingual 71.0
SecCodeBench 65.0
Terminal Bench 2 50.0

๐Ÿ“ Repository Layout


Manthan-M1/
โ”œโ”€โ”€ config.json
โ”œโ”€โ”€ model.safetensors.index.json
โ”œโ”€โ”€ vlm/
โ”œโ”€โ”€ llm/
โ”œโ”€โ”€ vlm_processor/
โ””โ”€โ”€ llm_tokenizer/
  • vlm/ โ†’ Vision encoder weights (BF16)
  • llm/ โ†’ Reasoning LLM weights (MXFP4)
  • vlm_processor/ โ†’ Image processor + tokenizer
  • llm_tokenizer/ โ†’ LLM tokenizer

๐Ÿ›  Usage

Multimodal (Image + Text)

import torch
from transformers import AutoProcessor, AutoTokenizer
from PIL import Image
from modeling_unified import ManthanM1

model = ManthanM1.from_pretrained(
    "/tmp/Manthan-M1",
    dtype=torch.bfloat16,
    device_map="auto",
)

vlm_processor = AutoProcessor.from_pretrained("/tmp/Manthan-M1/vlm_processor")
llm_tokenizer = AutoTokenizer.from_pretrained("/tmp/Manthan-M1/llm_tokenizer")

image = Image.open("test_image.jpg").convert("RGB")

response = model.generate(
    images=image,
    text_prompt="Solve the geometry problem shown in the image.",
    vlm_processor=vlm_processor,
    llm_tokenizer=llm_tokenizer,
    max_new_tokens=1024,
)

print(response)

Text-Only

response = model.generate(
    text_prompt="Prove that the sum of first n odd numbers is n^2.",
    llm_tokenizer=llm_tokenizer,
    max_new_tokens=1024,
)

print(response)

๐ŸŽ“ Training Data (High-Level)

  • 30 years of:

    • Indian competitive exams
    • Olympiad-style math
    • Engineering entrance problems
  • Structured reasoning datasets

  • Multilingual Indic corpora

  • Diagram-heavy math datasets


๐Ÿ‡ฎ๐Ÿ‡ณ Philosophy

Manthan-M1 is built with a simple belief:

India doesnโ€™t just need AI that chats. It needs AI that solves.

Structured reasoning. Exam-grade mathematics. Indic-native understanding. Tool-augmented intelligence.


๐Ÿ“œ License

Apache 2.0


๐Ÿ‘ค Author

Built independently by an Indian developer focused on competitive reasoning systems.


Manthan-M1 โ€” Churning Intelligence.


Downloads last month
30
Safetensors
Model size
16B params
Tensor type
BF16
ยท
U8
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support