Overview

Chitrapathak-2 (Chitra: Image; Pathak: Reader) is a VLM-based multilingual Optical Character Recognition (OCR) system designed specifically for the linguistic diversity and document complexity of the Indian ecosystem. Trained for high-fidelity OCR on Indic language book pages, Chitrapathak-2 demonstrates strong generalization across 10 major Indian languages and English. It is the second model in the Chitrapathak OCR series, continuing the effort to build robust OCR systems for Indic scripts and multilingual documents.

Model Summary

Property	Details
Architecture	Vision-Encoder + 3B Decoder LLM
Languages	Hindi, Sanskrit, Bengali, Telugu, Tamil, Marathi, Kannada, Malayalam, Odia, Punjabi and English
Use Cases	OCR for printed text in multilingual books, pdf documents etc.
Frameworks	TRL 0.22.1, Transformers 4.56.0, PyTorch 2.6.0+cu124
Training Strategy	Supervised Fine-Tuning (SFT), mixed precision (FP16 / bfloat16), multinode training, DeepSpeed ZeRO-2 optimization

Usage

Using transformers

from PIL import Image
from transformers import AutoTokenizer, AutoProcessor, AutoModelForImageTextToText

model_path = "krutrim-ai-labs/Chitrapathak-2"

model = AutoModelForImageTextToText.from_pretrained(
    model_path, 
    torch_dtype="auto", 
    device_map="auto", 
    attn_implementation="flash_attention_2"
)
model.eval()

tokenizer = AutoTokenizer.from_pretrained(model_path)
processor = AutoProcessor.from_pretrained(model_path)


def perform_ocr(image_path, model, processor, max_new_tokens=4096):
    image = Image.open(image_path)
    messages = [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": [
            {"type": "image", "image": f"file://{image_path}"},
            {"type": "text", "text": "Perform OCR on this image and transcribe all visible text exactly as it appears."},
        ]},
    ]
    text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    inputs = processor(text=[text], images=[image], padding=True, return_tensors="pt")
    inputs = inputs.to(model.device)
    
    output_ids = model.generate(**inputs, max_new_tokens=max_new_tokens, do_sample=False)
    generated_ids = [output_ids[len(input_ids):] for input_ids, output_ids in zip(inputs.input_ids, output_ids)]
    
    output_text = processor.batch_decode(generated_ids, skip_special_tokens=True, clean_up_tokenization_spaces=True)
    return output_text[0]

image_path = "/path/to/your/document.jpg"
result = perform_ocr(image_path, model, processor, max_new_tokens=15000)
print(result)

Using vLLM

Start the vLLM server.

vllm serve krutrim-ai-labs/Chitrapathak-2

Predict with the model

from openai import OpenAI
import base64

client = OpenAI(api_key="123", base_url="http://localhost:8000/v1")

model = "krutrim-ai-labs/Chitrapathak-2"

def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")

def perform_ocr(img_base64):
    response = client.chat.completions.create(
        model=model,
        messages=[
            {
                "role": "user",
                "content": [
                    {
                        "type": "image_url",
                        "image_url": {"url": f"data:image/png;base64,{img_base64}"},
                    },
                    {
                        "type": "text",
                        "text": "Perform OCR on this image and transcribe all visible text exactly as it appears.",
                    },
                ],
            }
        ],
        temperature=0.0,
        max_tokens=15000
    )
    return response.choices[0].message.content

test_img_path = "/path/to/your/document.jpg"
img_base64 = encode_image(test_img_path)
print(perform_ocr(img_base64))

Evaluation Results

Indic OCR Performance

Model	Bn Word ↓	Bn Char ↓	Hi Word ↓	Hi Char ↓	Kn Word ↓	Kn Char ↓	Ml Word ↓	Ml Char ↓	Mr Word ↓	Mr Char ↓	Or Word ↓	Or Char ↓	Pa Word ↓	Pa Char ↓	Ta Word ↓	Ta Char ↓	Te Word ↓	Te Char ↓
Maya	99.42	95.77	99.7	94.91	-	-	-	-	-	-	-	-	-	-	-	-	-	-
PALO	96.3	91.15	99.26	91.98	-	-	-	-	-	-	-	-	-	-	-	-	-	-
Pangea	94.66	80.33	99.53	91.5	-	-	-	-	-	-	-	-	-	-	99.44	84.13	99.95	89.91
Chitrarth-1	96.16	84.65	98.56	89.81	99.58	85.29	99.62	94.77	99.66	86.58	99.99	93.21	99.16	90.17	99.1	89.94	99.86	89.02
LLaMA-4 maverick	31.52	13.21	25.73	11.91	36.9	11.17	75.5	45.75	20.94	8.05	97.51	86.78	29.77	12.68	31.36	10.79	57.07	18.72
Gemma-3 27B	42.15	24.41	46.47	29.5	84.22	54.24	92.06	72.64	50.4	31.06	92.67	70.72	70.88	42.65	39.52	16.51	86.76	54.14
GPT-4o	55.51	32.68	54.62	35.54	94.33	69.79	94.67	78.47	63.44	37.93	94.61	73.46	68.88	40.71	74.35	43.39	95.97	70.08
Nanonets-OCR2-3B	28.56	12.42	32.26	16.78	99.38	93.07	97.24	89.81	40.97	15.92	99.82	97.11	98.70	82.84	95.25	78.83	99.42	89.39
Chitrapathak-1	17.14	7.03	25.55	13.74	26.24	8.78	71.97	48.19	15.68	6.09	50.72	31.62	17.7	7.87	19.25	5.81	38.79	11
Chitrapathak-2	14.51	5.47	19.87	8.36	18.8	4.81	64.47	34.7	9.82	2.27	44.74	21.83	15.24	7.06	17.66	5.68	31.81	6.69
Gemini-2.5 Flash	11.3	4.04	16.01	5.88	17.18	4.38	59.64	30.6	8.06	1.79	41.7	18.6	14.56	4.98	15.26	3.01	33.32	7.16

The above table shows the performance comparison of models on the IndicVisionBench-OCR benchmark.
The metrics used above are word-level ANLS and character-level ANLS.
The best value in each column is in bold and the second best is underlined.
Chitrapathak-2 delivers SOTA OCR accuracy in Telugu and is remarkably close to Gemini-2.5 in other Indic languages, with an average difference of only 2.21 (word) and 1.83 (char) across the nine Indic languages, and a maximum gap of just 4.83 (word) and 4.10 (char).

English OCR Performance

Model	Synthdog		SROIE
Model	ANLS-Word	ANLS-Char	% Match
Gemma-3 27B	61.56	30.29	68.37
Llama-4 maverick	29.37	14.09	70.32
GPT-4o	82.22	73.65	36.09
Nanonets-OCR2-3B	23.9	10.8	72.33
Chitrapathak-2	24.9	20.2	68.95
Gemini-2.5 Flash	22.43	15.33	70.1

The above table shows the performance comparison of models on Synthdog and SROIE benchmarks.
The metrics for Synthdog are ANLS-word and ANLS-char. In these metrics, lower is better.
The metric for SROIE is %Match, indicating the percentage of fields that matched exactly with the ground truth. Here, higher is better.
As we see, Chitrapathak-2 retains much of the English OCR capabilities of its base model Nanonets-OCR2-3B.

We also evaluated the model on the Old books OCR dataset, which consists of scanned book pages in English.

Model	ANLS-word	ANLS-char
Nanonets	4.33	2.96
Chitrapathak-2	4.49	1.89
Gemini-2.5	4.36	1.86

Token Efficiency and Latency Breakdown of Chitrapathak-2

Metric	bn	hi	kn	ml	mr	or	pa	ta	te	en
Tokens / Word	5.9	4.8	11.2	12.6	6.5	11.7	6.9	9.4	13.2	1.4
Tokens (200 words)	1174.8	951.4	2242.2	2514.0	1292.4	2334.2	1387.2	1873.6	2646.6	280.0
Latency (200 words)	4.9s	4.0s	9.2s	10.3s	5.3s	9.5s	5.7s	7.7s	10.8s	1.3s

Note that the above latency values are calculated assuming the size of the input image to be ~1024x1024.

Observations

TTFT (Time-to-First-Token): ~125 ms
Inter-token latency: ~4 ms per token
Language impact: Latency varies with tokenization efficiency.
- English and Hindi → Lower latency due to compact token-to-word ratios.
- Telugu and Malayalam → Higher latency due to fragmented tokenization (larger number of tokens per word).

Example OCR Outputs

Input Image	Model Output (Chitrapathak-2)
	CHAPTER XIV MILT GODDARD returned from Pancake that night, bringing letters for Taylor. Sitting on the deacon's bench in the men's shanty John opened them. One was from his father. The address was typewritten, but within was a scant page of Luke's scrawl. It had been years since the old man had touched pen to paper for his son and that fact was thrilling! "You are crazy to talk of that much pine. It can't be done. Don't believe everything they tell you up there just because you're a gullible cub. I'm sending Rowe to Pancake Monday night just to see how big a fool you are. Your mother is well. Yours, etc. L. Taylor." John breathed deeply and smiled and scratched his head and re-read the crabbed sentences. Beneath their crustiness was genuine interest, a willingness, after Luke's manner, to take him seriously at last, an indication that the favors he had asked two months before and which had drawn only a cruel trick now were his. Yesterday he would have tried to calculate the profit that might accrue to him from Luke Taylor's aid; tonight he saw only in that note a promise that the burden on Helen Foraker's shoulders would be lightened. She had helped him, she had shaped him, she had taught him; and now, perhaps, he could repay some of that obligation. He could not know what waited just over the horizon of time! The other letter was in a smudged, scrawled envelope, 140
	हिन्दू मत और मसीही मत । १ ईश्वर \| भूमिका \| इन व्याख्याओं में हमारा विशेष अभिप्राय यह है कि हम हिन्दू मत और मसीही मत के मुख्य सिद्धान्तों पर सोच विचार करके निर्णय करें कि वे कहां लो समान हैं और कहां लो उन में भिन्नता पाई जाती है। यह नहीं समझना चाहिये कि मसीही और हिन्दू मत हर एक बात में विरोधी हैं और कभी यह नहीं समझना चाहिये कि हिन्दू और मसीही आपस में शत्रु हैं। मेरा आसरा है कि यह बात प्रगट होगी कि दोनों मतों की मनसा और अभिप्राय एक है और दोनों में कई एक सिद्धान्त हैं जो कुछ समान हैं तौभी बहुत सी बातें हैं जिन में विरुद्धता और भिन्नता पाई जाती है। हर एक प्रकार से हमारे लिये यह लाभदायक बात होगी कि हम किसी प्रकार की समानता पाके आनन्दित होवें और भिन्नता देखके निरूपण करें कि कौन २ सिद्धान्त यथार्थ और उत्तम और स्वीकार करने के योग्य हैं। कभी न भूलना चाहिये कि मसीहियों के लिये यह बात काफी नहीं है कि वे इस बात को स्थापित करें कि अमुक २ सिद्धान्त बैबल में हैं क्योंकि हिन्दू नह मानते हैं कि बैबल प्रामाणिक और ईश्वरीय पुस्तक है

Input Image

Model Output (Chitrapathak-2)

CHAPTER XIV
MILT GODDARD returned from Pancake that night,
bringing letters for Taylor.

Sitting on the deacon's bench in the men's shanty John
opened them. One was from his father. The address was
typewritten, but within was a scant page of Luke's scrawl.
It had been years since the old man had touched pen to
paper for his son and that fact was thrilling!

"You are crazy to talk of that much pine. It can't be
done. Don't believe everything they tell you up there
just because you're a gullible cub. I'm sending Rowe to
Pancake Monday night just to see how big a fool you are.
Your mother is well. Yours, etc. L. Taylor."

John breathed deeply and smiled and scratched his
head and re-read the crabbed sentences. Beneath their
crustiness was genuine interest, a willingness, after Luke's
manner, to take him seriously at last, an indication that
the favors he had asked two months before and which had
drawn only a cruel trick now were his.

Yesterday he would have tried to calculate the profit
that might accrue to him from Luke Taylor's aid; tonight
he saw only in that note a promise that the burden on
Helen Foraker's shoulders would be lightened. She had
helped him, she had shaped him, she had taught him;
and now, perhaps, he could repay some of that obligation.
He could not know what waited just over the horizon
of time!

The other letter was in a smudged, scrawled envelope,
140

हिन्दू मत और मसीही मत ।
१ ईश्वर |
भूमिका |
इन व्याख्याओं में हमारा विशेष अभिप्राय यह है
कि हम हिन्दू मत और मसीही मत के मुख्य सिद्धान्तों
पर सोच विचार करके निर्णय करें कि वे कहां लो
समान हैं और कहां लो उन में भिन्नता पाई जाती है।
यह नहीं समझना चाहिये कि मसीही और हिन्दू
मत हर एक बात में विरोधी हैं और कभी यह नहीं
समझना चाहिये कि हिन्दू और मसीही आपस में शत्रु
हैं। मेरा आसरा है कि यह बात प्रगट होगी कि दोनों
मतों की मनसा और अभिप्राय एक है और दोनों में
कई एक सिद्धान्त हैं जो कुछ समान हैं तौभी बहुत सी
बातें हैं जिन में विरुद्धता और भिन्नता पाई जाती है।
हर एक प्रकार से हमारे लिये यह लाभदायक बात
होगी कि हम किसी प्रकार की समानता पाके आनन्दित
होवें और भिन्नता देखके निरूपण करें कि कौन २
सिद्धान्त यथार्थ और उत्तम और स्वीकार करने के
योग्य हैं।
कभी न भूलना चाहिये कि मसीहियों के लिये यह
बात काफी नहीं है कि वे इस बात को स्थापित करें
कि अमुक २ सिद्धान्त बैबल में हैं क्योंकि हिन्दू नह
मानते हैं कि बैबल प्रामाणिक और ईश्वरीय पुस्तक है

_{* The example images shown above were obtained from publicly available internet sources and are used only for demonstration of OCR outputs.}

Highlights

Supports 9 major Indic languages + English
Optimized for printed book pages, PDF documents, and scanned text
Strong layout robustness — handles multi-column, multi-font, and dense paragraphs with ease
Second-best overall model on IndicVisionBench-OCR, ranking just behind Gemini-2.5, and the best-performing model for Telugu
Compatible with vLLM and Hugging Face inference pipelines
Multilingual generalization across diverse font styles, layouts, and page qualities

Limitations

The model is only capable of OCR and is not meant for a document intelligence use-case.
Given an image, the model only returns the OCR transcription in its default output format.
Performance may drop on handwritten, noisy, or low-resolution images.
Some degradation may be observed for rare Indic/English scripts or non-book domains (e.g. forms).
Performance degradation is observed on Index-page layouts and other complicated page layouts.

License

This model is distributed under the Krutrim Community License Agreement v1.0.
Ensure compliance before any commercial or redistributed usage.

Update: This model is released only for research purposes since the underlying Qwen-2.5-VL model has changed from Apache to a more restrictive license.

Citation

If you use Chitrapathak-2 in your research, please cite:

@misc{faraz2026indicocr,
      title={Designing Production-Scale OCR for India: Multilingual and Domain-Specific Systems}, 
      author={Ali Faraz and Raja Kolla and Ashish Kulkarni and Shubham Agarwal},
      year={2026},
      eprint={2602.16430},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2602.16430}, 
}

Acknowledgements

Chitrapathak-2 builds upon the foundations of the following projects and open-source efforts:

Downloads last month: -

Safetensors

Model size

4B params

Tensor type

BF16

Model tree for krutrim-ai-labs/Chitrapathak-2

Base model

Qwen/Qwen2.5-VL-3B-Instruct

Finetuned

nanonets/Nanonets-OCR2-3B

Finetuned

(4)

this model

Paper for krutrim-ai-labs/Chitrapathak-2

Designing Production-Scale OCR for India: Multilingual and Domain-Specific Systems

Paper • 2602.16430 • Published 21 days ago