File size: 3,624 Bytes
bdec691
 
7b38832
 
 
 
 
 
 
 
 
 
bdec691
 
7b38832
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
bdec691
7b38832
bdec691
7b38832
bdec691
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
---
license: apache-2.0
base_model: unsloth/Qwen2.5-VL-7B-Instruct-bnb-4bit
tags:
- vision
- ocr
- document-understanding
- qwen2.5-vl
- lora
- latex
- handwriting
- invoice
---

# CernisOCR

A vision language model OCR model fine-tuned on Qwen2.5-VL-7B-Instruct for handling mathematical formulas, handwritten text, and structured documents in a single model.

## Model Description

CernisOCR is a vision language model, optimized for diverse OCR tasks across multiple document domains. Unlike domain-specific OCR models, CernisOCR unifies three traditionally separate OCR tasks into a single, efficient model:

- **Mathematical LaTeX conversion**: Converts handwritten or printed mathematical formulas to LaTeX notation
- **Handwritten text transcription**: Transcribes cursive and printed handwriting
- **Structured document extraction**: Extracts structured data from invoices and receipts

**Key Features:**
- Multi-domain capability in a single model
- Handles varied image types, layouts, and text styles
- Extracts both raw text and structured information
- Robust to noise and variable image quality

## Training Details

- **Base Model**: Qwen2.5-VL-7B-Instruct
- **Training Data**: 10,000 samples from three domains:
  - LaTeX OCR: 3,978 samples (mathematical notation)
  - Invoices & Receipts: 2,043 samples (structured documents)
  - Handwritten Text: 3,978 samples (handwriting transcription)
- **Fine-tuning Method**: LoRA (Low-Rank Adaptation)
- **Training Loss**: Reduced from 4.802 to 0.116 (97.6% improvement)
- **Training Time**: ~8.7 minutes on RTX 5090

## Intended Use

This model is designed for:
- Mathematical formula recognition and LaTeX conversion
- Handwritten text transcription
- Invoice and receipt data extraction
- Multi-domain document processing workflows
- Applications requiring unified OCR across different document types

## How to Use

```python
from unsloth import FastVisionModel
from transformers import AutoTokenizer
from PIL import Image

# Load model and tokenizer
model, tokenizer = FastVisionModel.from_pretrained(
    "coolAI/cernis-ocr",  # or "coolAI/cernis-vision-ocr" for merged model
    load_in_4bit=True,
)
FastVisionModel.for_inference(model)

# Example 1: LaTeX conversion
image = Image.open("formula.png")
messages = [{
    "role": "user",
    "content": [
        {"type": "image", "image": image},
        {"type": "text", "text": "Write the LaTeX representation for this image."}
    ]
}]

# Example 2: Handwritten transcription
messages = [{
    "role": "user",
    "content": [
        {"type": "image", "image": image},
        {"type": "text", "text": "Transcribe the handwritten text in this image."}
    ]
}]

# Example 3: Invoice extraction
messages = [{
    "role": "user",
    "content": [
        {"type": "image", "image": image},
        {"type": "text", "text": "Extract and structure all text content from this invoice/receipt image."}
    ]
}]

# Generate
inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=True, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=2048, temperature=0.7)
text = tokenizer.decode(outputs[0], skip_special_tokens=True)
```

## Citation

If you use this model, please cite:

```bibtex
@misc{cernis-ocr,
  title={CernisOCR: A Unified Multi-Domain OCR Model},
  author={Cernis AI},
  year={2025},
  howpublished={\url{https://huggingface.co/coolAI/cernis-ocr}}
}
```

## Acknowledgments

Built using [Unsloth](https://github.com/unslothai/unsloth) for efficient fine-tuning. Training data sourced from publicly available OCR datasets on Hugging Face.