cernis-intelligence
/

cernis-vision-ocr

@@ -1,21 +1,117 @@
 ---
-base_model: unsloth/qwen2.5-vl-7b-instruct-bnb-4bit
-tags:
-- text-generation-inference
-- transformers
-- unsloth
-- qwen2_5_vl
 license: apache-2.0
-language:
-- en
 ---
-# Uploaded finetuned  model
-- **Developed by:** coolAI
-- **License:** apache-2.0
-- **Finetuned from model :** unsloth/qwen2.5-vl-7b-instruct-bnb-4bit
-This qwen2_5_vl model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
-[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)

 ---
 license: apache-2.0
+base_model: unsloth/Qwen2.5-VL-7B-Instruct-bnb-4bit
+tags:
+- vision
+- ocr
+- document-understanding
+- qwen2.5-vl
+- lora
+- latex
+- handwriting
+- invoice
 ---
+# CernisOCR
+A vision language model OCR model fine-tuned on Qwen2.5-VL-7B-Instruct for handling mathematical formulas, handwritten text, and structured documents in a single model.
+## Model Description
+CernisOCR is a vision language model, optimized for diverse OCR tasks across multiple document domains. Unlike domain-specific OCR models, CernisOCR unifies three traditionally separate OCR tasks into a single, efficient model:
+- **Mathematical LaTeX conversion**: Converts handwritten or printed mathematical formulas to LaTeX notation
+- **Handwritten text transcription**: Transcribes cursive and printed handwriting
+- **Structured document extraction**: Extracts structured data from invoices and receipts
+**Key Features:**
+- Multi-domain capability in a single model
+- Handles varied image types, layouts, and text styles
+- Extracts both raw text and structured information
+- Robust to noise and variable image quality
+## Training Details
+- **Base Model**: Qwen2.5-VL-7B-Instruct
+- **Training Data**: 10,000 samples from three domains:
+  - LaTeX OCR: 3,978 samples (mathematical notation)
+  - Invoices & Receipts: 2,043 samples (structured documents)
+  - Handwritten Text: 3,978 samples (handwriting transcription)
+- **Fine-tuning Method**: LoRA (Low-Rank Adaptation)
+- **Training Loss**: Reduced from 4.802 to 0.116 (97.6% improvement)
+- **Training Time**: ~8.7 minutes on RTX 5090
+## Intended Use
+This model is designed for:
+- Mathematical formula recognition and LaTeX conversion
+- Handwritten text transcription
+- Invoice and receipt data extraction
+- Multi-domain document processing workflows
+- Applications requiring unified OCR across different document types
+## How to Use
+```python
+from unsloth import FastVisionModel
+from transformers import AutoTokenizer
+from PIL import Image
+# Load model and tokenizer
+model, tokenizer = FastVisionModel.from_pretrained(
+    "coolAI/cernis-ocr",  # or "coolAI/cernis-vision-ocr" for merged model
+    load_in_4bit=True,
+)
+FastVisionModel.for_inference(model)
+# Example 1: LaTeX conversion
+image = Image.open("formula.png")
+messages = [{
+    "role": "user",
+    "content": [
+        {"type": "image", "image": image},
+        {"type": "text", "text": "Write the LaTeX representation for this image."}
+    ]
+}]
+# Example 2: Handwritten transcription
+messages = [{
+    "role": "user",
+    "content": [
+        {"type": "image", "image": image},
+        {"type": "text", "text": "Transcribe the handwritten text in this image."}
+    ]
+}]
+# Example 3: Invoice extraction
+messages = [{
+    "role": "user",
+    "content": [
+        {"type": "image", "image": image},
+        {"type": "text", "text": "Extract and structure all text content from this invoice/receipt image."}
+    ]
+}]
+# Generate
+inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=True, return_tensors="pt").to("cuda")
+outputs = model.generate(**inputs, max_new_tokens=2048, temperature=0.7)
+text = tokenizer.decode(outputs[0], skip_special_tokens=True)
+```
+## Citation
+If you use this model, please cite:
+```bibtex
+@misc{cernis-ocr,
+  title={CernisOCR: A Unified Multi-Domain OCR Model},
+  author={Cernis AI},
+  year={2025},
+  howpublished={\url{https://huggingface.co/coolAI/cernis-ocr}}
+}
+```
+## Acknowledgments
+Built using [Unsloth](https://github.com/unslothai/unsloth) for efficient fine-tuning. Training data sourced from publicly available OCR datasets on Hugging Face.