---
pipeline_tag: image-to-text
library_name: transformers
license: other
tags:
  - ocr
  - multilingual
language:
  - en
  - zh
  - bo
  - mn
  - kk
  - ky
  - za
---

# CrossLing-OCR-Mini

🚀 **CrossLing-OCR-Mini** is a lightweight OCR model designed for **low-resource multilingual languages and complex document layouts**.  
The model emphasizes accurate text recognition while preserving original document structure, making it particularly suitable for **multilingual OCR research and academic benchmarking**.

---

## 1. Model Overview

CrossLing-OCR-Mini targets OCR scenarios involving **low-resource scripts, diverse writing directions, and complex layouts**.  
Despite its compact size (~580MB), the model demonstrates strong recognition performance across **11 languages**, while remaining deployable on **consumer-grade GPUs**.

### Key Features
- Multilingual OCR with structure-aware text recognition  
- Specialized optimization for low-resource and complex scripts  
- Lightweight (~580MB) and efficient inference  
- Designed exclusively for research and academic benchmarking  

### Supported Languages
- **High-resource languages**: Chinese, English  
- **Low-resource languages (specially optimized)**:  
  **Tibetan, Mongolian, Kazakh, Kyrgyz, Zhuang**

Experimental results indicate that CrossLing-OCR-Mini **outperforms or matches mainstream OCR systems** on multiple low-resource languages.

---

## 2. Usage / Inference

CrossLing-OCR-Mini can be directly used with the 🤗 **Transformers** library.  
The following example demonstrates **single-image OCR inference** for plain text recognition.

### Requirements
- Python ≥ 3.8  
- `transformers` (latest version recommended)  
- CUDA-enabled GPU (recommended for optimal performance)

```bash
pip install -U transformers accelerate
````

### Simple OCR Inference Example

```python
from transformers import AutoModel, AutoTokenizer

# Hugging Face model id
model_id = "NCUTNLP/CrossLing-OCR-Mini"
# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(
    model_id,
    trust_remote_code=True
)
model = AutoModel.from_pretrained(
    model_id,
    trust_remote_code=True,
    low_cpu_mem_usage=True,
    device_map="cuda",
    use_safetensors=True,
    pad_token_id=tokenizer.eos_token_id
)
model = model.eval().cuda()
# Input image
image_file = "test.png"
# Perform plain text OCR
result = model.chat(
    tokenizer,
    image_file,
    ocr_type="ocr"
)
print("Predicted OCR result:\n")
print(result)
```

### Notes

* `ocr_type="ocr"` enables plain text OCR mode
* The model automatically handles multilingual text recognition
* For best results, input images should be clear and upright
* Consumer-grade GPUs (e.g., RTX 3060 / 3090) are sufficient for inference

---

## 3. Performance Notes & Limitations

While CrossLing-OCR-Mini achieves strong overall performance, several limitations remain:

* OCR accuracy on **Mongolian and Uyghur** still has room for improvement
* Performance may degrade on extremely noisy, handwritten, or out-of-distribution inputs

These challenges will be addressed in future versions of the model.

---

## 4. Model Variants

| Version                       | Intended Use                | Availability        |
| ----------------------------- | --------------------------- | ------------------- |
| **CrossLing-OCR-Mini**        | Research & academic use     | ✅ Open-sourced      |
| **CrossLing-OCR-Pro-Preview** | Commercial / production use | 🔒 Contact required |

📩 For access to **CrossLing-OCR-Pro-Preview**, please contact:
**[zhumx@ncut.edu.cn](mailto:zhumx@ncut.edu.cn)**

The performance differences between the Mini and Pro-Preview versions are illustrated below.

![Mini\_Pro-Preview](https://cdn-uploads.huggingface.co/production/uploads/6956446a7ebeda1aa80be895/EcKEhwz-6VzPCmHqszIJy.png)

---

## 5. Intended Use

This model is **strictly intended for**:

* Academic research
* Scientific experimentation
* OCR benchmarking and method comparison
* Low-resource language OCR studies

---

## 6. Prohibited Use & Disclaimer

This model **must not be used** for:

* Any illegal or unlawful activities
* Applications violating social ethics, public order, or applicable laws
* Surveillance, discrimination, or harmful automated decision-making

**Disclaimer**:

* Any misuse of this model is **solely the responsibility of the user**
* The authors and maintainers **do not endorse** and **are not liable for** any consequences arising from improper or malicious use
* Outputs generated by this model **do not represent the views or positions of the authors**

---

## 7. Ethical Considerations & Bias

CrossLing-OCR-Mini is developed to support research on **low-resource and underrepresented languages**.
However, like all OCR systems, the model may reflect biases present in its training data, including:

* Uneven performance across languages and scripts
* Sensitivity to document quality, typography, and layout styles

Users are encouraged to:

* Carefully evaluate outputs before downstream use
* Avoid deploying the model in high-risk or sensitive decision-making scenarios

---

## 8. License

This model is released **for research purposes only**.
Commercial use is **not permitted** without explicit authorization.

For commercial licensing or extended usage, please contact the authors.

---

## 9. Citation

If you use CrossLing-OCR-Mini in your research, please cite:

```bibtex
@misc{crossling-ocr-mini,
  title     = {CrossLing-OCR: Advancing Low-Resource Multilingual Text Recognition through Multi-Stage Vision-Language Training},
  author    = {CrossLing Team},
  year      = {2025},
  note      = {Research-only OCR model}
}
```

---

## 10. Contact

For questions, collaboration, or commercial inquiries:

📧 **[zhumx@ncut.edu.cn](mailto:zhumx@ncut.edu.cn)**

---

## 11. Acknowledgement

This project aims to advance **low-resource multilingual OCR research** and contribute to the accessibility of underrepresented languages in the global AI ecosystem.

```