File size: 4,791 Bytes

---
license: apache-2.0
---

# CrossLing-OCR-Mini

🚀 **CrossLing-OCR-Mini** is a lightweight OCR model designed for **low-resource multilingual languages**.

---

## 1. Model Overview

Despite its compact size (~580MB), the model demonstrates strong recognition performance across **11 languages**, while remaining deployable on **consumer-grade GPUs**.

### Key Features
- Multilingual OCR with structure-aware text recognition  
- Specialized optimization for low-resource and complex scripts  
- Lightweight (~580MB) and efficient inference  

### Supported Languages
- **High-resource languages**: Chinese, English  
- **Low-resource languages (specially optimized)**:  
  **Tibetan, Mongolian, Kazakh, Kyrgyz, Zhuang, etc**

Experimental results indicate that CrossLing-OCR-Mini **outperforms or matches mainstream OCR systems** on multiple low-resource languages.

---

## 2. Usage / Inference

CrossLing-OCR-Mini can be directly used with the 🤗 **Transformers** library.  
The following example demonstrates **single-image OCR inference** for plain text recognition.

### Requirements
- Python ≥ 3.8  
- `transformers` (latest version recommended)  
- CUDA-enabled GPU (recommended for optimal performance)

```bash
pip install -U transformers accelerate
````

### Simple OCR Inference Example

```python
from transformers import AutoModel, AutoTokenizer

# Hugging Face model id
model_id = "NCUTNLP/CrossLing-OCR-Mini"
# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(
    model_id,
    trust_remote_code=True
)
model = AutoModel.from_pretrained(
    model_id,
    trust_remote_code=True,
    low_cpu_mem_usage=True,
    device_map="cuda",
    use_safetensors=True,
    pad_token_id=tokenizer.eos_token_id
)
model = model.eval().cuda()
# Input image
image_file = "test.png"
# Perform plain text OCR
result = model.chat(
    tokenizer,
    image_file,
    ocr_type="ocr"
)
print("Predicted OCR result:\n")
print(result)
```

### Notes

* `ocr_type="ocr"` enables plain text OCR mode
* The model automatically handles multilingual text recognition
* For best results, input images should be clear and upright
* Consumer-grade GPUs (e.g., RTX 3060 / 3090) are sufficient for inference

---

## 3. Performance Notes & Limitations

While CrossLing-OCR-Mini achieves strong overall performance, several limitations remain:

* OCR accuracy on **Mongolian and Uyghur** still has room for improvement
* Performance may degrade on extremely noisy, handwritten, or out-of-distribution inputs

These challenges will be addressed in future versions of the model.

---

## 4. Model Variants

| Version                       | Intended Use                | Availability        |
| ----------------------------- | --------------------------- | ------------------- |
| **CrossLing-OCR-Mini**        | Research and academic purposes only     | ✅ Open-sourced      |
| **CrossLing-OCR-Pro-Preview** | Commercial / production purposes | 🔒 Contact required |



The performance differences between the Mini and Pro-Preview versions are illustrated below.

![Mini\_Pro-Preview](https://cdn-uploads.huggingface.co/production/uploads/6956446a7ebeda1aa80be895/EcKEhwz-6VzPCmHqszIJy.png)

---


## 5. Prohibited Use & Disclaimer

This model **must not be used** for:

* Any illegal or unlawful activities
* Applications that violate applicable laws or regulations
* Surveillance or profiling that infringes on individual rights
* Discriminatory or harmful automated decision-making in sensitive contexts

**Disclaimer**:

* Any misuse of this model is **solely the responsibility of the user**
* The authors and maintainers **do not endorse** and **are not liable for** any consequences arising from improper or malicious use
* Outputs generated by this model **do not represent the views or positions of the authors**

---

## 6. Ethical Considerations & Bias

CrossLing-OCR-Mini is developed to support research on **low-resource and underrepresented languages**.
However, like all OCR systems, the model may reflect biases present in its training data, including:

* Uneven performance across languages and scripts
* Sensitivity to document quality, typography, and layout variations
* Reduced robustness on degraded, historical, or low-resolution documents

Users are encouraged to:

* Carefully evaluate outputs before downstream use
* Avoid deploying the model in high-risk or sensitive decision-making scenarios

---

## 7. License

This model is released **for research purposes only**.
Commercial use is **not permitted** without explicit authorization.

For commercial licensing or extended usage, please contact the authors.

---


## 8. Contact

For questions, collaboration, or commercial inquiries:

📧 **[zhumx@ncut.edu.cn](mailto:zhumx@ncut.edu.cn)**



```