|
|
---
|
|
|
license: apache-2.0
|
|
|
---
|
|
|
|
|
|
# CrossLing-OCR-Mini
|
|
|
|
|
|
π **CrossLing-OCR-Mini** is a lightweight OCR model designed for **low-resource multilingual languages and complex document layouts**.
|
|
|
The model emphasizes accurate text recognition while preserving original document structure, making it particularly suitable for **multilingual OCR research and academic benchmarking**.
|
|
|
|
|
|
---
|
|
|
|
|
|
## 1. Model Overview
|
|
|
|
|
|
CrossLing-OCR-Mini targets OCR scenarios involving **low-resource scripts, diverse writing directions, and complex layouts**.
|
|
|
Despite its compact size (~580MB), the model demonstrates strong recognition performance across **11 languages**, while remaining deployable on **consumer-grade GPUs**.
|
|
|
|
|
|
### Key Features
|
|
|
- Multilingual OCR with structure-aware text recognition
|
|
|
- Specialized optimization for low-resource and complex scripts
|
|
|
- Lightweight (~580MB) and efficient inference
|
|
|
- Designed exclusively for research and academic benchmarking
|
|
|
|
|
|
### Supported Languages
|
|
|
- **High-resource languages**: Chinese, English
|
|
|
- **Low-resource languages (specially optimized)**:
|
|
|
**Tibetan, Mongolian, Kazakh, Kyrgyz, Zhuang**
|
|
|
|
|
|
Experimental results indicate that CrossLing-OCR-Mini **outperforms or matches mainstream OCR systems** on multiple low-resource languages.
|
|
|
|
|
|
---
|
|
|
|
|
|
## 2. Usage / Inference
|
|
|
|
|
|
CrossLing-OCR-Mini can be directly used with the π€ **Transformers** library.
|
|
|
The following example demonstrates **single-image OCR inference** for plain text recognition.
|
|
|
|
|
|
### Requirements
|
|
|
- Python β₯ 3.8
|
|
|
- `transformers` (latest version recommended)
|
|
|
- CUDA-enabled GPU (recommended for optimal performance)
|
|
|
|
|
|
```bash
|
|
|
pip install -U transformers accelerate
|
|
|
````
|
|
|
|
|
|
### Simple OCR Inference Example
|
|
|
|
|
|
```python
|
|
|
from transformers import AutoModel, AutoTokenizer
|
|
|
|
|
|
# Hugging Face model id
|
|
|
model_id = "NCUTNLP/CrossLing-OCR-Mini"
|
|
|
# Load tokenizer and model
|
|
|
tokenizer = AutoTokenizer.from_pretrained(
|
|
|
model_id,
|
|
|
trust_remote_code=True
|
|
|
)
|
|
|
model = AutoModel.from_pretrained(
|
|
|
model_id,
|
|
|
trust_remote_code=True,
|
|
|
low_cpu_mem_usage=True,
|
|
|
device_map="cuda",
|
|
|
use_safetensors=True,
|
|
|
pad_token_id=tokenizer.eos_token_id
|
|
|
)
|
|
|
model = model.eval().cuda()
|
|
|
# Input image
|
|
|
image_file = "test.png"
|
|
|
# Perform plain text OCR
|
|
|
result = model.chat(
|
|
|
tokenizer,
|
|
|
image_file,
|
|
|
ocr_type="ocr"
|
|
|
)
|
|
|
print("Predicted OCR result:\n")
|
|
|
print(result)
|
|
|
```
|
|
|
|
|
|
### Notes
|
|
|
|
|
|
* `ocr_type="ocr"` enables plain text OCR mode
|
|
|
* The model automatically handles multilingual text recognition
|
|
|
* For best results, input images should be clear and upright
|
|
|
* Consumer-grade GPUs (e.g., RTX 3060 / 3090) are sufficient for inference
|
|
|
|
|
|
---
|
|
|
|
|
|
## 3. Performance Notes & Limitations
|
|
|
|
|
|
While CrossLing-OCR-Mini achieves strong overall performance, several limitations remain:
|
|
|
|
|
|
* OCR accuracy on **Mongolian and Uyghur** still has room for improvement
|
|
|
* Performance may degrade on extremely noisy, handwritten, or out-of-distribution inputs
|
|
|
|
|
|
These challenges will be addressed in future versions of the model.
|
|
|
|
|
|
---
|
|
|
|
|
|
## 4. Model Variants
|
|
|
|
|
|
| Version | Intended Use | Availability |
|
|
|
| ----------------------------- | --------------------------- | ------------------- |
|
|
|
| **CrossLing-OCR-Mini** | Research & academic use | β
Open-sourced |
|
|
|
| **CrossLing-OCR-Pro-Preview** | Commercial / production use | π Contact required |
|
|
|
|
|
|
π© For access to **CrossLing-OCR-Pro-Preview**, please contact:
|
|
|
**[zhumx@ncut.edu.cn](mailto:zhumx@ncut.edu.cn)**
|
|
|
|
|
|
The performance differences between the Mini and Pro-Preview versions are illustrated below.
|
|
|
|
|
|

|
|
|
|
|
|
---
|
|
|
|
|
|
## 5. Intended Use
|
|
|
|
|
|
This model is **strictly intended for**:
|
|
|
|
|
|
* Academic research
|
|
|
* Scientific experimentation
|
|
|
* OCR benchmarking and method comparison
|
|
|
* Low-resource language OCR studies
|
|
|
|
|
|
---
|
|
|
|
|
|
## 6. Prohibited Use & Disclaimer
|
|
|
|
|
|
This model **must not be used** for:
|
|
|
|
|
|
* Any illegal or unlawful activities
|
|
|
* Applications violating social ethics, public order, or applicable laws
|
|
|
* Surveillance, discrimination, or harmful automated decision-making
|
|
|
|
|
|
**Disclaimer**:
|
|
|
|
|
|
* Any misuse of this model is **solely the responsibility of the user**
|
|
|
* The authors and maintainers **do not endorse** and **are not liable for** any consequences arising from improper or malicious use
|
|
|
* Outputs generated by this model **do not represent the views or positions of the authors**
|
|
|
|
|
|
---
|
|
|
|
|
|
## 7. Ethical Considerations & Bias
|
|
|
|
|
|
CrossLing-OCR-Mini is developed to support research on **low-resource and underrepresented languages**.
|
|
|
However, like all OCR systems, the model may reflect biases present in its training data, including:
|
|
|
|
|
|
* Uneven performance across languages and scripts
|
|
|
* Sensitivity to document quality, typography, and layout styles
|
|
|
|
|
|
Users are encouraged to:
|
|
|
|
|
|
* Carefully evaluate outputs before downstream use
|
|
|
* Avoid deploying the model in high-risk or sensitive decision-making scenarios
|
|
|
|
|
|
---
|
|
|
|
|
|
## 8. License
|
|
|
|
|
|
This model is released **for research purposes only**.
|
|
|
Commercial use is **not permitted** without explicit authorization.
|
|
|
|
|
|
For commercial licensing or extended usage, please contact the authors.
|
|
|
|
|
|
---
|
|
|
|
|
|
## 9. Citation
|
|
|
|
|
|
If you use CrossLing-OCR-Mini in your research, please cite:
|
|
|
|
|
|
```bibtex
|
|
|
@misc{crossling-ocr-mini,
|
|
|
title = {CrossLing-OCR: Advancing Low-Resource Multilingual Text Recognition through Multi-Stage Vision-Language Training},
|
|
|
author = {CrossLing Team},
|
|
|
year = {2025},
|
|
|
note = {Research-only OCR model}
|
|
|
}
|
|
|
```
|
|
|
|
|
|
---
|
|
|
|
|
|
## 10. Contact
|
|
|
|
|
|
For questions, collaboration, or commercial inquiries:
|
|
|
|
|
|
π§ **[zhumx@ncut.edu.cn](mailto:zhumx@ncut.edu.cn)**
|
|
|
|
|
|
---
|
|
|
|
|
|
## 11. Acknowledgement
|
|
|
|
|
|
This project aims to advance **low-resource multilingual OCR research** and contribute to the accessibility of underrepresented languages in the global AI ecosystem.
|
|
|
|
|
|
```
|
|
|
|