File size: 4,791 Bytes
5cfe879 4301334 5cfe879 34e966f 5cfe879 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 |
---
license: apache-2.0
---
# CrossLing-OCR-Mini
🚀 **CrossLing-OCR-Mini** is a lightweight OCR model designed for **low-resource multilingual languages**.
---
## 1. Model Overview
Despite its compact size (~580MB), the model demonstrates strong recognition performance across **11 languages**, while remaining deployable on **consumer-grade GPUs**.
### Key Features
- Multilingual OCR with structure-aware text recognition
- Specialized optimization for low-resource and complex scripts
- Lightweight (~580MB) and efficient inference
### Supported Languages
- **High-resource languages**: Chinese, English
- **Low-resource languages (specially optimized)**:
**Tibetan, Mongolian, Kazakh, Kyrgyz, Zhuang, etc**
Experimental results indicate that CrossLing-OCR-Mini **outperforms or matches mainstream OCR systems** on multiple low-resource languages.
---
## 2. Usage / Inference
CrossLing-OCR-Mini can be directly used with the 🤗 **Transformers** library.
The following example demonstrates **single-image OCR inference** for plain text recognition.
### Requirements
- Python ≥ 3.8
- `transformers` (latest version recommended)
- CUDA-enabled GPU (recommended for optimal performance)
```bash
pip install -U transformers accelerate
````
### Simple OCR Inference Example
```python
from transformers import AutoModel, AutoTokenizer
# Hugging Face model id
model_id = "NCUTNLP/CrossLing-OCR-Mini"
# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(
model_id,
trust_remote_code=True
)
model = AutoModel.from_pretrained(
model_id,
trust_remote_code=True,
low_cpu_mem_usage=True,
device_map="cuda",
use_safetensors=True,
pad_token_id=tokenizer.eos_token_id
)
model = model.eval().cuda()
# Input image
image_file = "test.png"
# Perform plain text OCR
result = model.chat(
tokenizer,
image_file,
ocr_type="ocr"
)
print("Predicted OCR result:\n")
print(result)
```
### Notes
* `ocr_type="ocr"` enables plain text OCR mode
* The model automatically handles multilingual text recognition
* For best results, input images should be clear and upright
* Consumer-grade GPUs (e.g., RTX 3060 / 3090) are sufficient for inference
---
## 3. Performance Notes & Limitations
While CrossLing-OCR-Mini achieves strong overall performance, several limitations remain:
* OCR accuracy on **Mongolian and Uyghur** still has room for improvement
* Performance may degrade on extremely noisy, handwritten, or out-of-distribution inputs
These challenges will be addressed in future versions of the model.
---
## 4. Model Variants
| Version | Intended Use | Availability |
| ----------------------------- | --------------------------- | ------------------- |
| **CrossLing-OCR-Mini** | Research and academic purposes only | ✅ Open-sourced |
| **CrossLing-OCR-Pro-Preview** | Commercial / production purposes | 🔒 Contact required |
The performance differences between the Mini and Pro-Preview versions are illustrated below.

---
## 5. Prohibited Use & Disclaimer
This model **must not be used** for:
* Any illegal or unlawful activities
* Applications that violate applicable laws or regulations
* Surveillance or profiling that infringes on individual rights
* Discriminatory or harmful automated decision-making in sensitive contexts
**Disclaimer**:
* Any misuse of this model is **solely the responsibility of the user**
* The authors and maintainers **do not endorse** and **are not liable for** any consequences arising from improper or malicious use
* Outputs generated by this model **do not represent the views or positions of the authors**
---
## 6. Ethical Considerations & Bias
CrossLing-OCR-Mini is developed to support research on **low-resource and underrepresented languages**.
However, like all OCR systems, the model may reflect biases present in its training data, including:
* Uneven performance across languages and scripts
* Sensitivity to document quality, typography, and layout variations
* Reduced robustness on degraded, historical, or low-resolution documents
Users are encouraged to:
* Carefully evaluate outputs before downstream use
* Avoid deploying the model in high-risk or sensitive decision-making scenarios
---
## 7. License
This model is released **for research purposes only**.
Commercial use is **not permitted** without explicit authorization.
For commercial licensing or extended usage, please contact the authors.
---
## 8. Contact
For questions, collaboration, or commercial inquiries:
📧 **[zhumx@ncut.edu.cn](mailto:zhumx@ncut.edu.cn)**
```
|