|
|
--- |
|
|
license: apache-2.0 |
|
|
--- |
|
|
|
|
|
# CrossLing-OCR-Mini |
|
|
|
|
|
π **CrossLing-OCR-Mini** is a lightweight OCR model designed for **low-resource multilingual languages**. |
|
|
|
|
|
--- |
|
|
|
|
|
## 1. Model Overview |
|
|
|
|
|
Despite its compact size (~580MB), the model demonstrates strong recognition performance across **11 languages**, while remaining deployable on **consumer-grade GPUs**. |
|
|
|
|
|
### Key Features |
|
|
- Multilingual OCR with structure-aware text recognition |
|
|
- Specialized optimization for low-resource and complex scripts |
|
|
- Lightweight (~580MB) and efficient inference |
|
|
|
|
|
### Supported Languages |
|
|
- **High-resource languages**: Chinese, English |
|
|
- **Low-resource languages (specially optimized)**: |
|
|
**Tibetan, Mongolian, Kazakh, Kyrgyz, Zhuang, etc** |
|
|
|
|
|
Experimental results indicate that CrossLing-OCR-Mini **outperforms or matches mainstream OCR systems** on multiple low-resource languages. |
|
|
|
|
|
--- |
|
|
|
|
|
## 2. Usage / Inference |
|
|
|
|
|
CrossLing-OCR-Mini can be directly used with the π€ **Transformers** library. |
|
|
The following example demonstrates **single-image OCR inference** for plain text recognition. |
|
|
|
|
|
### Requirements |
|
|
- Python β₯ 3.8 |
|
|
- `transformers` (latest version recommended) |
|
|
- CUDA-enabled GPU (recommended for optimal performance) |
|
|
|
|
|
```bash |
|
|
pip install -U transformers accelerate |
|
|
```` |
|
|
|
|
|
### Simple OCR Inference Example |
|
|
|
|
|
```python |
|
|
from transformers import AutoModel, AutoTokenizer |
|
|
|
|
|
# Hugging Face model id |
|
|
model_id = "NCUTNLP/CrossLing-OCR-Mini" |
|
|
# Load tokenizer and model |
|
|
tokenizer = AutoTokenizer.from_pretrained( |
|
|
model_id, |
|
|
trust_remote_code=True |
|
|
) |
|
|
model = AutoModel.from_pretrained( |
|
|
model_id, |
|
|
trust_remote_code=True, |
|
|
low_cpu_mem_usage=True, |
|
|
device_map="cuda", |
|
|
use_safetensors=True, |
|
|
pad_token_id=tokenizer.eos_token_id |
|
|
) |
|
|
model = model.eval().cuda() |
|
|
# Input image |
|
|
image_file = "test.png" |
|
|
# Perform plain text OCR |
|
|
result = model.chat( |
|
|
tokenizer, |
|
|
image_file, |
|
|
ocr_type="ocr" |
|
|
) |
|
|
print("Predicted OCR result:\n") |
|
|
print(result) |
|
|
``` |
|
|
|
|
|
### Notes |
|
|
|
|
|
* `ocr_type="ocr"` enables plain text OCR mode |
|
|
* The model automatically handles multilingual text recognition |
|
|
* For best results, input images should be clear and upright |
|
|
* Consumer-grade GPUs (e.g., RTX 3060 / 3090) are sufficient for inference |
|
|
|
|
|
--- |
|
|
|
|
|
## 3. Performance Notes & Limitations |
|
|
|
|
|
While CrossLing-OCR-Mini achieves strong overall performance, several limitations remain: |
|
|
|
|
|
* OCR accuracy on **Mongolian and Uyghur** still has room for improvement |
|
|
* Performance may degrade on extremely noisy, handwritten, or out-of-distribution inputs |
|
|
|
|
|
These challenges will be addressed in future versions of the model. |
|
|
|
|
|
--- |
|
|
|
|
|
## 4. Model Variants |
|
|
|
|
|
| Version | Intended Use | Availability | |
|
|
| ----------------------------- | --------------------------- | ------------------- | |
|
|
| **CrossLing-OCR-Mini** | Research and academic purposes only | β
Open-sourced | |
|
|
| **CrossLing-OCR-Pro-Preview** | Commercial / production purposes | π Contact required | |
|
|
|
|
|
|
|
|
|
|
|
The performance differences between the Mini and Pro-Preview versions are illustrated below. |
|
|
|
|
|
 |
|
|
|
|
|
--- |
|
|
|
|
|
|
|
|
## 5. Prohibited Use & Disclaimer |
|
|
|
|
|
This model **must not be used** for: |
|
|
|
|
|
* Any illegal or unlawful activities |
|
|
* Applications that violate applicable laws or regulations |
|
|
* Surveillance or profiling that infringes on individual rights |
|
|
* Discriminatory or harmful automated decision-making in sensitive contexts |
|
|
|
|
|
**Disclaimer**: |
|
|
|
|
|
* Any misuse of this model is **solely the responsibility of the user** |
|
|
* The authors and maintainers **do not endorse** and **are not liable for** any consequences arising from improper or malicious use |
|
|
* Outputs generated by this model **do not represent the views or positions of the authors** |
|
|
|
|
|
--- |
|
|
|
|
|
## 6. Ethical Considerations & Bias |
|
|
|
|
|
CrossLing-OCR-Mini is developed to support research on **low-resource and underrepresented languages**. |
|
|
However, like all OCR systems, the model may reflect biases present in its training data, including: |
|
|
|
|
|
* Uneven performance across languages and scripts |
|
|
* Sensitivity to document quality, typography, and layout variations |
|
|
* Reduced robustness on degraded, historical, or low-resolution documents |
|
|
|
|
|
Users are encouraged to: |
|
|
|
|
|
* Carefully evaluate outputs before downstream use |
|
|
* Avoid deploying the model in high-risk or sensitive decision-making scenarios |
|
|
|
|
|
--- |
|
|
|
|
|
## 7. License |
|
|
|
|
|
This model is released **for research purposes only**. |
|
|
Commercial use is **not permitted** without explicit authorization. |
|
|
|
|
|
For commercial licensing or extended usage, please contact the authors. |
|
|
|
|
|
--- |
|
|
|
|
|
|
|
|
## 8. Contact |
|
|
|
|
|
For questions, collaboration, or commercial inquiries: |
|
|
|
|
|
π§ **[zhumx@ncut.edu.cn](mailto:zhumx@ncut.edu.cn)** |
|
|
|
|
|
|
|
|
|
|
|
``` |
|
|
|