CrossLing-OCR-Mini / README.md
NCUTNLP's picture
Update README.md
34e966f verified
---
license: apache-2.0
---
# CrossLing-OCR-Mini
πŸš€ **CrossLing-OCR-Mini** is a lightweight OCR model designed for **low-resource multilingual languages**.
---
## 1. Model Overview
Despite its compact size (~580MB), the model demonstrates strong recognition performance across **11 languages**, while remaining deployable on **consumer-grade GPUs**.
### Key Features
- Multilingual OCR with structure-aware text recognition
- Specialized optimization for low-resource and complex scripts
- Lightweight (~580MB) and efficient inference
### Supported Languages
- **High-resource languages**: Chinese, English
- **Low-resource languages (specially optimized)**:
**Tibetan, Mongolian, Kazakh, Kyrgyz, Zhuang, etc**
Experimental results indicate that CrossLing-OCR-Mini **outperforms or matches mainstream OCR systems** on multiple low-resource languages.
---
## 2. Usage / Inference
CrossLing-OCR-Mini can be directly used with the πŸ€— **Transformers** library.
The following example demonstrates **single-image OCR inference** for plain text recognition.
### Requirements
- Python β‰₯ 3.8
- `transformers` (latest version recommended)
- CUDA-enabled GPU (recommended for optimal performance)
```bash
pip install -U transformers accelerate
````
### Simple OCR Inference Example
```python
from transformers import AutoModel, AutoTokenizer
# Hugging Face model id
model_id = "NCUTNLP/CrossLing-OCR-Mini"
# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(
model_id,
trust_remote_code=True
)
model = AutoModel.from_pretrained(
model_id,
trust_remote_code=True,
low_cpu_mem_usage=True,
device_map="cuda",
use_safetensors=True,
pad_token_id=tokenizer.eos_token_id
)
model = model.eval().cuda()
# Input image
image_file = "test.png"
# Perform plain text OCR
result = model.chat(
tokenizer,
image_file,
ocr_type="ocr"
)
print("Predicted OCR result:\n")
print(result)
```
### Notes
* `ocr_type="ocr"` enables plain text OCR mode
* The model automatically handles multilingual text recognition
* For best results, input images should be clear and upright
* Consumer-grade GPUs (e.g., RTX 3060 / 3090) are sufficient for inference
---
## 3. Performance Notes & Limitations
While CrossLing-OCR-Mini achieves strong overall performance, several limitations remain:
* OCR accuracy on **Mongolian and Uyghur** still has room for improvement
* Performance may degrade on extremely noisy, handwritten, or out-of-distribution inputs
These challenges will be addressed in future versions of the model.
---
## 4. Model Variants
| Version | Intended Use | Availability |
| ----------------------------- | --------------------------- | ------------------- |
| **CrossLing-OCR-Mini** | Research and academic purposes only | βœ… Open-sourced |
| **CrossLing-OCR-Pro-Preview** | Commercial / production purposes | πŸ”’ Contact required |
The performance differences between the Mini and Pro-Preview versions are illustrated below.
![Mini\_Pro-Preview](https://cdn-uploads.huggingface.co/production/uploads/6956446a7ebeda1aa80be895/EcKEhwz-6VzPCmHqszIJy.png)
---
## 5. Prohibited Use & Disclaimer
This model **must not be used** for:
* Any illegal or unlawful activities
* Applications that violate applicable laws or regulations
* Surveillance or profiling that infringes on individual rights
* Discriminatory or harmful automated decision-making in sensitive contexts
**Disclaimer**:
* Any misuse of this model is **solely the responsibility of the user**
* The authors and maintainers **do not endorse** and **are not liable for** any consequences arising from improper or malicious use
* Outputs generated by this model **do not represent the views or positions of the authors**
---
## 6. Ethical Considerations & Bias
CrossLing-OCR-Mini is developed to support research on **low-resource and underrepresented languages**.
However, like all OCR systems, the model may reflect biases present in its training data, including:
* Uneven performance across languages and scripts
* Sensitivity to document quality, typography, and layout variations
* Reduced robustness on degraded, historical, or low-resolution documents
Users are encouraged to:
* Carefully evaluate outputs before downstream use
* Avoid deploying the model in high-risk or sensitive decision-making scenarios
---
## 7. License
This model is released **for research purposes only**.
Commercial use is **not permitted** without explicit authorization.
For commercial licensing or extended usage, please contact the authors.
---
## 8. Contact
For questions, collaboration, or commercial inquiries:
πŸ“§ **[zhumx@ncut.edu.cn](mailto:zhumx@ncut.edu.cn)**
```