--- license: apache-2.0 --- # CrossLing-OCR-Mini 🚀 **CrossLing-OCR-Mini** is a lightweight OCR model designed for **low-resource multilingual languages**. --- ## 1. Model Overview Despite its compact size (~580MB), the model demonstrates strong recognition performance across **11 languages**, while remaining deployable on **consumer-grade GPUs**. ### Key Features - Multilingual OCR with structure-aware text recognition - Specialized optimization for low-resource and complex scripts - Lightweight (~580MB) and efficient inference ### Supported Languages - **High-resource languages**: Chinese, English - **Low-resource languages (specially optimized)**: **Tibetan, Mongolian, Kazakh, Kyrgyz, Zhuang, etc** Experimental results indicate that CrossLing-OCR-Mini **outperforms or matches mainstream OCR systems** on multiple low-resource languages. --- ## 2. Usage / Inference CrossLing-OCR-Mini can be directly used with the 🤗 **Transformers** library. The following example demonstrates **single-image OCR inference** for plain text recognition. ### Requirements - Python ≥ 3.8 - `transformers` (latest version recommended) - CUDA-enabled GPU (recommended for optimal performance) ```bash pip install -U transformers accelerate ```` ### Simple OCR Inference Example ```python from transformers import AutoModel, AutoTokenizer # Hugging Face model id model_id = "NCUTNLP/CrossLing-OCR-Mini" # Load tokenizer and model tokenizer = AutoTokenizer.from_pretrained( model_id, trust_remote_code=True ) model = AutoModel.from_pretrained( model_id, trust_remote_code=True, low_cpu_mem_usage=True, device_map="cuda", use_safetensors=True, pad_token_id=tokenizer.eos_token_id ) model = model.eval().cuda() # Input image image_file = "test.png" # Perform plain text OCR result = model.chat( tokenizer, image_file, ocr_type="ocr" ) print("Predicted OCR result:\n") print(result) ``` ### Notes * `ocr_type="ocr"` enables plain text OCR mode * The model automatically handles multilingual text recognition * For best results, input images should be clear and upright * Consumer-grade GPUs (e.g., RTX 3060 / 3090) are sufficient for inference --- ## 3. Performance Notes & Limitations While CrossLing-OCR-Mini achieves strong overall performance, several limitations remain: * OCR accuracy on **Mongolian and Uyghur** still has room for improvement * Performance may degrade on extremely noisy, handwritten, or out-of-distribution inputs These challenges will be addressed in future versions of the model. --- ## 4. Model Variants | Version | Intended Use | Availability | | ----------------------------- | --------------------------- | ------------------- | | **CrossLing-OCR-Mini** | Research and academic purposes only | ✅ Open-sourced | | **CrossLing-OCR-Pro-Preview** | Commercial / production purposes | 🔒 Contact required | The performance differences between the Mini and Pro-Preview versions are illustrated below. ![Mini\_Pro-Preview](https://cdn-uploads.huggingface.co/production/uploads/6956446a7ebeda1aa80be895/EcKEhwz-6VzPCmHqszIJy.png) --- ## 5. Prohibited Use & Disclaimer This model **must not be used** for: * Any illegal or unlawful activities * Applications that violate applicable laws or regulations * Surveillance or profiling that infringes on individual rights * Discriminatory or harmful automated decision-making in sensitive contexts **Disclaimer**: * Any misuse of this model is **solely the responsibility of the user** * The authors and maintainers **do not endorse** and **are not liable for** any consequences arising from improper or malicious use * Outputs generated by this model **do not represent the views or positions of the authors** --- ## 6. Ethical Considerations & Bias CrossLing-OCR-Mini is developed to support research on **low-resource and underrepresented languages**. However, like all OCR systems, the model may reflect biases present in its training data, including: * Uneven performance across languages and scripts * Sensitivity to document quality, typography, and layout variations * Reduced robustness on degraded, historical, or low-resolution documents Users are encouraged to: * Carefully evaluate outputs before downstream use * Avoid deploying the model in high-risk or sensitive decision-making scenarios --- ## 7. License This model is released **for research purposes only**. Commercial use is **not permitted** without explicit authorization. For commercial licensing or extended usage, please contact the authors. --- ## 8. Contact For questions, collaboration, or commercial inquiries: 📧 **[zhumx@ncut.edu.cn](mailto:zhumx@ncut.edu.cn)** ```