--- pipeline_tag: image-to-text library_name: transformers license: other tags: - ocr - multilingual language: - en - zh - bo - mn - kk - ky - za --- # CrossLing-OCR-Mini 🚀 **CrossLing-OCR-Mini** is a lightweight OCR model designed for **low-resource multilingual languages and complex document layouts**. The model emphasizes accurate text recognition while preserving original document structure, making it particularly suitable for **multilingual OCR research and academic benchmarking**. --- ## 1. Model Overview CrossLing-OCR-Mini targets OCR scenarios involving **low-resource scripts, diverse writing directions, and complex layouts**. Despite its compact size (~580MB), the model demonstrates strong recognition performance across **11 languages**, while remaining deployable on **consumer-grade GPUs**. ### Key Features - Multilingual OCR with structure-aware text recognition - Specialized optimization for low-resource and complex scripts - Lightweight (~580MB) and efficient inference - Designed exclusively for research and academic benchmarking ### Supported Languages - **High-resource languages**: Chinese, English - **Low-resource languages (specially optimized)**: **Tibetan, Mongolian, Kazakh, Kyrgyz, Zhuang** Experimental results indicate that CrossLing-OCR-Mini **outperforms or matches mainstream OCR systems** on multiple low-resource languages. --- ## 2. Usage / Inference CrossLing-OCR-Mini can be directly used with the 🤗 **Transformers** library. The following example demonstrates **single-image OCR inference** for plain text recognition. ### Requirements - Python ≥ 3.8 - `transformers` (latest version recommended) - CUDA-enabled GPU (recommended for optimal performance) ```bash pip install -U transformers accelerate ```` ### Simple OCR Inference Example ```python from transformers import AutoModel, AutoTokenizer # Hugging Face model id model_id = "NCUTNLP/CrossLing-OCR-Mini" # Load tokenizer and model tokenizer = AutoTokenizer.from_pretrained( model_id, trust_remote_code=True ) model = AutoModel.from_pretrained( model_id, trust_remote_code=True, low_cpu_mem_usage=True, device_map="cuda", use_safetensors=True, pad_token_id=tokenizer.eos_token_id ) model = model.eval().cuda() # Input image image_file = "test.png" # Perform plain text OCR result = model.chat( tokenizer, image_file, ocr_type="ocr" ) print("Predicted OCR result:\n") print(result) ``` ### Notes * `ocr_type="ocr"` enables plain text OCR mode * The model automatically handles multilingual text recognition * For best results, input images should be clear and upright * Consumer-grade GPUs (e.g., RTX 3060 / 3090) are sufficient for inference --- ## 3. Performance Notes & Limitations While CrossLing-OCR-Mini achieves strong overall performance, several limitations remain: * OCR accuracy on **Mongolian and Uyghur** still has room for improvement * Performance may degrade on extremely noisy, handwritten, or out-of-distribution inputs These challenges will be addressed in future versions of the model. --- ## 4. Model Variants | Version | Intended Use | Availability | | ----------------------------- | --------------------------- | ------------------- | | **CrossLing-OCR-Mini** | Research & academic use | ✅ Open-sourced | | **CrossLing-OCR-Pro-Preview** | Commercial / production use | 🔒 Contact required | 📩 For access to **CrossLing-OCR-Pro-Preview**, please contact: **[zhumx@ncut.edu.cn](mailto:zhumx@ncut.edu.cn)** The performance differences between the Mini and Pro-Preview versions are illustrated below. ![Mini\_Pro-Preview](https://cdn-uploads.huggingface.co/production/uploads/6956446a7ebeda1aa80be895/EcKEhwz-6VzPCmHqszIJy.png) --- ## 5. Intended Use This model is **strictly intended for**: * Academic research * Scientific experimentation * OCR benchmarking and method comparison * Low-resource language OCR studies --- ## 6. Prohibited Use & Disclaimer This model **must not be used** for: * Any illegal or unlawful activities * Applications violating social ethics, public order, or applicable laws * Surveillance, discrimination, or harmful automated decision-making **Disclaimer**: * Any misuse of this model is **solely the responsibility of the user** * The authors and maintainers **do not endorse** and **are not liable for** any consequences arising from improper or malicious use * Outputs generated by this model **do not represent the views or positions of the authors** --- ## 7. Ethical Considerations & Bias CrossLing-OCR-Mini is developed to support research on **low-resource and underrepresented languages**. However, like all OCR systems, the model may reflect biases present in its training data, including: * Uneven performance across languages and scripts * Sensitivity to document quality, typography, and layout styles Users are encouraged to: * Carefully evaluate outputs before downstream use * Avoid deploying the model in high-risk or sensitive decision-making scenarios --- ## 8. License This model is released **for research purposes only**. Commercial use is **not permitted** without explicit authorization. For commercial licensing or extended usage, please contact the authors. --- ## 9. Citation If you use CrossLing-OCR-Mini in your research, please cite: ```bibtex @misc{crossling-ocr-mini, title = {CrossLing-OCR: Advancing Low-Resource Multilingual Text Recognition through Multi-Stage Vision-Language Training}, author = {CrossLing Team}, year = {2025}, note = {Research-only OCR model} } ``` --- ## 10. Contact For questions, collaboration, or commercial inquiries: 📧 **[zhumx@ncut.edu.cn](mailto:zhumx@ncut.edu.cn)** --- ## 11. Acknowledgement This project aims to advance **low-resource multilingual OCR research** and contribute to the accessibility of underrepresented languages in the global AI ecosystem. ```