NCUTNLP
/

CrossLing-OCR-Mini

@@ -1,190 +1,193 @@
---- ---
-# CrossLing-OCR-Mini
-🚀 **CrossLing-OCR-Mini** is a lightweight OCR model designed for **low-resource multilingual languages and complex document layouts**.
-The model emphasizes accurate text recognition while preserving original document structure, making it particularly suitable for **multilingual OCR research and academic benchmarking**.
----
-## 1. Model Overview
-CrossLing-OCR-Mini targets OCR scenarios involving **low-resource scripts, diverse writing directions, and complex layouts**.
-Despite its compact size (~580MB), the model demonstrates strong recognition performance across **11 languages**, while remaining deployable on **consumer-grade GPUs**.
-### Key Features
-- Multilingual OCR with structure-aware text recognition
-- Specialized optimization for low-resource and complex scripts
-- Lightweight (~580MB) and efficient inference
-- Designed exclusively for research and academic benchmarking
-### Supported Languages
-- **High-resource languages**: Chinese, English
-- **Low-resource languages (specially optimized)**:
-  **Tibetan, Mongolian, Kazakh, Kyrgyz, Zhuang**
-Experimental results indicate that CrossLing-OCR-Mini **outperforms or matches mainstream OCR systems** on multiple low-resource languages.
----
-## 2. Usage / Inference
-CrossLing-OCR-Mini can be directly used with the 🤗 **Transformers** library.
-The following example demonstrates **single-image OCR inference** for plain text recognition.
-### Requirements
-- Python ≥ 3.8
-- `transformers` (latest version recommended)
-- CUDA-enabled GPU (recommended for optimal performance)
-```bash
-pip install -U transformers accelerate
-````
-### Simple OCR Inference Example
-```python
-from transformers import AutoModel, AutoTokenizer
-# Hugging Face model id
-model_id = "NCUTNLP/CrossLing-OCR-Mini"
-# Load tokenizer and model
-tokenizer = AutoTokenizer.from_pretrained(
-    model_id,
-    trust_remote_code=True
-)
-model = AutoModel.from_pretrained(
-    model_id,
-    trust_remote_code=True,
-    low_cpu_mem_usage=True,
-    device_map="cuda",
-    use_safetensors=True,
-    pad_token_id=tokenizer.eos_token_id
-)
-model = model.eval().cuda()
-# Input image
-image_file = "test.png"
-# Perform plain text OCR
-result = model.chat(
-    tokenizer,
-    image_file,
-    ocr_type="ocr"
-)
-print("Predicted OCR result:\n")
-print(result)
-```
-### Notes
-* `ocr_type="ocr"` enables plain text OCR mode
-* The model automatically handles multilingual text recognition
-* For best results, input images should be clear and upright
-* Consumer-grade GPUs (e.g., RTX 3060 / 3090) are sufficient for inference
----
-## 3. Performance Notes & Limitations
-While CrossLing-OCR-Mini achieves strong overall performance, several limitations remain:
-* OCR accuracy on **Mongolian and Uyghur** still has room for improvement
-* Performance may degrade on extremely noisy, handwritten, or out-of-distribution inputs
-These challenges will be addressed in future versions of the model.
----
-## 4. Model Variants
-| Version                       | Intended Use                | Availability        |
-| ----------------------------- | --------------------------- | ------------------- |
-| **CrossLing-OCR-Mini**        | Research & academic use     | ✅ Open-sourced      |
-| **CrossLing-OCR-Pro-Preview** | Commercial / production use | 🔒 Contact required |
-📩 For access to **CrossLing-OCR-Pro-Preview**, please contact:
-**[zhumx@ncut.edu.cn](mailto:zhumx@ncut.edu.cn)**
-The performance differences between the Mini and Pro-Preview versions are illustrated below.
-![Mini\_Pro-Preview](https://cdn-uploads.huggingface.co/production/uploads/6956446a7ebeda1aa80be895/EcKEhwz-6VzPCmHqszIJy.png)
----
-## 5. Intended Use
-This model is **strictly intended for**:
-* Academic research
-* Scientific experimentation
-* OCR benchmarking and method comparison
-* Low-resource language OCR studies
----
-## 6. Prohibited Use & Disclaimer
-This model **must not be used** for:
-* Any illegal or unlawful activities
-* Applications violating social ethics, public order, or applicable laws
-* Surveillance, discrimination, or harmful automated decision-making
-**Disclaimer**:
-* Any misuse of this model is **solely the responsibility of the user**
-* The authors and maintainers **do not endorse** and **are not liable for** any consequences arising from improper or malicious use
-* Outputs generated by this model **do not represent the views or positions of the authors**
----
-## 7. Ethical Considerations & Bias
-CrossLing-OCR-Mini is developed to support research on **low-resource and underrepresented languages**.
-However, like all OCR systems, the model may reflect biases present in its training data, including:
-* Uneven performance across languages and scripts
-* Sensitivity to document quality, typography, and layout styles
-Users are encouraged to:
-* Carefully evaluate outputs before downstream use
-* Avoid deploying the model in high-risk or sensitive decision-making scenarios
----
-## 8. License
-This model is released **for research purposes only**.
-Commercial use is **not permitted** without explicit authorization.
-For commercial licensing or extended usage, please contact the authors.
----
-## 9. Citation
-If you use CrossLing-OCR-Mini in your research, please cite:
-```bibtex
-@misc{crossling-ocr-mini,
-  title     = {CrossLing-OCR: Advancing Low-Resource Multilingual Text Recognition through Multi-Stage Vision-Language Training},
-  author    = {CrossLing Team},
-  year      = {2025},
-  note      = {Research-only OCR model}
-}
-```
----
-## 10. Contact
-For questions, collaboration, or commercial inquiries:
-📧 **[zhumx@ncut.edu.cn](mailto:zhumx@ncut.edu.cn)**
----
-## 11. Acknowledgement
-This project aims to advance **low-resource multilingual OCR research** and contribute to the accessibility of underrepresented languages in the global AI ecosystem.
-```

+---
+license: apache-2.0
+---
+# CrossLing-OCR-Mini
+🚀 **CrossLing-OCR-Mini** is a lightweight OCR model designed for **low-resource multilingual languages and complex document layouts**.
+The model emphasizes accurate text recognition while preserving original document structure, making it particularly suitable for **multilingual OCR research and academic benchmarking**.
+---
+## 1. Model Overview
+CrossLing-OCR-Mini targets OCR scenarios involving **low-resource scripts, diverse writing directions, and complex layouts**.
+Despite its compact size (~580MB), the model demonstrates strong recognition performance across **11 languages**, while remaining deployable on **consumer-grade GPUs**.
+### Key Features
+- Multilingual OCR with structure-aware text recognition
+- Specialized optimization for low-resource and complex scripts
+- Lightweight (~580MB) and efficient inference
+- Designed exclusively for research and academic benchmarking
+### Supported Languages
+- **High-resource languages**: Chinese, English
+- **Low-resource languages (specially optimized)**:
+  **Tibetan, Mongolian, Kazakh, Kyrgyz, Zhuang**
+Experimental results indicate that CrossLing-OCR-Mini **outperforms or matches mainstream OCR systems** on multiple low-resource languages.
+---
+## 2. Usage / Inference
+CrossLing-OCR-Mini can be directly used with the 🤗 **Transformers** library.
+The following example demonstrates **single-image OCR inference** for plain text recognition.
+### Requirements
+- Python ≥ 3.8
+- `transformers` (latest version recommended)
+- CUDA-enabled GPU (recommended for optimal performance)
+```bash
+pip install -U transformers accelerate
+````
+### Simple OCR Inference Example
+```python
+from transformers import AutoModel, AutoTokenizer
+# Hugging Face model id
+model_id = "NCUTNLP/CrossLing-OCR-Mini"
+# Load tokenizer and model
+tokenizer = AutoTokenizer.from_pretrained(
+    model_id,
+    trust_remote_code=True
+)
+model = AutoModel.from_pretrained(
+    model_id,
+    trust_remote_code=True,
+    low_cpu_mem_usage=True,
+    device_map="cuda",
+    use_safetensors=True,
+    pad_token_id=tokenizer.eos_token_id
+)
+model = model.eval().cuda()
+# Input image
+image_file = "test.png"
+# Perform plain text OCR
+result = model.chat(
+    tokenizer,
+    image_file,
+    ocr_type="ocr"
+)
+print("Predicted OCR result:\n")
+print(result)
+```
+### Notes
+* `ocr_type="ocr"` enables plain text OCR mode
+* The model automatically handles multilingual text recognition
+* For best results, input images should be clear and upright
+* Consumer-grade GPUs (e.g., RTX 3060 / 3090) are sufficient for inference
+---
+## 3. Performance Notes & Limitations
+While CrossLing-OCR-Mini achieves strong overall performance, several limitations remain:
+* OCR accuracy on **Mongolian and Uyghur** still has room for improvement
+* Performance may degrade on extremely noisy, handwritten, or out-of-distribution inputs
+These challenges will be addressed in future versions of the model.
+---
+## 4. Model Variants
+| Version                       | Intended Use                | Availability        |
+| ----------------------------- | --------------------------- | ------------------- |
+| **CrossLing-OCR-Mini**        | Research & academic use     | ✅ Open-sourced      |
+| **CrossLing-OCR-Pro-Preview** | Commercial / production use | 🔒 Contact required |
+📩 For access to **CrossLing-OCR-Pro-Preview**, please contact:
+**[zhumx@ncut.edu.cn](mailto:zhumx@ncut.edu.cn)**
+The performance differences between the Mini and Pro-Preview versions are illustrated below.
+![Mini\_Pro-Preview](https://cdn-uploads.huggingface.co/production/uploads/6956446a7ebeda1aa80be895/EcKEhwz-6VzPCmHqszIJy.png)
+---
+## 5. Intended Use
+This model is **strictly intended for**:
+* Academic research
+* Scientific experimentation
+* OCR benchmarking and method comparison
+* Low-resource language OCR studies
+---
+## 6. Prohibited Use & Disclaimer
+This model **must not be used** for:
+* Any illegal or unlawful activities
+* Applications violating social ethics, public order, or applicable laws
+* Surveillance, discrimination, or harmful automated decision-making
+**Disclaimer**:
+* Any misuse of this model is **solely the responsibility of the user**
+* The authors and maintainers **do not endorse** and **are not liable for** any consequences arising from improper or malicious use
+* Outputs generated by this model **do not represent the views or positions of the authors**
+---
+## 7. Ethical Considerations & Bias
+CrossLing-OCR-Mini is developed to support research on **low-resource and underrepresented languages**.
+However, like all OCR systems, the model may reflect biases present in its training data, including:
+* Uneven performance across languages and scripts
+* Sensitivity to document quality, typography, and layout styles
+Users are encouraged to:
+* Carefully evaluate outputs before downstream use
+* Avoid deploying the model in high-risk or sensitive decision-making scenarios
+---
+## 8. License
+This model is released **for research purposes only**.
+Commercial use is **not permitted** without explicit authorization.
+For commercial licensing or extended usage, please contact the authors.
+---
+## 9. Citation
+If you use CrossLing-OCR-Mini in your research, please cite:
+```bibtex
+@misc{crossling-ocr-mini,
+  title     = {CrossLing-OCR: Advancing Low-Resource Multilingual Text Recognition through Multi-Stage Vision-Language Training},
+  author    = {CrossLing Team},
+  year      = {2025},
+  note      = {Research-only OCR model}
+}
+```
+---
+## 10. Contact
+For questions, collaboration, or commercial inquiries:
+📧 **[zhumx@ncut.edu.cn](mailto:zhumx@ncut.edu.cn)**
+---
+## 11. Acknowledgement
+This project aims to advance **low-resource multilingual OCR research** and contribute to the accessibility of underrepresented languages in the global AI ecosystem.
+```