NCUTNLP
/

CrossLing-OCR-Mini

Safetensors

GOT

custom_code

Model card Files Files and versions

xet

Community

Delete README.md

by shajiu - opened Jan 1

base: refs/heads/main

←

from: refs/pr/9

Discussion Files changed

-190

Files changed (1) hide show

README.md +0 -190

README.md DELETED Viewed

@@ -1,190 +0,0 @@
---- ---
-# CrossLing-OCR-Mini
-🚀 **CrossLing-OCR-Mini** is a lightweight OCR model designed for **low-resource multilingual languages and complex document layouts**.
-The model emphasizes accurate text recognition while preserving original document structure, making it particularly suitable for **multilingual OCR research and academic benchmarking**.
----
-## 1. Model Overview
-CrossLing-OCR-Mini targets OCR scenarios involving **low-resource scripts, diverse writing directions, and complex layouts**.
-Despite its compact size (~580MB), the model demonstrates strong recognition performance across **11 languages**, while remaining deployable on **consumer-grade GPUs**.
-### Key Features
-- Multilingual OCR with structure-aware text recognition
-- Specialized optimization for low-resource and complex scripts
-- Lightweight (~580MB) and efficient inference
-- Designed exclusively for research and academic benchmarking
-### Supported Languages
-- **High-resource languages**: Chinese, English
-- **Low-resource languages (specially optimized)**:
-  **Tibetan, Mongolian, Kazakh, Kyrgyz, Zhuang**
-Experimental results indicate that CrossLing-OCR-Mini **outperforms or matches mainstream OCR systems** on multiple low-resource languages.
----
-## 2. Usage / Inference
-CrossLing-OCR-Mini can be directly used with the 🤗 **Transformers** library.
-The following example demonstrates **single-image OCR inference** for plain text recognition.
-### Requirements
-- Python ≥ 3.8
-- `transformers` (latest version recommended)
-- CUDA-enabled GPU (recommended for optimal performance)
-```bash
-pip install -U transformers accelerate
-````
-### Simple OCR Inference Example
-```python
-from transformers import AutoModel, AutoTokenizer
-# Hugging Face model id
-model_id = "NCUTNLP/CrossLing-OCR-Mini"
-# Load tokenizer and model
-tokenizer = AutoTokenizer.from_pretrained(
-    model_id,
-    trust_remote_code=True
-)
-model = AutoModel.from_pretrained(
-    model_id,
-    trust_remote_code=True,
-    low_cpu_mem_usage=True,
-    device_map="cuda",
-    use_safetensors=True,
-    pad_token_id=tokenizer.eos_token_id
-)
-model = model.eval().cuda()
-# Input image
-image_file = "test.png"
-# Perform plain text OCR
-result = model.chat(
-    tokenizer,
-    image_file,
-    ocr_type="ocr"
-)
-print("Predicted OCR result:\n")
-print(result)
-```
-### Notes
-* `ocr_type="ocr"` enables plain text OCR mode
-* The model automatically handles multilingual text recognition
-* For best results, input images should be clear and upright
-* Consumer-grade GPUs (e.g., RTX 3060 / 3090) are sufficient for inference
----
-## 3. Performance Notes & Limitations
-While CrossLing-OCR-Mini achieves strong overall performance, several limitations remain:
-* OCR accuracy on **Mongolian and Uyghur** still has room for improvement
-* Performance may degrade on extremely noisy, handwritten, or out-of-distribution inputs
-These challenges will be addressed in future versions of the model.
----
-## 4. Model Variants
-| Version                       | Intended Use                | Availability        |
-| ----------------------------- | --------------------------- | ------------------- |
-| **CrossLing-OCR-Mini**        | Research & academic use     | ✅ Open-sourced      |
-| **CrossLing-OCR-Pro-Preview** | Commercial / production use | 🔒 Contact required |
-📩 For access to **CrossLing-OCR-Pro-Preview**, please contact:
-**[zhumx@ncut.edu.cn](mailto:zhumx@ncut.edu.cn)**
-The performance differences between the Mini and Pro-Preview versions are illustrated below.
-![Mini\_Pro-Preview](https://cdn-uploads.huggingface.co/production/uploads/6956446a7ebeda1aa80be895/EcKEhwz-6VzPCmHqszIJy.png)
----
-## 5. Intended Use
-This model is **strictly intended for**:
-* Academic research
-* Scientific experimentation
-* OCR benchmarking and method comparison
-* Low-resource language OCR studies
----
-## 6. Prohibited Use & Disclaimer
-This model **must not be used** for:
-* Any illegal or unlawful activities
-* Applications violating social ethics, public order, or applicable laws
-* Surveillance, discrimination, or harmful automated decision-making
-**Disclaimer**:
-* Any misuse of this model is **solely the responsibility of the user**
-* The authors and maintainers **do not endorse** and **are not liable for** any consequences arising from improper or malicious use
-* Outputs generated by this model **do not represent the views or positions of the authors**
----
-## 7. Ethical Considerations & Bias
-CrossLing-OCR-Mini is developed to support research on **low-resource and underrepresented languages**.
-However, like all OCR systems, the model may reflect biases present in its training data, including:
-* Uneven performance across languages and scripts
-* Sensitivity to document quality, typography, and layout styles
-Users are encouraged to:
-* Carefully evaluate outputs before downstream use
-* Avoid deploying the model in high-risk or sensitive decision-making scenarios
----
-## 8. License
-This model is released **for research purposes only**.
-Commercial use is **not permitted** without explicit authorization.
-For commercial licensing or extended usage, please contact the authors.
----
-## 9. Citation
-If you use CrossLing-OCR-Mini in your research, please cite:
-```bibtex
-@misc{crossling-ocr-mini,
-  title     = {CrossLing-OCR: Advancing Low-Resource Multilingual Text Recognition through Multi-Stage Vision-Language Training},
-  author    = {CrossLing Team},
-  year      = {2025},
-  note      = {Research-only OCR model}
-}
-```
----
-## 10. Contact
-For questions, collaboration, or commercial inquiries:
-📧 **[zhumx@ncut.edu.cn](mailto:zhumx@ncut.edu.cn)**
----
-## 11. Acknowledgement
-This project aims to advance **low-resource multilingual OCR research** and contribute to the accessibility of underrepresented languages in the global AI ecosystem.
-```