Files changed (1) hide show
  1. README.md +0 -190
README.md DELETED
@@ -1,190 +0,0 @@
1
- --- ---
2
- # CrossLing-OCR-Mini
3
-
4
- 🚀 **CrossLing-OCR-Mini** is a lightweight OCR model designed for **low-resource multilingual languages and complex document layouts**.
5
- The model emphasizes accurate text recognition while preserving original document structure, making it particularly suitable for **multilingual OCR research and academic benchmarking**.
6
-
7
- ---
8
-
9
- ## 1. Model Overview
10
-
11
- CrossLing-OCR-Mini targets OCR scenarios involving **low-resource scripts, diverse writing directions, and complex layouts**.
12
- Despite its compact size (~580MB), the model demonstrates strong recognition performance across **11 languages**, while remaining deployable on **consumer-grade GPUs**.
13
-
14
- ### Key Features
15
- - Multilingual OCR with structure-aware text recognition
16
- - Specialized optimization for low-resource and complex scripts
17
- - Lightweight (~580MB) and efficient inference
18
- - Designed exclusively for research and academic benchmarking
19
-
20
- ### Supported Languages
21
- - **High-resource languages**: Chinese, English
22
- - **Low-resource languages (specially optimized)**:
23
- **Tibetan, Mongolian, Kazakh, Kyrgyz, Zhuang**
24
-
25
- Experimental results indicate that CrossLing-OCR-Mini **outperforms or matches mainstream OCR systems** on multiple low-resource languages.
26
-
27
- ---
28
-
29
- ## 2. Usage / Inference
30
-
31
- CrossLing-OCR-Mini can be directly used with the 🤗 **Transformers** library.
32
- The following example demonstrates **single-image OCR inference** for plain text recognition.
33
-
34
- ### Requirements
35
- - Python ≥ 3.8
36
- - `transformers` (latest version recommended)
37
- - CUDA-enabled GPU (recommended for optimal performance)
38
-
39
- ```bash
40
- pip install -U transformers accelerate
41
- ````
42
-
43
- ### Simple OCR Inference Example
44
-
45
- ```python
46
- from transformers import AutoModel, AutoTokenizer
47
-
48
- # Hugging Face model id
49
- model_id = "NCUTNLP/CrossLing-OCR-Mini"
50
- # Load tokenizer and model
51
- tokenizer = AutoTokenizer.from_pretrained(
52
- model_id,
53
- trust_remote_code=True
54
- )
55
- model = AutoModel.from_pretrained(
56
- model_id,
57
- trust_remote_code=True,
58
- low_cpu_mem_usage=True,
59
- device_map="cuda",
60
- use_safetensors=True,
61
- pad_token_id=tokenizer.eos_token_id
62
- )
63
- model = model.eval().cuda()
64
- # Input image
65
- image_file = "test.png"
66
- # Perform plain text OCR
67
- result = model.chat(
68
- tokenizer,
69
- image_file,
70
- ocr_type="ocr"
71
- )
72
- print("Predicted OCR result:\n")
73
- print(result)
74
- ```
75
-
76
- ### Notes
77
-
78
- * `ocr_type="ocr"` enables plain text OCR mode
79
- * The model automatically handles multilingual text recognition
80
- * For best results, input images should be clear and upright
81
- * Consumer-grade GPUs (e.g., RTX 3060 / 3090) are sufficient for inference
82
-
83
- ---
84
-
85
- ## 3. Performance Notes & Limitations
86
-
87
- While CrossLing-OCR-Mini achieves strong overall performance, several limitations remain:
88
-
89
- * OCR accuracy on **Mongolian and Uyghur** still has room for improvement
90
- * Performance may degrade on extremely noisy, handwritten, or out-of-distribution inputs
91
-
92
- These challenges will be addressed in future versions of the model.
93
-
94
- ---
95
-
96
- ## 4. Model Variants
97
-
98
- | Version | Intended Use | Availability |
99
- | ----------------------------- | --------------------------- | ------------------- |
100
- | **CrossLing-OCR-Mini** | Research & academic use | ✅ Open-sourced |
101
- | **CrossLing-OCR-Pro-Preview** | Commercial / production use | 🔒 Contact required |
102
-
103
- 📩 For access to **CrossLing-OCR-Pro-Preview**, please contact:
104
- **[zhumx@ncut.edu.cn](mailto:zhumx@ncut.edu.cn)**
105
-
106
- The performance differences between the Mini and Pro-Preview versions are illustrated below.
107
-
108
- ![Mini\_Pro-Preview](https://cdn-uploads.huggingface.co/production/uploads/6956446a7ebeda1aa80be895/EcKEhwz-6VzPCmHqszIJy.png)
109
-
110
- ---
111
-
112
- ## 5. Intended Use
113
-
114
- This model is **strictly intended for**:
115
-
116
- * Academic research
117
- * Scientific experimentation
118
- * OCR benchmarking and method comparison
119
- * Low-resource language OCR studies
120
-
121
- ---
122
-
123
- ## 6. Prohibited Use & Disclaimer
124
-
125
- This model **must not be used** for:
126
-
127
- * Any illegal or unlawful activities
128
- * Applications violating social ethics, public order, or applicable laws
129
- * Surveillance, discrimination, or harmful automated decision-making
130
-
131
- **Disclaimer**:
132
-
133
- * Any misuse of this model is **solely the responsibility of the user**
134
- * The authors and maintainers **do not endorse** and **are not liable for** any consequences arising from improper or malicious use
135
- * Outputs generated by this model **do not represent the views or positions of the authors**
136
-
137
- ---
138
-
139
- ## 7. Ethical Considerations & Bias
140
-
141
- CrossLing-OCR-Mini is developed to support research on **low-resource and underrepresented languages**.
142
- However, like all OCR systems, the model may reflect biases present in its training data, including:
143
-
144
- * Uneven performance across languages and scripts
145
- * Sensitivity to document quality, typography, and layout styles
146
-
147
- Users are encouraged to:
148
-
149
- * Carefully evaluate outputs before downstream use
150
- * Avoid deploying the model in high-risk or sensitive decision-making scenarios
151
-
152
- ---
153
-
154
- ## 8. License
155
-
156
- This model is released **for research purposes only**.
157
- Commercial use is **not permitted** without explicit authorization.
158
-
159
- For commercial licensing or extended usage, please contact the authors.
160
-
161
- ---
162
-
163
- ## 9. Citation
164
-
165
- If you use CrossLing-OCR-Mini in your research, please cite:
166
-
167
- ```bibtex
168
- @misc{crossling-ocr-mini,
169
- title = {CrossLing-OCR: Advancing Low-Resource Multilingual Text Recognition through Multi-Stage Vision-Language Training},
170
- author = {CrossLing Team},
171
- year = {2025},
172
- note = {Research-only OCR model}
173
- }
174
- ```
175
-
176
- ---
177
-
178
- ## 10. Contact
179
-
180
- For questions, collaboration, or commercial inquiries:
181
-
182
- 📧 **[zhumx@ncut.edu.cn](mailto:zhumx@ncut.edu.cn)**
183
-
184
- ---
185
-
186
- ## 11. Acknowledgement
187
-
188
- This project aims to advance **low-resource multilingual OCR research** and contribute to the accessibility of underrepresented languages in the global AI ecosystem.
189
-
190
- ```