NCUTNLP commited on
Commit
5cfe879
·
verified ·
1 Parent(s): 8f0d9e8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +163 -193
README.md CHANGED
@@ -1,193 +1,163 @@
1
- ---
2
- license: apache-2.0
3
- ---
4
-
5
- # CrossLing-OCR-Mini
6
-
7
- 🚀 **CrossLing-OCR-Mini** is a lightweight OCR model designed for **low-resource multilingual languages and complex document layouts**.
8
- The model emphasizes accurate text recognition while preserving original document structure, making it particularly suitable for **multilingual OCR research and academic benchmarking**.
9
-
10
- ---
11
-
12
- ## 1. Model Overview
13
-
14
- CrossLing-OCR-Mini targets OCR scenarios involving **low-resource scripts, diverse writing directions, and complex layouts**.
15
- Despite its compact size (~580MB), the model demonstrates strong recognition performance across **11 languages**, while remaining deployable on **consumer-grade GPUs**.
16
-
17
- ### Key Features
18
- - Multilingual OCR with structure-aware text recognition
19
- - Specialized optimization for low-resource and complex scripts
20
- - Lightweight (~580MB) and efficient inference
21
- - Designed exclusively for research and academic benchmarking
22
-
23
- ### Supported Languages
24
- - **High-resource languages**: Chinese, English
25
- - **Low-resource languages (specially optimized)**:
26
- **Tibetan, Mongolian, Kazakh, Kyrgyz, Zhuang**
27
-
28
- Experimental results indicate that CrossLing-OCR-Mini **outperforms or matches mainstream OCR systems** on multiple low-resource languages.
29
-
30
- ---
31
-
32
- ## 2. Usage / Inference
33
-
34
- CrossLing-OCR-Mini can be directly used with the 🤗 **Transformers** library.
35
- The following example demonstrates **single-image OCR inference** for plain text recognition.
36
-
37
- ### Requirements
38
- - Python ≥ 3.8
39
- - `transformers` (latest version recommended)
40
- - CUDA-enabled GPU (recommended for optimal performance)
41
-
42
- ```bash
43
- pip install -U transformers accelerate
44
- ````
45
-
46
- ### Simple OCR Inference Example
47
-
48
- ```python
49
- from transformers import AutoModel, AutoTokenizer
50
-
51
- # Hugging Face model id
52
- model_id = "NCUTNLP/CrossLing-OCR-Mini"
53
- # Load tokenizer and model
54
- tokenizer = AutoTokenizer.from_pretrained(
55
- model_id,
56
- trust_remote_code=True
57
- )
58
- model = AutoModel.from_pretrained(
59
- model_id,
60
- trust_remote_code=True,
61
- low_cpu_mem_usage=True,
62
- device_map="cuda",
63
- use_safetensors=True,
64
- pad_token_id=tokenizer.eos_token_id
65
- )
66
- model = model.eval().cuda()
67
- # Input image
68
- image_file = "test.png"
69
- # Perform plain text OCR
70
- result = model.chat(
71
- tokenizer,
72
- image_file,
73
- ocr_type="ocr"
74
- )
75
- print("Predicted OCR result:\n")
76
- print(result)
77
- ```
78
-
79
- ### Notes
80
-
81
- * `ocr_type="ocr"` enables plain text OCR mode
82
- * The model automatically handles multilingual text recognition
83
- * For best results, input images should be clear and upright
84
- * Consumer-grade GPUs (e.g., RTX 3060 / 3090) are sufficient for inference
85
-
86
- ---
87
-
88
- ## 3. Performance Notes & Limitations
89
-
90
- While CrossLing-OCR-Mini achieves strong overall performance, several limitations remain:
91
-
92
- * OCR accuracy on **Mongolian and Uyghur** still has room for improvement
93
- * Performance may degrade on extremely noisy, handwritten, or out-of-distribution inputs
94
-
95
- These challenges will be addressed in future versions of the model.
96
-
97
- ---
98
-
99
- ## 4. Model Variants
100
-
101
- | Version | Intended Use | Availability |
102
- | ----------------------------- | --------------------------- | ------------------- |
103
- | **CrossLing-OCR-Mini** | Research & academic use | ✅ Open-sourced |
104
- | **CrossLing-OCR-Pro-Preview** | Commercial / production use | 🔒 Contact required |
105
-
106
- 📩 For access to **CrossLing-OCR-Pro-Preview**, please contact:
107
- **[zhumx@ncut.edu.cn](mailto:zhumx@ncut.edu.cn)**
108
-
109
- The performance differences between the Mini and Pro-Preview versions are illustrated below.
110
-
111
- ![Mini\_Pro-Preview](https://cdn-uploads.huggingface.co/production/uploads/6956446a7ebeda1aa80be895/EcKEhwz-6VzPCmHqszIJy.png)
112
-
113
- ---
114
-
115
- ## 5. Intended Use
116
-
117
- This model is **strictly intended for**:
118
-
119
- * Academic research
120
- * Scientific experimentation
121
- * OCR benchmarking and method comparison
122
- * Low-resource language OCR studies
123
-
124
- ---
125
-
126
- ## 6. Prohibited Use & Disclaimer
127
-
128
- This model **must not be used** for:
129
-
130
- * Any illegal or unlawful activities
131
- * Applications violating social ethics, public order, or applicable laws
132
- * Surveillance, discrimination, or harmful automated decision-making
133
-
134
- **Disclaimer**:
135
-
136
- * Any misuse of this model is **solely the responsibility of the user**
137
- * The authors and maintainers **do not endorse** and **are not liable for** any consequences arising from improper or malicious use
138
- * Outputs generated by this model **do not represent the views or positions of the authors**
139
-
140
- ---
141
-
142
- ## 7. Ethical Considerations & Bias
143
-
144
- CrossLing-OCR-Mini is developed to support research on **low-resource and underrepresented languages**.
145
- However, like all OCR systems, the model may reflect biases present in its training data, including:
146
-
147
- * Uneven performance across languages and scripts
148
- * Sensitivity to document quality, typography, and layout styles
149
-
150
- Users are encouraged to:
151
-
152
- * Carefully evaluate outputs before downstream use
153
- * Avoid deploying the model in high-risk or sensitive decision-making scenarios
154
-
155
- ---
156
-
157
- ## 8. License
158
-
159
- This model is released **for research purposes only**.
160
- Commercial use is **not permitted** without explicit authorization.
161
-
162
- For commercial licensing or extended usage, please contact the authors.
163
-
164
- ---
165
-
166
- ## 9. Citation
167
-
168
- If you use CrossLing-OCR-Mini in your research, please cite:
169
-
170
- ```bibtex
171
- @misc{crossling-ocr-mini,
172
- title = {CrossLing-OCR: Advancing Low-Resource Multilingual Text Recognition through Multi-Stage Vision-Language Training},
173
- author = {CrossLing Team},
174
- year = {2025},
175
- note = {Research-only OCR model}
176
- }
177
- ```
178
-
179
- ---
180
-
181
- ## 10. Contact
182
-
183
- For questions, collaboration, or commercial inquiries:
184
-
185
- 📧 **[zhumx@ncut.edu.cn](mailto:zhumx@ncut.edu.cn)**
186
-
187
- ---
188
-
189
- ## 11. Acknowledgement
190
-
191
- This project aims to advance **low-resource multilingual OCR research** and contribute to the accessibility of underrepresented languages in the global AI ecosystem.
192
-
193
- ```
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+
5
+ # CrossLing-OCR-Mini
6
+
7
+ 🚀 **CrossLing-OCR-Mini** is a lightweight OCR model designed for **low-resource multilingual languages**, making it particularly suitable for **multilingual OCR research and academic benchmarking**.
8
+
9
+ ---
10
+
11
+ ## 1. Model Overview
12
+
13
+ Despite its compact size (~580MB), the model demonstrates strong recognition performance across **11 languages**, while remaining deployable on **consumer-grade GPUs**.
14
+
15
+ ### Key Features
16
+ - Multilingual OCR with structure-aware text recognition
17
+ - Specialized optimization for low-resource and complex scripts
18
+ - Lightweight (~580MB) and efficient inference
19
+
20
+ ### Supported Languages
21
+ - **High-resource languages**: Chinese, English
22
+ - **Low-resource languages (specially optimized)**:
23
+ **Tibetan, Mongolian, Kazakh, Kyrgyz, Zhuang**
24
+
25
+ Experimental results indicate that CrossLing-OCR-Mini **outperforms or matches mainstream OCR systems** on multiple low-resource languages.
26
+
27
+ ---
28
+
29
+ ## 2. Usage / Inference
30
+
31
+ CrossLing-OCR-Mini can be directly used with the 🤗 **Transformers** library.
32
+ The following example demonstrates **single-image OCR inference** for plain text recognition.
33
+
34
+ ### Requirements
35
+ - Python 3.8
36
+ - `transformers` (latest version recommended)
37
+ - CUDA-enabled GPU (recommended for optimal performance)
38
+
39
+ ```bash
40
+ pip install -U transformers accelerate
41
+ ````
42
+
43
+ ### Simple OCR Inference Example
44
+
45
+ ```python
46
+ from transformers import AutoModel, AutoTokenizer
47
+
48
+ # Hugging Face model id
49
+ model_id = "NCUTNLP/CrossLing-OCR-Mini"
50
+ # Load tokenizer and model
51
+ tokenizer = AutoTokenizer.from_pretrained(
52
+ model_id,
53
+ trust_remote_code=True
54
+ )
55
+ model = AutoModel.from_pretrained(
56
+ model_id,
57
+ trust_remote_code=True,
58
+ low_cpu_mem_usage=True,
59
+ device_map="cuda",
60
+ use_safetensors=True,
61
+ pad_token_id=tokenizer.eos_token_id
62
+ )
63
+ model = model.eval().cuda()
64
+ # Input image
65
+ image_file = "test.png"
66
+ # Perform plain text OCR
67
+ result = model.chat(
68
+ tokenizer,
69
+ image_file,
70
+ ocr_type="ocr"
71
+ )
72
+ print("Predicted OCR result:\n")
73
+ print(result)
74
+ ```
75
+
76
+ ### Notes
77
+
78
+ * `ocr_type="ocr"` enables plain text OCR mode
79
+ * The model automatically handles multilingual text recognition
80
+ * For best results, input images should be clear and upright
81
+ * Consumer-grade GPUs (e.g., RTX 3060 / 3090) are sufficient for inference
82
+
83
+ ---
84
+
85
+ ## 3. Performance Notes & Limitations
86
+
87
+ While CrossLing-OCR-Mini achieves strong overall performance, several limitations remain:
88
+
89
+ * OCR accuracy on **Mongolian and Uyghur** still has room for improvement
90
+ * Performance may degrade on extremely noisy, handwritten, or out-of-distribution inputs
91
+
92
+ These challenges will be addressed in future versions of the model.
93
+
94
+ ---
95
+
96
+ ## 4. Model Variants
97
+
98
+ | Version | Intended Use | Availability |
99
+ | ----------------------------- | --------------------------- | ------------------- |
100
+ | **CrossLing-OCR-Mini** | Research and academic purposes only | ✅ Open-sourced |
101
+ | **CrossLing-OCR-Pro-Preview** | Commercial / production purposes | 🔒 Contact required |
102
+
103
+
104
+
105
+ The performance differences between the Mini and Pro-Preview versions are illustrated below.
106
+
107
+ ![Mini\_Pro-Preview](https://cdn-uploads.huggingface.co/production/uploads/6956446a7ebeda1aa80be895/EcKEhwz-6VzPCmHqszIJy.png)
108
+
109
+ ---
110
+
111
+
112
+ ## 5. Prohibited Use & Disclaimer
113
+
114
+ This model **must not be used** for:
115
+
116
+ * Any illegal or unlawful activities
117
+ * Applications that violate applicable laws or regulations
118
+ * Surveillance or profiling that infringes on individual rights
119
+ * Discriminatory or harmful automated decision-making in sensitive contexts
120
+
121
+ **Disclaimer**:
122
+
123
+ * Any misuse of this model is **solely the responsibility of the user**
124
+ * The authors and maintainers **do not endorse** and **are not liable for** any consequences arising from improper or malicious use
125
+ * Outputs generated by this model **do not represent the views or positions of the authors**
126
+
127
+ ---
128
+
129
+ ## 6. Ethical Considerations & Bias
130
+
131
+ CrossLing-OCR-Mini is developed to support research on **low-resource and underrepresented languages**.
132
+ However, like all OCR systems, the model may reflect biases present in its training data, including:
133
+
134
+ * Uneven performance across languages and scripts
135
+ * Sensitivity to document quality, typography, and layout variations
136
+ * Reduced robustness on degraded, historical, or low-resolution documents
137
+
138
+ Users are encouraged to:
139
+
140
+ * Carefully evaluate outputs before downstream use
141
+ * Avoid deploying the model in high-risk or sensitive decision-making scenarios
142
+
143
+ ---
144
+
145
+ ## 7. License
146
+
147
+ This model is released **for research purposes only**.
148
+ Commercial use is **not permitted** without explicit authorization.
149
+
150
+ For commercial licensing or extended usage, please contact the authors.
151
+
152
+ ---
153
+
154
+
155
+ ## 8. Contact
156
+
157
+ For questions, collaboration, or commercial inquiries:
158
+
159
+ 📧 **[zhumx@ncut.edu.cn](mailto:zhumx@ncut.edu.cn)**
160
+
161
+
162
+
163
+ ```