Upload README.md

#8
by shajiu - opened
Files changed (1) hide show
  1. README.md +193 -190
README.md CHANGED
@@ -1,190 +1,193 @@
1
- --- ---
2
- # CrossLing-OCR-Mini
3
-
4
- πŸš€ **CrossLing-OCR-Mini** is a lightweight OCR model designed for **low-resource multilingual languages and complex document layouts**.
5
- The model emphasizes accurate text recognition while preserving original document structure, making it particularly suitable for **multilingual OCR research and academic benchmarking**.
6
-
7
- ---
8
-
9
- ## 1. Model Overview
10
-
11
- CrossLing-OCR-Mini targets OCR scenarios involving **low-resource scripts, diverse writing directions, and complex layouts**.
12
- Despite its compact size (~580MB), the model demonstrates strong recognition performance across **11 languages**, while remaining deployable on **consumer-grade GPUs**.
13
-
14
- ### Key Features
15
- - Multilingual OCR with structure-aware text recognition
16
- - Specialized optimization for low-resource and complex scripts
17
- - Lightweight (~580MB) and efficient inference
18
- - Designed exclusively for research and academic benchmarking
19
-
20
- ### Supported Languages
21
- - **High-resource languages**: Chinese, English
22
- - **Low-resource languages (specially optimized)**:
23
- **Tibetan, Mongolian, Kazakh, Kyrgyz, Zhuang**
24
-
25
- Experimental results indicate that CrossLing-OCR-Mini **outperforms or matches mainstream OCR systems** on multiple low-resource languages.
26
-
27
- ---
28
-
29
- ## 2. Usage / Inference
30
-
31
- CrossLing-OCR-Mini can be directly used with the πŸ€— **Transformers** library.
32
- The following example demonstrates **single-image OCR inference** for plain text recognition.
33
-
34
- ### Requirements
35
- - Python β‰₯ 3.8
36
- - `transformers` (latest version recommended)
37
- - CUDA-enabled GPU (recommended for optimal performance)
38
-
39
- ```bash
40
- pip install -U transformers accelerate
41
- ````
42
-
43
- ### Simple OCR Inference Example
44
-
45
- ```python
46
- from transformers import AutoModel, AutoTokenizer
47
-
48
- # Hugging Face model id
49
- model_id = "NCUTNLP/CrossLing-OCR-Mini"
50
- # Load tokenizer and model
51
- tokenizer = AutoTokenizer.from_pretrained(
52
- model_id,
53
- trust_remote_code=True
54
- )
55
- model = AutoModel.from_pretrained(
56
- model_id,
57
- trust_remote_code=True,
58
- low_cpu_mem_usage=True,
59
- device_map="cuda",
60
- use_safetensors=True,
61
- pad_token_id=tokenizer.eos_token_id
62
- )
63
- model = model.eval().cuda()
64
- # Input image
65
- image_file = "test.png"
66
- # Perform plain text OCR
67
- result = model.chat(
68
- tokenizer,
69
- image_file,
70
- ocr_type="ocr"
71
- )
72
- print("Predicted OCR result:\n")
73
- print(result)
74
- ```
75
-
76
- ### Notes
77
-
78
- * `ocr_type="ocr"` enables plain text OCR mode
79
- * The model automatically handles multilingual text recognition
80
- * For best results, input images should be clear and upright
81
- * Consumer-grade GPUs (e.g., RTX 3060 / 3090) are sufficient for inference
82
-
83
- ---
84
-
85
- ## 3. Performance Notes & Limitations
86
-
87
- While CrossLing-OCR-Mini achieves strong overall performance, several limitations remain:
88
-
89
- * OCR accuracy on **Mongolian and Uyghur** still has room for improvement
90
- * Performance may degrade on extremely noisy, handwritten, or out-of-distribution inputs
91
-
92
- These challenges will be addressed in future versions of the model.
93
-
94
- ---
95
-
96
- ## 4. Model Variants
97
-
98
- | Version | Intended Use | Availability |
99
- | ----------------------------- | --------------------------- | ------------------- |
100
- | **CrossLing-OCR-Mini** | Research & academic use | βœ… Open-sourced |
101
- | **CrossLing-OCR-Pro-Preview** | Commercial / production use | πŸ”’ Contact required |
102
-
103
- πŸ“© For access to **CrossLing-OCR-Pro-Preview**, please contact:
104
- **[zhumx@ncut.edu.cn](mailto:zhumx@ncut.edu.cn)**
105
-
106
- The performance differences between the Mini and Pro-Preview versions are illustrated below.
107
-
108
- ![Mini\_Pro-Preview](https://cdn-uploads.huggingface.co/production/uploads/6956446a7ebeda1aa80be895/EcKEhwz-6VzPCmHqszIJy.png)
109
-
110
- ---
111
-
112
- ## 5. Intended Use
113
-
114
- This model is **strictly intended for**:
115
-
116
- * Academic research
117
- * Scientific experimentation
118
- * OCR benchmarking and method comparison
119
- * Low-resource language OCR studies
120
-
121
- ---
122
-
123
- ## 6. Prohibited Use & Disclaimer
124
-
125
- This model **must not be used** for:
126
-
127
- * Any illegal or unlawful activities
128
- * Applications violating social ethics, public order, or applicable laws
129
- * Surveillance, discrimination, or harmful automated decision-making
130
-
131
- **Disclaimer**:
132
-
133
- * Any misuse of this model is **solely the responsibility of the user**
134
- * The authors and maintainers **do not endorse** and **are not liable for** any consequences arising from improper or malicious use
135
- * Outputs generated by this model **do not represent the views or positions of the authors**
136
-
137
- ---
138
-
139
- ## 7. Ethical Considerations & Bias
140
-
141
- CrossLing-OCR-Mini is developed to support research on **low-resource and underrepresented languages**.
142
- However, like all OCR systems, the model may reflect biases present in its training data, including:
143
-
144
- * Uneven performance across languages and scripts
145
- * Sensitivity to document quality, typography, and layout styles
146
-
147
- Users are encouraged to:
148
-
149
- * Carefully evaluate outputs before downstream use
150
- * Avoid deploying the model in high-risk or sensitive decision-making scenarios
151
-
152
- ---
153
-
154
- ## 8. License
155
-
156
- This model is released **for research purposes only**.
157
- Commercial use is **not permitted** without explicit authorization.
158
-
159
- For commercial licensing or extended usage, please contact the authors.
160
-
161
- ---
162
-
163
- ## 9. Citation
164
-
165
- If you use CrossLing-OCR-Mini in your research, please cite:
166
-
167
- ```bibtex
168
- @misc{crossling-ocr-mini,
169
- title = {CrossLing-OCR: Advancing Low-Resource Multilingual Text Recognition through Multi-Stage Vision-Language Training},
170
- author = {CrossLing Team},
171
- year = {2025},
172
- note = {Research-only OCR model}
173
- }
174
- ```
175
-
176
- ---
177
-
178
- ## 10. Contact
179
-
180
- For questions, collaboration, or commercial inquiries:
181
-
182
- πŸ“§ **[zhumx@ncut.edu.cn](mailto:zhumx@ncut.edu.cn)**
183
-
184
- ---
185
-
186
- ## 11. Acknowledgement
187
-
188
- This project aims to advance **low-resource multilingual OCR research** and contribute to the accessibility of underrepresented languages in the global AI ecosystem.
189
-
190
- ```
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+
5
+ # CrossLing-OCR-Mini
6
+
7
+ πŸš€ **CrossLing-OCR-Mini** is a lightweight OCR model designed for **low-resource multilingual languages and complex document layouts**.
8
+ The model emphasizes accurate text recognition while preserving original document structure, making it particularly suitable for **multilingual OCR research and academic benchmarking**.
9
+
10
+ ---
11
+
12
+ ## 1. Model Overview
13
+
14
+ CrossLing-OCR-Mini targets OCR scenarios involving **low-resource scripts, diverse writing directions, and complex layouts**.
15
+ Despite its compact size (~580MB), the model demonstrates strong recognition performance across **11 languages**, while remaining deployable on **consumer-grade GPUs**.
16
+
17
+ ### Key Features
18
+ - Multilingual OCR with structure-aware text recognition
19
+ - Specialized optimization for low-resource and complex scripts
20
+ - Lightweight (~580MB) and efficient inference
21
+ - Designed exclusively for research and academic benchmarking
22
+
23
+ ### Supported Languages
24
+ - **High-resource languages**: Chinese, English
25
+ - **Low-resource languages (specially optimized)**:
26
+ **Tibetan, Mongolian, Kazakh, Kyrgyz, Zhuang**
27
+
28
+ Experimental results indicate that CrossLing-OCR-Mini **outperforms or matches mainstream OCR systems** on multiple low-resource languages.
29
+
30
+ ---
31
+
32
+ ## 2. Usage / Inference
33
+
34
+ CrossLing-OCR-Mini can be directly used with the πŸ€— **Transformers** library.
35
+ The following example demonstrates **single-image OCR inference** for plain text recognition.
36
+
37
+ ### Requirements
38
+ - Python β‰₯ 3.8
39
+ - `transformers` (latest version recommended)
40
+ - CUDA-enabled GPU (recommended for optimal performance)
41
+
42
+ ```bash
43
+ pip install -U transformers accelerate
44
+ ````
45
+
46
+ ### Simple OCR Inference Example
47
+
48
+ ```python
49
+ from transformers import AutoModel, AutoTokenizer
50
+
51
+ # Hugging Face model id
52
+ model_id = "NCUTNLP/CrossLing-OCR-Mini"
53
+ # Load tokenizer and model
54
+ tokenizer = AutoTokenizer.from_pretrained(
55
+ model_id,
56
+ trust_remote_code=True
57
+ )
58
+ model = AutoModel.from_pretrained(
59
+ model_id,
60
+ trust_remote_code=True,
61
+ low_cpu_mem_usage=True,
62
+ device_map="cuda",
63
+ use_safetensors=True,
64
+ pad_token_id=tokenizer.eos_token_id
65
+ )
66
+ model = model.eval().cuda()
67
+ # Input image
68
+ image_file = "test.png"
69
+ # Perform plain text OCR
70
+ result = model.chat(
71
+ tokenizer,
72
+ image_file,
73
+ ocr_type="ocr"
74
+ )
75
+ print("Predicted OCR result:\n")
76
+ print(result)
77
+ ```
78
+
79
+ ### Notes
80
+
81
+ * `ocr_type="ocr"` enables plain text OCR mode
82
+ * The model automatically handles multilingual text recognition
83
+ * For best results, input images should be clear and upright
84
+ * Consumer-grade GPUs (e.g., RTX 3060 / 3090) are sufficient for inference
85
+
86
+ ---
87
+
88
+ ## 3. Performance Notes & Limitations
89
+
90
+ While CrossLing-OCR-Mini achieves strong overall performance, several limitations remain:
91
+
92
+ * OCR accuracy on **Mongolian and Uyghur** still has room for improvement
93
+ * Performance may degrade on extremely noisy, handwritten, or out-of-distribution inputs
94
+
95
+ These challenges will be addressed in future versions of the model.
96
+
97
+ ---
98
+
99
+ ## 4. Model Variants
100
+
101
+ | Version | Intended Use | Availability |
102
+ | ----------------------------- | --------------------------- | ------------------- |
103
+ | **CrossLing-OCR-Mini** | Research & academic use | βœ… Open-sourced |
104
+ | **CrossLing-OCR-Pro-Preview** | Commercial / production use | πŸ”’ Contact required |
105
+
106
+ πŸ“© For access to **CrossLing-OCR-Pro-Preview**, please contact:
107
+ **[zhumx@ncut.edu.cn](mailto:zhumx@ncut.edu.cn)**
108
+
109
+ The performance differences between the Mini and Pro-Preview versions are illustrated below.
110
+
111
+ ![Mini\_Pro-Preview](https://cdn-uploads.huggingface.co/production/uploads/6956446a7ebeda1aa80be895/EcKEhwz-6VzPCmHqszIJy.png)
112
+
113
+ ---
114
+
115
+ ## 5. Intended Use
116
+
117
+ This model is **strictly intended for**:
118
+
119
+ * Academic research
120
+ * Scientific experimentation
121
+ * OCR benchmarking and method comparison
122
+ * Low-resource language OCR studies
123
+
124
+ ---
125
+
126
+ ## 6. Prohibited Use & Disclaimer
127
+
128
+ This model **must not be used** for:
129
+
130
+ * Any illegal or unlawful activities
131
+ * Applications violating social ethics, public order, or applicable laws
132
+ * Surveillance, discrimination, or harmful automated decision-making
133
+
134
+ **Disclaimer**:
135
+
136
+ * Any misuse of this model is **solely the responsibility of the user**
137
+ * The authors and maintainers **do not endorse** and **are not liable for** any consequences arising from improper or malicious use
138
+ * Outputs generated by this model **do not represent the views or positions of the authors**
139
+
140
+ ---
141
+
142
+ ## 7. Ethical Considerations & Bias
143
+
144
+ CrossLing-OCR-Mini is developed to support research on **low-resource and underrepresented languages**.
145
+ However, like all OCR systems, the model may reflect biases present in its training data, including:
146
+
147
+ * Uneven performance across languages and scripts
148
+ * Sensitivity to document quality, typography, and layout styles
149
+
150
+ Users are encouraged to:
151
+
152
+ * Carefully evaluate outputs before downstream use
153
+ * Avoid deploying the model in high-risk or sensitive decision-making scenarios
154
+
155
+ ---
156
+
157
+ ## 8. License
158
+
159
+ This model is released **for research purposes only**.
160
+ Commercial use is **not permitted** without explicit authorization.
161
+
162
+ For commercial licensing or extended usage, please contact the authors.
163
+
164
+ ---
165
+
166
+ ## 9. Citation
167
+
168
+ If you use CrossLing-OCR-Mini in your research, please cite:
169
+
170
+ ```bibtex
171
+ @misc{crossling-ocr-mini,
172
+ title = {CrossLing-OCR: Advancing Low-Resource Multilingual Text Recognition through Multi-Stage Vision-Language Training},
173
+ author = {CrossLing Team},
174
+ year = {2025},
175
+ note = {Research-only OCR model}
176
+ }
177
+ ```
178
+
179
+ ---
180
+
181
+ ## 10. Contact
182
+
183
+ For questions, collaboration, or commercial inquiries:
184
+
185
+ πŸ“§ **[zhumx@ncut.edu.cn](mailto:zhumx@ncut.edu.cn)**
186
+
187
+ ---
188
+
189
+ ## 11. Acknowledgement
190
+
191
+ This project aims to advance **low-resource multilingual OCR research** and contribute to the accessibility of underrepresented languages in the global AI ecosystem.
192
+
193
+ ```