CrossLing-OCR-Mini / README.md

Update README.md

34e966f verified about 1 month ago

4.79 kB

	---
	license: apache-2.0
	---

	# CrossLing-OCR-Mini

	🚀 CrossLing-OCR-Mini is a lightweight OCR model designed for low-resource multilingual languages.

	---

	## 1. Model Overview

	Despite its compact size (~580MB), the model demonstrates strong recognition performance across 11 languages, while remaining deployable on consumer-grade GPUs.

	### Key Features
	- Multilingual OCR with structure-aware text recognition
	- Specialized optimization for low-resource and complex scripts
	- Lightweight (~580MB) and efficient inference

	### Supported Languages
	- High-resource languages: Chinese, English
	- Low-resource languages (specially optimized):
	Tibetan, Mongolian, Kazakh, Kyrgyz, Zhuang, etc

	Experimental results indicate that CrossLing-OCR-Mini outperforms or matches mainstream OCR systems on multiple low-resource languages.

	---

	## 2. Usage / Inference

	CrossLing-OCR-Mini can be directly used with the 🤗 Transformers library.
	The following example demonstrates single-image OCR inference for plain text recognition.

	### Requirements
	- Python ≥ 3.8
	- `transformers` (latest version recommended)
	- CUDA-enabled GPU (recommended for optimal performance)

	```bash
	pip install -U transformers accelerate
	````

	### Simple OCR Inference Example

	```python
	from transformers import AutoModel, AutoTokenizer

	# Hugging Face model id
	model_id = "NCUTNLP/CrossLing-OCR-Mini"
	# Load tokenizer and model
	tokenizer = AutoTokenizer.from_pretrained(
	model_id,
	trust_remote_code=True
	)
	model = AutoModel.from_pretrained(
	model_id,
	trust_remote_code=True,
	low_cpu_mem_usage=True,
	device_map="cuda",
	use_safetensors=True,
	pad_token_id=tokenizer.eos_token_id
	)
	model = model.eval().cuda()
	# Input image
	image_file = "test.png"
	# Perform plain text OCR
	result = model.chat(
	tokenizer,
	image_file,
	ocr_type="ocr"
	)
	print("Predicted OCR result:\n")
	print(result)
	```

	### Notes

	* `ocr_type="ocr"` enables plain text OCR mode
	* The model automatically handles multilingual text recognition
	* For best results, input images should be clear and upright
	* Consumer-grade GPUs (e.g., RTX 3060 / 3090) are sufficient for inference

	---

	## 3. Performance Notes & Limitations

	While CrossLing-OCR-Mini achieves strong overall performance, several limitations remain:

	* OCR accuracy on Mongolian and Uyghur still has room for improvement
	* Performance may degrade on extremely noisy, handwritten, or out-of-distribution inputs

	These challenges will be addressed in future versions of the model.

	---

	## 4. Model Variants

	\| Version \| Intended Use \| Availability \|
	\| ----------------------------- \| --------------------------- \| ------------------- \|
	\| CrossLing-OCR-Mini \| Research and academic purposes only \| ✅ Open-sourced \|
	\| CrossLing-OCR-Pro-Preview \| Commercial / production purposes \| 🔒 Contact required \|



	The performance differences between the Mini and Pro-Preview versions are illustrated below.

	![Mini\_Pro-Preview](https://cdn-uploads.huggingface.co/production/uploads/6956446a7ebeda1aa80be895/EcKEhwz-6VzPCmHqszIJy.png)

	---


	## 5. Prohibited Use & Disclaimer

	This model must not be used for:

	* Any illegal or unlawful activities
	* Applications that violate applicable laws or regulations
	* Surveillance or profiling that infringes on individual rights
	* Discriminatory or harmful automated decision-making in sensitive contexts

	Disclaimer:

	* Any misuse of this model is solely the responsibility of the user
	* The authors and maintainers do not endorse and are not liable for any consequences arising from improper or malicious use
	* Outputs generated by this model do not represent the views or positions of the authors

	---

	## 6. Ethical Considerations & Bias

	CrossLing-OCR-Mini is developed to support research on low-resource and underrepresented languages.
	However, like all OCR systems, the model may reflect biases present in its training data, including:

	* Uneven performance across languages and scripts
	* Sensitivity to document quality, typography, and layout variations
	* Reduced robustness on degraded, historical, or low-resolution documents

	Users are encouraged to:

	* Carefully evaluate outputs before downstream use
	* Avoid deploying the model in high-risk or sensitive decision-making scenarios

	---

	## 7. License

	This model is released for research purposes only.
	Commercial use is not permitted without explicit authorization.

	For commercial licensing or extended usage, please contact the authors.

	---


	## 8. Contact

	For questions, collaboration, or commercial inquiries:

	📧 [zhumx@ncut.edu.cn](mailto:zhumx@ncut.edu.cn)



	```