CrossLing-OCR-Mini / README.md

Upload README.md

7f08f5f verified about 1 month ago

6.09 kB

	---
	license: apache-2.0
	---

	# CrossLing-OCR-Mini

	🚀 CrossLing-OCR-Mini is a lightweight OCR model designed for low-resource multilingual languages and complex document layouts.
	The model emphasizes accurate text recognition while preserving original document structure, making it particularly suitable for multilingual OCR research and academic benchmarking.

	---

	## 1. Model Overview

	CrossLing-OCR-Mini targets OCR scenarios involving low-resource scripts, diverse writing directions, and complex layouts.
	Despite its compact size (~580MB), the model demonstrates strong recognition performance across 11 languages, while remaining deployable on consumer-grade GPUs.

	### Key Features
	- Multilingual OCR with structure-aware text recognition
	- Specialized optimization for low-resource and complex scripts
	- Lightweight (~580MB) and efficient inference
	- Designed exclusively for research and academic benchmarking

	### Supported Languages
	- High-resource languages: Chinese, English
	- Low-resource languages (specially optimized):
	Tibetan, Mongolian, Kazakh, Kyrgyz, Zhuang

	Experimental results indicate that CrossLing-OCR-Mini outperforms or matches mainstream OCR systems on multiple low-resource languages.

	---

	## 2. Usage / Inference

	CrossLing-OCR-Mini can be directly used with the 🤗 Transformers library.
	The following example demonstrates single-image OCR inference for plain text recognition.

	### Requirements
	- Python ≥ 3.8
	- `transformers` (latest version recommended)
	- CUDA-enabled GPU (recommended for optimal performance)

	```bash
	pip install -U transformers accelerate
	````

	### Simple OCR Inference Example

	```python
	from transformers import AutoModel, AutoTokenizer

	# Hugging Face model id
	model_id = "NCUTNLP/CrossLing-OCR-Mini"
	# Load tokenizer and model
	tokenizer = AutoTokenizer.from_pretrained(
	model_id,
	trust_remote_code=True
	)
	model = AutoModel.from_pretrained(
	model_id,
	trust_remote_code=True,
	low_cpu_mem_usage=True,
	device_map="cuda",
	use_safetensors=True,
	pad_token_id=tokenizer.eos_token_id
	)
	model = model.eval().cuda()
	# Input image
	image_file = "test.png"
	# Perform plain text OCR
	result = model.chat(
	tokenizer,
	image_file,
	ocr_type="ocr"
	)
	print("Predicted OCR result:\n")
	print(result)
	```

	### Notes

	* `ocr_type="ocr"` enables plain text OCR mode
	* The model automatically handles multilingual text recognition
	* For best results, input images should be clear and upright
	* Consumer-grade GPUs (e.g., RTX 3060 / 3090) are sufficient for inference

	---

	## 3. Performance Notes & Limitations

	While CrossLing-OCR-Mini achieves strong overall performance, several limitations remain:

	* OCR accuracy on Mongolian and Uyghur still has room for improvement
	* Performance may degrade on extremely noisy, handwritten, or out-of-distribution inputs

	These challenges will be addressed in future versions of the model.

	---

	## 4. Model Variants

	\| Version \| Intended Use \| Availability \|
	\| ----------------------------- \| --------------------------- \| ------------------- \|
	\| CrossLing-OCR-Mini \| Research & academic use \| ✅ Open-sourced \|
	\| CrossLing-OCR-Pro-Preview \| Commercial / production use \| 🔒 Contact required \|

	📩 For access to CrossLing-OCR-Pro-Preview, please contact:
	[zhumx@ncut.edu.cn](mailto:zhumx@ncut.edu.cn)

	The performance differences between the Mini and Pro-Preview versions are illustrated below.

	![Mini\_Pro-Preview](https://cdn-uploads.huggingface.co/production/uploads/6956446a7ebeda1aa80be895/EcKEhwz-6VzPCmHqszIJy.png)

	---

	## 5. Intended Use

	This model is strictly intended for:

	* Academic research
	* Scientific experimentation
	* OCR benchmarking and method comparison
	* Low-resource language OCR studies

	---

	## 6. Prohibited Use & Disclaimer

	This model must not be used for:

	* Any illegal or unlawful activities
	* Applications violating social ethics, public order, or applicable laws
	* Surveillance, discrimination, or harmful automated decision-making

	Disclaimer:

	* Any misuse of this model is solely the responsibility of the user
	* The authors and maintainers do not endorse and are not liable for any consequences arising from improper or malicious use
	* Outputs generated by this model do not represent the views or positions of the authors

	---

	## 7. Ethical Considerations & Bias

	CrossLing-OCR-Mini is developed to support research on low-resource and underrepresented languages.
	However, like all OCR systems, the model may reflect biases present in its training data, including:

	* Uneven performance across languages and scripts
	* Sensitivity to document quality, typography, and layout styles

	Users are encouraged to:

	* Carefully evaluate outputs before downstream use
	* Avoid deploying the model in high-risk or sensitive decision-making scenarios

	---

	## 8. License

	This model is released for research purposes only.
	Commercial use is not permitted without explicit authorization.

	For commercial licensing or extended usage, please contact the authors.

	---

	## 9. Citation

	If you use CrossLing-OCR-Mini in your research, please cite:

	```bibtex
	@misc{crossling-ocr-mini,
	title = {CrossLing-OCR: Advancing Low-Resource Multilingual Text Recognition through Multi-Stage Vision-Language Training},
	author = {CrossLing Team},
	year = {2025},
	note = {Research-only OCR model}
	}
	```

	---

	## 10. Contact

	For questions, collaboration, or commercial inquiries:

	📧 [zhumx@ncut.edu.cn](mailto:zhumx@ncut.edu.cn)

	---

	## 11. Acknowledgement

	This project aims to advance low-resource multilingual OCR research and contribute to the accessibility of underrepresented languages in the global AI ecosystem.

	```