DeepSeek-ocr-Thai / README.md

Update README.md

4671449 verified 23 days ago

3.71 kB

	---
	library_name: peft
	tags:
	- unsloth
	license: apache-2.0
	datasets:
	- iapp/thai_handwriting_dataset
	- Thinnaphat/TH-HANDWRITTEN-CPE-OPH2025
	- openthaigpt/thai-ocr-evaluation
	base_model:
	- deepseek-ai/DeepSeek-OCR
	language:
	- th
	- en
	pipeline_tag: image-text-to-text
	---

	DeepSeek-OCR Thai is a fine-tuned version of [DeepSeek-OCR](https://huggingface.co/unsloth/DeepSeek-OCR) specifically optimized for recognizing Thai handwriting. This model leverages the power of the DeepSeek-OCR architecture and has been adapted using Low-Rank Adaptation (LoRA) for high-performance Thai OCR tasks, particularly focusing on handwritten text which often presents challenges for general-purpose OCR systems.

	- Base Model: `unsloth/DeepSeek-OCR`
	- Language(s): Thai, Eng
	- Task: Optical Character Recognition (OCR) for Thai Language

	## Inference
	requirement libraries.
	```
	pip install -U addict transformers unsloth unsloth_zoo
	pip install torch torchvision --index-url https://download.pytorch.org/whl/cu126
	```

	```py
	from unsloth import FastVisionModel
	import torch
	from transformers import AutoModel
	import os
	import builtins

	# Fix for Unsloth/PEFT issue: "name 'VARIANT_KWARG_KEYS' is not defined"
	builtins.VARIANT_KWARG_KEYS = ['alora_offsets']
	os.environ["UNSLOTH_WARN_UNINITIALIZED"] = '0'

	adapter = "sthaps/DeepSeek-ocr-Thai"
	from huggingface_hub import snapshot_download
	snapshot_download("unsloth/DeepSeek-OCR", local_dir = "deepseek_ocr")
	model, tokenizer = FastVisionModel.from_pretrained(
	"./deepseek_ocr",
	load_in_4bit = False, # Use 4bit to reduce memory use. False for 16bit LoRA.
	auto_model = AutoModel,
	trust_remote_code = True,
	unsloth_force_compile = True,
	use_gradient_checkpointing = "unsloth", # True or "unsloth" for long context
	)
	model = PeftModel.from_pretrained(model, adapter)
	model = model.eval().to(torch.bfloat16)

	prompt = "<image>\nFree OCR"
	image = "download.jpeg"
	# Tiny: base_size = 512, image_size = 512, crop_mode = False
	# Small: base_size = 640, image_size = 640, crop_mode = False
	# Base: base_size = 1024, image_size = 1024, crop_mode = False
	# Large: base_size = 1280, image_size = 1280, crop_mode = False

	# Gundam: base_size = 1024, image_size = 640, crop_mode = True
	res = model.infer(tokenizer, prompt=prompt, image_file=image, output_path = "output2", base_size = 1024, image_size = 640, crop_mode=True, save_results = True, test_compress = False)
	print(res)
	```

	## Training Data
	The model was fine-tuned on a comprehensive collection of Thai OCR datasets, including:
	- iapp/thai_handwriting_dataset: A dataset focused on various styles of Thai handwriting.
	- Thinnaphat/TH-HANDWRITTEN-CPE-OPH2025: Recent Thai handwritten data.
	- openthaigpt/thai-ocr-evaluation: Standard Thai OCR evaluation data used to broaden the model's robustness.
	- [OCR image data for Thai documents](https://www.kaggle.com/datasets/appenlimited/ocr-image-data-for-thai-documents): Additional samples.

	The training data was preprocessed to a conversation format with images saved locally and mapped to instruction-based prompts:
	- Instruction: `<image>\nFree OCR. `

	## Training Procedure
	The model was trained using the `unsloth` library for memory-efficient and fast training.

	- Method: PEFT (LoRA)
	- LoRA Config:
	- Rank (R): 16
	- Alpha: 16
	- Target Modules: `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`
	- Hyperparameters:
	- Optimizer: `adamw_8bit`
	- Learning Rate: `2e-4`
	- Batch Size: `8`
	- Gradient Accumulation Steps: `2` (Effective Batch Size: 16)
	- Training Epochs: 1
	- LR Scheduler: `linear`
	- Warmup Steps: 50
	- Precision: `bf16` (if supported) or `fp16`