--- library_name: peft tags: - unsloth license: apache-2.0 datasets: - iapp/thai_handwriting_dataset - Thinnaphat/TH-HANDWRITTEN-CPE-OPH2025 - openthaigpt/thai-ocr-evaluation base_model: - deepseek-ai/DeepSeek-OCR language: - th - en pipeline_tag: image-text-to-text --- **DeepSeek-OCR Thai ** is a fine-tuned version of [DeepSeek-OCR](https://huggingface.co/unsloth/DeepSeek-OCR) specifically optimized for recognizing Thai handwriting. This model leverages the power of the DeepSeek-OCR architecture and has been adapted using Low-Rank Adaptation (LoRA) for high-performance Thai OCR tasks, particularly focusing on handwritten text which often presents challenges for general-purpose OCR systems. - **Base Model:** `unsloth/DeepSeek-OCR` - **Language(s):** Thai, Eng - **Task:** Optical Character Recognition (OCR) for Thai Language ## Inference requirement libraries. ``` pip install -U addict transformers unsloth unsloth_zoo pip install torch torchvision --index-url https://download.pytorch.org/whl/cu126 ``` ```py from unsloth import FastVisionModel import torch from transformers import AutoModel import os import builtins # Fix for Unsloth/PEFT issue: "name 'VARIANT_KWARG_KEYS' is not defined" builtins.VARIANT_KWARG_KEYS = ['alora_offsets'] os.environ["UNSLOTH_WARN_UNINITIALIZED"] = '0' adapter = "sthaps/DeepSeek-ocr-Thai" from huggingface_hub import snapshot_download snapshot_download("unsloth/DeepSeek-OCR", local_dir = "deepseek_ocr") model, tokenizer = FastVisionModel.from_pretrained( "./deepseek_ocr", load_in_4bit = False, # Use 4bit to reduce memory use. False for 16bit LoRA. auto_model = AutoModel, trust_remote_code = True, unsloth_force_compile = True, use_gradient_checkpointing = "unsloth", # True or "unsloth" for long context ) model = PeftModel.from_pretrained(model, adapter) model = model.eval().to(torch.bfloat16) prompt = "\nFree OCR" image = "download.jpeg" # Tiny: base_size = 512, image_size = 512, crop_mode = False # Small: base_size = 640, image_size = 640, crop_mode = False # Base: base_size = 1024, image_size = 1024, crop_mode = False # Large: base_size = 1280, image_size = 1280, crop_mode = False # Gundam: base_size = 1024, image_size = 640, crop_mode = True res = model.infer(tokenizer, prompt=prompt, image_file=image, output_path = "output2", base_size = 1024, image_size = 640, crop_mode=True, save_results = True, test_compress = False) print(res) ``` ## Training Data The model was fine-tuned on a comprehensive collection of Thai OCR datasets, including: - **iapp/thai_handwriting_dataset**: A dataset focused on various styles of Thai handwriting. - **Thinnaphat/TH-HANDWRITTEN-CPE-OPH2025**: Recent Thai handwritten data. - **openthaigpt/thai-ocr-evaluation**: Standard Thai OCR evaluation data used to broaden the model's robustness. - **[OCR image data for Thai documents](https://www.kaggle.com/datasets/appenlimited/ocr-image-data-for-thai-documents)**: Additional samples. The training data was preprocessed to a conversation format with images saved locally and mapped to instruction-based prompts: - **Instruction:** `\nFree OCR. ` ## Training Procedure The model was trained using the `unsloth` library for memory-efficient and fast training. - **Method:** PEFT (LoRA) - **LoRA Config:** - Rank (R): 16 - Alpha: 16 - Target Modules: `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj` - **Hyperparameters:** - Optimizer: `adamw_8bit` - Learning Rate: `2e-4` - Batch Size: `8` - Gradient Accumulation Steps: `2` (Effective Batch Size: 16) - Training Epochs: 1 - LR Scheduler: `linear` - Warmup Steps: 50 - Precision: `bf16` (if supported) or `fp16`