Image-Text-to-Text
PEFT
Safetensors
Thai
English
unsloth

**DeepSeek-OCR Thai ** is a fine-tuned version of DeepSeek-OCR specifically optimized for recognizing Thai handwriting. This model leverages the power of the DeepSeek-OCR architecture and has been adapted using Low-Rank Adaptation (LoRA) for high-performance Thai OCR tasks, particularly focusing on handwritten text which often presents challenges for general-purpose OCR systems.

  • Base Model: unsloth/DeepSeek-OCR
  • Language(s): Thai, Eng
  • Task: Optical Character Recognition (OCR) for Thai Language

Inference

requirement libraries.

pip install -U addict transformers unsloth unsloth_zoo
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu126
from unsloth import FastVisionModel
import torch
from transformers import AutoModel
import os
import builtins

# Fix for Unsloth/PEFT issue: "name 'VARIANT_KWARG_KEYS' is not defined"
builtins.VARIANT_KWARG_KEYS = ['alora_offsets']
os.environ["UNSLOTH_WARN_UNINITIALIZED"] = '0'

adapter = "sthaps/DeepSeek-ocr-Thai"
from huggingface_hub import snapshot_download
snapshot_download("unsloth/DeepSeek-OCR", local_dir = "deepseek_ocr")
model, tokenizer = FastVisionModel.from_pretrained(
    "./deepseek_ocr",
    load_in_4bit = False, # Use 4bit to reduce memory use. False for 16bit LoRA.
    auto_model = AutoModel,
    trust_remote_code = True,
    unsloth_force_compile = True,
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for long context
)
model = PeftModel.from_pretrained(model, adapter)
model = model.eval().to(torch.bfloat16)

prompt = "<image>\nFree OCR"
image = "download.jpeg"
# Tiny: base_size = 512, image_size = 512, crop_mode = False
# Small: base_size = 640, image_size = 640, crop_mode = False
# Base: base_size = 1024, image_size = 1024, crop_mode = False
# Large: base_size = 1280, image_size = 1280, crop_mode = False

# Gundam: base_size = 1024, image_size = 640, crop_mode = True
res = model.infer(tokenizer, prompt=prompt, image_file=image, output_path = "output2", base_size = 1024, image_size = 640, crop_mode=True, save_results = True, test_compress = False)
print(res)

Training Data

The model was fine-tuned on a comprehensive collection of Thai OCR datasets, including:

  • iapp/thai_handwriting_dataset: A dataset focused on various styles of Thai handwriting.
  • Thinnaphat/TH-HANDWRITTEN-CPE-OPH2025: Recent Thai handwritten data.
  • openthaigpt/thai-ocr-evaluation: Standard Thai OCR evaluation data used to broaden the model's robustness.
  • OCR image data for Thai documents: Additional samples.

The training data was preprocessed to a conversation format with images saved locally and mapped to instruction-based prompts:

  • Instruction: <image>\nFree OCR.

Training Procedure

The model was trained using the unsloth library for memory-efficient and fast training.

  • Method: PEFT (LoRA)
  • LoRA Config:
    • Rank (R): 16
    • Alpha: 16
    • Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
  • Hyperparameters:
    • Optimizer: adamw_8bit
    • Learning Rate: 2e-4
    • Batch Size: 8
    • Gradient Accumulation Steps: 2 (Effective Batch Size: 16)
    • Training Epochs: 1
    • LR Scheduler: linear
    • Warmup Steps: 50
    • Precision: bf16 (if supported) or fp16
Downloads last month
48
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sthaps/DeepSeek-ocr-Thai

Adapter
(24)
this model

Datasets used to train sthaps/DeepSeek-ocr-Thai

Collection including sthaps/DeepSeek-ocr-Thai