|
|
--- |
|
|
library_name: peft |
|
|
tags: |
|
|
- unsloth |
|
|
license: apache-2.0 |
|
|
datasets: |
|
|
- iapp/thai_handwriting_dataset |
|
|
- Thinnaphat/TH-HANDWRITTEN-CPE-OPH2025 |
|
|
- openthaigpt/thai-ocr-evaluation |
|
|
base_model: |
|
|
- deepseek-ai/DeepSeek-OCR |
|
|
language: |
|
|
- th |
|
|
- en |
|
|
pipeline_tag: image-text-to-text |
|
|
--- |
|
|
|
|
|
**DeepSeek-OCR Thai ** is a fine-tuned version of [DeepSeek-OCR](https://huggingface.co/unsloth/DeepSeek-OCR) specifically optimized for recognizing Thai handwriting. This model leverages the power of the DeepSeek-OCR architecture and has been adapted using Low-Rank Adaptation (LoRA) for high-performance Thai OCR tasks, particularly focusing on handwritten text which often presents challenges for general-purpose OCR systems. |
|
|
|
|
|
- **Base Model:** `unsloth/DeepSeek-OCR` |
|
|
- **Language(s):** Thai, Eng |
|
|
- **Task:** Optical Character Recognition (OCR) for Thai Language |
|
|
|
|
|
## Inference |
|
|
requirement libraries. |
|
|
``` |
|
|
pip install -U addict transformers unsloth unsloth_zoo |
|
|
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu126 |
|
|
``` |
|
|
|
|
|
```py |
|
|
from unsloth import FastVisionModel |
|
|
import torch |
|
|
from transformers import AutoModel |
|
|
import os |
|
|
import builtins |
|
|
|
|
|
# Fix for Unsloth/PEFT issue: "name 'VARIANT_KWARG_KEYS' is not defined" |
|
|
builtins.VARIANT_KWARG_KEYS = ['alora_offsets'] |
|
|
os.environ["UNSLOTH_WARN_UNINITIALIZED"] = '0' |
|
|
|
|
|
adapter = "sthaps/DeepSeek-ocr-Thai" |
|
|
from huggingface_hub import snapshot_download |
|
|
snapshot_download("unsloth/DeepSeek-OCR", local_dir = "deepseek_ocr") |
|
|
model, tokenizer = FastVisionModel.from_pretrained( |
|
|
"./deepseek_ocr", |
|
|
load_in_4bit = False, # Use 4bit to reduce memory use. False for 16bit LoRA. |
|
|
auto_model = AutoModel, |
|
|
trust_remote_code = True, |
|
|
unsloth_force_compile = True, |
|
|
use_gradient_checkpointing = "unsloth", # True or "unsloth" for long context |
|
|
) |
|
|
model = PeftModel.from_pretrained(model, adapter) |
|
|
model = model.eval().to(torch.bfloat16) |
|
|
|
|
|
prompt = "<image>\nFree OCR" |
|
|
image = "download.jpeg" |
|
|
# Tiny: base_size = 512, image_size = 512, crop_mode = False |
|
|
# Small: base_size = 640, image_size = 640, crop_mode = False |
|
|
# Base: base_size = 1024, image_size = 1024, crop_mode = False |
|
|
# Large: base_size = 1280, image_size = 1280, crop_mode = False |
|
|
|
|
|
# Gundam: base_size = 1024, image_size = 640, crop_mode = True |
|
|
res = model.infer(tokenizer, prompt=prompt, image_file=image, output_path = "output2", base_size = 1024, image_size = 640, crop_mode=True, save_results = True, test_compress = False) |
|
|
print(res) |
|
|
``` |
|
|
|
|
|
## Training Data |
|
|
The model was fine-tuned on a comprehensive collection of Thai OCR datasets, including: |
|
|
- **iapp/thai_handwriting_dataset**: A dataset focused on various styles of Thai handwriting. |
|
|
- **Thinnaphat/TH-HANDWRITTEN-CPE-OPH2025**: Recent Thai handwritten data. |
|
|
- **openthaigpt/thai-ocr-evaluation**: Standard Thai OCR evaluation data used to broaden the model's robustness. |
|
|
- **[OCR image data for Thai documents](https://www.kaggle.com/datasets/appenlimited/ocr-image-data-for-thai-documents)**: Additional samples. |
|
|
|
|
|
The training data was preprocessed to a conversation format with images saved locally and mapped to instruction-based prompts: |
|
|
- **Instruction:** `<image>\nFree OCR. ` |
|
|
|
|
|
## Training Procedure |
|
|
The model was trained using the `unsloth` library for memory-efficient and fast training. |
|
|
|
|
|
- **Method:** PEFT (LoRA) |
|
|
- **LoRA Config:** |
|
|
- Rank (R): 16 |
|
|
- Alpha: 16 |
|
|
- Target Modules: `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj` |
|
|
- **Hyperparameters:** |
|
|
- Optimizer: `adamw_8bit` |
|
|
- Learning Rate: `2e-4` |
|
|
- Batch Size: `8` |
|
|
- Gradient Accumulation Steps: `2` (Effective Batch Size: 16) |
|
|
- Training Epochs: 1 |
|
|
- LR Scheduler: `linear` |
|
|
- Warmup Steps: 50 |
|
|
- Precision: `bf16` (if supported) or `fp16` |