ViTOCR-base / README.md
metythorn's picture
Upload README.md with huggingface_hub
468778f verified
metadata
language:
  - km
license: apache-2.0
tags:
  - ocr
  - transformer
  - vision
pipeline_tag: image-to-text

ViTOCR

This repository contains a pure Transformer-based checkpoint for Khmer OCR. Images are patch-embedded and encoded by a Transformer encoder, then decoded autoregressively.

Installation

pip install onnxruntime pillow torch torchvision numpy

Get the inference script

Download from this model repo:

curl -L -o onnx_inference.py https://huggingface.co/metythorn/ViTOCR-base/resolve/main/onnx_inference.py

Or copy onnx_inference.py from the repository files into your project directory.

Usage

from onnx_inference import ONNXPredictor

predictor = ONNXPredictor(
    model_path="model.onnx",
    config_path="config.json",
    providers=["CPUExecutionProvider"],  # or include CUDAExecutionProvider if available
)
result = predictor.predict("sample_image.png")
print("Predicted text:", result)