File size: 957 Bytes
d4022b9
 
 
 
 
 
 
 
 
 
 
f86ba13
d4022b9
f86ba13
d4022b9
 
 
5571604
d4022b9
468778f
 
 
 
 
 
 
d4022b9
 
5571604
d4022b9
5571604
 
 
 
 
 
d4022b9
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
---
language:
- km
license: apache-2.0
tags:
- ocr
- transformer
- vision
pipeline_tag: image-to-text
---

# ViTOCR

This repository contains a pure Transformer-based checkpoint for Khmer OCR. Images are patch-embedded and encoded by a Transformer encoder, then decoded autoregressively.

## Installation
```python
pip install onnxruntime pillow torch torchvision numpy
```
## Get the inference script
Download from this model repo:
```bash
curl -L -o onnx_inference.py https://huggingface.co/metythorn/ViTOCR-base/resolve/main/onnx_inference.py
```
Or copy `onnx_inference.py` from the repository files into your project directory.

## Usage
```python
from onnx_inference import ONNXPredictor

predictor = ONNXPredictor(
    model_path="model.onnx",
    config_path="config.json",
    providers=["CPUExecutionProvider"],  # or include CUDAExecutionProvider if available
)
result = predictor.predict("sample_image.png")
print("Predicted text:", result)
```