| language: | |
| - km | |
| license: apache-2.0 | |
| tags: | |
| - ocr | |
| - transformer | |
| - vision | |
| pipeline_tag: image-to-text | |
| # ViTOCR | |
| This repository contains a pure Transformer-based checkpoint for Khmer OCR. Images are patch-embedded and encoded by a Transformer encoder, then decoded autoregressively. | |
| ## Installation | |
| ```python | |
| pip install onnxruntime pillow torch torchvision numpy | |
| ``` | |
| ## Get the inference script | |
| Download from this model repo: | |
| ```bash | |
| curl -L -o onnx_inference.py https://huggingface.co/metythorn/ViTOCR-base/resolve/main/onnx_inference.py | |
| ``` | |
| Or copy `onnx_inference.py` from the repository files into your project directory. | |
| ## Usage | |
| ```python | |
| from onnx_inference import ONNXPredictor | |
| predictor = ONNXPredictor( | |
| model_path="model.onnx", | |
| config_path="config.json", | |
| providers=["CPUExecutionProvider"], # or include CUDAExecutionProvider if available | |
| ) | |
| result = predictor.predict("sample_image.png") | |
| print("Predicted text:", result) | |
| ``` | |