Instructions to use gg4ever/trOCR-final with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use gg4ever/trOCR-final with Transformers:
# Use a pipeline as a high-level helper # Warning: Pipeline type "image-to-text" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("image-to-text", model="gg4ever/trOCR-final")# Load model directly from transformers import AutoTokenizer, AutoModelForImageTextToText tokenizer = AutoTokenizer.from_pretrained("gg4ever/trOCR-final") model = AutoModelForImageTextToText.from_pretrained("gg4ever/trOCR-final") - Notebooks
- Google Colab
- Kaggle
# Load model directly
from transformers import AutoTokenizer, AutoModelForImageTextToText
tokenizer = AutoTokenizer.from_pretrained("gg4ever/trOCR-final")
model = AutoModelForImageTextToText.from_pretrained("gg4ever/trOCR-final")Quick Links
trOCR-final
fine-tuned for VisionEncoderDecoderModel(encoder , decoder) encoder = 'facebook/deit-base-distilled-patch16-384' decoder = 'klue/roberta-base'
How to Get Started with the Model
from transformers import VisionEncoderDecoderModel,AutoTokenizer, TrOCRProcessor
import torch
from PIL import Image
device = torch.device('cuda') # change 'cuda' if you need.
image_path='(your image path)'
image = Image.open(image_path)
#model can be .jpg or .png
#hugging face download: https://huggingface.co/gg4ever/trOCR-final
processor = TrOCRProcessor.from_pretrained("microsoft/trocr-base-handwritten")
trocr_model = "gg4ever/trOCR-final"
model = VisionEncoderDecoderModel.from_pretrained(trocr_model).to(device)
tokenizer = AutoTokenizer.from_pretrained(trocr_model)
pixel_values = (processor(image, return_tensors="pt").pixel_values).to(device)
generated_ids = model.generate(pixel_values)
generated_text = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(generated_text)
Training Details
Training Data
1M words generated by TextRecognitionDataGenerator(trdg) : https://github.com/Belval/TextRecognitionDataGenerator/blob/master/trdg/run.py
1.1M words from AI-hub OCR words dataset : https://aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&dataSetSn=81
Training Hyperparameters
| hyperparameters | values |
|---|---|
| predict_with_generate | True |
| evaluation_strategy | "steps" |
| per_device_train_batch_size | 32 |
| per_device_eval_batch_size | 32 |
| num_train_epochs | 2 |
| fp16 | True |
| learning_rate | 4e-5 |
| eval_stept | 10000 |
| warmup_steps | 20000 |
| weight_decay | 0.01 |
- Downloads last month
- 12
# Use a pipeline as a high-level helper # Warning: Pipeline type "image-to-text" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("image-to-text", model="gg4ever/trOCR-final")