--- license: apache-2.0 language: - ru - en base_model: - raxtemur/trocr-base-ru pipeline_tag: image-to-text tags: - htr - trocr - image-to-text - transformers datasets: - CherryJam/ru-dialectological-stackmix - CherryJam/ru-dialectological-fonts-aug --- # Russian Dialectic HTR using TrOCR The TrOCR-base-ru model was fine-tuned on synthetic datasets generated using [StackMix](https://huggingface.co/datasets/CherryJam/ru-dialectological-stackmix) method and based on various [fonts](https://huggingface.co/datasets/CherryJam/ru-dialectological-fonts-aug). For more information, check out the [GitHub repository](https://github.com/DialecticalHTR/RuDialect-HTR). ## Model description TrOCR-base-ru was fine-tuned for Handwritten Russian Text Recognition in dialectological cards. The model was trained for a total of 20 epochs with a batch size of 16 using dual NVIDIA T4 GPUs. The fine-tuning process took approximately 28 hours. # Example Usage ```python # Load libraries from transformers import TrOCRProcessor, VisionEncoderDecoderModel import matplotlib.pyplot as plt from PIL import Image # Load image img_path = 'path/to/image' image = Image.open(img_path).convert("RGB") # Load model and processor model_name = "Daniil-Domino/trocr-base-ru-dialectic-stackmix" processor = TrOCRProcessor.from_pretrained(model_name) model = VisionEncoderDecoderModel.from_pretrained(model_name) # Preprocess and run inference pixel_values = processor(images=image, return_tensors="pt").pixel_values generated_ids = model.generate(pixel_values) generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0] # Output result print(generated_text) # Display input image plt.axis("off") plt.imshow(image) plt.show() ``` # Metrics Below are the key evaluation metrics on the validation set: - **CER**: 14.87 % - **WER**: 45.71 % - **Accuracy**: 55.28 %