|
|
--- |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- ru |
|
|
- en |
|
|
base_model: |
|
|
- raxtemur/trocr-base-ru |
|
|
pipeline_tag: image-to-text |
|
|
tags: |
|
|
- htr |
|
|
- trocr |
|
|
- image-to-text |
|
|
- transformers |
|
|
datasets: |
|
|
- CherryJam/ru-dialectological-stackmix |
|
|
- CherryJam/ru-dialectological-fonts-aug |
|
|
--- |
|
|
|
|
|
# Russian Dialectic HTR using TrOCR |
|
|
|
|
|
The TrOCR-base-ru model was fine-tuned on synthetic datasets generated using [StackMix](https://huggingface.co/datasets/CherryJam/ru-dialectological-stackmix) method |
|
|
and based on various [fonts](https://huggingface.co/datasets/CherryJam/ru-dialectological-fonts-aug). |
|
|
|
|
|
For more information, check out the [GitHub repository](https://github.com/DialecticalHTR/RuDialect-HTR). |
|
|
|
|
|
## Model description |
|
|
|
|
|
TrOCR-base-ru was fine-tuned for Handwritten Russian Text Recognition in dialectological cards. |
|
|
The model was trained for a total of 20 epochs with a batch size of 16 using dual NVIDIA T4 GPUs. The fine-tuning process took approximately 28 hours. |
|
|
|
|
|
# Example Usage |
|
|
|
|
|
```python |
|
|
# Load libraries |
|
|
from transformers import TrOCRProcessor, VisionEncoderDecoderModel |
|
|
import matplotlib.pyplot as plt |
|
|
from PIL import Image |
|
|
|
|
|
|
|
|
# Load image |
|
|
img_path = 'path/to/image' |
|
|
image = Image.open(img_path).convert("RGB") |
|
|
|
|
|
# Load model and processor |
|
|
model_name = "Daniil-Domino/trocr-base-ru-dialectic-stackmix" |
|
|
processor = TrOCRProcessor.from_pretrained(model_name) |
|
|
model = VisionEncoderDecoderModel.from_pretrained(model_name) |
|
|
|
|
|
# Preprocess and run inference |
|
|
pixel_values = processor(images=image, return_tensors="pt").pixel_values |
|
|
generated_ids = model.generate(pixel_values) |
|
|
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0] |
|
|
|
|
|
# Output result |
|
|
print(generated_text) |
|
|
|
|
|
# Display input image |
|
|
plt.axis("off") |
|
|
plt.imshow(image) |
|
|
plt.show() |
|
|
``` |
|
|
|
|
|
# Metrics |
|
|
Below are the key evaluation metrics on the validation set: |
|
|
- **CER**: 14.87 % |
|
|
- **WER**: 45.71 % |
|
|
- **Accuracy**: 55.28 % |