Model description
Model Name: multicentury-htr-model
Model Version: 202509_tf32
Model Type: Transformer-based OCR (TrOCR)
Base Model: microsoft/trocr-large-handwritten
Purpose: Handwritten text recognition
Languages: Swedish, Finnish
License: Apache 2.0
This model is a fine-tuned version of the microsoft/trocr-large-handwritten model, specialized for recognizing handwritten text. It has been trained on various dataset from 16th to 20th centuries and can be used for applications such as document digitization, form recognition, or any task involving handwritten text extraction.
Model Architecture
The model is based on a Transformer architecture (TrOCR) with an encoder-decoder setup:
- The encoder processes images of handwritten text.
- The decoder generates corresponding text output.
Intended Use
This model is designed for handwritten text recognition and is intended for use in:
- Document digitization (e.g., archival work, historical manuscripts)
- Handwritten notes transcription
Training data
The training dataset includes more than 913 000 samples of handwritten and typewritten text rows, covering a wide variety of handwriting styles and text samples.
Evaluation
The model was evaluated on test dataset. Below are key metrics:
Character Error Rate (CER): 2.8
Test Dataset Description: size ~111 800 text rows
Used Hyperparameters
Evaluation strategy: epoch
Train batch size per device: 16
Learning rate: 2.2e-5
Scheduler: polynomial
Optimizer: AdamW
Number of epochs: 14
FP16 mixed precision training: True
Half precision backend: cuda_amp
Input image size: 192 x 1024
How to Use the Model
You can use the model directly with Hugging Face’s pipeline function or by manually loading the processor and model.
from transformers.models.vit.modeling_vit import ViTPatchEmbeddings, ViTEmbeddings
from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image
def load_custom_trocr_model():
"""Load a TrOCR model with custom image size support"""
original_embeddings_forward = ViTEmbeddings.forward
# Always apply patches for models saved with custom image sizes
def universal_patch_forward(self, *args, **kwargs):
pixel_values = args[0] if args else kwargs['pixel_values']
embeddings = self.projection(pixel_values).flatten(2).transpose(1, 2)
return embeddings
def universal_embeddings_forward(self, *args, **kwargs):
kwargs['interpolate_pos_encoding'] = True
return original_embeddings_forward(self, *args, **kwargs)
# Apply patches
ViTPatchEmbeddings.forward = universal_patch_forward
ViTEmbeddings.forward = universal_embeddings_forward
# Load model and processor
processor = TrOCRProcessor.from_pretrained("Kansallisarkisto/multicentury-htr-model",
use_fast=True,
do_resize=True,
size={'height': 192,'width': 1024})
model = VisionEncoderDecoderModel.from_pretrained("Kansallisarkisto/multicentury-htr-model")
return processor, model
# Load model and processor
processor, model = load_custom_trocr_model()
# Open an image of handwritten text
image = Image.open("path_to_image.jpg")
# Preprocess and predict
pixel_values = processor(image, return_tensors="pt").pixel_values
generated_ids = model.generate(pixel_values)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(generated_text)
Limitations and Biases
The model was trained primarily on handwritten text that uses basic Latin characters (A-Z, a-z) and includes Nordic special characters (å, ä, ö). It has not been trained on non-Latin alphabets, such as Chinese characters, Cyrillic script, or other writing systems like Arabic or Hebrew. The model may not generalize well to any other languages than Finnish, Swedish or English.
Future Work
Potential improvements for this model include:
- Expanding training data: Incorporating more diverse handwriting styles and languages.
- Optimizing for specific domains: Fine-tuning the model on domain-specific handwriting.
Citation
If you use this model in your work, please cite it as:
@misc{multicentury_htr_model_202509_tf32,
author = {Kansallisarkisto},
title = {Multicentury HTR Model: Handwritten Text Recognition},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/Kansallisarkisto/multicentury-htr-model/}},
}
Model Card Authors
Author: Kansallisarkisto Contact Information: mikko.lipsanen@kansallisarkisto.fi, ilkka.jokipii@kansallisarkisto.fi
- Downloads last month
- 116
Model tree for Kansallisarkisto/multicentury-htr-model
Base model
microsoft/trocr-large-handwritten