You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

TrOCR-SWIN HTR

Model Description

This is a TrOCR (Transformer-based Optical Character Recognition) model fine-tuned for handwritten text recognition. It uses a SWIN (Shifted Window Transformer) backbone as the image encoder and a BERT-based decoder for text generation.

Architecture: VisionEncoderDecoder with SWIN encoder and BERT decoder
Encoder: SwinForImageClassification (1024 hidden size, gelu activation)
Decoder: BertForMaskedLM (768 hidden size, 12 layers, 12 attention heads)
Training Time: 1.75 hours (6292 seconds)

Intended Use

Handwritten text recognition (HTR)
Document digitization
Historical document processing

Training Configuration

Key Hyperparameters:

Optimizer: Adam (β1=0.9, β2=0.999, ε=1e-8)
Batch Handling: Even batches enabled, seedable sampler
Precision: BF16 disabled
DataLoader: Pin memory enabled, 0 workers, no drop last

Decoder Specifications:

Vocabulary Size: 119,547
Max Position Embeddings: 512
Hidden Dropout Probability: 0.1
Attention Dropout Probability: 0.1
Layer Normalization EPS: 1e-12

Accelerator Configuration:

Even batches: true
Non-blocking: false
Split batches: false
Use seedable sampler: true

Usage

from transformers import TrOCRProcessor, VisionEncoderDecoderModel

processor = TrOCRProcessor.from_pretrained('model_name')
model = VisionEncoderDecoderModel.from_pretrained('model_name')

# Process image and generate text
inputs = processor(image, return_tensors="pt").pixel_values
outputs = model.generate(inputs)
texts = processor.batch_decode(outputs, skip_special_tokens=True)

print(texts[0])

Limitations

Primarily trained on handwritten text samples
Performance may vary with printed text or unusual fonts
Best results with clear, legible handwriting

Training Data

will be updated

Environmental Impact

GPU: NVIDIA T4 (16 GB VRAM)
Environment: Google Colab
Training Time: 1.75 hours

Model Card Contact

For questions about this model, please check the original training logs or contact the model owner.

Downloads last month: 12

Safetensors

Model size

0.3B params

Tensor type

I64

F32