Handwritten Chess Notation Recognition with TrOCR

Model Description

This is a fine-tuned version of Microsoft's TrOCR-Large model, specifically trained to recognize handwritten chess notation from images of chess scoresheets. The model can accurately transcribe chess moves in Standard Algebraic Notation (SAN) from handwritten images.

Model Type: Vision Encoder-Decoder
Architecture: TrOCR (Transformer-based Optical Character Recognition)
Base Model: microsoft/trocr-large-handwritten

Intended Uses & Limitations

Intended Use

  • Transcription of handwritten chess moves from scoresheet images
  • Digitization of historical chess games
  • Chess notation recognition in mobile apps
  • Educational tools for chess analysis

Limitations

  • Works best with clear handwriting
  • Trained specifically on chess notation (not general text)
  • May struggle with extremely cursive handwriting
  • Requires well-lit, focused images

Training Data

The model was trained on a custom dataset of 13,731 handwritten chess move images with corresponding text labels in Standard Algebraic Notation.

Dataset Characteristics:

  • Format: PNG images with move text annotations
  • Content: Chess moves in SAN format (e.g., "e4", "Nf3", "O-O", "Qxf7#")
  • Split: 88% training, 12% validation
  • Handwriting styles: Multiple variations

Training Procedure

Preprocessing

Images were resized to 384x384 pixels and converted to RGB format. Text was tokenized with chess-specific vocabulary and padded/truncated to a maximum length of 16 tokens.

Training Hyperparameters

  • Epochs: 6
  • Batch Size: 1 (with gradient accumulation of 8)
  • Learning Rate: 3e-5
  • Optimizer: AdamW
  • Weight Decay: 0.01
  • Warmup Steps: 200
  • Mixed Precision: FP16

Hardware

  • GPU: NVIDIA (8+ GB VRAM recommended)
  • Training Time: ~6 hours

Evaluation Results

Metric Value
Accuracy ~92%
Character Error Rate (CER) ~3%
Inference Speed ~100 ms/image

How to Use

Direct Inference

from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image
import requests

# Load model and processor
model = VisionEncoderDecoderModel.from_pretrained("username/trocr-chess-handwritten")
processor = TrOCRProcessor.from_pretrained("username/trocr-chess-handwritten")

# Load and process image
image = Image.open("chess_move.png").convert("RGB")
pixel_values = processor(image, return_tensors="pt").pixel_values

# Generate prediction
generated_ids = model.generate(pixel_values)
predicted_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(f"Predicted move: {predicted_text}")
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support