Handwritten Chess Notation Recognition with TrOCR

Model Description

This is a fine-tuned version of Microsoft's TrOCR-Large model, specifically trained to recognize handwritten chess notation from images of chess scoresheets. The model can accurately transcribe chess moves in Standard Algebraic Notation (SAN) from handwritten images.

Model Type: Vision Encoder-Decoder
Architecture: TrOCR (Transformer-based Optical Character Recognition)
Base Model: microsoft/trocr-large-handwritten

Intended Uses & Limitations

Intended Use

Transcription of handwritten chess moves from scoresheet images
Digitization of historical chess games
Chess notation recognition in mobile apps
Educational tools for chess analysis

Limitations

Works best with clear handwriting
Trained specifically on chess notation (not general text)
May struggle with extremely cursive handwriting
Requires well-lit, focused images

Training Data

The model was trained on a custom dataset of 13,731 handwritten chess move images with corresponding text labels in Standard Algebraic Notation.

Dataset Characteristics:

Format: PNG images with move text annotations
Content: Chess moves in SAN format (e.g., "e4", "Nf3", "O-O", "Qxf7#")
Split: 88% training, 12% validation
Handwriting styles: Multiple variations

Training Procedure

Preprocessing

Images were resized to 384x384 pixels and converted to RGB format. Text was tokenized with chess-specific vocabulary and padded/truncated to a maximum length of 16 tokens.

Training Hyperparameters

Epochs: 6
Batch Size: 1 (with gradient accumulation of 8)
Learning Rate: 3e-5
Optimizer: AdamW
Weight Decay: 0.01
Warmup Steps: 200
Mixed Precision: FP16

Hardware

GPU: NVIDIA (8+ GB VRAM recommended)
Training Time: ~6 hours

Evaluation Results

Metric	Value
Accuracy	~92%
Character Error Rate (CER)	~3%
Inference Speed	~100 ms/image

How to Use

Direct Inference

from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image
import requests

# Load model and processor
model = VisionEncoderDecoderModel.from_pretrained("username/trocr-chess-handwritten")
processor = TrOCRProcessor.from_pretrained("username/trocr-chess-handwritten")

# Load and process image
image = Image.open("chess_move.png").convert("RGB")
pixel_values = processor(image, return_tensors="pt").pixel_values

# Generate prediction
generated_ids = model.generate(pixel_values)
predicted_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(f"Predicted move: {predicted_text}")

Downloads last month: 51