Handwritten Chess Notation Recognition with TrOCR
Model Description
This is a fine-tuned version of Microsoft's TrOCR-Large model, specifically trained to recognize handwritten chess notation from images of chess scoresheets. The model can accurately transcribe chess moves in Standard Algebraic Notation (SAN) from handwritten images.
Model Type: Vision Encoder-Decoder
Architecture: TrOCR (Transformer-based Optical Character Recognition)
Base Model: microsoft/trocr-large-handwritten
Intended Uses & Limitations
Intended Use
- Transcription of handwritten chess moves from scoresheet images
- Digitization of historical chess games
- Chess notation recognition in mobile apps
- Educational tools for chess analysis
Limitations
- Works best with clear handwriting
- Trained specifically on chess notation (not general text)
- May struggle with extremely cursive handwriting
- Requires well-lit, focused images
Training Data
The model was trained on a custom dataset of 13,731 handwritten chess move images with corresponding text labels in Standard Algebraic Notation.
Dataset Characteristics:
- Format: PNG images with move text annotations
- Content: Chess moves in SAN format (e.g., "e4", "Nf3", "O-O", "Qxf7#")
- Split: 88% training, 12% validation
- Handwriting styles: Multiple variations
Training Procedure
Preprocessing
Images were resized to 384x384 pixels and converted to RGB format. Text was tokenized with chess-specific vocabulary and padded/truncated to a maximum length of 16 tokens.
Training Hyperparameters
- Epochs: 6
- Batch Size: 1 (with gradient accumulation of 8)
- Learning Rate: 3e-5
- Optimizer: AdamW
- Weight Decay: 0.01
- Warmup Steps: 200
- Mixed Precision: FP16
Hardware
- GPU: NVIDIA (8+ GB VRAM recommended)
- Training Time: ~6 hours
Evaluation Results
| Metric | Value |
|---|---|
| Accuracy | ~92% |
| Character Error Rate (CER) | ~3% |
| Inference Speed | ~100 ms/image |
How to Use
Direct Inference
from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image
import requests
# Load model and processor
model = VisionEncoderDecoderModel.from_pretrained("username/trocr-chess-handwritten")
processor = TrOCRProcessor.from_pretrained("username/trocr-chess-handwritten")
# Load and process image
image = Image.open("chess_move.png").convert("RGB")
pixel_values = processor(image, return_tensors="pt").pixel_values
# Generate prediction
generated_ids = model.generate(pixel_values)
predicted_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(f"Predicted move: {predicted_text}")
- Downloads last month
- 2