--- language: en license: mit tags: - chess - ocr - handwritten-text-recognition - trocr - transformers - computer-vision - image-to-text datasets: - handwritten-chess-notation metrics: - accuracy - cer widget: - src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/chess_example.png example_title: Chess Move Example pipeline_tag: image-to-text --- # Handwritten Chess Notation Recognition with TrOCR ## Model Description This is a fine-tuned version of Microsoft's TrOCR-Large model, specifically trained to recognize handwritten chess notation from images of chess scoresheets. The model can accurately transcribe chess moves in Standard Algebraic Notation (SAN) from handwritten images. **Model Type:** Vision Encoder-Decoder **Architecture:** TrOCR (Transformer-based Optical Character Recognition) **Base Model:** `microsoft/trocr-large-handwritten` ## Intended Uses & Limitations ### Intended Use - Transcription of handwritten chess moves from scoresheet images - Digitization of historical chess games - Chess notation recognition in mobile apps - Educational tools for chess analysis ### Limitations - Works best with clear handwriting - Trained specifically on chess notation (not general text) - May struggle with extremely cursive handwriting - Requires well-lit, focused images ## Training Data The model was trained on a custom dataset of 13,731 handwritten chess move images with corresponding text labels in Standard Algebraic Notation. **Dataset Characteristics:** - Format: PNG images with move text annotations - Content: Chess moves in SAN format (e.g., "e4", "Nf3", "O-O", "Qxf7#") - Split: 88% training, 12% validation - Handwriting styles: Multiple variations ## Training Procedure ### Preprocessing Images were resized to 384x384 pixels and converted to RGB format. Text was tokenized with chess-specific vocabulary and padded/truncated to a maximum length of 16 tokens. ### Training Hyperparameters - **Epochs:** 6 - **Batch Size:** 1 (with gradient accumulation of 8) - **Learning Rate:** 3e-5 - **Optimizer:** AdamW - **Weight Decay:** 0.01 - **Warmup Steps:** 200 - **Mixed Precision:** FP16 ### Hardware - **GPU:** NVIDIA (8+ GB VRAM recommended) - **Training Time:** ~6 hours ## Evaluation Results | Metric | Value | |--------|-------| | Accuracy | ~92% | | Character Error Rate (CER) | ~3% | | Inference Speed | ~100 ms/image | ## How to Use ### Direct Inference ```python from transformers import TrOCRProcessor, VisionEncoderDecoderModel from PIL import Image import requests # Load model and processor model = VisionEncoderDecoderModel.from_pretrained("username/trocr-chess-handwritten") processor = TrOCRProcessor.from_pretrained("username/trocr-chess-handwritten") # Load and process image image = Image.open("chess_move.png").convert("RGB") pixel_values = processor(image, return_tensors="pt").pixel_values # Generate prediction generated_ids = model.generate(pixel_values) predicted_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0] print(f"Predicted move: {predicted_text}")