uchihamadara1816
/

TROCR-Chess

+---
+language: en
+license: mit
+tags:
+- chess
+- ocr
+- handwritten-text-recognition
+- trocr
+- transformers
+- computer-vision
+- image-to-text
+datasets:
+- handwritten-chess-notation
+metrics:
+- accuracy
+- cer
+widget:
+- src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/chess_example.png
+  example_title: Chess Move Example
+pipeline_tag: image-to-text
+---
+# Handwritten Chess Notation Recognition with TrOCR
+## Model Description
+This is a fine-tuned version of Microsoft's TrOCR-Large model, specifically trained to recognize handwritten chess notation from images of chess scoresheets. The model can accurately transcribe chess moves in Standard Algebraic Notation (SAN) from handwritten images.
+**Model Type:** Vision Encoder-Decoder
+**Architecture:** TrOCR (Transformer-based Optical Character Recognition)
+**Base Model:** `microsoft/trocr-large-handwritten`
+## Intended Uses & Limitations
+### Intended Use
+- Transcription of handwritten chess moves from scoresheet images
+- Digitization of historical chess games
+- Chess notation recognition in mobile apps
+- Educational tools for chess analysis
+### Limitations
+- Works best with clear handwriting
+- Trained specifically on chess notation (not general text)
+- May struggle with extremely cursive handwriting
+- Requires well-lit, focused images
+## Training Data
+The model was trained on a custom dataset of 13,731 handwritten chess move images with corresponding text labels in Standard Algebraic Notation.
+**Dataset Characteristics:**
+- Format: PNG images with move text annotations
+- Content: Chess moves in SAN format (e.g., "e4", "Nf3", "O-O", "Qxf7#")
+- Split: 88% training, 12% validation
+- Handwriting styles: Multiple variations
+## Training Procedure
+### Preprocessing
+Images were resized to 384x384 pixels and converted to RGB format. Text was tokenized with chess-specific vocabulary and padded/truncated to a maximum length of 16 tokens.
+### Training Hyperparameters
+- **Epochs:** 6
+- **Batch Size:** 1 (with gradient accumulation of 8)
+- **Learning Rate:** 3e-5
+- **Optimizer:** AdamW
+- **Weight Decay:** 0.01
+- **Warmup Steps:** 200
+- **Mixed Precision:** FP16
+### Hardware
+- **GPU:** NVIDIA (8+ GB VRAM recommended)
+- **Training Time:** ~6 hours
+## Evaluation Results
+| Metric | Value |
+|--------|-------|
+| Accuracy | ~92% |
+| Character Error Rate (CER) | ~3% |
+| Inference Speed | ~100 ms/image |
+## How to Use
+### Direct Inference
+```python
+from transformers import TrOCRProcessor, VisionEncoderDecoderModel
+from PIL import Image
+import requests
+# Load model and processor
+model = VisionEncoderDecoderModel.from_pretrained("username/trocr-chess-handwritten")
+processor = TrOCRProcessor.from_pretrained("username/trocr-chess-handwritten")
+# Load and process image
+image = Image.open("chess_move.png").convert("RGB")
+pixel_values = processor(image, return_tensors="pt").pixel_values
+# Generate prediction
+generated_ids = model.generate(pixel_values)
+predicted_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
+print(f"Predicted move: {predicted_text}")