TROCR-Chess / README.md
uchihamadara1816's picture
Update README.md
2cf5b72 verified
---
language: en
license: mit
tags:
- chess
- ocr
- handwritten-text-recognition
- trocr
- transformers
- computer-vision
- image-to-text
datasets:
- handwritten-chess-notation
metrics:
- accuracy
- cer
widget:
- src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/chess_example.png
example_title: Chess Move Example
pipeline_tag: image-to-text
---
# Handwritten Chess Notation Recognition with TrOCR
## Model Description
This is a fine-tuned version of Microsoft's TrOCR-Large model, specifically trained to recognize handwritten chess notation from images of chess scoresheets. The model can accurately transcribe chess moves in Standard Algebraic Notation (SAN) from handwritten images.
**Model Type:** Vision Encoder-Decoder
**Architecture:** TrOCR (Transformer-based Optical Character Recognition)
**Base Model:** `microsoft/trocr-large-handwritten`
## Intended Uses & Limitations
### Intended Use
- Transcription of handwritten chess moves from scoresheet images
- Digitization of historical chess games
- Chess notation recognition in mobile apps
- Educational tools for chess analysis
### Limitations
- Works best with clear handwriting
- Trained specifically on chess notation (not general text)
- May struggle with extremely cursive handwriting
- Requires well-lit, focused images
## Training Data
The model was trained on a custom dataset of 13,731 handwritten chess move images with corresponding text labels in Standard Algebraic Notation.
**Dataset Characteristics:**
- Format: PNG images with move text annotations
- Content: Chess moves in SAN format (e.g., "e4", "Nf3", "O-O", "Qxf7#")
- Split: 88% training, 12% validation
- Handwriting styles: Multiple variations
## Training Procedure
### Preprocessing
Images were resized to 384x384 pixels and converted to RGB format. Text was tokenized with chess-specific vocabulary and padded/truncated to a maximum length of 16 tokens.
### Training Hyperparameters
- **Epochs:** 6
- **Batch Size:** 1 (with gradient accumulation of 8)
- **Learning Rate:** 3e-5
- **Optimizer:** AdamW
- **Weight Decay:** 0.01
- **Warmup Steps:** 200
- **Mixed Precision:** FP16
### Hardware
- **GPU:** NVIDIA (8+ GB VRAM recommended)
- **Training Time:** ~6 hours
## Evaluation Results
| Metric | Value |
|--------|-------|
| Accuracy | ~92% |
| Character Error Rate (CER) | ~3% |
| Inference Speed | ~100 ms/image |
## How to Use
### Direct Inference
```python
from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image
import requests
# Load model and processor
model = VisionEncoderDecoderModel.from_pretrained("username/trocr-chess-handwritten")
processor = TrOCRProcessor.from_pretrained("username/trocr-chess-handwritten")
# Load and process image
image = Image.open("chess_move.png").convert("RGB")
pixel_values = processor(image, return_tensors="pt").pixel_values
# Generate prediction
generated_ids = model.generate(pixel_values)
predicted_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(f"Predicted move: {predicted_text}")