|
|
--- |
|
|
language: en |
|
|
license: mit |
|
|
tags: |
|
|
- chess |
|
|
- ocr |
|
|
- handwritten-text-recognition |
|
|
- trocr |
|
|
- transformers |
|
|
- computer-vision |
|
|
- image-to-text |
|
|
datasets: |
|
|
- handwritten-chess-notation |
|
|
metrics: |
|
|
- accuracy |
|
|
- cer |
|
|
widget: |
|
|
- src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/chess_example.png |
|
|
example_title: Chess Move Example |
|
|
pipeline_tag: image-to-text |
|
|
--- |
|
|
|
|
|
# Handwritten Chess Notation Recognition with TrOCR |
|
|
|
|
|
## Model Description |
|
|
|
|
|
This is a fine-tuned version of Microsoft's TrOCR-Large model, specifically trained to recognize handwritten chess notation from images of chess scoresheets. The model can accurately transcribe chess moves in Standard Algebraic Notation (SAN) from handwritten images. |
|
|
|
|
|
**Model Type:** Vision Encoder-Decoder |
|
|
**Architecture:** TrOCR (Transformer-based Optical Character Recognition) |
|
|
**Base Model:** `microsoft/trocr-large-handwritten` |
|
|
|
|
|
## Intended Uses & Limitations |
|
|
|
|
|
### Intended Use |
|
|
- Transcription of handwritten chess moves from scoresheet images |
|
|
- Digitization of historical chess games |
|
|
- Chess notation recognition in mobile apps |
|
|
- Educational tools for chess analysis |
|
|
|
|
|
### Limitations |
|
|
- Works best with clear handwriting |
|
|
- Trained specifically on chess notation (not general text) |
|
|
- May struggle with extremely cursive handwriting |
|
|
- Requires well-lit, focused images |
|
|
|
|
|
## Training Data |
|
|
|
|
|
The model was trained on a custom dataset of 13,731 handwritten chess move images with corresponding text labels in Standard Algebraic Notation. |
|
|
|
|
|
**Dataset Characteristics:** |
|
|
- Format: PNG images with move text annotations |
|
|
- Content: Chess moves in SAN format (e.g., "e4", "Nf3", "O-O", "Qxf7#") |
|
|
- Split: 88% training, 12% validation |
|
|
- Handwriting styles: Multiple variations |
|
|
|
|
|
## Training Procedure |
|
|
|
|
|
### Preprocessing |
|
|
Images were resized to 384x384 pixels and converted to RGB format. Text was tokenized with chess-specific vocabulary and padded/truncated to a maximum length of 16 tokens. |
|
|
|
|
|
### Training Hyperparameters |
|
|
- **Epochs:** 6 |
|
|
- **Batch Size:** 1 (with gradient accumulation of 8) |
|
|
- **Learning Rate:** 3e-5 |
|
|
- **Optimizer:** AdamW |
|
|
- **Weight Decay:** 0.01 |
|
|
- **Warmup Steps:** 200 |
|
|
- **Mixed Precision:** FP16 |
|
|
|
|
|
### Hardware |
|
|
- **GPU:** NVIDIA (8+ GB VRAM recommended) |
|
|
- **Training Time:** ~6 hours |
|
|
|
|
|
## Evaluation Results |
|
|
|
|
|
| Metric | Value | |
|
|
|--------|-------| |
|
|
| Accuracy | ~92% | |
|
|
| Character Error Rate (CER) | ~3% | |
|
|
| Inference Speed | ~100 ms/image | |
|
|
|
|
|
## How to Use |
|
|
|
|
|
### Direct Inference |
|
|
```python |
|
|
from transformers import TrOCRProcessor, VisionEncoderDecoderModel |
|
|
from PIL import Image |
|
|
import requests |
|
|
|
|
|
# Load model and processor |
|
|
model = VisionEncoderDecoderModel.from_pretrained("username/trocr-chess-handwritten") |
|
|
processor = TrOCRProcessor.from_pretrained("username/trocr-chess-handwritten") |
|
|
|
|
|
# Load and process image |
|
|
image = Image.open("chess_move.png").convert("RGB") |
|
|
pixel_values = processor(image, return_tensors="pt").pixel_values |
|
|
|
|
|
# Generate prediction |
|
|
generated_ids = model.generate(pixel_values) |
|
|
predicted_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0] |
|
|
print(f"Predicted move: {predicted_text}") |