File size: 3,083 Bytes
2cf5b72
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
---
language: en
license: mit
tags:
- chess
- ocr
- handwritten-text-recognition
- trocr
- transformers
- computer-vision
- image-to-text
datasets:
- handwritten-chess-notation
metrics:
- accuracy
- cer
widget:
- src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/chess_example.png
  example_title: Chess Move Example
pipeline_tag: image-to-text
---

# Handwritten Chess Notation Recognition with TrOCR

## Model Description

This is a fine-tuned version of Microsoft's TrOCR-Large model, specifically trained to recognize handwritten chess notation from images of chess scoresheets. The model can accurately transcribe chess moves in Standard Algebraic Notation (SAN) from handwritten images.

**Model Type:** Vision Encoder-Decoder  
**Architecture:** TrOCR (Transformer-based Optical Character Recognition)  
**Base Model:** `microsoft/trocr-large-handwritten`

## Intended Uses & Limitations

### Intended Use
- Transcription of handwritten chess moves from scoresheet images
- Digitization of historical chess games
- Chess notation recognition in mobile apps
- Educational tools for chess analysis

### Limitations
- Works best with clear handwriting
- Trained specifically on chess notation (not general text)
- May struggle with extremely cursive handwriting
- Requires well-lit, focused images

## Training Data

The model was trained on a custom dataset of 13,731 handwritten chess move images with corresponding text labels in Standard Algebraic Notation.

**Dataset Characteristics:**
- Format: PNG images with move text annotations
- Content: Chess moves in SAN format (e.g., "e4", "Nf3", "O-O", "Qxf7#")
- Split: 88% training, 12% validation
- Handwriting styles: Multiple variations

## Training Procedure

### Preprocessing
Images were resized to 384x384 pixels and converted to RGB format. Text was tokenized with chess-specific vocabulary and padded/truncated to a maximum length of 16 tokens.

### Training Hyperparameters
- **Epochs:** 6
- **Batch Size:** 1 (with gradient accumulation of 8)
- **Learning Rate:** 3e-5
- **Optimizer:** AdamW
- **Weight Decay:** 0.01
- **Warmup Steps:** 200
- **Mixed Precision:** FP16

### Hardware
- **GPU:** NVIDIA (8+ GB VRAM recommended)
- **Training Time:** ~6 hours

## Evaluation Results

| Metric | Value |
|--------|-------|
| Accuracy | ~92% |
| Character Error Rate (CER) | ~3% |
| Inference Speed | ~100 ms/image |

## How to Use

### Direct Inference
```python
from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image
import requests

# Load model and processor
model = VisionEncoderDecoderModel.from_pretrained("username/trocr-chess-handwritten")
processor = TrOCRProcessor.from_pretrained("username/trocr-chess-handwritten")

# Load and process image
image = Image.open("chess_move.png").convert("RGB")
pixel_values = processor(image, return_tensors="pt").pixel_values

# Generate prediction
generated_ids = model.generate(pixel_values)
predicted_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(f"Predicted move: {predicted_text}")