uchihamadara1816
/

TROCR-Chess

vision-encoder-decoder

image-text-to-text

handwritten-text-recognition

computer-vision

Model card Files Files and versions

TROCR-Chess / README.md

uchihamadara1816's picture

uchihamadara1816

Update README.md

2cf5b72 verified 2 months ago

|

history blame contribute delete

3.08 kB

	---
	language: en
	license: mit
	tags:
	- chess
	- ocr
	- handwritten-text-recognition
	- trocr
	- transformers
	- computer-vision
	- image-to-text
	datasets:
	- handwritten-chess-notation
	metrics:
	- accuracy
	- cer
	widget:
	- src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/chess_example.png
	example_title: Chess Move Example
	pipeline_tag: image-to-text
	---

	# Handwritten Chess Notation Recognition with TrOCR

	## Model Description

	This is a fine-tuned version of Microsoft's TrOCR-Large model, specifically trained to recognize handwritten chess notation from images of chess scoresheets. The model can accurately transcribe chess moves in Standard Algebraic Notation (SAN) from handwritten images.

	Model Type: Vision Encoder-Decoder
	Architecture: TrOCR (Transformer-based Optical Character Recognition)
	Base Model: `microsoft/trocr-large-handwritten`

	## Intended Uses & Limitations

	### Intended Use
	- Transcription of handwritten chess moves from scoresheet images
	- Digitization of historical chess games
	- Chess notation recognition in mobile apps
	- Educational tools for chess analysis

	### Limitations
	- Works best with clear handwriting
	- Trained specifically on chess notation (not general text)
	- May struggle with extremely cursive handwriting
	- Requires well-lit, focused images

	## Training Data

	The model was trained on a custom dataset of 13,731 handwritten chess move images with corresponding text labels in Standard Algebraic Notation.

	Dataset Characteristics:
	- Format: PNG images with move text annotations
	- Content: Chess moves in SAN format (e.g., "e4", "Nf3", "O-O", "Qxf7#")
	- Split: 88% training, 12% validation
	- Handwriting styles: Multiple variations

	## Training Procedure

	### Preprocessing
	Images were resized to 384x384 pixels and converted to RGB format. Text was tokenized with chess-specific vocabulary and padded/truncated to a maximum length of 16 tokens.

	### Training Hyperparameters
	- Epochs: 6
	- Batch Size: 1 (with gradient accumulation of 8)
	- Learning Rate: 3e-5
	- Optimizer: AdamW
	- Weight Decay: 0.01
	- Warmup Steps: 200
	- Mixed Precision: FP16

	### Hardware
	- GPU: NVIDIA (8+ GB VRAM recommended)
	- Training Time: ~6 hours

	## Evaluation Results

	\| Metric \| Value \|
	\|--------\|-------\|
	\| Accuracy \| ~92% \|
	\| Character Error Rate (CER) \| ~3% \|
	\| Inference Speed \| ~100 ms/image \|

	## How to Use

	### Direct Inference
	```python
	from transformers import TrOCRProcessor, VisionEncoderDecoderModel
	from PIL import Image
	import requests

	# Load model and processor
	model = VisionEncoderDecoderModel.from_pretrained("username/trocr-chess-handwritten")
	processor = TrOCRProcessor.from_pretrained("username/trocr-chess-handwritten")

	# Load and process image
	image = Image.open("chess_move.png").convert("RGB")
	pixel_values = processor(image, return_tensors="pt").pixel_values

	# Generate prediction
	generated_ids = model.generate(pixel_values)
	predicted_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
	print(f"Predicted move: {predicted_text}")