uchihamadara1816 commited on
Commit
2cf5b72
·
verified ·
1 Parent(s): 2a1c2fb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +102 -3
README.md CHANGED
@@ -1,3 +1,102 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: mit
4
+ tags:
5
+ - chess
6
+ - ocr
7
+ - handwritten-text-recognition
8
+ - trocr
9
+ - transformers
10
+ - computer-vision
11
+ - image-to-text
12
+ datasets:
13
+ - handwritten-chess-notation
14
+ metrics:
15
+ - accuracy
16
+ - cer
17
+ widget:
18
+ - src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/chess_example.png
19
+ example_title: Chess Move Example
20
+ pipeline_tag: image-to-text
21
+ ---
22
+
23
+ # Handwritten Chess Notation Recognition with TrOCR
24
+
25
+ ## Model Description
26
+
27
+ This is a fine-tuned version of Microsoft's TrOCR-Large model, specifically trained to recognize handwritten chess notation from images of chess scoresheets. The model can accurately transcribe chess moves in Standard Algebraic Notation (SAN) from handwritten images.
28
+
29
+ **Model Type:** Vision Encoder-Decoder
30
+ **Architecture:** TrOCR (Transformer-based Optical Character Recognition)
31
+ **Base Model:** `microsoft/trocr-large-handwritten`
32
+
33
+ ## Intended Uses & Limitations
34
+
35
+ ### Intended Use
36
+ - Transcription of handwritten chess moves from scoresheet images
37
+ - Digitization of historical chess games
38
+ - Chess notation recognition in mobile apps
39
+ - Educational tools for chess analysis
40
+
41
+ ### Limitations
42
+ - Works best with clear handwriting
43
+ - Trained specifically on chess notation (not general text)
44
+ - May struggle with extremely cursive handwriting
45
+ - Requires well-lit, focused images
46
+
47
+ ## Training Data
48
+
49
+ The model was trained on a custom dataset of 13,731 handwritten chess move images with corresponding text labels in Standard Algebraic Notation.
50
+
51
+ **Dataset Characteristics:**
52
+ - Format: PNG images with move text annotations
53
+ - Content: Chess moves in SAN format (e.g., "e4", "Nf3", "O-O", "Qxf7#")
54
+ - Split: 88% training, 12% validation
55
+ - Handwriting styles: Multiple variations
56
+
57
+ ## Training Procedure
58
+
59
+ ### Preprocessing
60
+ Images were resized to 384x384 pixels and converted to RGB format. Text was tokenized with chess-specific vocabulary and padded/truncated to a maximum length of 16 tokens.
61
+
62
+ ### Training Hyperparameters
63
+ - **Epochs:** 6
64
+ - **Batch Size:** 1 (with gradient accumulation of 8)
65
+ - **Learning Rate:** 3e-5
66
+ - **Optimizer:** AdamW
67
+ - **Weight Decay:** 0.01
68
+ - **Warmup Steps:** 200
69
+ - **Mixed Precision:** FP16
70
+
71
+ ### Hardware
72
+ - **GPU:** NVIDIA (8+ GB VRAM recommended)
73
+ - **Training Time:** ~6 hours
74
+
75
+ ## Evaluation Results
76
+
77
+ | Metric | Value |
78
+ |--------|-------|
79
+ | Accuracy | ~92% |
80
+ | Character Error Rate (CER) | ~3% |
81
+ | Inference Speed | ~100 ms/image |
82
+
83
+ ## How to Use
84
+
85
+ ### Direct Inference
86
+ ```python
87
+ from transformers import TrOCRProcessor, VisionEncoderDecoderModel
88
+ from PIL import Image
89
+ import requests
90
+
91
+ # Load model and processor
92
+ model = VisionEncoderDecoderModel.from_pretrained("username/trocr-chess-handwritten")
93
+ processor = TrOCRProcessor.from_pretrained("username/trocr-chess-handwritten")
94
+
95
+ # Load and process image
96
+ image = Image.open("chess_move.png").convert("RGB")
97
+ pixel_values = processor(image, return_tensors="pt").pixel_values
98
+
99
+ # Generate prediction
100
+ generated_ids = model.generate(pixel_values)
101
+ predicted_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
102
+ print(f"Predicted move: {predicted_text}")