daavidhauser commited on
Commit
c9648e6
Β·
verified Β·
1 Parent(s): 0ce5083

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +126 -3
README.md CHANGED
@@ -1,3 +1,126 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ β”‚
2
+ β”‚ --- β”‚
3
+ β”‚ license: apache-2.0 β”‚
4
+ β”‚ language: β”‚
5
+ β”‚ - en β”‚
6
+ β”‚ tags: β”‚
7
+ β”‚ - chess β”‚
8
+ β”‚ - game-ai β”‚
9
+ β”‚ - uci β”‚
10
+ β”‚ - nanotron β”‚
11
+ β”‚ --- β”‚
12
+ β”‚ β”‚
13
+ β”‚ # Chess-Bot-3000 100M β”‚
14
+ β”‚ β”‚
15
+ β”‚ A 100M parameter language model trained from scratch on chess games in UCI notation. The model learns to predict chess moves from β”‚
16
+ β”‚ game context and can adapt its play style based on player Elo ratings. β”‚
17
+ β”‚ β”‚
18
+ β”‚ ## Model Details β”‚
19
+ β”‚ β”‚
20
+ β”‚ ### Model Description β”‚
21
+ β”‚ β”‚
22
+ β”‚ This model is a transformer-based language model trained on millions of chess games from the Lichess database. It uses UCI β”‚
23
+ β”‚ (Universal Chess Interface) notation and includes special tokens for player Elo ratings and game outcomes, allowing it to β”‚
24
+ β”‚ generate moves appropriate for different skill levels. β”‚
25
+ β”‚ β”‚
26
+ β”‚ - **Developed by:** [Your name/organization] β”‚
27
+ β”‚ - **Model type:** Causal language model (decoder-only transformer) β”‚
28
+ β”‚ - **Language(s):** Chess UCI notation β”‚
29
+ β”‚ - **License:** Apache 2.0 β”‚
30
+ β”‚ - **Architecture:** Qwen2-style (SmolLM3 base) β”‚
31
+ β”‚ - **Parameters:** ~100M β”‚
32
+ β”‚ β”‚
33
+ β”‚ ### Model Sources β”‚
34
+ β”‚ β”‚
35
+ β”‚ - **Repository:** [Your repository URL] β”‚
36
+ β”‚ β”‚
37
+ β”‚ ## Uses β”‚
38
+ β”‚ β”‚
39
+ β”‚ ### Direct Use β”‚
40
+ β”‚ β”‚
41
+ β”‚ The model can be used for: β”‚
42
+ β”‚ - Chess move prediction and game continuation β”‚
43
+ β”‚ - Generating chess games at specific skill levels (by conditioning on Elo tokens) β”‚
44
+ β”‚ - Chess position evaluation through next-move probabilities β”‚
45
+ β”‚ - Chess education and analysis tools β”‚
46
+ β”‚ β”‚
47
+ β”‚ ## Training Details β”‚
48
+ β”‚ β”‚
49
+ β”‚ ### Training Data β”‚
50
+ β”‚ β”‚
51
+ β”‚ The model was trained on approximately 25 million chess games (~2B tokens) from the Lichess open database (January 2024). Games β”‚
52
+ β”‚ were converted to UCI notation and augmented with: β”‚
53
+ β”‚ - Player Elo ratings (rounded to nearest 100, range 0-3500) β”‚
54
+ β”‚ - Game outcomes (white win, black win, draw) β”‚
55
+ β”‚ - Special tokens for game boundaries β”‚
56
+ β”‚ β”‚
57
+ β”‚ Each training example follows the format: β”‚
58
+ β”‚ ``` β”‚
59
+ β”‚ <BOG> <WHITE:1600> <BLACK:1550> <WHITE_WIN> e2e4 e7e5 g1f3 ... <EOG> β”‚
60
+ β”‚ ``` β”‚
61
+ β”‚ β”‚
62
+ β”‚ In this representation, each chess move (half-move) corresponds to one token. β”‚
63
+ β”‚ β”‚
64
+ β”‚ ### Training Procedure β”‚
65
+ β”‚ β”‚
66
+ β”‚ **Model Architecture:** β”‚
67
+ β”‚ - 12 transformer layers β”‚
68
+ β”‚ - 768 hidden dimensions β”‚
69
+ β”‚ - 8 attention heads (2 KV heads with Grouped Query Attention) β”‚
70
+ β”‚ - 3072 intermediate size β”‚
71
+ β”‚ - 4687 vocabulary size β”‚
72
+ β”‚ - 256 max sequence length β”‚
73
+ β”‚ - Flash Attention 2 for efficient training β”‚
74
+ β”‚ β”‚
75
+ β”‚ **Training Hyperparameters:** β”‚
76
+ β”‚ - Training regime: bfloat16 mixed precision β”‚
77
+ β”‚ - Batch size: 1024 (64 per GPU Γ— 4 gradient accumulation steps) β”‚
78
+ β”‚ - Learning rate: 6e-4 (cosine decay to 6e-5) β”‚
79
+ β”‚ - Warmup steps: 520 β”‚
80
+ β”‚ - Total training steps: 26,400 β”‚
81
+ β”‚ - Total tokens: ~2B β”‚
82
+ β”‚ - Optimizer: AdamW (β₁=0.9, Ξ²β‚‚=0.95, weight decay=0.1) β”‚
83
+ β”‚ - Gradient clipping: 1.0 β”‚
84
+ β”‚ β”‚
85
+ β”‚ **Training Infrastructure:** β”‚
86
+ β”‚ - Framework: Nanotron (PyTorch) β”‚
87
+ β”‚ - Hardware: 1x NVIDIA A100 GPU β”‚
88
+ β”‚ - Total training time: ~3 hours β”‚
89
+ β”‚ β”‚
90
+ β”‚ ## How to Use β”‚
91
+ β”‚ β”‚
92
+ β”‚ ```python β”‚
93
+ β”‚ from transformers import AutoTokenizer, AutoModelForCausalLM β”‚
94
+ β”‚ β”‚
95
+ β”‚ # Load model and tokenizer β”‚
96
+ β”‚ model = AutoModelForCausalLM.from_pretrained("your-username/chess-bot-3000-100m") β”‚
97
+ β”‚ tokenizer = AutoTokenizer.from_pretrained("your-username/chess-bot-3000-100m") β”‚
98
+ β”‚ β”‚
99
+ β”‚ # Generate a game at 1500 Elo level β”‚
100
+ β”‚ prompt = "<BOG> <WHITE:1500> <BLACK:1500> <DRAW> e2e4" β”‚
101
+ β”‚ inputs = tokenizer(prompt, return_tensors="pt") β”‚
102
+ β”‚ outputs = model.generate(**inputs, max_length=256) β”‚
103
+ β”‚ game = tokenizer.decode(outputs[0]) β”‚
104
+ β”‚ print(game) β”‚
105
+ β”‚ ``` β”‚
106
+ β”‚ β”‚
107
+ β”‚ ## Limitations β”‚
108
+ β”‚ β”‚
109
+ β”‚ - The model is trained only on standard chess games and may not handle unusual positions well β”‚
110
+ β”‚ - Performance has not been systematically evaluated against chess engines β”‚
111
+ β”‚ - The model generates UCI notation strings but does not validate move legality β”‚
112
+ β”‚ - Elo-conditioned generation is approximate and based on statistical patterns, not true playing strength β”‚
113
+ β”‚ β”‚
114
+ β”‚ ## Technical Specifications β”‚
115
+ β”‚ β”‚
116
+ β”‚ ### Compute Infrastructure β”‚
117
+ β”‚ β”‚
118
+ β”‚ **Hardware:** β”‚
119
+ β”‚ - NVIDIA A100 GPU (40GB VRAM) β”‚
120
+ β”‚ - Leonardo supercomputer (CINECA) β”‚
121
+ β”‚ β”‚
122
+ β”‚ **Software:** β”‚
123
+ β”‚ - Nanotron training framework β”‚
124
+ β”‚ - PyTorch 2.x β”‚
125
+ β”‚ - CUDA 12.4 β”‚
126
+ β”‚ - Flash Attention 2