Update README.md
Browse files
README.md
CHANGED
|
@@ -1,126 +1,52 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
β - Player Elo ratings (rounded to nearest 100, range 0-3500) β
|
| 54 |
-
β - Game outcomes (white win, black win, draw) β
|
| 55 |
-
β - Special tokens for game boundaries β
|
| 56 |
-
β β
|
| 57 |
-
β Each training example follows the format: β
|
| 58 |
-
β ``` β
|
| 59 |
-
β <BOG> <WHITE:1600> <BLACK:1550> <WHITE_WIN> e2e4 e7e5 g1f3 ... <EOG> β
|
| 60 |
-
β ``` β
|
| 61 |
-
β β
|
| 62 |
-
β In this representation, each chess move (half-move) corresponds to one token. β
|
| 63 |
-
β β
|
| 64 |
-
β ### Training Procedure β
|
| 65 |
-
β β
|
| 66 |
-
β **Model Architecture:** β
|
| 67 |
-
β - 12 transformer layers β
|
| 68 |
-
β - 768 hidden dimensions β
|
| 69 |
-
β - 8 attention heads (2 KV heads with Grouped Query Attention) β
|
| 70 |
-
β - 3072 intermediate size β
|
| 71 |
-
β - 4687 vocabulary size β
|
| 72 |
-
β - 256 max sequence length β
|
| 73 |
-
β - Flash Attention 2 for efficient training β
|
| 74 |
-
β β
|
| 75 |
-
β **Training Hyperparameters:** β
|
| 76 |
-
β - Training regime: bfloat16 mixed precision β
|
| 77 |
-
β - Batch size: 1024 (64 per GPU Γ 4 gradient accumulation steps) β
|
| 78 |
-
β - Learning rate: 6e-4 (cosine decay to 6e-5) β
|
| 79 |
-
β - Warmup steps: 520 β
|
| 80 |
-
β - Total training steps: 26,400 β
|
| 81 |
-
β - Total tokens: ~2B β
|
| 82 |
-
β - Optimizer: AdamW (Ξ²β=0.9, Ξ²β=0.95, weight decay=0.1) β
|
| 83 |
-
β - Gradient clipping: 1.0 β
|
| 84 |
-
β β
|
| 85 |
-
β **Training Infrastructure:** β
|
| 86 |
-
β - Framework: Nanotron (PyTorch) β
|
| 87 |
-
β - Hardware: 1x NVIDIA A100 GPU β
|
| 88 |
-
β - Total training time: ~3 hours β
|
| 89 |
-
β β
|
| 90 |
-
β ## How to Use β
|
| 91 |
-
β β
|
| 92 |
-
β ```python β
|
| 93 |
-
β from transformers import AutoTokenizer, AutoModelForCausalLM β
|
| 94 |
-
β β
|
| 95 |
-
β # Load model and tokenizer β
|
| 96 |
-
β model = AutoModelForCausalLM.from_pretrained("your-username/chess-bot-3000-100m") β
|
| 97 |
-
β tokenizer = AutoTokenizer.from_pretrained("your-username/chess-bot-3000-100m") β
|
| 98 |
-
β β
|
| 99 |
-
β # Generate a game at 1500 Elo level β
|
| 100 |
-
β prompt = "<BOG> <WHITE:1500> <BLACK:1500> <DRAW> e2e4" β
|
| 101 |
-
β inputs = tokenizer(prompt, return_tensors="pt") β
|
| 102 |
-
β outputs = model.generate(**inputs, max_length=256) β
|
| 103 |
-
β game = tokenizer.decode(outputs[0]) β
|
| 104 |
-
β print(game) β
|
| 105 |
-
β ``` β
|
| 106 |
-
β β
|
| 107 |
-
β ## Limitations β
|
| 108 |
-
β β
|
| 109 |
-
β - The model is trained only on standard chess games and may not handle unusual positions well β
|
| 110 |
-
β - Performance has not been systematically evaluated against chess engines β
|
| 111 |
-
β - The model generates UCI notation strings but does not validate move legality β
|
| 112 |
-
β - Elo-conditioned generation is approximate and based on statistical patterns, not true playing strength β
|
| 113 |
-
β β
|
| 114 |
-
β ## Technical Specifications β
|
| 115 |
-
β β
|
| 116 |
-
β ### Compute Infrastructure β
|
| 117 |
-
β β
|
| 118 |
-
β **Hardware:** β
|
| 119 |
-
β - NVIDIA A100 GPU (40GB VRAM) β
|
| 120 |
-
β - Leonardo supercomputer (CINECA) β
|
| 121 |
-
β β
|
| 122 |
-
β **Software:** β
|
| 123 |
-
β - Nanotron training framework β
|
| 124 |
-
β - PyTorch 2.x β
|
| 125 |
-
β - CUDA 12.4 β
|
| 126 |
-
β - Flash Attention 2
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
language:
|
| 4 |
+
- en
|
| 5 |
+
tags:
|
| 6 |
+
- chess
|
| 7 |
+
- game-ai
|
| 8 |
+
- uci
|
| 9 |
+
- nanotron
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
# Chess-Bot-3000 100M
|
| 13 |
+
|
| 14 |
+
A 100M parameter language model trained from scratch on chess games in UCI notation. The model learns to predict chess moves from game context and can adapt its play style based on player Elo ratings.
|
| 15 |
+
|
| 16 |
+
## Model Details
|
| 17 |
+
|
| 18 |
+
### Model Description
|
| 19 |
+
|
| 20 |
+
This model is a transformer-based language model trained on millions of chess games from the Lichess database. It uses UCI (Universal Chess Interface) notation and includes special tokens for player Elo ratings and game outcomes, allowing it to generate moves appropriate for different skill levels.
|
| 21 |
+
|
| 22 |
+
- **Developed by:** [Your name/organization]
|
| 23 |
+
- **Model type:** Causal language model (decoder-only transformer)
|
| 24 |
+
- **Language(s):** Chess UCI notation
|
| 25 |
+
- **License:** Apache 2.0
|
| 26 |
+
- **Architecture:** Qwen2-style (SmolLM3 base)
|
| 27 |
+
- **Parameters:** ~100M
|
| 28 |
+
|
| 29 |
+
### Model Sources
|
| 30 |
+
|
| 31 |
+
- **Repository:** [Your repository URL]
|
| 32 |
+
|
| 33 |
+
## Uses
|
| 34 |
+
|
| 35 |
+
### Direct Use
|
| 36 |
+
|
| 37 |
+
The model can be used for:
|
| 38 |
+
- Chess move prediction and game continuation
|
| 39 |
+
- Generating chess games at specific skill levels (by conditioning on Elo tokens)
|
| 40 |
+
- Chess position evaluation through next-move probabilities
|
| 41 |
+
- Chess education and analysis tools
|
| 42 |
+
|
| 43 |
+
## Training Details
|
| 44 |
+
|
| 45 |
+
### Training Data
|
| 46 |
+
|
| 47 |
+
The model was trained on approximately 25 million chess games (~2B tokens) from the Lichess open database (January 2024). Games were converted to UCI notation and augmented with:
|
| 48 |
+
- Player Elo ratings (rounded to nearest 100, range 0-3500)
|
| 49 |
+
- Game outcomes (white win, black win, draw)
|
| 50 |
+
- Special tokens for game boundaries
|
| 51 |
+
|
| 52 |
+
Each training example follows the format:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|