Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,126 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
β
|
| 2 |
+
β --- β
|
| 3 |
+
β license: apache-2.0 β
|
| 4 |
+
β language: β
|
| 5 |
+
β - en β
|
| 6 |
+
β tags: β
|
| 7 |
+
β - chess β
|
| 8 |
+
β - game-ai β
|
| 9 |
+
β - uci β
|
| 10 |
+
β - nanotron β
|
| 11 |
+
β --- β
|
| 12 |
+
β β
|
| 13 |
+
β # Chess-Bot-3000 100M β
|
| 14 |
+
β β
|
| 15 |
+
β A 100M parameter language model trained from scratch on chess games in UCI notation. The model learns to predict chess moves from β
|
| 16 |
+
β game context and can adapt its play style based on player Elo ratings. β
|
| 17 |
+
β β
|
| 18 |
+
β ## Model Details β
|
| 19 |
+
β β
|
| 20 |
+
β ### Model Description β
|
| 21 |
+
β β
|
| 22 |
+
β This model is a transformer-based language model trained on millions of chess games from the Lichess database. It uses UCI β
|
| 23 |
+
β (Universal Chess Interface) notation and includes special tokens for player Elo ratings and game outcomes, allowing it to β
|
| 24 |
+
β generate moves appropriate for different skill levels. β
|
| 25 |
+
β β
|
| 26 |
+
β - **Developed by:** [Your name/organization] β
|
| 27 |
+
β - **Model type:** Causal language model (decoder-only transformer) β
|
| 28 |
+
β - **Language(s):** Chess UCI notation β
|
| 29 |
+
β - **License:** Apache 2.0 β
|
| 30 |
+
β - **Architecture:** Qwen2-style (SmolLM3 base) β
|
| 31 |
+
β - **Parameters:** ~100M β
|
| 32 |
+
β β
|
| 33 |
+
β ### Model Sources β
|
| 34 |
+
β β
|
| 35 |
+
β - **Repository:** [Your repository URL] β
|
| 36 |
+
β β
|
| 37 |
+
β ## Uses β
|
| 38 |
+
β β
|
| 39 |
+
β ### Direct Use β
|
| 40 |
+
β β
|
| 41 |
+
β The model can be used for: β
|
| 42 |
+
β - Chess move prediction and game continuation β
|
| 43 |
+
β - Generating chess games at specific skill levels (by conditioning on Elo tokens) β
|
| 44 |
+
β - Chess position evaluation through next-move probabilities β
|
| 45 |
+
β - Chess education and analysis tools β
|
| 46 |
+
β β
|
| 47 |
+
β ## Training Details β
|
| 48 |
+
β β
|
| 49 |
+
β ### Training Data β
|
| 50 |
+
β β
|
| 51 |
+
β The model was trained on approximately 25 million chess games (~2B tokens) from the Lichess open database (January 2024). Games β
|
| 52 |
+
β were converted to UCI notation and augmented with: β
|
| 53 |
+
β - Player Elo ratings (rounded to nearest 100, range 0-3500) β
|
| 54 |
+
β - Game outcomes (white win, black win, draw) β
|
| 55 |
+
β - Special tokens for game boundaries β
|
| 56 |
+
β β
|
| 57 |
+
β Each training example follows the format: β
|
| 58 |
+
β ``` β
|
| 59 |
+
β <BOG> <WHITE:1600> <BLACK:1550> <WHITE_WIN> e2e4 e7e5 g1f3 ... <EOG> β
|
| 60 |
+
β ``` β
|
| 61 |
+
β β
|
| 62 |
+
β In this representation, each chess move (half-move) corresponds to one token. β
|
| 63 |
+
β β
|
| 64 |
+
β ### Training Procedure β
|
| 65 |
+
β β
|
| 66 |
+
β **Model Architecture:** β
|
| 67 |
+
β - 12 transformer layers β
|
| 68 |
+
β - 768 hidden dimensions β
|
| 69 |
+
β - 8 attention heads (2 KV heads with Grouped Query Attention) β
|
| 70 |
+
β - 3072 intermediate size β
|
| 71 |
+
β - 4687 vocabulary size β
|
| 72 |
+
β - 256 max sequence length β
|
| 73 |
+
β - Flash Attention 2 for efficient training β
|
| 74 |
+
β β
|
| 75 |
+
β **Training Hyperparameters:** β
|
| 76 |
+
β - Training regime: bfloat16 mixed precision β
|
| 77 |
+
β - Batch size: 1024 (64 per GPU Γ 4 gradient accumulation steps) β
|
| 78 |
+
β - Learning rate: 6e-4 (cosine decay to 6e-5) β
|
| 79 |
+
β - Warmup steps: 520 β
|
| 80 |
+
β - Total training steps: 26,400 β
|
| 81 |
+
β - Total tokens: ~2B β
|
| 82 |
+
β - Optimizer: AdamW (Ξ²β=0.9, Ξ²β=0.95, weight decay=0.1) β
|
| 83 |
+
β - Gradient clipping: 1.0 β
|
| 84 |
+
β β
|
| 85 |
+
β **Training Infrastructure:** β
|
| 86 |
+
β - Framework: Nanotron (PyTorch) β
|
| 87 |
+
β - Hardware: 1x NVIDIA A100 GPU β
|
| 88 |
+
β - Total training time: ~3 hours β
|
| 89 |
+
β β
|
| 90 |
+
β ## How to Use β
|
| 91 |
+
β β
|
| 92 |
+
β ```python β
|
| 93 |
+
β from transformers import AutoTokenizer, AutoModelForCausalLM β
|
| 94 |
+
β β
|
| 95 |
+
β # Load model and tokenizer β
|
| 96 |
+
β model = AutoModelForCausalLM.from_pretrained("your-username/chess-bot-3000-100m") β
|
| 97 |
+
β tokenizer = AutoTokenizer.from_pretrained("your-username/chess-bot-3000-100m") β
|
| 98 |
+
β β
|
| 99 |
+
β # Generate a game at 1500 Elo level β
|
| 100 |
+
β prompt = "<BOG> <WHITE:1500> <BLACK:1500> <DRAW> e2e4" β
|
| 101 |
+
β inputs = tokenizer(prompt, return_tensors="pt") β
|
| 102 |
+
β outputs = model.generate(**inputs, max_length=256) β
|
| 103 |
+
β game = tokenizer.decode(outputs[0]) β
|
| 104 |
+
β print(game) β
|
| 105 |
+
β ``` β
|
| 106 |
+
β β
|
| 107 |
+
β ## Limitations β
|
| 108 |
+
β β
|
| 109 |
+
β - The model is trained only on standard chess games and may not handle unusual positions well β
|
| 110 |
+
β - Performance has not been systematically evaluated against chess engines β
|
| 111 |
+
β - The model generates UCI notation strings but does not validate move legality β
|
| 112 |
+
β - Elo-conditioned generation is approximate and based on statistical patterns, not true playing strength β
|
| 113 |
+
β β
|
| 114 |
+
β ## Technical Specifications β
|
| 115 |
+
β β
|
| 116 |
+
β ### Compute Infrastructure β
|
| 117 |
+
β β
|
| 118 |
+
β **Hardware:** β
|
| 119 |
+
β - NVIDIA A100 GPU (40GB VRAM) β
|
| 120 |
+
β - Leonardo supercomputer (CINECA) β
|
| 121 |
+
β β
|
| 122 |
+
β **Software:** β
|
| 123 |
+
β - Nanotron training framework β
|
| 124 |
+
β - PyTorch 2.x β
|
| 125 |
+
β - CUDA 12.4 β
|
| 126 |
+
β - Flash Attention 2
|