daavidhauser
/

chess-bot-3000-100m

+                                                                                                                                  │
+│ ---                                                                                                                               │
+│ license: apache-2.0                                                                                                               │
+│ language:                                                                                                                         │
+│ - en                                                                                                                              │
+│ tags:                                                                                                                             │
+│ - chess                                                                                                                           │
+│ - game-ai                                                                                                                         │
+│ - uci                                                                                                                             │
+│ - nanotron                                                                                                                        │
+│ ---                                                                                                                               │
+│                                                                                                                                   │
+│ # Chess-Bot-3000 100M                                                                                                             │
+│                                                                                                                                   │
+│ A 100M parameter language model trained from scratch on chess games in UCI notation. The model learns to predict chess moves from │
+│  game context and can adapt its play style based on player Elo ratings.                                                           │
+│                                                                                                                                   │
+│ ## Model Details                                                                                                                  │
+│                                                                                                                                   │
+│ ### Model Description                                                                                                             │
+│                                                                                                                                   │
+│ This model is a transformer-based language model trained on millions of chess games from the Lichess database. It uses UCI        │
+│ (Universal Chess Interface) notation and includes special tokens for player Elo ratings and game outcomes, allowing it to         │
+│ generate moves appropriate for different skill levels.                                                                            │
+│                                                                                                                                   │
+│ - **Developed by:** [Your name/organization]                                                                                      │
+│ - **Model type:** Causal language model (decoder-only transformer)                                                                │
+│ - **Language(s):** Chess UCI notation                                                                                             │
+│ - **License:** Apache 2.0                                                                                                         │
+│ - **Architecture:** Qwen2-style (SmolLM3 base)                                                                                    │
+│ - **Parameters:** ~100M                                                                                                           │
+│                                                                                                                                   │
+│ ### Model Sources                                                                                                                 │
+│                                                                                                                                   │
+│ - **Repository:** [Your repository URL]                                                                                           │
+│                                                                                                                                   │
+│ ## Uses                                                                                                                           │
+│                                                                                                                                   │
+│ ### Direct Use                                                                                                                    │
+│                                                                                                                                   │
+│ The model can be used for:                                                                                                        │
+│ - Chess move prediction and game continuation                                                                                     │
+│ - Generating chess games at specific skill levels (by conditioning on Elo tokens)                                                 │
+│ - Chess position evaluation through next-move probabilities                                                                       │
+│ - Chess education and analysis tools                                                                                              │
+│                                                                                                                                   │
+│ ## Training Details                                                                                                               │
+│                                                                                                                                   │
+│ ### Training Data                                                                                                                 │
+│                                                                                                                                   │
+│ The model was trained on approximately 25 million chess games (~2B tokens) from the Lichess open database (January 2024). Games   │
+│ were converted to UCI notation and augmented with:                                                                                │
+│ - Player Elo ratings (rounded to nearest 100, range 0-3500)                                                                       │
+│ - Game outcomes (white win, black win, draw)                                                                                      │
+│ - Special tokens for game boundaries                                                                                              │
+│                                                                                                                                   │
+│ Each training example follows the format:                                                                                         │
+│ ```                                                                                                                               │
+│ <BOG> <WHITE:1600> <BLACK:1550> <WHITE_WIN> e2e4 e7e5 g1f3 ... <EOG>                                                              │
+│ ```                                                                                                                               │
+│                                                                                                                                   │
+│ In this representation, each chess move (half-move) corresponds to one token.                                                     │
+│                                                                                                                                   │
+│ ### Training Procedure                                                                                                            │
+│                                                                                                                                   │
+│ **Model Architecture:**                                                                                                           │
+│ - 12 transformer layers                                                                                                           │
+│ - 768 hidden dimensions                                                                                                           │
+│ - 8 attention heads (2 KV heads with Grouped Query Attention)                                                                     │
+│ - 3072 intermediate size                                                                                                          │
+│ - 4687 vocabulary size                                                                                                            │
+│ - 256 max sequence length                                                                                                         │
+│ - Flash Attention 2 for efficient training                                                                                        │
+│                                                                                                                                   │
+│ **Training Hyperparameters:**                                                                                                     │
+│ - Training regime: bfloat16 mixed precision                                                                                       │
+│ - Batch size: 1024 (64 per GPU × 4 gradient accumulation steps)                                                                   │
+│ - Learning rate: 6e-4 (cosine decay to 6e-5)                                                                                      │
+│ - Warmup steps: 520                                                                                                               │
+│ - Total training steps: 26,400                                                                                                    │
+│ - Total tokens: ~2B                                                                                                               │
+│ - Optimizer: AdamW (β₁=0.9, β₂=0.95, weight decay=0.1)                                                                            │
+│ - Gradient clipping: 1.0                                                                                                          │
+│                                                                                                                                   │
+│ **Training Infrastructure:**                                                                                                      │
+│ - Framework: Nanotron (PyTorch)                                                                                                   │
+│ - Hardware: 1x NVIDIA A100 GPU                                                                                                    │
+│ - Total training time: ~3 hours                                                                                                   │
+│                                                                                                                                   │
+│ ## How to Use                                                                                                                     │
+│                                                                                                                                   │
+│ ```python                                                                                                                         │
+│ from transformers import AutoTokenizer, AutoModelForCausalLM                                                                      │
+│                                                                                                                                   │
+│ # Load model and tokenizer                                                                                                        │
+│ model = AutoModelForCausalLM.from_pretrained("your-username/chess-bot-3000-100m")                                                 │
+│ tokenizer = AutoTokenizer.from_pretrained("your-username/chess-bot-3000-100m")                                                    │
+│                                                                                                                                   │
+│ # Generate a game at 1500 Elo level                                                                                               │
+│ prompt = "<BOG> <WHITE:1500> <BLACK:1500> <DRAW> e2e4"                                                                            │
+│ inputs = tokenizer(prompt, return_tensors="pt")                                                                                   │
+│ outputs = model.generate(**inputs, max_length=256)                                                                                │
+│ game = tokenizer.decode(outputs[0])                                                                                               │
+│ print(game)                                                                                                                       │
+│ ```                                                                                                                               │
+│                                                                                                                                   │
+│ ## Limitations                                                                                                                    │
+│                                                                                                                                   │
+│ - The model is trained only on standard chess games and may not handle unusual positions well                                     │
+│ - Performance has not been systematically evaluated against chess engines                                                         │
+│ - The model generates UCI notation strings but does not validate move legality                                                    │
+│ - Elo-conditioned generation is approximate and based on statistical patterns, not true playing strength                          │
+│                                                                                                                                   │
+│ ## Technical Specifications                                                                                                       │
+│                                                                                                                                   │
+│ ### Compute Infrastructure                                                                                                        │
+│                                                                                                                                   │
+│ **Hardware:**                                                                                                                     │
+│ - NVIDIA A100 GPU (40GB VRAM)                                                                                                     │
+│ - Leonardo supercomputer (CINECA)                                                                                                 │
+│                                                                                                                                   │
+│ **Software:**                                                                                                                     │
+│ - Nanotron training framework                                                                                                     │
+│ - PyTorch 2.x                                                                                                                     │
+│ - CUDA 12.4                                                                                                                       │
+│ - Flash Attention 2