rubenifrah's picture
Chess Challenge submission by rubenifrah
b0ad0b2 verified
Metadata-Version: 2.4
Name: chess-challenge
Version: 0.1.0
Summary: LLM Chess Challenge - Train a 1M parameter model to play chess
Author-email: Nathanaël Fijalkow <nathanael.fijalkow@gmail.com>
License: MIT
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Education
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: torch>=2.0.0
Requires-Dist: transformers>=4.40.0
Requires-Dist: accelerate>=0.26.0
Requires-Dist: datasets>=2.14.0
Requires-Dist: python-chess>=1.999
Requires-Dist: huggingface-hub>=0.20.0
Requires-Dist: tqdm>=4.65.0
Requires-Dist: numpy>=1.24.0
Requires-Dist: wandb>=0.15.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Provides-Extra: eval
Requires-Dist: stockfish>=3.28.0; extra == "eval"
# Chess Challenge
Train a 1M parameter LLM to play chess!
## Objective
Design and train a transformer-based language model to predict chess moves. Your model must:
1. **Stay under 1M parameters** - This is the hard constraint!
2. **Use a custom tokenizer** - Design an efficient move-level tokenizer
3. **Play legal chess** - The model should learn to generate valid moves
4. **Beat Stockfish** - Your ELO will be measured against Stockfish Level 1
## Dataset
We use the Lichess dataset: [`dlouapre/lichess_2025-01_1M`](https://huggingface.co/datasets/dlouapre/lichess_2025-01_1M)
The dataset uses an extended UCI notation:
- `W`/`B` prefix for White/Black
- Piece letter: `P`=Pawn, `N`=Knight, `B`=Bishop, `R`=Rook, `Q`=Queen, `K`=King
- Source and destination squares (e.g., `e2e4`)
- Special suffixes: `(x)`=capture, `(+)`=check, `(+*)`=checkmate, `(o)`/`(O)`=castling
Example game:
```
WPe2e4 BPe7e5 WNg1f3 BNb8c6 WBf1b5 BPa7a6 WBb5c6(x) BPd7c6(x) ...
```
## Quick Start
### Installation
```bash
# Clone the template
git clone https://github.com/nathanael-fijalkow/ChessChallengeTemplate.git
cd ChessChallengeTemplate
# Create virtual environment
uv venv
source .venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
uv pip install -e .
```
### Train a Model
```bash
# Basic training
python -m src.train \
--output_dir ./my_model \
--num_train_epochs 3 \
--per_device_train_batch_size 32
```
### Evaluate Your Model
Evaluation happens in two phases:
```bash
# Phase 1: Legal Move Evaluation (quick sanity check)
python -m src.evaluate \
--model_path ./my_model/final_model \
--mode legal \
--n_positions 500
# Phase 2: Win Rate Evaluation (full games against Stockfish)
python -m src.evaluate \
--model_path ./my_model/final_model \
--mode winrate \
--n_games 100 \
--stockfish_level 1
# Or run both phases:
python -m src.evaluate \
--model_path ./my_model/final_model \
--mode both
```
## Parameter Budget
Use the utility function to check your budget:
```python
from src import ChessConfig, print_parameter_budget
config = ChessConfig(
vocab_size=1200,
n_embd=128,
n_layer=4,
n_head=4,
)
print_parameter_budget(config)
```
### Pro Tips
1. **Weight Tying**: The default config ties the embedding and output layer weights, saving ~154k parameters
2. **Vocabulary Size**: Keep it small! ~1200 tokens covers all moves
3. **Depth vs Width**: With limited parameters, experiment with shallow-but-wide vs deep-but-narrow
## Customization
### Custom Tokenizer
The template provides a move-level tokenizer that builds vocabulary from the actual dataset.
Feel free to try different approaches!
### Custom Architecture
Modify the model in `src/model.py`:
```python
from src import ChessConfig, ChessForCausalLM
# Customize configuration
config = ChessConfig(
vocab_size=1200,
n_embd=128, # Try 96, 128, or 192
n_layer=4, # Try 3, 4, or 6
n_head=4, # Try 4 or 8
n_inner=384, # Feed-forward dimension (default: 3*n_embd)
dropout=0.1,
tie_weights=True,
)
model = ChessForCausalLM(config)
```
## Evaluation Metrics
### Phase 1: Legal Move Evaluation
Tests if your model generates valid chess moves:
| Metric | Description |
|--------|-------------|
| **Legal Rate (1st try)** | % of legal moves on first attempt |
| **Legal Rate (with retry)** | % of legal moves within 3 attempts |
> **Target**: >90% legal rate before proceeding to Phase 2
### Phase 2: Win Rate Evaluation
Full games against Stockfish to measure playing strength:
| Metric | Description |
|--------|-------------|
| **Win Rate** | % of games won against Stockfish |
| **ELO Rating** | Estimated rating based on game results |
| **Avg Game Length** | Average number of moves per game |
| **Illegal Move Rate** | % of illegal moves during games |
## Submission
1. Train your model
2. Log in to Hugging Face: `hf auth login`
3. Submit your model using the submission script:
```bash
python submit.py --model_path ./my_model/final_model --model_name your-model-name
```
The script will:
- Upload your model to the LLM-course organization
- Include your HF username in the model card for tracking