| --- |
| license: mit |
| tags: |
| - chess |
| - transformer |
| - policy-value |
| datasets: |
| - avewright/chess-positions-lichess-sf |
| --- |
| |
| # ChessTransformer200M |
|
|
| A 204M parameter chess-native transformer trained on Stockfish-labeled positions. |
|
|
| ## Architecture |
| - **Encoder**: FusedBoardEncoder (256d) — learned piece-color + square + context embeddings |
| - **Backbone**: 16-layer Transformer (1024d, 16 heads, FFN 4096, GELU, norm_first) |
| - **Policy Head**: SpatialPolicyHead (from×to square features, 512d) |
| - **Value Head**: WDL (win/draw/loss) classification |
| |
| ## Training |
| - **Dataset**: avewright/chess-positions-lichess-sf (10.2M positions seen out of 48M available) |
| - **Steps**: 10,000 optimizer steps (effective batch 1024) |
| - **Final Policy Loss**: ~2.5 (estimated from loss curve) |
| - **Top-1 Accuracy**: 18.4% (on 5K eval positions vs Stockfish best moves) |
| - **GPU**: NVIDIA A40 46GB, FP16 + torch.compile |
| - **Training time**: ~6 hours to step 10,000 |
| |
| ## Usage |
| |
| ```python |
| import torch |
| from play import ChessTransformer200M, load_model, encode_board, get_model_move |
| import chess |
| |
| model = load_model("best_model.pt", torch.device("cpu")) |
| board = chess.Board() |
| move, info = get_model_move(model, board, torch.device("cpu")) |
| print(f"Best move: {move.uci()}, Top 5: {info['top_moves']}") |
| ``` |
| |
| ## Files |
| - `best_model.pt` — Model weights only (816 MB) |
| - `training_log.json` — Loss curve data |
| - `config.json` — Architecture config |
| |
| ## Known Issues |
| - Training hit FP16 NaN at step ~13,800. Best checkpoint is step 10,000. |
| - Model is only ~21% through 1 epoch of the 48M subset dataset. |
| - Opens with 1.d4 as White. Plays reasonable chess but still early in training. |
| |