ChessFormer-SL / README.md
kaupane's picture
Update README.md
cf347c8 verified
---
license: mit
tags:
- chess
- transformer
- reinforcement-learning
- game-playing
library_name: pytorch
---
# ChessFormer-SL
ChessFormer-SL is a transformer-based chess model trained via supervised learning on Stockfish evaluations. This model explores training chess engines without Monte Carlo Tree Search (MCTS), using only neural networks.
## Model Description
- **Model type**: Transformer for chess position evaluation and move prediction
- **Language(s)**: Chess (FEN notation)
- **License**: MIT
- **Parameters**: 100.7M
## Architecture
ChessFormer uses a custom transformer architecture optimized for chess:
- **Blocks**: 20 transformer layers
- **Hidden size**: 640
- **Attention heads**: 8
- **Intermediate size**: 1728
- **Features**: RMSNorm, SwiGLU activation, custom FEN tokenizer
### Input Format
The model processes FEN strings and repetition counts, tokenizing them into 75-token sequences representing:
- 64 board square tokens (pieces + positional embeddings)
- 9 metadata tokens (turn, castling, en passant, clocks, repetitions)
- 2 special tokens (action, value)
### Output Format
- **Policy head**: Logits over 1,969 structurally valid chess moves
- **Value head**: Position evaluation from current player's perspective
## Training Details
### Training Data
- **Dataset**: `kaupane/lichess-2023-01-stockfish-annotated` (depth18 split)
- **Size**: 56M positions with Stockfish evaluations
- **Validation**: depth27 split
### Training Procedure
- **Method**: Supervised learning on Stockfish move recommendations and evaluations
- **Objective**: Cross-entropy loss (moves) + MSE loss (values) + invalid move penalty
- **Hardware**: RTX 4060Ti 16GB
- **Duration**: ~2 weeks
- **Checkpoints**: 20 total, this model is the final checkpoint
### Training Metrics
- **Action Loss**: 1.6985
- **Value Loss**: 0.0407
- **Invalid Loss**: 0.0303
## Performance
### Capabilities
- βœ… Reasonable opening and endgame play
- βœ… Fast inference without search
- βœ… Better than next-token prediction chess models
- βœ… Can defeat Stockfish occasionally with search enhancement
### Limitations
- ❌ Frequent tactical blunders in midgame
- ❌ Estimated ELO ~1500 (informal assessment)
- ❌ Struggles with complex tactical combinations
- ❌ Tends to give away pieces ("free captures")
## Usage
### Installation
```bash
pip install torch transformers huggingface_hub chess
# Download model.py from this repository
```
### Basic Usage
```python
import torch
from model import ChessFormerModel
# Load model
model = ChessFormerModel.from_pretrained("kaupane/ChessFormer-SL")
model.eval()
# Analyze position
fens = ["rnbqkbnr/pppppppp/8/8/4P3/8/PPPP1PPP/RNBQKBNR b KQkq e3 0 1"]
repetitions = torch.tensor([1])
with torch.no_grad():
move_logits, position_value = model(fens, repetitions)
# Get best move (requires additional processing for legal moves)
print(f"Position value: {position_value.item():.3f}")
```
### With Chess Engine Interface
```python
from engine import Engine, ChessformerConfig
import chess
# Create engine
config = ChessformerConfig(
chessformer=model,
temperature=0.5,
depth=2 # Enable search enhancement
)
engine = Engine(type="chessformer", chessformer_config=config)
# Play move
board = chess.Board()
move_uci, value = engine.move(board)
print(f"Suggested move: {move_uci}, Value: {value:.3f}")
```
## Limitations and Bias
### Technical Limitations
- **Tactical weakness**: Prone to hanging pieces and missing simple tactics
- **Computational inefficiency**: FEN tokenization creates training bottlenecks, preprocess the entire dataset before training should be benefical
### Potential Biases
- Trained exclusively on Stockfish evaluations, may inherit engine biases
- May not generalize to unconventional openings or endgames
### Known Issues
- Piece embeddings have consistently lower norms than positional embeddings
- Model sometimes assigns probability (though unlikely, ~3%) to invalid moves despite training penalty
- Performance degrades without search enhancement
## Ethical Considerations
This model is intended for:
- βœ… Educational purposes and chess learning
- βœ… Research into neural chess architectures
- βœ… Developing chess training tools
Not recommended for:
- ❌ Competitive chess tournaments
- ❌ Production chess engines without extensive testing
- ❌ Applications requiring reliable tactical calculation
## Additional Information
- **Repository**: [GitHub link](https://github.com/Mtrya/chess-transformer)
- **Demo**: [HuggingFace Space Demo](https://huggingface.co/spaces/kaupane/Chessformer_Demo)
- **Related**: [ChessFormer-RL](https://huggingface.co/kaupane/ChessFormer-RL) (RL training experiment)