ChessFormer-SL / README.md

Update README.md

cf347c8 verified 8 months ago

4.74 kB

	---
	license: mit
	tags:
	- chess
	- transformer
	- reinforcement-learning
	- game-playing
	library_name: pytorch
	---

	# ChessFormer-SL

	ChessFormer-SL is a transformer-based chess model trained via supervised learning on Stockfish evaluations. This model explores training chess engines without Monte Carlo Tree Search (MCTS), using only neural networks.

	## Model Description

	- Model type: Transformer for chess position evaluation and move prediction
	- Language(s): Chess (FEN notation)
	- License: MIT
	- Parameters: 100.7M

	## Architecture

	ChessFormer uses a custom transformer architecture optimized for chess:

	- Blocks: 20 transformer layers
	- Hidden size: 640
	- Attention heads: 8
	- Intermediate size: 1728
	- Features: RMSNorm, SwiGLU activation, custom FEN tokenizer

	### Input Format

	The model processes FEN strings and repetition counts, tokenizing them into 75-token sequences representing:

	- 64 board square tokens (pieces + positional embeddings)
	- 9 metadata tokens (turn, castling, en passant, clocks, repetitions)
	- 2 special tokens (action, value)

	### Output Format

	- Policy head: Logits over 1,969 structurally valid chess moves
	- Value head: Position evaluation from current player's perspective

	## Training Details

	### Training Data

	- Dataset: `kaupane/lichess-2023-01-stockfish-annotated` (depth18 split)
	- Size: 56M positions with Stockfish evaluations
	- Validation: depth27 split

	### Training Procedure

	- Method: Supervised learning on Stockfish move recommendations and evaluations
	- Objective: Cross-entropy loss (moves) + MSE loss (values) + invalid move penalty
	- Hardware: RTX 4060Ti 16GB
	- Duration: ~2 weeks
	- Checkpoints: 20 total, this model is the final checkpoint

	### Training Metrics

	- Action Loss: 1.6985
	- Value Loss: 0.0407
	- Invalid Loss: 0.0303

	## Performance

	### Capabilities

	- ✅ Reasonable opening and endgame play
	- ✅ Fast inference without search
	- ✅ Better than next-token prediction chess models
	- ✅ Can defeat Stockfish occasionally with search enhancement

	### Limitations

	- ❌ Frequent tactical blunders in midgame
	- ❌ Estimated ELO ~1500 (informal assessment)
	- ❌ Struggles with complex tactical combinations
	- ❌ Tends to give away pieces ("free captures")

	## Usage

	### Installation

	```bash
	pip install torch transformers huggingface_hub chess
	# Download model.py from this repository
	```

	### Basic Usage

	```python
	import torch
	from model import ChessFormerModel

	# Load model
	model = ChessFormerModel.from_pretrained("kaupane/ChessFormer-SL")
	model.eval()

	# Analyze position
	fens = ["rnbqkbnr/pppppppp/8/8/4P3/8/PPPP1PPP/RNBQKBNR b KQkq e3 0 1"]
	repetitions = torch.tensor([1])

	with torch.no_grad():
	move_logits, position_value = model(fens, repetitions)

	# Get best move (requires additional processing for legal moves)
	print(f"Position value: {position_value.item():.3f}")
	```

	### With Chess Engine Interface

	```python
	from engine import Engine, ChessformerConfig
	import chess

	# Create engine
	config = ChessformerConfig(
	chessformer=model,
	temperature=0.5,
	depth=2 # Enable search enhancement
	)
	engine = Engine(type="chessformer", chessformer_config=config)

	# Play move
	board = chess.Board()
	move_uci, value = engine.move(board)
	print(f"Suggested move: {move_uci}, Value: {value:.3f}")
	```

	## Limitations and Bias

	### Technical Limitations

	- Tactical weakness: Prone to hanging pieces and missing simple tactics
	- Computational inefficiency: FEN tokenization creates training bottlenecks, preprocess the entire dataset before training should be benefical

	### Potential Biases

	- Trained exclusively on Stockfish evaluations, may inherit engine biases
	- May not generalize to unconventional openings or endgames

	### Known Issues

	- Piece embeddings have consistently lower norms than positional embeddings
	- Model sometimes assigns probability (though unlikely, ~3%) to invalid moves despite training penalty
	- Performance degrades without search enhancement

	## Ethical Considerations

	This model is intended for:

	- ✅ Educational purposes and chess learning
	- ✅ Research into neural chess architectures
	- ✅ Developing chess training tools

	Not recommended for:

	- ❌ Competitive chess tournaments
	- ❌ Production chess engines without extensive testing
	- ❌ Applications requiring reliable tactical calculation

	## Additional Information

	- Repository: [GitHub link](https://github.com/Mtrya/chess-transformer)
	- Demo: [HuggingFace Space Demo](https://huggingface.co/spaces/kaupane/Chessformer_Demo)
	- Related: [ChessFormer-RL](https://huggingface.co/kaupane/ChessFormer-RL) (RL training experiment)