kaupane
/

ChessFormer-SL

@@ -1,10 +1,169 @@
 ---
 tags:
-- model_hub_mixin
-- pytorch_model_hub_mixin
 ---
-This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
-- Code: [More Information Needed]
-- Paper: [More Information Needed]
-- Docs: [More Information Needed]

 ---
+license: mit
 tags:
+- chess
+- transformer
+- reinforcement-learning
+- game-playing
+library_name: pytorch
 ---
+# ChessFormer-SL
+ChessFormer-SL is a transformer-based chess model trained via supervised learning on Stockfish evaluations. This model explores training chess engines without Monte Carlo Tree Search (MCTS), using only neural networks.
+## Model Description
+- **Model type**: Transformer for chess position evaluation and move prediction
+- **Language(s)**: Chess (FEN notation)
+- **License**: MIT
+- **Parameters**: 100.7M
+## Architecture
+ChessFormer uses a custom transformer architecture optimized for chess:
+- **Blocks**: 20 transformer layers
+- **Hidden size**: 640
+- **Attention heads**: 8
+- **Intermediate size**: 1728
+- **Features**: RMSNorm, SwiGLU activation, custom FEN tokenizer
+### Input Format
+The model processes FEN strings and repetition counts, tokenizing them into 75-token sequences representing:
+- 64 board square tokens (pieces + positional embeddings)
+- 9 metadata tokens (turn, castling, en passant, clocks, repetitions)
+- 2 special tokens (action, value)
+### Output Format
+- **Policy head**: Logits over 1,969 structurally valid chess moves
+- **Value head**: Position evaluation from current player's perspective
+## Training Details
+### Training Data
+- **Dataset**: `kaupane/lichess-2023-01-stockfish-annotated` (depth18 split)
+- **Size**: 56M positions with Stockfish evaluations
+- **Validation**: depth27 split
+### Training Procedure
+- **Method**: Supervised learning on Stockfish move recommendations and evaluations
+- **Objective**: Cross-entropy loss (moves) + MSE loss (values) + invalid move penalty
+- **Hardware**: RTX 4060Ti 16GB
+- **Duration**: ~2 weeks
+- **Checkpoints**: 20 total, this model is the final checkpoint
+### Training Metrics
+- **Action Loss**: 1.6985
+- **Value Loss**: 0.0407
+- **Invalid Loss**: 0.0303
+## Performance
+### Capabilities
+- ✅ Reasonable opening and endgame play
+- ✅ Fast inference without search
+- ✅ Better than next-token prediction chess models
+- ✅ Can defeat Stockfish occasionally with search enhancement
+### Limitations
+- ❌ Frequent tactical blunders in midgame
+- ❌ Estimated ELO ~1500 (informal assessment)
+- ❌ Struggles with complex tactical combinations
+- ❌ Tends to give away pieces ("free captures")
+## Usage
+### Installation
+```bash
+pip install torch transformers huggingface_hub chess
+# Download model.py from this repository
+```
+### Basic Usage
+```python
+import torch
+from model import ChessFormerModel
+# Load model
+model = ChessFormerModel.from_pretrained("kaupane/ChessFormer-SL")
+model.eval()
+# Analyze position
+fens = ["rnbqkbnr/pppppppp/8/8/4P3/8/PPPP1PPP/RNBQKBNR b KQkq e3 0 1"]
+repetitions = torch.tensor([1])
+with torch.no_grad():
+    move_logits, position_value = model(fens, repetitions)
+# Get best move (requires additional processing for legal moves)
+print(f"Position value: {position_value.item():.3f}")
+```
+### With Chess Engine Interface
+```python
+from engine import Engine, ChessformerConfig
+import chess
+# Create engine
+config = ChessformerConfig(
+    chessformer=model,
+    temperature=0.5,
+    depth=2  # Enable search enhancement
+)
+engine = Engine(type="chessformer", chessformer_config=config)
+# Play move
+board = chess.Board()
+move_uci, value = engine.move(board)
+print(f"Suggested move: {move_uci}, Value: {value:.3f}")
+```
+## Limitations and Bias
+### Technical Limitations
+- **Tactical weakness**: Prone to hanging pieces and missing simple tactics
+- **Computational inefficiency**: FEN tokenization creates training bottlenecks, preprocess the entire dataset before training should be benefical
+### Potential Biases
+- Trained exclusively on Stockfish evaluations, may inherit engine biases
+- May not generalize to unconventional openings or endgames
+### Known Issues
+- Piece embeddings have consistently lower norms than positional embeddings
+- Model sometimes assigns probability (though unlikely, ~3%) to invalid moves despite training penalty
+- Performance degrades without search enhancement
+## Ethical Considerations
+This model is intended for:
+- ✅ Educational purposes and chess learning
+- ✅ Research into neural chess architectures
+- ✅ Developing chess training tools
+Not recommended for:
+- ❌ Competitive chess tournaments
+- ❌ Production chess engines without extensive testing
+- ❌ Applications requiring reliable tactical calculation
+## Additional Information
+- **Repository**: [GitHub link](https://github.com/Mtrya/chess-transformer)
+- **Demo**: [HuggingFace Space Demo](https://huggingface.co/spaces/kaupane/Chessformer_Demo)
+- **Related**: [ChessFormer-RL](https://huggingface.co/kaupane/ChessFormer-RL) (RL training experiment)