kaupane commited on
Commit
56dec42
·
verified ·
1 Parent(s): 2779e76

Push model using huggingface_hub.

Browse files
Files changed (2) hide show
  1. README.md +6 -165
  2. model.safetensors +1 -1
README.md CHANGED
@@ -1,169 +1,10 @@
1
  ---
2
- license: mit
3
  tags:
4
- - chess
5
- - transformer
6
- - reinforcement-learning
7
- - game-playing
8
- library_name: pytorch
9
  ---
10
 
11
- # ChessFormer-SL
12
-
13
- ChessFormer-SL is a transformer-based chess model trained via supervised learning on Stockfish evaluations. This model explores training chess engines without Monte Carlo Tree Search (MCTS), using only neural networks.
14
-
15
- ## Model Description
16
-
17
- - **Model type**: Transformer for chess position evaluation and move prediction
18
- - **Language(s)**: Chess (FEN notation)
19
- - **License**: MIT
20
- - **Parameters**: 100.7M
21
-
22
- ## Architecture
23
-
24
- ChessFormer uses a custom transformer architecture optimized for chess:
25
-
26
- - **Blocks**: 20 transformer layers
27
- - **Hidden size**: 640
28
- - **Attention heads**: 8
29
- - **Intermediate size**: 1728
30
- - **Features**: RMSNorm, SwiGLU activation, custom FEN tokenizer
31
-
32
- ### Input Format
33
-
34
- The model processes FEN strings and repetition counts, tokenizing them into 75-token sequences representing:
35
-
36
- - 64 board square tokens (pieces + positional embeddings)
37
- - 9 metadata tokens (turn, castling, en passant, clocks, repetitions)
38
- - 2 special tokens (action, value)
39
-
40
- ### Output Format
41
-
42
- - **Policy head**: Logits over 1,969 structurally valid chess moves
43
- - **Value head**: Position evaluation from current player's perspective
44
-
45
- ## Training Details
46
-
47
- ### Training Data
48
-
49
- - **Dataset**: `kaupane/lichess-2023-01-stockfish-annotated` (depth18 split)
50
- - **Size**: 56M positions with Stockfish evaluations
51
- - **Validation**: depth27 split
52
-
53
- ### Training Procedure
54
-
55
- - **Method**: Supervised learning on Stockfish move recommendations and evaluations
56
- - **Objective**: Cross-entropy loss (moves) + MSE loss (values) + invalid move penalty
57
- - **Hardware**: RTX 4060Ti 16GB
58
- - **Duration**: ~2 weeks
59
- - **Checkpoints**: 20 total, this model is the final checkpoint
60
-
61
- ### Training Metrics
62
-
63
- - **Action Loss**: /
64
- - **Value Loss**: /
65
- - **Invalid Loss**: /
66
-
67
- ## Performance
68
-
69
- ### Capabilities
70
-
71
- - ✅ Reasonable opening and endgame play
72
- - ✅ Fast inference without search
73
- - ✅ Better than next-token prediction chess models
74
- - ✅ Can defeat Stockfish occasionally with search enhancement
75
-
76
- ### Limitations
77
-
78
- - ❌ Frequent tactical blunders in midgame
79
- - ❌ Estimated ELO ~1500 (informal assessment)
80
- - ❌ Struggles with complex tactical combinations
81
- - ❌ Tends to give away pieces ("free captures")
82
-
83
- ## Usage
84
-
85
- ### Installation
86
-
87
- ```bash
88
- pip install torch transformers huggingface_hub chess
89
- # Download model.py from this repository
90
- ```
91
-
92
- ### Basic Usage
93
-
94
- ```python
95
- import torch
96
- from model import ChessFormerModel
97
-
98
- # Load model
99
- model = ChessFormerModel.from_pretrained("kaupane/ChessFormer-SL")
100
- model.eval()
101
-
102
- # Analyze position
103
- fens = ["rnbqkbnr/pppppppp/8/8/4P3/8/PPPP1PPP/RNBQKBNR b KQkq e3 0 1"]
104
- repetitions = torch.tensor([1])
105
-
106
- with torch.no_grad():
107
- move_logits, position_value = model(fens, repetitions)
108
-
109
- # Get best move (requires additional processing for legal moves)
110
- print(f"Position value: {position_value.item():.3f}")
111
- ```
112
-
113
- ### With Chess Engine Interface
114
-
115
- ```python
116
- from engine import Engine, ChessformerConfig
117
- import chess
118
-
119
- # Create engine
120
- config = ChessformerConfig(
121
- chessformer=model,
122
- temperature=0.5,
123
- depth=2 # Enable search enhancement
124
- )
125
- engine = Engine(type="chessformer", chessformer_config=config)
126
-
127
- # Play move
128
- board = chess.Board()
129
- move_uci, value = engine.move(board)
130
- print(f"Suggested move: {move_uci}, Value: {value:.3f}")
131
- ```
132
-
133
- ## Limitations and Bias
134
-
135
- ### Technical Limitations
136
-
137
- - **Tactical weakness**: Prone to hanging pieces and missing simple tactics
138
- - **Computational inefficiency**: FEN tokenization creates training bottlenecks, preprocess the entire dataset before training should be benefical
139
-
140
- ### Potential Biases
141
-
142
- - Trained exclusively on Stockfish evaluations, may inherit engine biases
143
- - May not generalize to unconventional openings or endgames
144
-
145
- ### Known Issues
146
-
147
- - Piece embeddings have consistently lower norms than positional embeddings
148
- - Model sometimes assigns probability (though unlikely, ~3%) to invalid moves despite training penalty
149
- - Performance degrades without search enhancement
150
-
151
- ## Ethical Considerations
152
-
153
- This model is intended for:
154
-
155
- - ✅ Educational purposes and chess learning
156
- - ✅ Research into neural chess architectures
157
- - ✅ Developing chess training tools
158
-
159
- Not recommended for:
160
-
161
- - ❌ Competitive chess tournaments
162
- - ❌ Production chess engines without extensive testing
163
- - ❌ Applications requiring reliable tactical calculation
164
-
165
- ## Additional Information
166
-
167
- - **Repository**: [GitHub link](https://github.com/Mtrya/chess-transformer)
168
- - **Demo**: [HuggingFace Space Demo](https://huggingface.co/spaces/kaupane/Chessformer_Demo)
169
- - **Related**: [ChessFormer-RL](https://huggingface.co/kaupane/ChessFormer-RL) (RL training experiment)
 
1
  ---
 
2
  tags:
3
+ - model_hub_mixin
4
+ - pytorch_model_hub_mixin
 
 
 
5
  ---
6
 
7
+ This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
8
+ - Code: [More Information Needed]
9
+ - Paper: [More Information Needed]
10
+ - Docs: [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:07655925064152222a4958baf270adfbe1e31b7eb1baa35e03db75bed1238714
3
  size 402931432
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fb0cb41c159c82f1c8cf5fb9747bce35c639940dc4f78a999d7060d451450e48
3
  size 402931432