kaupane commited on
Commit
d50f141
·
verified ·
1 Parent(s): 4dd6f0e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +160 -10
README.md CHANGED
@@ -1,14 +1,164 @@
1
  ---
2
- tags:
3
- - model_hub_mixin
4
- - pytorch_model_hub_mixin
5
  license: mit
6
- datasets:
7
- - kaupane/lichess-2023-01-stockfish-annotated
8
- pipeline_tag: reinforcement-learning
 
 
 
 
9
  ---
10
 
11
- This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
12
- - Code: [More Information Needed]
13
- - Paper: [More Information Needed]
14
- - Docs: [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
 
 
2
  license: mit
3
+ tags:
4
+ - chess
5
+ - transformer
6
+ - reinforcement-learning
7
+ - game-playing
8
+ - research
9
+ library_name: pytorch
10
  ---
11
 
12
+ # ChessFormer-RL
13
+
14
+ ChessFormer-RL represents an experimental checkpoint in training chess models with reinforcement learning. **Note**: This model is actually the 8th supervised learning checkpoint (49152 steps) that was intended as initialization for RL training, as the full RL training encountered challenges.
15
+
16
+ ## Model Description
17
+
18
+ - **Model type**: Transformer for chess (RL training initialization)
19
+ - **Language(s)**: Chess (FEN notation)
20
+ - **License**: MIT
21
+ - **Parameters**: 100.7M
22
+
23
+ ## Important Notice
24
+
25
+ ⚠️ **This model represents a research checkpoint rather than a completed RL-trained model.** The actual reinforcement learning training encountered:
26
+
27
+ - Gradient norm explosion
28
+ - Noisy reward signals
29
+ - Performance degradation from this initialization point
30
+
31
+ This checkpoint is provided for researchers interested in:
32
+
33
+ - RL training initialization strategies
34
+ - Comparative analysis with the final SL model
35
+ - Continuing RL experiments with improved methods
36
+
37
+ ## Architecture
38
+
39
+ Identical to ChessFormer-SL:
40
+
41
+ - **Blocks**: 20 transformer layers
42
+ - **Hidden size**: 640
43
+ - **Attention heads**: 8
44
+ - **Intermediate size**: 1728
45
+ - **Features**: RMSNorm, SwiGLU activation, custom FEN tokenizer
46
+
47
+ ## Training Details
48
+
49
+ ### Phase 1: Supervised Learning (This Checkpoint)
50
+
51
+ - **Dataset**: `kaupane/lichess-2023-01-stockfish-annotated` (depth18 split)
52
+ - **Training**: 49152 steps of supervised learning on Stockfish evaluations
53
+ - **Purpose**: Initialization for subsequent RL training
54
+
55
+ ### Phase 2: Reinforcement Learning (Attempted)
56
+
57
+ - **Method**: Self-play with Proximal Policy Optimization (PPO)
58
+ - **Environment**: Batch chess environment with sparse terminal rewards
59
+ - **Outcome**: Training instabilities led to performance degradation
60
+ - **Current Status**: Requires further research and improved RL methodology
61
+
62
+ ### Training Metrics (This Checkpoint)
63
+
64
+ - **Action Loss**: 1.8329
65
+ - **Value Loss**: 0.0501
66
+ - **Invalid Loss**: 0.0484
67
+
68
+ ## Performance
69
+
70
+ As an intermediate SL checkpoint, this model exhibits:
71
+
72
+ - Similar capabilities to early ChessFormer-SL training
73
+ - Less refined than the final SL model
74
+ - Suitable for RL initialization experiments
75
+
76
+ ### Comparison with ChessFormer-SL
77
+
78
+ | Metric | ChessFormer-RL (8th ckpt) | ChessFormer-SL (20th ckpt) |
79
+ |--------|---------------------------|----------------------------|
80
+ | Action Loss | 1.8329 | / |
81
+ | Value Loss | 0.0501 | / |
82
+ | Invalid Loss | 0.0484 | / |
83
+
84
+ ## Research Context
85
+
86
+ ### RL Training Challenges Encountered
87
+
88
+ 1. **Gradient Instability**: Explosive gradient norms during PPO updates
89
+ 2. **Sparse Rewards**: Terminal-only rewards created noisy learning signals
90
+ 3. **Action Space Complexity**: 1,969 possible moves created exploration challenges
91
+ 4. **Self-Play Dynamics**: Unstable opponent strength during training
92
+
93
+ ## Usage
94
+
95
+ ### Installation
96
+
97
+ ```bash
98
+ pip install torch transformers huggingface_hub chess
99
+ # Download model.py from this repository
100
+ ```
101
+
102
+ ### Loading the Model
103
+
104
+ ```python
105
+ import torch
106
+ from model import ChessFormerModel
107
+
108
+ # Load model
109
+ model = ChessFormerModel.from_pretrained("kaupane/ChessFormer-RL")
110
+ model.eval()
111
+
112
+ # This is an intermediate checkpoint - performance will be lower than ChessFormer-SL
113
+ ```
114
+
115
+ ### For RL Research
116
+
117
+ ```python
118
+ # This checkpoint can serve as initialization for RL experiments
119
+ from train_rl import RLTrainer
120
+
121
+ # Load checkpoint for RL training continuation
122
+ trainer = RLTrainer(
123
+ model=model,
124
+ # ... other hyperparameters
125
+ )
126
+ trainer.resume("path/to/checkpoint", from_sl_checkpoint=True)
127
+ ```
128
+
129
+ ## Limitations
130
+
131
+ ### Technical Limitations
132
+
133
+ - **Incomplete Training**: Represents intermediate rather than final model
134
+ - **RL Instabilities**: Subsequent RL training was unsuccessful
135
+ - **Performance**: Lower quality than ChessFormer-SL final checkpoint
136
+
137
+ ### Research Limitations
138
+
139
+ - Demonstrates challenges rather than solutions for chess RL
140
+ - Requires significant additional work for competitive performance
141
+ - Not suitable for production use
142
+
143
+ ## Intended Use
144
+
145
+ This model is specifically intended for:
146
+
147
+ - ✅ RL research and experimentation
148
+ - ✅ Studying initialization strategies for chess RL
149
+ - ✅ Comparative analysis of SL vs RL training trajectories
150
+ - ✅ Educational purposes in understanding RL challenges
151
+
152
+ **Not intended for:**
153
+
154
+ - ❌ Practical chess playing applications
155
+ - ❌ Production chess engines
156
+ - ❌ Competitive chess analysis
157
+
158
+ ## Additional Information
159
+
160
+ - **Repository**: [GitHub link](https://github.com/Mtrya/chess-transformer)
161
+ - **Demo**: [HuggingFace Space Demo](https://huggingface.co/spaces/kaupane/Chessformer_Demo)
162
+ - **Related**: [ChessFormer-SL](https://huggingface.co/kaupane/ChessFormer-SL) (Completed SL Training)
163
+
164
+ *This model represents ongoing research into chess RL training. While the full RL training was unsuccessful, this checkpoint may be an initial starting point for future research directions.*