Lespleiades commited on
Commit
4c53197
·
verified ·
1 Parent(s): 577f58b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +69 -76
README.md CHANGED
@@ -8,117 +8,111 @@ tags:
8
  - ResNet
9
  ---
10
 
11
- # **GChess**
12
 
13
- ## **Model Description:**
14
- GChess model is a powerful deep neural network designed specifically for the game of chess. Its architecture is heavily inspired by the principles of AlphaZero, utilizing a single neural network to simultaneously predict the optimal move and evaluate the position.
15
 
16
- ## **Architecture Details:**
17
- The core of the network is a Deep Residual Network (ResNet), a structure well-suited for processing the spatial data of an 8x8 chessboard.
18
 
19
- ## **Torso:**
20
- The network employs a robust torso composed of 20 Residual Blocks. Each block contains convolutional layers with skip connections, allowing for the effective learning of deep, hierarchical features and maintaining stable training.
21
 
22
- ## **Feature Processing:**
23
- The entire network processes data using a high number of channels, specifically 512 filters in its main convolutional layers.
24
-
25
- * **Input Representation:** The current board state and history are encoded into a multi-plane tensor with 128 input channels, which typically includes information about piece locations, the player to move, castling rights, and repetition history, a common input format for state-of-the-art chess AI.
26
-
27
- ## **Dual Output Heads:**
28
- The shared ResNet torso branches into two specialized heads:
29
-
30
- * **Policy Head (p_logits):** Predicts a probability distribution over the 4672 possible moves (actions) that can be taken. This output is crucial for guiding the Monte Carlo Tree Search (MCTS).
31
-
32
- * **Value Head (v):** Outputs a single scalar value, typically between -1.0 (Black is winning) and +1.0 (White is winning). This score represents the network's prediction of the final game outcome from the current position.
33
-
34
- ## **Training:**
35
- The model is trained on small dataset from high-quality PGN of games. The model is trained on 50.000 games during 50 hours on RTX4060. This model acctually is evaluate around 1250 Elo. Training uses the PyTorch framework with advanced optimization techniques, including a OneCycleLR learning rate scheduler for accelerated convergence and a large batch size of 1024.
36
-
37
- ## **Metrics & Training Loss Analysis**
38
-
39
- * **Training Loss Curve:**
40
- ![Training Loss](training_loss.png)
41
- The graph shows a very sharp initial drop followed by a smooth, gradual decline before stabilizing at a low point.
42
-
43
- ## **Interpretation:**
44
- The rapid initial drop signifies highly efficient learning of fundamental chess concepts. The smooth convergence indicates stable training with no major signs of oscillation or instability.
45
-
46
- * **Key Metrics:** The deep residual network (20 blocks) optimizes a combined total loss from two primary components:
47
-
48
- * **Policy Loss:** Measures the accuracy of the model's move predictions (the most crucial metric for move quality).
49
-
50
- * **Value Loss:** Measures how accurately the model evaluates the position (the score, ranging from -1 to 1).
51
-
52
- ## **Training Efficiency:**
53
-
54
- The efficient convergence is largely due to the use of the OneCycleLR learning rate scheduler, which accelerates training by strategically cycling the learning rate up to a high maximum value before annealing it (cooling it down).
55
-
56
-
57
- ## **Detailed Loss Convergence:**
58
- ![Detailed Loss convergences](detailed_training_loss.png)
59
-
60
- ## **Key Observations**
61
- Rapid Initial Drop: The loss shows an immediate, steep decline, indicating the model quickly learned fundamental concepts and patterns. This is a great sign of an effective learning setup.
62
 
63
- * **Wider Fluctuations:** Compared to a smoother curve, this detailed view reveals more short-term oscillations (ups and downs) in the loss, particularly around the 50,000 to 100,000 steps mark.
64
 
65
- * **Interpretation:** These fluctuations are common, especially when using an aggressive learning rate schedule like OneCycleLR, as seen in our Training.py. The high learning rate peaks allow the model to escape shallow minimums but also cause the loss to momentarily rise.
66
 
67
- * **Consistent Convergence:** Despite the fluctuations, the overall long-term trend is clearly downward. The loss is consistently driven lower, stabilizing at a much lower point towards the end of the shown steps.
 
 
 
68
 
69
- * **Stable Final Phase:** In the later steps, the magnitude of the fluctuations seems to decrease, and the loss settles into a low, stable range, suggesting the model has largely converged.
 
 
 
70
 
71
- ![Accuracy Evaluations](accuracy_evals.png)
 
72
 
73
- ## **Accuracy Evaluations**
 
 
 
74
 
75
- The detailed curve confirms a stable and aggressive training process. The fluctuations are expected with our dynamic learning rate strategy but are offset by the continuous decrease in overall loss, indicating successful learning and convergence of our chess network.
 
 
 
 
76
 
77
- ## **Key Observations**
78
 
79
- * **Positive Progress:** Both Top 1 Accuracy and Top 5 Accuracy show a clear and consistent upward trend across the training steps. This is the most crucial takeaway: the network is successfully learning to predict moves and is not overfitting to the limited data.
80
 
81
- * **Top 5 Strength:** The Top 5 Accuracy is considerably higher, indicating that the correct expert move is frequently included in the model's top five choices. This is highly promising.
82
 
 
 
83
 
84
- ## **Conclusions**
 
 
85
 
 
 
 
86
 
87
- * **Current Performance:** The model has achieved an estimated 1300 Elo rating. While this is not yet grandmaster level, it represents a respectable baseline performance, particularly for a policy network without Monte Carlo Tree Search (MCTS) enhancement.
 
 
88
 
89
- * **Resource Constraints:** This 1300 Elo was reached using a small dataset of only 50,000 PGN games across 25 training epochs. This resource limitation means the model's knowledge depth is restricted, preventing it from tackling the highest-level strategies.
90
 
91
- * **Stable and Efficient Learning:** The loss curves demonstrate stable and aggressive convergence, thanks to the OneCycleLR scheduler. The network effectively maximized the learning potential of the limited data.
92
 
93
- * **Strong Predictive Foundation:** The Accuracy Evaluations show consistent improvement. Crucially, the Top 5 Accuracy is high, confirming that the model reliably generates a small list of strong candidate moves.
 
 
94
 
95
- * **Future Potential:** The established architecture is highly potent. The current performance is a strong proof of concept. With future iterations involving a larger, more diverse dataset (e.g., millions of games) and a deeper training run, this model has the necessary structural foundation to climb significantly higher into the expert and master Elo ranges.
96
 
 
97
 
98
- ## **Usage:**
99
 
100
  ```python
101
-
102
  import chess
103
  import torch
 
 
 
 
 
 
104
 
105
  # Define Input State (FEN)
106
- # Example: The initial position of a game
107
  fen = "rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1"
108
  board = chess.Board(fen)
109
 
110
- # Preprocess Input
111
- # This function converts the board object into the 128-channel input tensor expected by the model
112
- # (Implementation of board_to_tensor is required separately)
113
- input_tensor = board_to_tensor(board, history_depth=8).unsqueeze(0).to(DEVICE)
 
 
114
 
115
  # Run Inference
116
  with torch.no_grad():
117
  # policy_logits is a tensor of size 4672, value_output is a scalar tensor
118
- policy_logits, value_output = model(input_tensor)
 
 
 
 
119
 
120
  # Post-process Output
121
- # Convert logits to a probability distribution over all possible moves (actions)
122
  policy_probabilities = F.softmax(policy_logits, dim=1).squeeze(0)
123
 
124
  # Find the move with the highest predicted probability
@@ -131,10 +125,9 @@ expected_value = value_output.item()
131
  # Print Results
132
  print(f"FEN: {fen}")
133
  print(f"--- Model Prediction ---")
134
- print(f"Move Probability: {best_probability:.4f}")
135
  print(f"Position Evaluation (Value): {expected_value:.4f}")
136
- print("\nInterpretation: Value close to +1.0 means White is winning, -1.0 means Black is winning.")
137
-
138
  ```
139
 
140
- Developer: Vanhans, PENEAUX Benjamin
 
8
  - ResNet
9
  ---
10
 
 
11
 
12
+ # **GChess: A Deep Residual Network for Chess**
 
13
 
14
+ ## Model Description
15
+ The **GChess** model is a deep neural network designed for the game of chess, inspired by the **AlphaZero** architecture. It uses a single network to perform both move prediction (Policy) and position evaluation (Value).
16
 
17
+ This release is a **proof-of-concept** version. The model's current estimated playing strength is **~1300 Elo**, placing it at a beginner to intermediate level. It demonstrates a robust foundation for an AlphaZero-style chess AI.
 
18
 
19
+ ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
 
21
+ ## Architecture Details
22
 
23
+ GChess is built on a **Deep Residual Network (ResNet)**, which is highly effective for processing the spatial features of an 8x8 board.
24
 
25
+ ### **Core Network (Torso)**
26
+ * **Architecture Type:** Deep Residual Network (ResNet).
27
+ * **Residual Blocks:** **20** blocks, ensuring deep, hierarchical feature learning.
28
+ * **Filter Count:** **512** convolutional filters (channels) in its main layers for high feature complexity.
29
 
30
+ ### **Input Representation**
31
+ The network accepts a multi-plane tensor encoding the board state and history:
32
+ * **Input Channels:** **128** input channels.
33
+ * **Data Included:** Piece locations, player to move, castling rights, and **8-ply history** to handle repetition and context.
34
 
35
+ ### **Dual Output Heads**
36
+ The shared ResNet torso branches into two specialized output heads:
37
 
38
+ | Head | Function | Output Format |
39
+ | :--- | :--- | :--- |
40
+ | **Policy Head (p\_logits)** | **Move Prediction** | Logits over **4672** possible moves/actions. |
41
+ | **Value Head (v)** | **Position Evaluation** | Single scalar value in [-1.0, +1.0]. |
42
 
43
+ | Value Interpretation | Score |
44
+ | :--- | :--- |
45
+ | **White Winning** | Close to +1.0 |
46
+ | **Black Winning** | Close to -1.0 |
47
+ | **Equal Position** | Close to 0.0 |
48
 
49
+ ---
50
 
51
+ ## Training Summary
52
 
53
+ The model was trained on a small dataset of **50,000 high-quality PGN games** across **25 epochs**.
54
 
55
+ ### **Convergence Analysis**
56
+ The training process was stable and highly efficient, utilizing an aggressive learning rate strategy.
57
 
58
+ * **Training Loss Curve:**
59
+ ![Training Loss](training_loss.png)
60
+ The loss shows a rapid initial drop, signifying quick learning of fundamental concepts, followed by a smooth convergence.
61
 
62
+ * **Detailed Loss Convergence:**
63
+ ![Detailed Loss convergences](detailed_training_loss.png)
64
+ A detailed view reveals short-term oscillations, which are expected with dynamic learning rate scheduling but confirm a consistently downward trend towards a low, stable loss.
65
 
66
+ * **Accuracy Evaluations:**
67
+ ![Accuracy Evaluations](accuracy_evals.png)
68
+ Both Top 1 and Top 5 Accuracy showed clear, consistent upward trends, confirming that the network successfully learned to predict expert moves without overfitting to the limited data. The high Top 5 Accuracy indicates the model reliably generates a strong list of candidate moves.
69
 
70
+ ---
71
 
72
+ ## Conclusion and Future Outlook
73
 
74
+ * **Current Performance:** The model achieved an estimated **1300 Elo**. While this is an entry-level performance, it's a strong result considering the resource constraints.
75
+ * **Strong Foundation:** The architecture is structurally sound, and the training process demonstrated effective learning.
76
+ * **Future Potential:** The established architecture is well-suited for scaling. With a significantly larger, more diverse dataset (e.g., millions of games) and extended training, this model has the foundation to reach expert and master Elo levels.
77
 
78
+ ---
79
 
80
+ ## Usage
81
 
82
+ To use the GChess model for inference, you must convert a `chess.Board` object and its history into the required **128-channel input tensor**.
83
 
84
  ```python
 
85
  import chess
86
  import torch
87
+ import torch.nn.functional as F
88
+
89
+ # NOTE: The 'model' object must be loaded from a checkpoint, and
90
+ # 'board_to_tensor' function must be implemented separately
91
+ # to generate the 128-channel input.
92
+ # DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
93
 
94
  # Define Input State (FEN)
95
+ # Example: Initial position
96
  fen = "rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1"
97
  board = chess.Board(fen)
98
 
99
+ # --- Preprocess Input (Requires custom function) ---
100
+ # input_tensor = board_to_tensor(board, history_depth=8).unsqueeze(0).to(DEVICE)
101
+ # Placeholder tensor for execution:
102
+ input_tensor = torch.randn(1, 128, 8, 8)
103
+ model = torch.nn.Module() # Placeholder model for execution
104
+ model.eval() # Set model to evaluation mode
105
 
106
  # Run Inference
107
  with torch.no_grad():
108
  # policy_logits is a tensor of size 4672, value_output is a scalar tensor
109
+ # policy_logits, value_output = model(input_tensor)
110
+
111
+ # Placeholder outputs for demonstration:
112
+ policy_logits = torch.randn(1, 4672)
113
+ value_output = torch.tensor([[0.25]])
114
 
115
  # Post-process Output
 
116
  policy_probabilities = F.softmax(policy_logits, dim=1).squeeze(0)
117
 
118
  # Find the move with the highest predicted probability
 
125
  # Print Results
126
  print(f"FEN: {fen}")
127
  print(f"--- Model Prediction ---")
128
+ print(f"Predicted Probability of Top Move: {best_probability:.4f}")
129
  print(f"Position Evaluation (Value): {expected_value:.4f}")
130
+ print("Interpretation: Value close to +1.0 means White is winning, -1.0 means Black is winning.")
 
131
  ```
132
 
133
+ Devlopper: PENEAUX Benjamin