brandonlanexyz commited on
Commit
9deb5ea
·
verified ·
1 Parent(s): cfd438f

Initial upload of Dualist Othello AI (Iteration 652)

Browse files
Files changed (2) hide show
  1. README.md +70 -85
  2. mcts.py +257 -0
README.md CHANGED
@@ -1,85 +1,70 @@
1
- ---
2
- license: apache-2.0
3
- language:
4
- - en
5
- - sv
6
- metrics:
7
- - accuracy
8
- tags:
9
- - othello
10
- - reinforcement-learning
11
- - alphazero
12
- - edax
13
- - board-games
14
- ---
15
- # Dualist Othello AI
16
-
17
- Dualist is a high-performance Othello (Reversi) AI model trained using a **Deep Residual Neural Network** architecture. It was developed as part of a hybrid learning project where a bitboard-based engine (Edax) acted as the "Grandmaster Teacher" to train the neural network via curriculum learning.
18
-
19
- ## Features
20
- - **Architecture**: 10 Residual Blocks with 256 channels.
21
- - **Input**: 3x8x8 planes (Player bits, Opponent bits, Turn/Constant).
22
- - **Heuristics**: Trained to emulate professional-level Othello gameplay and strategic positioning.
23
- - **Teacher**: Supervised and Reinforcement Learning against the Edax engine (Depth 1-30).
24
-
25
- ## Model Details
26
- - **Model File**: `dualist_model.pth`
27
- - **Total Parameters**: Optimized for balancing speed and strategic depth.
28
- - **Architecture Class**: `OthelloNet` in `model.py`.
29
-
30
- ## Installation & Usage
31
-
32
- ### Prerequisites
33
- - Python 3.8+
34
- - PyTorch
35
- - NumPy
36
-
37
- ### Quick Start (Inference)
38
- The model can be loaded and used for move prediction. Make sure `model.py`, `bitboard.py`, and `dualist_model.pth` are in your working directory.
39
-
40
- ```python
41
- import torch
42
- from model import OthelloNet
43
- from bitboard import get_bit, make_input_planes
44
-
45
- # Load model
46
- model = OthelloNet(num_res_blocks=10, num_channels=256)
47
- checkpoint = torch.load("dualist_model.pth", map_location="cpu")
48
- model.load_state_dict(checkpoint["model_state_dict"])
49
- model.eval()
50
-
51
- # Example input (Bitboards)
52
- black_bb = 0x0000000810000000
53
- white_bb = 0x0000001008000000
54
-
55
- # Get prediction
56
- input_planes = make_input_planes(black_bb, white_bb)
57
- with torch.no_grad():
58
- policy, value = model(input_planes)
59
-
60
- # 'policy' contains move probabilities (log_softmax)
61
- # 'value' is the predicted game outcome [-1, 1]
62
- ```
63
-
64
- ### Files Description
65
- - `dualist_model.pth`: Pre-trained weights for the OthelloNet.
66
- - `model.py`: Neural Network architecture definition.
67
- - `game.py`: Core Othello logic and move generation.
68
- - `bitboard.py`: Bit manipulation and input plane processing.
69
- - `inference.py`: Example script to run the model on a board state.
70
-
71
- ## Hugging Face Integration
72
- To push this to your Hugging Face account:
73
- 1. Install `huggingface_hub`: `pip install huggingface_hub`
74
- 2. Login: `huggingface-cli login`
75
- 3. Push files to `brandonlanexyz/dualist`.
76
-
77
-
78
-
79
- ![unnamed (8)](https://cdn-uploads.huggingface.co/production/uploads/65fc3d2c2ba04e5ae4f1c1c6/MFhYlx5TTBlt8S7Kj2ZYO.png)
80
-
81
-
82
- ![unnamed (9)](https://cdn-uploads.huggingface.co/production/uploads/65fc3d2c2ba04e5ae4f1c1c6/6pznzjJcaZUKIEBtQfWll.png)
83
-
84
- ---
85
- *Created by Brandon | Part of the AntiGravity AI-LAB Othello Project*
 
1
+ # Dualist Othello AI
2
+
3
+ Dualist is a high-performance Othello (Reversi) AI model trained using a **Deep Residual Neural Network** architecture. It was developed as part of a hybrid learning project where a bitboard-based engine (Edax) acted as the "Grandmaster Teacher" to train the neural network via curriculum learning.
4
+
5
+ ## Features
6
+ - **Architecture**: 10 Residual Blocks with 256 channels.
7
+ - **Input**: 3x8x8 planes (Player bits, Opponent bits, Turn/Constant).
8
+ - **Heuristics**: Trained to emulate professional-level Othello gameplay and strategic positioning.
9
+ - **Teacher**: Supervised and Reinforcement Learning against the Edax engine (Depth 1-30).
10
+
11
+ ## Model Details
12
+ - **Model File**: `dualist_model.pth`
13
+ - **Total Parameters**: Optimized for balancing speed and strategic depth.
14
+ - **Architecture Class**: `OthelloNet` in `model.py`.
15
+
16
+ ## Installation & Usage
17
+
18
+ ### Prerequisites
19
+ - Python 3.8+
20
+ - PyTorch
21
+ - NumPy
22
+
23
+ ### Quick Start (Inference)
24
+ The model can be loaded and used for move prediction. Make sure `model.py`, `bitboard.py`, and `dualist_model.pth` are in your working directory.
25
+
26
+ ```python
27
+ import torch
28
+ from model import OthelloNet
29
+ from bitboard import get_bit, make_input_planes
30
+
31
+ # Load model
32
+ model = OthelloNet(num_res_blocks=10, num_channels=256)
33
+ checkpoint = torch.load("dualist_model.pth", map_location="cpu")
34
+ model.load_state_dict(checkpoint["model_state_dict"])
35
+ model.eval()
36
+
37
+ # Example input (Bitboards)
38
+ black_bb = 0x0000000810000000
39
+ white_bb = 0x0000001008000000
40
+
41
+ # Get prediction
42
+ input_planes = make_input_planes(black_bb, white_bb)
43
+ with torch.no_grad():
44
+ policy, value = model(input_planes)
45
+
46
+ # 'policy' contains move probabilities (log_softmax)
47
+ # 'value' is the predicted game outcome [-1, 1]
48
+ ```
49
+
50
+ ### Optimal Performance: MCTS Integration
51
+ While the model provides strong "intuitive" moves (Policy Head), it is designed to be used with **Monte Carlo Tree Search (MCTS)** to reach its full potential. By using the Policy to guide the search and the Value head to prune it, the agent can look ahead multiple turns, making it significantly more "sharp" and strategically sound.
52
+
53
+ In the provided `inference.py` and the associated Space, we demonstrate how to use 400-800 simulations per move to achieve Expert/Master level play.
54
+
55
+ ### Files Description
56
+ - `dualist_model.pth`: Pre-trained weights for the OthelloNet.
57
+ - `model.py`: Neural Network architecture definition.
58
+ - `game.py`: Core Othello logic and move generation.
59
+ - `bitboard.py`: Bit manipulation and input plane processing.
60
+ - `mcts.py`: Monte Carlo Tree Search implementation (recommended for play).
61
+ - `inference.py`: Example script to run the model on a board state.
62
+
63
+ ## Hugging Face Integration
64
+ To push this to your Hugging Face account:
65
+ 1. Install `huggingface_hub`: `pip install huggingface_hub`
66
+ 2. Login: `huggingface-cli login`
67
+ 3. Push files to `brandonlanexyz/dualist`.
68
+
69
+ ---
70
+ *Created by Brandon | Part of the AntiGravity AI-LAB Othello Project*
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
mcts.py ADDED
@@ -0,0 +1,257 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import math
2
+ import numpy as np
3
+ import torch
4
+ from src.game import OthelloGame
5
+ from src.bitboard import make_input_planes, bit_to_row_col, popcount
6
+
7
+ class MCTSNode:
8
+ def __init__(self, prior, to_play):
9
+ self.prior = prior
10
+ self.visit_count = 0
11
+ self.value_sum = 0
12
+ self.children = {} # move_bit -> MCTSNode
13
+ self.to_play = to_play # Whose turn it is at this node
14
+
15
+ def value(self):
16
+ if self.visit_count == 0:
17
+ return 0
18
+ return self.value_sum / self.visit_count
19
+
20
+ def expand(self, policy_logits, valid_moves, next_to_play):
21
+ """
22
+ Expands the node using the policy from the neural network.
23
+ """
24
+ # Softmax
25
+ policy = np.exp(policy_logits - np.max(policy_logits)) # Stable softmax
26
+ policy /= np.sum(policy)
27
+
28
+ # Masking invalid moves?
29
+ # Ideally we only care about probabilities of valid moves.
30
+ # But indices 0-63 + 64 (pass).
31
+
32
+ valid_probs_sum = 0
33
+ temp_children = {}
34
+
35
+ for move_bit in valid_moves:
36
+ if move_bit == 0: # Pass
37
+ idx = 64
38
+ else:
39
+ r, c = bit_to_row_col(move_bit)
40
+ # Helper returns r,c. Index is r*8+c.
41
+ if r == -1: idx = 64 # Should not happen if move_bit != 0
42
+ else: idx = r * 8 + c
43
+
44
+ prob = policy[idx]
45
+ valid_probs_sum += prob
46
+ temp_children[move_bit] = prob
47
+
48
+ # Renormalize probabilities over valid moves
49
+ if valid_probs_sum > 0:
50
+ for move, prob in temp_children.items():
51
+ self.children[move] = MCTSNode(prior=prob / valid_probs_sum, to_play=next_to_play)
52
+ else:
53
+ # Unexpected: Policy gave 0 prob to all valid moves. Uniform.
54
+ prob = 1.0 / len(valid_moves)
55
+ for move in valid_moves:
56
+ self.children[move] = MCTSNode(prior=prob, to_play=next_to_play)
57
+
58
+ class MCTS:
59
+ def __init__(self, model, cpuct=1.0, num_simulations=800):
60
+ self.model = model
61
+ self.cpuct = cpuct
62
+ self.num_simulations = num_simulations
63
+
64
+ def search(self, game: OthelloGame):
65
+ """
66
+ Executes MCTS simulations and returns the root node (containing mechanics for move selection).
67
+ """
68
+ # Create Root
69
+ valid_moves_bb = game.get_valid_moves(game.player_bb, game.opponent_bb)
70
+ valid_moves_list = self._get_moves_list(valid_moves_bb)
71
+
72
+ # Handle case where current player has no moves.
73
+ # In Othello, if one cannot move, they Pass (move=0).
74
+ # Unless BOTH cannot move, then Terminal.
75
+ # game.get_valid_moves returns 0 if no moves.
76
+ if valid_moves_bb == 0:
77
+ if game.is_terminal():
78
+ return None # Game Over
79
+ valid_moves_list = [0]
80
+
81
+ # Evaluate Root (to initialize it)
82
+ root = MCTSNode(prior=0, to_play=game.turn)
83
+
84
+ # Input for NN: Always Canonical (Player, Opponent)
85
+ state_tensor = make_input_planes(game.player_bb, game.opponent_bb)
86
+
87
+ # Move to model device
88
+ device = next(self.model.parameters()).device
89
+ state_tensor = state_tensor.to(device)
90
+
91
+ self.model.eval()
92
+ with torch.no_grad():
93
+ policy_logits, _ = self.model(state_tensor)
94
+
95
+ # Determine next player for root's children
96
+ # If we play a move, the turn usually swaps.
97
+ # But we need to check if the move was a Pass?
98
+ # Logic: Node stores `to_play`. Children stores `next_to_play`.
99
+ # In `expand`, we pass `next_to_play`.
100
+ # But `next_to_play` depends on the move?
101
+ # Usually yes. But in Othello, turn ALWAYS swaps unless... wait.
102
+ # If I play a move, it is now Opponent's turn.
103
+ # Even if Opponent has to Pass immediately, it is THEIR turn to Pass.
104
+ # So `next_to_play` is always `-game.turn`.
105
+
106
+ root.expand(policy_logits.cpu().numpy().flatten(), valid_moves_list, -game.turn)
107
+
108
+ # Add exploration noise to root
109
+ self._add_dirichlet_noise(root)
110
+
111
+ for _ in range(self.num_simulations):
112
+ node = root
113
+ sim_game = self._clone_game(game)
114
+ search_path = [node]
115
+ last_value = 0
116
+
117
+ # 1. Selection
118
+ while node.children:
119
+ move_bit, node = self._select_child(node)
120
+ search_path.append(node)
121
+ sim_game.play_move(move_bit)
122
+
123
+ # 2. Evaluation & Expansion
124
+ if sim_game.is_terminal():
125
+ # Terminal Value from perspective of Current Turn (sim_game.turn)
126
+ # Wait, if terminal, there is no turn.
127
+ # Value relative to `node.to_play` (which determines who Just Passed/Finished?)
128
+ # Let's standarize: Value relative to Black (1).
129
+ p1_score = popcount(sim_game.player_bb) if sim_game.turn == 1 else popcount(sim_game.opponent_bb) # game.player_bb tracks 'Current Player'
130
+ # wait. sim_game.player_bb is WHOSE turn it is.
131
+ # If sim_game.turn == 1 (Black). player_bb is Black.
132
+ # If sim_game.turn == -1 (White). player_bb is White.
133
+
134
+ # Score difference from Black's perspective
135
+ if sim_game.turn == 1:
136
+ black_score = popcount(sim_game.player_bb)
137
+ white_score = popcount(sim_game.opponent_bb)
138
+ else:
139
+ white_score = popcount(sim_game.player_bb)
140
+ black_score = popcount(sim_game.opponent_bb)
141
+
142
+ diff = black_score - white_score
143
+ if diff > 0: last_value = 1.0 # Black wins
144
+ elif diff < 0: last_value = -1.0 # White wins
145
+ else: last_value = 0.0
146
+
147
+ else:
148
+ # Evaluate
149
+ state_tensor = make_input_planes(sim_game.player_bb, sim_game.opponent_bb)
150
+
151
+ # Move to model device
152
+ device = next(self.model.parameters()).device
153
+ state_tensor = state_tensor.to(device)
154
+
155
+ with torch.no_grad():
156
+ policy_logits, v = self.model(state_tensor)
157
+
158
+ # v is value for Current Player (sim_game.turn).
159
+ # If Black -> v is prob Black wins.
160
+ # If White -> v is prob White wins.
161
+ # We need standardized value for backprop?
162
+ # Let's convert to Black's perspective.
163
+ val_for_current = v.item()
164
+ if sim_game.turn == 1:
165
+ last_value = val_for_current
166
+ else:
167
+ last_value = -val_for_current # If good for White (-1), then Bad for Black (-1).
168
+ # Wait. If White wins, val_for_current (White) = 1.
169
+ # Then last_value (Black) = -1. Correct.
170
+
171
+ valid_bb = sim_game.get_valid_moves(sim_game.player_bb, sim_game.opponent_bb)
172
+ valid_list = self._get_moves_list(valid_bb)
173
+ if valid_bb == 0: valid_list = [0]
174
+
175
+ node.expand(policy_logits.cpu().numpy().flatten(), valid_list, -sim_game.turn)
176
+
177
+ # 3. Backup
178
+ self._backpropagate(search_path, last_value)
179
+
180
+ return root
181
+
182
+ def _select_child(self, node):
183
+ best_score = -float('inf')
184
+ best_action = None
185
+ best_child = None
186
+
187
+ for action, child in node.children.items():
188
+ # UCB
189
+ # Q is value for 'node.to_play'.
190
+ # child.value() is average raw value accumulated.
191
+ # We accumulated 'Black Perspective Value'.
192
+ # If node.to_play is Black (1). We want high Value (1).
193
+ # If node.to_play is White (-1). We want low Value (-1).
194
+
195
+ # Let's adjust Q based on turn.
196
+ mean_val = child.value() # This is Black-perspective value.
197
+
198
+ if node.to_play == 1: # Black
199
+ q = mean_val
200
+ else: # White
201
+ q = -mean_val
202
+
203
+ # Normalize q to [0, 1]? Tanh gives [-1, 1].
204
+ # AlphaZero uses [0, 1]. Tanh uses [-1, 1].
205
+ # PUCT expects q and u to be comparable.
206
+ # If q in [-1, 1], u should be similar scale.
207
+
208
+ u = self.cpuct * child.prior * math.sqrt(node.visit_count) / (1 + child.visit_count)
209
+
210
+ score = q + u
211
+ if score > best_score:
212
+ best_score = score
213
+ best_action = action
214
+ best_child = child
215
+
216
+ return best_action, best_child
217
+
218
+ def _backpropagate(self, search_path, value):
219
+ """
220
+ value: The evaluation of the lead node, from BLACK's perspective (1=Black wins, -1=White wins).
221
+ """
222
+ for node in search_path:
223
+ node.value_sum += value
224
+ node.visit_count += 1
225
+ # We store Sum of Black-Values.
226
+ # So average is Average Black Value.
227
+
228
+ def _add_dirichlet_noise(self, node):
229
+ eps = 0.25
230
+ alpha = 0.3
231
+ moves = list(node.children.keys())
232
+ noise = np.random.dirichlet([alpha] * len(moves))
233
+
234
+ for i, move in enumerate(moves):
235
+ node.children[move].prior = (1 - eps) * node.children[move].prior + eps * noise[i]
236
+
237
+ def _get_moves_list(self, moves_bb):
238
+ moves = []
239
+ if moves_bb == 0: return []
240
+
241
+ # Extract bits
242
+ # In python integers have infinite precision, so normal bit hacks work but need care with loops.
243
+ # Ideally copy bb.
244
+ temp = moves_bb
245
+ while temp:
246
+ # Isolate LSB
247
+ lsb = temp & -temp
248
+ moves.append(lsb)
249
+ temp ^= lsb # Remove LSB
250
+ return moves
251
+
252
+ def _clone_game(self, game):
253
+ new_game = OthelloGame()
254
+ new_game.player_bb = game.player_bb
255
+ new_game.opponent_bb = game.opponent_bb
256
+ new_game.turn = game.turn
257
+ return new_game