timmyburn commited on
Commit
22de51c
·
verified ·
1 Parent(s): 2c2ba7b

add model card

Browse files
Files changed (1) hide show
  1. README.md +87 -0
README.md ADDED
@@ -0,0 +1,87 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ library_name: pytorch
4
+ tags:
5
+ - reinforcement-learning
6
+ - alphazero
7
+ - board-games
8
+ - hex-tic-tac-toe
9
+ - mcts
10
+ pipeline_tag: other
11
+ ---
12
+
13
+ # HeXO Bootstrap Model
14
+
15
+ Pretrained policy/value network for [Hex Tic-Tac-Toe](https://hex-tic-tac-toe.github.io/) —
16
+ a two-player game on an infinite hexagonal grid, 6-in-a-row to win. Used as
17
+ the starting point for AlphaZero-style self-play training in
18
+ [`seeligto/hexo_rl`](https://github.com/seeligto/hexo_rl).
19
+
20
+ ## Architecture
21
+
22
+ - Input: **18 × 19 × 19** float tensor (AlphaZero-style history + scalar planes)
23
+ - ResNet-12 trunk with squeeze-and-excitation blocks
24
+ - GroupNorm(8) throughout (BN-free, stable under small batch sizes)
25
+ - Dual-pool value head with BCE loss
26
+ - Auxiliary heads: ownership prediction + winning-line prediction
27
+ - Saved as a `state_dict` inside a standard `torch.save` checkpoint
28
+
29
+ The board is genuinely infinite: the accompanying Rust engine uses a sparse
30
+ coordinate hashmap. The network receives a 19×19 window assembled around
31
+ the active stone cluster, so the model itself has no board-size prior.
32
+
33
+ ## Training
34
+
35
+ Supervised bootstrap only — no self-play was used to produce this artifact.
36
+ Trained on a mixed corpus of:
37
+
38
+ - **SealBot self-play games** (community minimax engine, mixed time limits)
39
+ - **Anonymized public human games** (visibility=public, PII-stripped at ingestion)
40
+ - **Hybrid human-seed + bot-continuation games**
41
+
42
+ See the companion dataset (access-restricted):
43
+ [`timmyburn/hexo-bootstrap-corpus`](https://huggingface.co/datasets/timmyburn/hexo-bootstrap-corpus).
44
+
45
+ ## Usage
46
+
47
+ ```python
48
+ import torch
49
+ from huggingface_hub import hf_hub_download
50
+
51
+ path = hf_hub_download(
52
+ repo_id="timmyburn/hexo-bootstrap-models",
53
+ filename="bootstrap_model.pt",
54
+ )
55
+ ckpt = torch.load(path, map_location="cpu", weights_only=False)
56
+ # Load into the network defined in seeligto/hexo_rl:
57
+ # from hexo_rl.model.network import HexTacToeNet
58
+ # model = HexTacToeNet(in_channels=18)
59
+ # model.load_state_dict(ckpt["model"])
60
+ # model.eval()
61
+ ```
62
+
63
+ The full inference path (windowing, legal-move masking, policy projection
64
+ over the 362-dim action space) lives in the
65
+ [`hexo_rl`](https://github.com/seeligto/hexo_rl) repo.
66
+
67
+ ## Evaluation
68
+
69
+ Calibrated against a threat-detection probe on 18-plane fixtures:
70
+
71
+ | Metric | Pass threshold | Notes |
72
+ |---|---|---|
73
+ | C2: extension cell in policy top-5 | ≥ 25% | baseline for bootstrap-v4 |
74
+ | C3: extension cell in policy top-10 | ≥ 40% | baseline for bootstrap-v4 |
75
+
76
+ Thresholds are minimum-viable — later self-play checkpoints should clear
77
+ these comfortably and will be released as a separate model variant.
78
+
79
+ ## Files
80
+
81
+ | File | Size | Description |
82
+ |---|---|---|
83
+ | `bootstrap_model.pt` | ~17 MB | PyTorch checkpoint (state dict + optimizer + metadata) |
84
+
85
+ ## License
86
+
87
+ MIT — see the [repository LICENSE](https://github.com/seeligto/hexo_rl/blob/master/LICENSE).