timmyburn
/

hexo-bootstrap-models

reinforcement-learning

hex-tic-tac-toe

Model card Files Files and versions

timmyburn commited on 27 days ago

Commit

22de51c

·

verified ·

1 Parent(s): 2c2ba7b

add model card

Files changed (1) hide show

README.md +87 -0

README.md ADDED Viewed

	@@ -0,0 +1,87 @@

+---
+license: mit
+library_name: pytorch
+tags:
+  - reinforcement-learning
+  - alphazero
+  - board-games
+  - hex-tic-tac-toe
+  - mcts
+pipeline_tag: other
+---
+# HeXO Bootstrap Model
+Pretrained policy/value network for [Hex Tic-Tac-Toe](https://hex-tic-tac-toe.github.io/) —
+a two-player game on an infinite hexagonal grid, 6-in-a-row to win. Used as
+the starting point for AlphaZero-style self-play training in
+[`seeligto/hexo_rl`](https://github.com/seeligto/hexo_rl).
+## Architecture
+- Input: **18 × 19 × 19** float tensor (AlphaZero-style history + scalar planes)
+- ResNet-12 trunk with squeeze-and-excitation blocks
+- GroupNorm(8) throughout (BN-free, stable under small batch sizes)
+- Dual-pool value head with BCE loss
+- Auxiliary heads: ownership prediction + winning-line prediction
+- Saved as a `state_dict` inside a standard `torch.save` checkpoint
+The board is genuinely infinite: the accompanying Rust engine uses a sparse
+coordinate hashmap. The network receives a 19×19 window assembled around
+the active stone cluster, so the model itself has no board-size prior.
+## Training
+Supervised bootstrap only — no self-play was used to produce this artifact.
+Trained on a mixed corpus of:
+- **SealBot self-play games** (community minimax engine, mixed time limits)
+- **Anonymized public human games** (visibility=public, PII-stripped at ingestion)
+- **Hybrid human-seed + bot-continuation games**
+See the companion dataset (access-restricted):
+[`timmyburn/hexo-bootstrap-corpus`](https://huggingface.co/datasets/timmyburn/hexo-bootstrap-corpus).
+## Usage
+```python
+import torch
+from huggingface_hub import hf_hub_download
+path = hf_hub_download(
+    repo_id="timmyburn/hexo-bootstrap-models",
+    filename="bootstrap_model.pt",
+)
+ckpt = torch.load(path, map_location="cpu", weights_only=False)
+# Load into the network defined in seeligto/hexo_rl:
+#   from hexo_rl.model.network import HexTacToeNet
+#   model = HexTacToeNet(in_channels=18)
+#   model.load_state_dict(ckpt["model"])
+#   model.eval()
+```
+The full inference path (windowing, legal-move masking, policy projection
+over the 362-dim action space) lives in the
+[`hexo_rl`](https://github.com/seeligto/hexo_rl) repo.
+## Evaluation
+Calibrated against a threat-detection probe on 18-plane fixtures:
+| Metric | Pass threshold | Notes |
+|---|---|---|
+| C2: extension cell in policy top-5 | ≥ 25% | baseline for bootstrap-v4 |
+| C3: extension cell in policy top-10 | ≥ 40% | baseline for bootstrap-v4 |
+Thresholds are minimum-viable — later self-play checkpoints should clear
+these comfortably and will be released as a separate model variant.
+## Files
+| File | Size | Description |
+|---|---|---|
+| `bootstrap_model.pt` | ~17 MB | PyTorch checkpoint (state dict + optimizer + metadata) |
+## License
+MIT — see the [repository LICENSE](https://github.com/seeligto/hexo_rl/blob/master/LICENSE).