| --- |
| license: mit |
| library_name: pytorch |
| tags: |
| - reinforcement-learning |
| - alphazero |
| - board-games |
| - hex-tic-tac-toe |
| - mcts |
| pipeline_tag: other |
| --- |
| |
| # HeXO Bootstrap Model |
|
|
| Pretrained policy/value network for [Hex Tic-Tac-Toe](https://hex-tic-tac-toe.github.io/) — |
| a two-player game on an infinite hexagonal grid, 6-in-a-row to win. Used as |
| the starting point for AlphaZero-style self-play training in |
| [`seeligto/hexo_rl`](https://github.com/seeligto/hexo_rl). |
|
|
| ## Architecture |
|
|
| - Input: **18 × 19 × 19** float tensor (AlphaZero-style history + scalar planes) |
| - ResNet-12 trunk with squeeze-and-excitation blocks |
| - GroupNorm(8) throughout (BN-free, stable under small batch sizes) |
| - Dual-pool value head with BCE loss |
| - Auxiliary heads: ownership prediction + winning-line prediction |
| - Saved as a `state_dict` inside a standard `torch.save` checkpoint |
|
|
| The board is genuinely infinite: the accompanying Rust engine uses a sparse |
| coordinate hashmap. The network receives a 19×19 window assembled around |
| the active stone cluster, so the model itself has no board-size prior. |
|
|
| ## Training |
|
|
| Supervised bootstrap only — no self-play was used to produce this artifact. |
| Trained on a mixed corpus of: |
|
|
| - **SealBot self-play games** (community minimax engine, mixed time limits) |
| - **Anonymized public human games** (visibility=public, PII-stripped at ingestion) |
| - **Hybrid human-seed + bot-continuation games** |
|
|
| See the companion dataset (access-restricted): |
| [`timmyburn/hexo-bootstrap-corpus`](https://huggingface.co/datasets/timmyburn/hexo-bootstrap-corpus). |
|
|
| ## Usage |
|
|
| ```python |
| import torch |
| from huggingface_hub import hf_hub_download |
| |
| path = hf_hub_download( |
| repo_id="timmyburn/hexo-bootstrap-models", |
| filename="bootstrap_model.pt", |
| ) |
| ckpt = torch.load(path, map_location="cpu", weights_only=False) |
| # Load into the network defined in seeligto/hexo_rl: |
| # from hexo_rl.model.network import HexTacToeNet |
| # model = HexTacToeNet(in_channels=18) |
| # model.load_state_dict(ckpt["model"]) |
| # model.eval() |
| ``` |
|
|
| The full inference path (windowing, legal-move masking, policy projection |
| over the 362-dim action space) lives in the |
| [`hexo_rl`](https://github.com/seeligto/hexo_rl) repo. |
|
|
| ## Evaluation |
|
|
| Calibrated against a threat-detection probe on 18-plane fixtures: |
|
|
| | Metric | Pass threshold | Notes | |
| |---|---|---| |
| | C2: extension cell in policy top-5 | ≥ 25% | baseline for bootstrap-v4 | |
| | C3: extension cell in policy top-10 | ≥ 40% | baseline for bootstrap-v4 | |
|
|
| Thresholds are minimum-viable — later self-play checkpoints should clear |
| these comfortably and will be released as a separate model variant. |
|
|
| ## Files |
|
|
| | File | Size | Description | |
| |---|---|---| |
| | `bootstrap_model.pt` | ~17 MB | PyTorch checkpoint (state dict + optimizer + metadata) | |
|
|
| ## License |
|
|
| MIT — see the [repository LICENSE](https://github.com/seeligto/hexo_rl/blob/master/LICENSE). |
|
|