File size: 2,884 Bytes

22de51c

---
license: mit
library_name: pytorch
tags:
  - reinforcement-learning
  - alphazero
  - board-games
  - hex-tic-tac-toe
  - mcts
pipeline_tag: other
---

# HeXO Bootstrap Model

Pretrained policy/value network for [Hex Tic-Tac-Toe](https://hex-tic-tac-toe.github.io/) —
a two-player game on an infinite hexagonal grid, 6-in-a-row to win. Used as
the starting point for AlphaZero-style self-play training in
[`seeligto/hexo_rl`](https://github.com/seeligto/hexo_rl).

## Architecture

- Input: **18 × 19 × 19** float tensor (AlphaZero-style history + scalar planes)
- ResNet-12 trunk with squeeze-and-excitation blocks
- GroupNorm(8) throughout (BN-free, stable under small batch sizes)
- Dual-pool value head with BCE loss
- Auxiliary heads: ownership prediction + winning-line prediction
- Saved as a `state_dict` inside a standard `torch.save` checkpoint

The board is genuinely infinite: the accompanying Rust engine uses a sparse
coordinate hashmap. The network receives a 19×19 window assembled around
the active stone cluster, so the model itself has no board-size prior.

## Training

Supervised bootstrap only — no self-play was used to produce this artifact.
Trained on a mixed corpus of:

- **SealBot self-play games** (community minimax engine, mixed time limits)
- **Anonymized public human games** (visibility=public, PII-stripped at ingestion)
- **Hybrid human-seed + bot-continuation games**

See the companion dataset (access-restricted):
[`timmyburn/hexo-bootstrap-corpus`](https://huggingface.co/datasets/timmyburn/hexo-bootstrap-corpus).

## Usage

```python
import torch
from huggingface_hub import hf_hub_download

path = hf_hub_download(
    repo_id="timmyburn/hexo-bootstrap-models",
    filename="bootstrap_model.pt",
)
ckpt = torch.load(path, map_location="cpu", weights_only=False)
# Load into the network defined in seeligto/hexo_rl:
#   from hexo_rl.model.network import HexTacToeNet
#   model = HexTacToeNet(in_channels=18)
#   model.load_state_dict(ckpt["model"])
#   model.eval()
```

The full inference path (windowing, legal-move masking, policy projection
over the 362-dim action space) lives in the
[`hexo_rl`](https://github.com/seeligto/hexo_rl) repo.

## Evaluation

Calibrated against a threat-detection probe on 18-plane fixtures:

| Metric | Pass threshold | Notes |
|---|---|---|
| C2: extension cell in policy top-5 | ≥ 25% | baseline for bootstrap-v4 |
| C3: extension cell in policy top-10 | ≥ 40% | baseline for bootstrap-v4 |

Thresholds are minimum-viable — later self-play checkpoints should clear
these comfortably and will be released as a separate model variant.

## Files

| File | Size | Description |
|---|---|---|
| `bootstrap_model.pt` | ~17 MB | PyTorch checkpoint (state dict + optimizer + metadata) |

## License

MIT — see the [repository LICENSE](https://github.com/seeligto/hexo_rl/blob/master/LICENSE).