--- license: mit library_name: pytorch tags: - reinforcement-learning - alphazero - board-games - hex-tic-tac-toe - mcts pipeline_tag: other --- # HeXO Bootstrap Model Pretrained policy/value network for [Hex Tic-Tac-Toe](https://hex-tic-tac-toe.github.io/) — a two-player game on an infinite hexagonal grid, 6-in-a-row to win. Used as the starting point for AlphaZero-style self-play training in [`seeligto/hexo_rl`](https://github.com/seeligto/hexo_rl). ## Architecture - Input: **18 × 19 × 19** float tensor (AlphaZero-style history + scalar planes) - ResNet-12 trunk with squeeze-and-excitation blocks - GroupNorm(8) throughout (BN-free, stable under small batch sizes) - Dual-pool value head with BCE loss - Auxiliary heads: ownership prediction + winning-line prediction - Saved as a `state_dict` inside a standard `torch.save` checkpoint The board is genuinely infinite: the accompanying Rust engine uses a sparse coordinate hashmap. The network receives a 19×19 window assembled around the active stone cluster, so the model itself has no board-size prior. ## Training Supervised bootstrap only — no self-play was used to produce this artifact. Trained on a mixed corpus of: - **SealBot self-play games** (community minimax engine, mixed time limits) - **Anonymized public human games** (visibility=public, PII-stripped at ingestion) - **Hybrid human-seed + bot-continuation games** See the companion dataset (access-restricted): [`timmyburn/hexo-bootstrap-corpus`](https://huggingface.co/datasets/timmyburn/hexo-bootstrap-corpus). ## Usage ```python import torch from huggingface_hub import hf_hub_download path = hf_hub_download( repo_id="timmyburn/hexo-bootstrap-models", filename="bootstrap_model.pt", ) ckpt = torch.load(path, map_location="cpu", weights_only=False) # Load into the network defined in seeligto/hexo_rl: # from hexo_rl.model.network import HexTacToeNet # model = HexTacToeNet(in_channels=18) # model.load_state_dict(ckpt["model"]) # model.eval() ``` The full inference path (windowing, legal-move masking, policy projection over the 362-dim action space) lives in the [`hexo_rl`](https://github.com/seeligto/hexo_rl) repo. ## Evaluation Calibrated against a threat-detection probe on 18-plane fixtures: | Metric | Pass threshold | Notes | |---|---|---| | C2: extension cell in policy top-5 | ≥ 25% | baseline for bootstrap-v4 | | C3: extension cell in policy top-10 | ≥ 40% | baseline for bootstrap-v4 | Thresholds are minimum-viable — later self-play checkpoints should clear these comfortably and will be released as a separate model variant. ## Files | File | Size | Description | |---|---|---| | `bootstrap_model.pt` | ~17 MB | PyTorch checkpoint (state dict + optimizer + metadata) | ## License MIT — see the [repository LICENSE](https://github.com/seeligto/hexo_rl/blob/master/LICENSE).