timmyburn
/

hexo-bootstrap-models

reinforcement-learning

hex-tic-tac-toe

Model card Files Files and versions

hexo-bootstrap-models / README.md

timmyburn's picture

add model card

22de51c verified 25 days ago

|

history blame contribute delete

2.88 kB

	---
	license: mit
	library_name: pytorch
	tags:
	- reinforcement-learning
	- alphazero
	- board-games
	- hex-tic-tac-toe
	- mcts
	pipeline_tag: other
	---

	# HeXO Bootstrap Model

	Pretrained policy/value network for [Hex Tic-Tac-Toe](https://hex-tic-tac-toe.github.io/) —
	a two-player game on an infinite hexagonal grid, 6-in-a-row to win. Used as
	the starting point for AlphaZero-style self-play training in
	[`seeligto/hexo_rl`](https://github.com/seeligto/hexo_rl).

	## Architecture

	- Input: 18 × 19 × 19 float tensor (AlphaZero-style history + scalar planes)
	- ResNet-12 trunk with squeeze-and-excitation blocks
	- GroupNorm(8) throughout (BN-free, stable under small batch sizes)
	- Dual-pool value head with BCE loss
	- Auxiliary heads: ownership prediction + winning-line prediction
	- Saved as a `state_dict` inside a standard `torch.save` checkpoint

	The board is genuinely infinite: the accompanying Rust engine uses a sparse
	coordinate hashmap. The network receives a 19×19 window assembled around
	the active stone cluster, so the model itself has no board-size prior.

	## Training

	Supervised bootstrap only — no self-play was used to produce this artifact.
	Trained on a mixed corpus of:

	- SealBot self-play games (community minimax engine, mixed time limits)
	- Anonymized public human games (visibility=public, PII-stripped at ingestion)
	- Hybrid human-seed + bot-continuation games

	See the companion dataset (access-restricted):
	[`timmyburn/hexo-bootstrap-corpus`](https://huggingface.co/datasets/timmyburn/hexo-bootstrap-corpus).

	## Usage

	```python
	import torch
	from huggingface_hub import hf_hub_download

	path = hf_hub_download(
	repo_id="timmyburn/hexo-bootstrap-models",
	filename="bootstrap_model.pt",
	)
	ckpt = torch.load(path, map_location="cpu", weights_only=False)
	# Load into the network defined in seeligto/hexo_rl:
	# from hexo_rl.model.network import HexTacToeNet
	# model = HexTacToeNet(in_channels=18)
	# model.load_state_dict(ckpt["model"])
	# model.eval()
	```

	The full inference path (windowing, legal-move masking, policy projection
	over the 362-dim action space) lives in the
	[`hexo_rl`](https://github.com/seeligto/hexo_rl) repo.

	## Evaluation

	Calibrated against a threat-detection probe on 18-plane fixtures:

	\| Metric \| Pass threshold \| Notes \|
	\|---\|---\|---\|
	\| C2: extension cell in policy top-5 \| ≥ 25% \| baseline for bootstrap-v4 \|
	\| C3: extension cell in policy top-10 \| ≥ 40% \| baseline for bootstrap-v4 \|

	Thresholds are minimum-viable — later self-play checkpoints should clear
	these comfortably and will be released as a separate model variant.

	## Files

	\| File \| Size \| Description \|
	\|---\|---\|---\|
	\| `bootstrap_model.pt` \| ~17 MB \| PyTorch checkpoint (state dict + optimizer + metadata) \|

	## License

	MIT — see the [repository LICENSE](https://github.com/seeligto/hexo_rl/blob/master/LICENSE).