maojh15
/

GomokuZeroAI

Reinforcement Learning

Model card Files Files and versions

GomokuZeroAI / README.md

maojh15's picture

Upload README.md with huggingface_hub

87e8db2 verified about 1 month ago

|

history blame contribute delete

3.26 kB

	---
	license: mit
	library_name: pytorch
	tags:
	- gomoku
	- alphazero
	- mcts
	- board-game-ai
	- pytorch
	pipeline_tag: reinforcement-learning
	---

	# GomokuZeroAI

	GomokuZeroAI is an AlphaZero-style Gomoku checkpoint trained with self-play, a PyTorch policy-value network, and Monte Carlo Tree Search.

	This repository hosts the model weights used by the companion project:

	```text
	https://github.com/maojh15/GomokuZeroAI
	```

	The current checkpoint is:

	```text
	iter_0150_15x15.pt
	```

	It is intended for local human-vs-AI play through the project's web UI.

	## Quick Start

	Clone the code repository:

	```bash
	git clone https://github.com/maojh15/GomokuZeroAI.git
	cd GomokuZeroAI
	```

	Install dependencies:

	```bash
	pip install numpy torch pyyaml huggingface_hub
	```

	Download the checkpoint:

	```bash
	hf download maojh15/GomokuZeroAI iter_0150_15x15.pt --local-dir result_15x15/checkpoints
	```

	Start the local human-vs-AI server:

	```bash
	python play_human.py --host 127.0.0.1 --port 8765
	```

	Open the web UI:

	```text
	http://127.0.0.1:8765
	```

	Select `iter_0150_15x15.pt` in the checkpoint dropdown and click the new-game button to start playing.

	## Model Details

	- Game: Gomoku / Five in a Row
	- Board size: 15x15
	- Checkpoint: `iter_0150_15x15.pt`
	- Training iteration: 150
	- Framework: PyTorch
	- Architecture: convolutional policy-value network
	- Input channels: 2
	- Network width: 128 channels
	- Player encodings: `1` and `-1`
	- MCTS backend used during training: C++ Torch Extension
	- MCTS playouts during training: 2000
	- Opening self-play temperature: 1.0 for the first 12 moves
	- Evaluation temperature: 0.001 after the opening

	The network predicts:

	- a policy distribution over legal board moves
	- a value estimate in `[0, 1]` from the current player's perspective

	The local web UI can display both the raw network value and the MCTS root value.

	## Intended Use

	This checkpoint is meant for:

	- playing Gomoku against the AI locally
	- inspecting policy and visit overlays in the web UI
	- comparing future GomokuZeroAI checkpoints
	- experimenting with AlphaZero-style self-play training code

	This is not a Transformers model and is not intended for use through the Hugging Face `pipeline()` API.

	## Limitations

	- The model was trained for 15x15 Gomoku only.
	- It requires the GomokuZeroAI codebase to load and run correctly.
	- Playing strength depends heavily on the MCTS playout setting used at inference time.
	- Higher playouts usually improve move quality but increase latency.
	- The checkpoint is an experimental game AI model, not a benchmarked tournament engine.

	## Recommended Inference Settings

	For interactive human-vs-AI play, start with:

	- `MCTS playouts`: 2000
	- `c_puct`: 5.0
	- `candidate distance`: empty / all legal moves
	- `mcts_tactical_shortcuts`: enabled for faster tactical responses in the web UI

	If moves are too slow on your machine, reduce `MCTS playouts` to 400-1000.

	## Files

	```text
	iter_0150_15x15.pt
	```

	This file contains the model weights and training configuration payload used by the GomokuZeroAI checkpoint loader.

	## Citation

	If you use this checkpoint or codebase in your own experiments, please reference the project repository:

	```text
	https://github.com/maojh15/GomokuZeroAI
	```