Gomoku MaskablePPO Stage3 H6

This repository contains a compact release of a Gomoku (9x9, connect-5) agent trained with MaskablePPO from sb3-contrib.

best_model/best_model.zip: best checkpoint selected by evaluation callback
best_model/evaluations.npz: raw evaluation callback output
gomoku_maskable_ppo_final.zip: final checkpoint at the end of training
gomoku_rl/: environment, opponents, and custom CNN feature extractor required to load the model
play.py: local browser game against the agent
evaluate.py: evaluation script
metrics_summary.json: compact summary of tracked metrics
upload_to_hf.py: helper for uploading this prepared folder

Training Setup

Board size: 9
Win length: 5
Algorithm: MaskablePPO
Policy: custom CNN (GomokuCNN)
Training variant: resumed from models_stage3_h5 and continued in models_stage3_h6

Training command used for this stage:

python train.py --resume-from models_stage3_h5/best_model/best_model.zip --opponent heuristic --heuristic-search-depth 1 --heuristic-max-candidates 4 --heuristic-early-max-candidates 6 --vec-env subproc --n-envs 8 --total-timesteps 5000000 --models-dir models_stage3_h6 --log-dir logs_stage3_h6 --eval-opponent heuristic --eval-freq 500000 --eval-games 100

Checkpoint Summary

Best checkpoint by evaluation callback: 13350000 timesteps
Last evaluated checkpoint: 13850000 timesteps
Best callback mean reward: 1.5249
Last callback mean reward: 1.4882

Quick Local Benchmarks

The following checks were run locally after packaging:

Best checkpoint

Opponent: heuristic
Opponent config: depth=1, radius=2, max_candidates=4, early_max_candidates=6
Games: 50
Wins / Losses / Draws: 47 / 3 / 0
Win rate: 94%

Final checkpoint

Opponent: heuristic
Opponent config: depth=1, radius=2, max_candidates=4, early_max_candidates=6
Games: 50
Wins / Losses / Draws: 38 / 10 / 2
Win rate: 76%

The best checkpoint is stronger than the final checkpoint for this release, so best_model/best_model.zip is the recommended file.

Install

pip install -r requirements.txt

Load The Model

from sb3_contrib import MaskablePPO

model = MaskablePPO.load("best_model/best_model.zip")

Because the policy uses a custom feature extractor, keep the gomoku_rl/ package next to the model files or in your Python path.

Evaluate

python evaluate.py --model-path best_model/best_model.zip --opponent heuristic --games 100 --opponent-search-depth 1 --opponent-max-candidates 4 --opponent-early-max-candidates 6

Play In Browser

python play.py --model-path best_model/best_model.zip --host 127.0.0.1 --port 8000 --human-first

Upload This Folder

If you cloned or copied this release locally and want to publish it under your own Hugging Face account:

python upload_to_hf.py

Or specify a target repository explicitly:

python upload_to_hf.py --repo-id your-name/gomoku-maskable-ppo-stage3-h6

Downloads last month: -

Video Preview

Reinforcement Learning

capser54
/

gomoku-maskable-ppo-stage3-h6