Gomoku MaskablePPO Stage3 H6

This repository contains a compact release of a Gomoku (9x9, connect-5) agent trained with MaskablePPO from sb3-contrib.

Contents

  • best_model/best_model.zip: best checkpoint selected by evaluation callback
  • best_model/evaluations.npz: raw evaluation callback output
  • gomoku_maskable_ppo_final.zip: final checkpoint at the end of training
  • gomoku_rl/: environment, opponents, and custom CNN feature extractor required to load the model
  • play.py: local browser game against the agent
  • evaluate.py: evaluation script
  • metrics_summary.json: compact summary of tracked metrics
  • upload_to_hf.py: helper for uploading this prepared folder

Training Setup

  • Board size: 9
  • Win length: 5
  • Algorithm: MaskablePPO
  • Policy: custom CNN (GomokuCNN)
  • Training variant: resumed from models_stage3_h5 and continued in models_stage3_h6

Training command used for this stage:

python train.py --resume-from models_stage3_h5/best_model/best_model.zip --opponent heuristic --heuristic-search-depth 1 --heuristic-max-candidates 4 --heuristic-early-max-candidates 6 --vec-env subproc --n-envs 8 --total-timesteps 5000000 --models-dir models_stage3_h6 --log-dir logs_stage3_h6 --eval-opponent heuristic --eval-freq 500000 --eval-games 100

Checkpoint Summary

  • Best checkpoint by evaluation callback: 13350000 timesteps
  • Last evaluated checkpoint: 13850000 timesteps
  • Best callback mean reward: 1.5249
  • Last callback mean reward: 1.4882

Quick Local Benchmarks

The following checks were run locally after packaging:

Best checkpoint

  • Opponent: heuristic
  • Opponent config: depth=1, radius=2, max_candidates=4, early_max_candidates=6
  • Games: 50
  • Wins / Losses / Draws: 47 / 3 / 0
  • Win rate: 94%

Final checkpoint

  • Opponent: heuristic
  • Opponent config: depth=1, radius=2, max_candidates=4, early_max_candidates=6
  • Games: 50
  • Wins / Losses / Draws: 38 / 10 / 2
  • Win rate: 76%

The best checkpoint is stronger than the final checkpoint for this release, so best_model/best_model.zip is the recommended file.

Install

pip install -r requirements.txt

Load The Model

from sb3_contrib import MaskablePPO

model = MaskablePPO.load("best_model/best_model.zip")

Because the policy uses a custom feature extractor, keep the gomoku_rl/ package next to the model files or in your Python path.

Evaluate

python evaluate.py --model-path best_model/best_model.zip --opponent heuristic --games 100 --opponent-search-depth 1 --opponent-max-candidates 4 --opponent-early-max-candidates 6

Play In Browser

python play.py --model-path best_model/best_model.zip --host 127.0.0.1 --port 8000 --human-first

Upload This Folder

If you cloned or copied this release locally and want to publish it under your own Hugging Face account:

python upload_to_hf.py

Or specify a target repository explicitly:

python upload_to_hf.py --repo-id your-name/gomoku-maskable-ppo-stage3-h6
Downloads last month
-
Video Preview
loading