Instructions to use capser54/gomoku-maskable-ppo-stage3-h6 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- stable-baselines3
How to use capser54/gomoku-maskable-ppo-stage3-h6 with stable-baselines3:
from huggingface_sb3 import load_from_hub checkpoint = load_from_hub( repo_id="capser54/gomoku-maskable-ppo-stage3-h6", filename="{MODEL FILENAME}.zip", ) - Notebooks
- Google Colab
- Kaggle
Gomoku MaskablePPO Stage3 H6
This repository contains a compact release of a Gomoku (9x9, connect-5) agent trained with MaskablePPO from sb3-contrib.
Contents
best_model/best_model.zip: best checkpoint selected by evaluation callbackbest_model/evaluations.npz: raw evaluation callback outputgomoku_maskable_ppo_final.zip: final checkpoint at the end of traininggomoku_rl/: environment, opponents, and custom CNN feature extractor required to load the modelplay.py: local browser game against the agentevaluate.py: evaluation scriptmetrics_summary.json: compact summary of tracked metricsupload_to_hf.py: helper for uploading this prepared folder
Training Setup
- Board size:
9 - Win length:
5 - Algorithm:
MaskablePPO - Policy: custom CNN (
GomokuCNN) - Training variant: resumed from
models_stage3_h5and continued inmodels_stage3_h6
Training command used for this stage:
python train.py --resume-from models_stage3_h5/best_model/best_model.zip --opponent heuristic --heuristic-search-depth 1 --heuristic-max-candidates 4 --heuristic-early-max-candidates 6 --vec-env subproc --n-envs 8 --total-timesteps 5000000 --models-dir models_stage3_h6 --log-dir logs_stage3_h6 --eval-opponent heuristic --eval-freq 500000 --eval-games 100
Checkpoint Summary
- Best checkpoint by evaluation callback:
13350000timesteps - Last evaluated checkpoint:
13850000timesteps - Best callback mean reward:
1.5249 - Last callback mean reward:
1.4882
Quick Local Benchmarks
The following checks were run locally after packaging:
Best checkpoint
- Opponent: heuristic
- Opponent config:
depth=1,radius=2,max_candidates=4,early_max_candidates=6 - Games:
50 - Wins / Losses / Draws:
47/3/0 - Win rate:
94%
Final checkpoint
- Opponent: heuristic
- Opponent config:
depth=1,radius=2,max_candidates=4,early_max_candidates=6 - Games:
50 - Wins / Losses / Draws:
38/10/2 - Win rate:
76%
The best checkpoint is stronger than the final checkpoint for this release, so best_model/best_model.zip is the recommended file.
Install
pip install -r requirements.txt
Load The Model
from sb3_contrib import MaskablePPO
model = MaskablePPO.load("best_model/best_model.zip")
Because the policy uses a custom feature extractor, keep the gomoku_rl/ package next to the model files or in your Python path.
Evaluate
python evaluate.py --model-path best_model/best_model.zip --opponent heuristic --games 100 --opponent-search-depth 1 --opponent-max-candidates 4 --opponent-early-max-candidates 6
Play In Browser
python play.py --model-path best_model/best_model.zip --host 127.0.0.1 --port 8000 --human-first
Upload This Folder
If you cloned or copied this release locally and want to publish it under your own Hugging Face account:
python upload_to_hf.py
Or specify a target repository explicitly:
python upload_to_hf.py --repo-id your-name/gomoku-maskable-ppo-stage3-h6
- Downloads last month
- -