Battleship PPO Agent โ€” Hanks1234/battleship-ppo-dagger

A MaskablePPO agent trained on a 10x20 Battleship board with custom T-shaped and Z-shaped ships using sb3-contrib.

Environment

  • Board: 10 columns x 20 rows
  • Ships: 10 ships including T-shaped Battleships and Z-shaped Carriers
  • Observation: 5-channel binary image (5, 20, 10)
  • Action: Discrete(200) with action masking (no repeat shots)

Training Config

Parameter Value
method DAgger (Dataset Aggregation)
base_model BC-pretrained MaskablePPO
observation_channels 15
board_size 10x20
expert Monte Carlo solver (1000 samples)
disagree_only True
confidence_threshold 0.3
freeze_cnn True

Evaluation Results

Metric Value
mean_shots_500_games 100.90
verified_games 500
seed 20000
notes 15-channel DAgger; first model to break 100-shot barrier

Usage

from training.hub import load_model_from_hub

model = load_model_from_hub("Hanks1234/battleship-ppo-dagger")
Downloads last month
1
Video Preview
loading