Battleship PPO Agent — Hanks1234/battleship-ppo-dagger

Board: 10 columns x 20 rows
Ships: 10 ships including T-shaped Battleships and Z-shaped Carriers
Observation: 5-channel binary image (5, 20, 10)
Action: Discrete(200) with action masking (no repeat shots)

A MaskablePPO agent trained on a 10x20 Battleship board with custom T-shaped and Z-shaped ships using sb3-contrib.

Environment

Metric	Value
mean_shots_500_games	100.90
verified_games	500
seed	20000
notes	15-channel DAgger; first model to break 100-shot barrier

from training.hub import load_model_from_hub

model = load_model_from_hub("Hanks1234/battleship-ppo-dagger")

Video Preview