Beasty Bar PPO Agent
A neural network trained with PPO (Proximal Policy Optimization) to play Beasty Bar, a strategic card game where animals compete to enter Heaven (the bar) while avoiding Hell.
GitHub: diegooprime/beastybar
Latest Model
Recommended: v4/final.pt - trained with diverse opponent pool for robust play.
Performance
Evaluated with 500 games per opponent (both sides), greedy action selection.
| Opponent | Win Rate | 95% CI |
|---|---|---|
| Random | 93.4% | [0.91, 0.95] |
| Defensive | 81.0% | [0.77, 0.84] |
| Heuristic | 76.8% | [0.73, 0.80] |
| Queue | 75.6% | [0.72, 0.79] |
| Skunk | 75.6% | [0.72, 0.79] |
| Noisy | 75.6% | [0.72, 0.79] |
| Aggressive | 75.0% | [0.71, 0.79] |
| Online | 70.2% | [0.66, 0.74] |
| Distilled Outcome | 67.4% | [0.63, 0.71] |
| Outcome Heuristic | 66.0% | [0.62, 0.70] |
Overall: 75.7% win rate across 5000 games, ~1379 ELO
Quick Start
import torch
from huggingface_hub import hf_hub_download
# Download the latest model
checkpoint_path = hf_hub_download(
repo_id="shiptoday101/beastybar-ppo",
filename="v4/final.pt"
)
# Load checkpoint
checkpoint = torch.load(checkpoint_path, map_location="cpu", weights_only=False)
# Access the network weights
state_dict = checkpoint["model_state_dict"]
config = checkpoint["config"]
print(f"Iteration: {checkpoint['iteration']}")
print(f"Network config: {config['network_config']}")
Full Integration (with game repo)
# Clone the game repo first: git clone https://github.com/diegooprime/beastybar
from _02_agents.neural.network import BeastyBarNetwork
from _02_agents.neural.utils import NetworkConfig
from _03_training.checkpoint_manager import load_for_inference
# Load for inference (smaller footprint)
state_dict, config = load_for_inference("path/to/v4_final.pt")
network = BeastyBarNetwork(NetworkConfig.from_dict(config))
network.load_state_dict(state_dict)
network.eval()
Architecture
- Type: Transformer policy-value network
- Parameters: ~1.3M
- Input: 988-dimensional observation vector
- Output: 124-dim action logits + scalar value [-1, 1]
Network Details
| Component | Specification |
|---|---|
| Hidden dimension | 256 |
| Attention heads | 8 |
| Transformer layers | 4 |
| Species embedding | 64-dim |
Training Details
- Algorithm: PPO with GAE
- Method: Self-play with opponent diversity
- Hardware: RunPod A100/H200 GPU
- Games: ~5M+ games across 600 iterations
Opponent Pool
| Type | Weight |
|---|---|
| Current policy (self-play) | 60% |
| Historical checkpoints | 20% |
| Random agent | 10% |
| Heuristic variants | 10% |
PPO Hyperparameters
| Parameter | Value |
|---|---|
| Learning rate | 0.0001 (cosine decay) |
| Clip epsilon | 0.2 |
| Value coefficient | 0.5 |
| Entropy coefficient | 0.04 -> 0.01 |
| GAE lambda | 0.95 |
| PPO epochs | 4 |
| Minibatch size | 2048 |
Available Checkpoints
| Model | Description |
|---|---|
v4/final.pt |
Latest - 600 iterations, opponent pool diversity |
v4/iter_*.pt |
Intermediate v4 checkpoints |
v3/final.pt |
Previous training run |
v2/iter_*.pt |
V2 checkpoints |
v1/iter_074.pt |
Early experiment |
Related
- Tablebase: shiptoday101/beastybar-tablebase - Endgame lookup table for perfect play in 4-card positions
- Game Rules: Beasty Bar PDF
License
MIT