UTDG MaskablePPO Agent

Hugging Face Stable-Baselines3 License: MIT

A trained reinforcement learning agent for the Untitled Tower Defense Game using MaskablePPO.

Model Details

Description

This model is a MaskablePPO (Proximal Policy Optimization with invalid action masking) agent trained on the UTDG (Untitled Tower Defense Game) environment. The agent learns to strategically place and upgrade towers to defend against waves of enemies.

Model Architecture

  • Algorithm: MaskablePPO from sb3-contrib
  • Policy Network: MlpPolicy (Multi-layer Perceptron)
  • Framework: Stable-Baselines3
  • Environment: Custom UTDG Gymnasium environment with action masking

Training Hyperparameters

Parameter Value
Total Timesteps 0
Learning Rate 0.0003
N Steps 2048
Batch Size 64
N Epochs 10
Gamma (γ) 0.99
GAE Lambda (λ) 0.95
Clip Range 0.2
Entropy Coefficient 0.001
Value Function Coefficient 0.5

Usage

Quick Start

from huggingface_hub import hf_hub_download
from sb3_contrib import MaskablePPO

# Download the model from Hugging Face Hub
model_path = hf_hub_download(
    repo_id="chrisjcc/utdg-maskableppo-policy",
    filename="model_policy_v0.3.5.zip"
)

# Load the trained model
model = MaskablePPO.load(model_path)

Inference with Action Masking

import gymnasium as gym
from sb3_contrib import MaskablePPO

# Assuming you have the UTDG environment installed
# from utdg_env import UTDGEnv

# Load model
model = MaskablePPO.load(model_path)

# Create environment
env = gym.make("UTDGEnv-v0")
obs, info = env.reset()

# Run inference loop
done = False
total_reward = 0

while not done:
    # Get action mask from environment info
    action_masks = info.get("action_mask", None)

    # Predict action with masking
    action, _states = model.predict(
        obs,
        action_masks=action_masks,
        deterministic=True  # Set False for stochastic behavior
    )

    # Step environment
    obs, reward, terminated, truncated, info = env.step(action)
    done = terminated or truncated
    total_reward += reward

print(f"Episode reward: {total_reward}")
env.close()

Load Specific Revision

from sb3_contrib import MaskablePPO

# Load from a specific branch/revision
model = MaskablePPO.load(
    "chrisjcc/utdg-maskableppo-policy",
    revision="production"  # or "main", specific commit hash, etc.
)

Environment

UTDG (Untitled Tower Defense Game)

The agent is trained on a custom tower defense environment with the following characteristics:

Observation Space

  • Grid-based game state representation
  • Tower positions and types
  • Enemy positions and health
  • Player resources (gold, lives)
  • Wave information

Action Space

  • Discrete action space with invalid action masking
  • Actions include: place tower, upgrade tower, sell tower, skip turn
  • Action masking prevents invalid actions (e.g., placing towers on occupied tiles)

Reward Structure

  • Positive rewards for defeating enemies
  • Negative rewards for losing lives
  • Bonus rewards for completing waves
  • Efficiency bonuses for resource management

Training

Methodology

The model was trained using the MaskablePPO algorithm, which extends standard PPO with support for invalid action masking. This is crucial for the tower defense domain where many actions are contextually invalid (e.g., placing a tower on an occupied cell).

Key Features

  1. Action Masking: Prevents the agent from selecting invalid actions, improving sample efficiency
  2. Curriculum Learning: Progressive difficulty increase through wave complexity
  3. Reward Shaping: Carefully designed reward function to encourage strategic play

Training Infrastructure

Repository Contents

File Description
model_policy_v0.3.5.zip Trained MaskablePPO model checkpoint (SB3 format)
README.md This model card with full documentation
config.yaml Hydra configuration snapshot (if included)

Limitations and Intended Use

Intended Use

  • Research and experimentation with RL agents in game environments
  • Baseline comparisons for tower defense AI development
  • Educational purposes for understanding action-masked RL

Limitations

  • Trained on a specific map configuration; may not generalize to significantly different layouts
  • Performance may vary with different enemy compositions not seen during training
  • Requires the UTDG environment to be installed for inference

Ethical Considerations

This model is designed for entertainment and research purposes in a game simulation context.

Citation

If you use this model in your research, please cite:

@misc{utdg-maskableppo,
  author = {Chris Cadonic},
  title = {UTDG MaskablePPO Agent},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/chrisjcc/utdg-maskableppo-policy}}
}

Acknowledgments


Generated on 2025-12-27T18:33:32.663706 UTC

Downloads last month
1,037
Video Preview
loading

Evaluation results