UTDG MaskablePPO Agent
A trained reinforcement learning agent for the Untitled Tower Defense Game using MaskablePPO.
Model Details
Description
This model is a MaskablePPO (Proximal Policy Optimization with invalid action masking) agent trained on the UTDG (Untitled Tower Defense Game) environment. The agent learns to strategically place and upgrade towers to defend against waves of enemies.
Model Architecture
- Algorithm: MaskablePPO from sb3-contrib
- Policy Network: MlpPolicy (Multi-layer Perceptron)
- Framework: Stable-Baselines3
- Environment: Custom UTDG Gymnasium environment with action masking
Training Hyperparameters
| Parameter | Value |
|---|---|
| Total Timesteps | 0 |
| Learning Rate | 0.0003 |
| N Steps | 2048 |
| Batch Size | 64 |
| N Epochs | 10 |
| Gamma (γ) | 0.99 |
| GAE Lambda (λ) | 0.95 |
| Clip Range | 0.2 |
| Entropy Coefficient | 0.001 |
| Value Function Coefficient | 0.5 |
Usage
Quick Start
from huggingface_hub import hf_hub_download
from sb3_contrib import MaskablePPO
# Download the model from Hugging Face Hub
model_path = hf_hub_download(
repo_id="chrisjcc/utdg-maskableppo-policy",
filename="model_policy_v0.3.5.zip"
)
# Load the trained model
model = MaskablePPO.load(model_path)
Inference with Action Masking
import gymnasium as gym
from sb3_contrib import MaskablePPO
# Assuming you have the UTDG environment installed
# from utdg_env import UTDGEnv
# Load model
model = MaskablePPO.load(model_path)
# Create environment
env = gym.make("UTDGEnv-v0")
obs, info = env.reset()
# Run inference loop
done = False
total_reward = 0
while not done:
# Get action mask from environment info
action_masks = info.get("action_mask", None)
# Predict action with masking
action, _states = model.predict(
obs,
action_masks=action_masks,
deterministic=True # Set False for stochastic behavior
)
# Step environment
obs, reward, terminated, truncated, info = env.step(action)
done = terminated or truncated
total_reward += reward
print(f"Episode reward: {total_reward}")
env.close()
Load Specific Revision
from sb3_contrib import MaskablePPO
# Load from a specific branch/revision
model = MaskablePPO.load(
"chrisjcc/utdg-maskableppo-policy",
revision="production" # or "main", specific commit hash, etc.
)
Environment
UTDG (Untitled Tower Defense Game)
The agent is trained on a custom tower defense environment with the following characteristics:
Observation Space
- Grid-based game state representation
- Tower positions and types
- Enemy positions and health
- Player resources (gold, lives)
- Wave information
Action Space
- Discrete action space with invalid action masking
- Actions include: place tower, upgrade tower, sell tower, skip turn
- Action masking prevents invalid actions (e.g., placing towers on occupied tiles)
Reward Structure
- Positive rewards for defeating enemies
- Negative rewards for losing lives
- Bonus rewards for completing waves
- Efficiency bonuses for resource management
Training
Methodology
The model was trained using the MaskablePPO algorithm, which extends standard PPO with support for invalid action masking. This is crucial for the tower defense domain where many actions are contextually invalid (e.g., placing a tower on an occupied cell).
Key Features
- Action Masking: Prevents the agent from selecting invalid actions, improving sample efficiency
- Curriculum Learning: Progressive difficulty increase through wave complexity
- Reward Shaping: Carefully designed reward function to encourage strategic play
Training Infrastructure
- Trained using Stable-Baselines3 and sb3-contrib
- Configuration managed via Hydra
- Experiment tracking and model versioning via Hugging Face Hub
Repository Contents
| File | Description |
|---|---|
model_policy_v0.3.5.zip |
Trained MaskablePPO model checkpoint (SB3 format) |
README.md |
This model card with full documentation |
config.yaml |
Hydra configuration snapshot (if included) |
Limitations and Intended Use
Intended Use
- Research and experimentation with RL agents in game environments
- Baseline comparisons for tower defense AI development
- Educational purposes for understanding action-masked RL
Limitations
- Trained on a specific map configuration; may not generalize to significantly different layouts
- Performance may vary with different enemy compositions not seen during training
- Requires the UTDG environment to be installed for inference
Ethical Considerations
This model is designed for entertainment and research purposes in a game simulation context.
Citation
If you use this model in your research, please cite:
@misc{utdg-maskableppo,
author = {Chris Cadonic},
title = {UTDG MaskablePPO Agent},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/chrisjcc/utdg-maskableppo-policy}}
}
Acknowledgments
- Stable-Baselines3 team for the RL framework
- sb3-contrib for MaskablePPO implementation
- Hugging Face for model hosting infrastructure
Generated on 2025-12-27T18:33:32.663706 UTC
- Downloads last month
- 1,037
Evaluation results
- Mean Episode Reward on UTDG Environmentself-reportedTBD