PPO Expert Agent for ALE/SpaceInvaders-v5 (10M Steps)
This is a Proximal Policy Optimization (PPO) agent trained on Atari Space Invaders (v5) using a vectorized environment setup.
Training Details
- Algorithm: PPO
- Environment: ALE/SpaceInvaders-v5 (with sticky actions)
- Total Timesteps: 10,000,000
- Frame Stacking: 4 frames
- Terminal on Life Loss: True (during training)
Performance
- Peak Score observed: 615.0
- Average Reward (approx): ~300-450 range at 10M steps.
- Behavior: Learned to clear multiple waves, use shields for cover, and target the Mystery Ship.
Usage
import torch
# Assumes you have the ActorCritic class defined in your script
config = Config() # Using your existing Config class
model = ActorCritic(input_channels=4, action_dim=6) # Space Invaders has 6 actions
model.load_state_dict(torch.load('ppo_final_10M.pt', map_location='cpu'))
model.eval()