mischievers
/

openfront-rl-agent

@@ -14,23 +14,23 @@ PPO-trained agent for [OpenFront.io](https://openfront.io), a multiplayer territ
 ## Training Details
 - **Algorithm:** PPO (Proximal Policy Optimization)
-- **Architecture:** Actor-Critic with shared backbone (256→256→128)
 - **Observation dim:** 80
 - **Max neighbors:** 16
 - **Maps:** plains, big_plains, world, giantworldmap, ocean_and_land, half_land_half_ocean (random per episode)
 - **Opponents:** 2 Easy bots
 - **Parallel envs:** 8
-- **Learning rate:** 0.0002
 - **Rollout steps:** 512
-- **Updates trained:** 1650
-- **Global steps:** 6758400
-- **Best mean reward:** 468.54246531009676
 ## Final Training Metrics
-- **Mean reward:** 231.36178754091262
-- **Mean episode length:** 6722.13
-- **Loss:** 1.217943549156189
 ## Usage
@@ -38,7 +38,7 @@ PPO-trained agent for [OpenFront.io](https://openfront.io), a multiplayer territ
 from train import ActorCritic
 import torch
-model = ActorCritic(obs_dim=80, max_neighbors=16)
 model.load_state_dict(torch.load("best_model.pt", weights_only=True))
 model.eval()
 ```

 ## Training Details
 - **Algorithm:** PPO (Proximal Policy Optimization)
+- **Architecture:** Actor-Critic with shared backbone (512→512→256)
 - **Observation dim:** 80
 - **Max neighbors:** 16
 - **Maps:** plains, big_plains, world, giantworldmap, ocean_and_land, half_land_half_ocean (random per episode)
 - **Opponents:** 2 Easy bots
 - **Parallel envs:** 8
+- **Learning rate:** 0.00015
 - **Rollout steps:** 512
+- **Updates trained:** 330
+- **Global steps:** 1351680
+- **Best mean reward:** 591.3189961528778
 ## Final Training Metrics
+- **Mean reward:** 591.3189961528778
+- **Mean episode length:** 3142.3
+- **Loss:** 1779.034423828125
 ## Usage
 from train import ActorCritic
 import torch
+model = ActorCritic(obs_dim=80, max_neighbors=16, hidden_sizes=[512, 512, 256])
 model.load_state_dict(torch.load("best_model.pt", weights_only=True))
 model.eval()
 ```