mischievers
/

openfront-rl-agent

@@ -15,22 +15,22 @@ PPO-trained agent for [OpenFront.io](https://openfront.io), a multiplayer territ
 - **Algorithm:** PPO (Proximal Policy Optimization)
 - **Architecture:** Actor-Critic with shared backbone (256→256→128)
-- **Observation dim:** N/A
-- **Max neighbors:** N/A
-- **Maps:** N/A (random per episode)
-- **Opponents:** N/A N/A bots
-- **Parallel envs:** N/A
-- **Learning rate:** N/A
-- **Rollout steps:** N/A
-- **Updates trained:** N/A
-- **Global steps:** N/A
-- **Best mean reward:** N/A
 ## Final Training Metrics
-- **Mean reward:** 29.898164215087892
-- **Mean episode length:** 3657.29
-- **Loss:** 0.8671517372131348
 ## Usage
@@ -38,7 +38,7 @@ PPO-trained agent for [OpenFront.io](https://openfront.io), a multiplayer territ
 from train import ActorCritic
 import torch
-model = ActorCritic(obs_dim=N/A, max_neighbors=N/A)
 model.load_state_dict(torch.load("best_model.pt", weights_only=True))
 model.eval()
 ```

 - **Algorithm:** PPO (Proximal Policy Optimization)
 - **Architecture:** Actor-Critic with shared backbone (256→256→128)
+- **Observation dim:** 80
+- **Max neighbors:** 16
+- **Maps:** plains, big_plains, world, giantworldmap, ocean_and_land, half_land_half_ocean (random per episode)
+- **Opponents:** 2 Easy bots
+- **Parallel envs:** 8
+- **Learning rate:** 0.0002
+- **Rollout steps:** 512
+- **Updates trained:** 1400
+- **Global steps:** 5734400
+- **Best mean reward:** 468.54246531009676
 ## Final Training Metrics
+- **Mean reward:** 178.37687824249267
+- **Mean episode length:** 6926.31
+- **Loss:** 0.08463311195373535
 ## Usage
 from train import ActorCritic
 import torch
+model = ActorCritic(obs_dim=80, max_neighbors=16)
 model.load_state_dict(torch.load("best_model.pt", weights_only=True))
 model.eval()
 ```