mischievers
/

openfront-rl-agent

Reinforcement Learning

Model card Files Files and versions

openfront-rl-agent / README.md

JoshuaFreeman's picture

Update README for v13b

beba66a verified about 1 month ago

|

history blame contribute delete

1.96 kB

	---
	license: mit
	tags:
	- reinforcement-learning
	- ppo
	- openfront
	- game-ai
	---

	# OpenFront RL Agent

	PPO-trained agent for [OpenFront.io](https://openfront.io), a multiplayer territory control game.

	## Model Version: v13b

	Current best model trained with normalized elimination reward and winner bonus.

	## Training Details

	- Algorithm: PPO (Proximal Policy Optimization)
	- Architecture: Actor-Critic with shared backbone (512→512→256)
	- Observation dim: 80 (16 player stats + 16 neighbors × 4 features)
	- Action space: MultiDiscrete [17 action types, 16 targets, 5 troop fractions]
	- Maps: plains, big_plains, world, giantworldmap, ocean_and_land, half_land_half_ocean (random per episode)
	- Parallel envs: 16
	- Learning rate: 1.5e-4 (constant)
	- Rollout steps: 1024
	- Batch size: 16,384
	- Value function coefficient: 0.5
	- Updates trained: 1550 (ongoing)

	## Reward Design (v13)

	Normalized elimination reward — total reward sums to +1.0 on a full win regardless of opponent count:
	- Per-kill: `+1/N` per opponent eliminated (N = starting opponents)
	- Winner bonus: remaining alive opponents credited as `aliveCount/N` when `game.getWinner()` fires
	- Death penalty: -1.0

	## Curriculum

	Win-rate-gated 12-stage curriculum advancing through Easy → Medium → Hard difficulty and 2 → 15 opponents. Stages advance only when rolling win rate exceeds per-stage threshold (75% down to 45%) over 200 episodes.

	## Eval Results

	- Easy/2 opponents: 100% win rate (20/20 games)

	## Usage

	```python
	from train import ActorCritic
	import torch

	model = ActorCritic(obs_dim=80, max_neighbors=16, hidden_sizes=[512, 512, 256])
	checkpoint = torch.load("best_model.pt", map_location="cpu", weights_only=False)
	model.load_state_dict(checkpoint["model_state_dict"])
	model.eval()
	```

	## Repository

	Trained from [josh-freeman/openfront-rl](https://github.com/josh-freeman/openfront-rl).