Spaces:
Running
Running
A newer version of the Gradio SDK is available:
6.9.0
metadata
title: OpenRA-Bench
emoji: 🎮
colorFrom: red
colorTo: blue
sdk: gradio
sdk_version: 5.12.0
app_file: app.py
pinned: true
license: gpl-3.0
OpenRA-Bench
Standardized benchmark and leaderboard for AI agents playing Red Alert through OpenRA-RL.
Features
- Leaderboard: Ranked agent comparison with composite scoring
- Filtering: By agent type (Scripted/LLM/RL) and opponent difficulty
- Evaluation harness: Automated N-game benchmarking with metrics collection
- OpenEnv rubrics: Composable scoring (win/loss, military efficiency, economy)
- Replay verification: Replay files linked to leaderboard entries
Quick Start
View the leaderboard
pip install -r requirements.txt
python app.py
# Opens at http://localhost:7860
Run an evaluation
# Against the HuggingFace-hosted environment (no Docker needed)
python evaluate.py \
--agent scripted \
--agent-name "MyBot-v1" \
--opponent Normal \
--games 10 \
--server https://openra-rl-openra-rl.hf.space
# Or against a local Docker server
python evaluate.py \
--agent scripted \
--agent-name "MyBot-v1" \
--opponent Normal \
--games 10 \
--server http://localhost:8000
Submit results
Via CLI (recommended):
pip install openra-rl
openra-rl bench submit result.json
openra-rl bench submit result.json --replay game.orarep --agent-name "MyBot" --agent-url "https://github.com/user/mybot"
Results from openra-rl play are auto-submitted after each game.
Via PR:
- Fork this repo
- Run evaluation (appends to
data/results.csv) - Open a PR with your results
Agent identity
Customize your leaderboard entry:
| Field | Description |
|---|---|
agent_name |
Display name (e.g. "DeathBot-9000") |
agent_type |
Scripted, LLM, or RL |
agent_url |
GitHub/project URL — renders as a clickable link on the leaderboard |
Replay downloads
Entries submitted with a .orarep replay file show a download link in the Replay column. Replays are stored on the Space and served at /replays/<filename>.
API endpoints
The Gradio app exposes these API endpoints (Gradio 5+ SSE protocol):
| Endpoint | Description |
|---|---|
submit |
Submit JSON results (no replay) |
submit_with_replay |
Submit JSON + replay file |
filter_leaderboard |
Query/filter leaderboard data |
Scoring
| Component | Weight | Description |
|---|---|---|
| Win Rate | 50% | Games won / total games |
| Military Efficiency | 25% | Kill/death cost ratio (normalized) |
| Economy | 25% | Final asset value (normalized) |