OpenRA-Bench / openra_bench /run_eval.py

Commit History

run_eval: add --bedrock-region flag and route bedrock provider config
a3ba9ba

Xiaochuang Yuan commited on

Bucket D: discover archived packs + load adversarial-siege from _archive
0f4480c

Xiaochuang Yuan commited on

Phase 4 fix: run_eval double-nest in full_playback dir
d608892

yxc20098 commited on

Quality drive: schema fix, 5 new/revised packs, 4 engine tests, scenario audit
6d71d3b

yxc20098 commited on

Paper-experiment prep: human-study harness + pass^k + paper plan
07dfe2e

yxc20098 commited on

Add handoff ablation (recover-from-deficit / capitalize-on-advantage)
cb15568

yxc20098 commited on

Add perception ablation grid (observation channel × fog of war)
4a5b0dd

yxc20098 commited on

feat(providers): add together.ai preset (Qwen3.6-Plus and friends)
90c4b50

yxc20098 commited on

win-speed bonus: record + reward how fast a model wins
35d7d47

yxc20098 commited on

action-multiunit-coordination hard: spatial-grounding via relative-direction objective
51d66ad

yxc20098 commited on

Scenario configs (level+fog per cell) + adversarial-duel duel/interrupts + clearer names
f244b78

yxc20098 commited on

run_eval: --or-provider parses provider/quant correctly
fa727f0

yxc20098 commited on

run_eval: --or-provider (pin OpenRouter provider/quant, no fallback) + --fog-mode CLI
cf788d9

yxc20098 commited on

Structured-fog text mode, premium routing, codex descriptions, minimap colour-by-difficulty
93ee9dd

yxc20098 commited on

Wire bench to vendored training prompt v2 (system/briefing/minimap)
8e88074

yxc20098 commited on

Training-parity minimap (real terrain + legend) + viewer (system/thinking/debrief)
39fba02

yxc20098 commited on

Live-smoke fixes: tool-call wire 400, episode resilience, real PNG minimap
247ff7a

yxc20098 commited on

run_eval: dependency-free .env autoload for the API key
82e69a9

yxc20098 commited on

Deterministic scenario-scoped game knowledge + explicit objective
049448a

yxc20098 commited on

Eval resilience layer for real OpenRouter runs
424da31

yxc20098 commited on

Scenario hygiene: quarantine redundant cat-* + harvest packs
8bb4b88

yxc20098 commited on

Adversarial 1v1 spotlight: ladder family + rating + Elo wiring
f5e23f8

yxc20098 commited on

Unified Battle Viewer in app.py + run/model playback identity
0a488d3

yxc20098 commited on

Wire goal tracker into scoring + leaderboard
eeedfdf

yxc20098 commited on

Step 2: concurrent scenario execution (--concurrency N)
3771d77

yxc20098 commited on

Pipeline step 7: per-episode playback persistence
28c736f

yxc20098 commited on

Generalization-gap metric: held-out split in run_eval + leaderboard
03e4efa

yxc20098 commited on

#6 leaderboard: data layer + run_eval publish + Gradio tab
b98ab1a

yxc20098 commited on

Add scoring + P/R/A diagnostics + run_eval CLI
5b68a55

yxc20098 commited on