OpenRA-Bench / openra_bench /eval_core.py

Commit History

Defensive fix: strip agent.cash=0 when starting_cash>0 in tmp YAML
b77e43d

yxc20098 commited on

Quality drive: schema fix, 5 new/revised packs, 4 engine tests, scenario audit
6d71d3b

yxc20098 commited on

Engine-feature integration: 4 commands + 9 scenario packs + 4 test suites
20960c1

yxc20098 commited on

Add perception ablation grid (observation channel × fog of war)
4a5b0dd

yxc20098 commited on

Phase 1: unified Controller interface for the eval stack
c68e036

yxc20098 commited on

feat(scenario): scout-cycle-keep-info-fresh — re-scout to detect mid-episode reinforcement
0090040

yxc20098 commited on

feat(bench): forbidden_tools + tool_violations_gte — strict-toolban / procedural-compliance primitive (BFCL V4 / τ²-bench / IFBench anchor)
e3e91b7

yxc20098 commited on

Default spawn_mcvs:false — stop engine auto-seeding phantom MCVs
f1ea367

yxc20098 commited on

playback: append a terminal 'episode end' frame (the resolved win/loss board)
7dbeb97

yxc20098 commited on

Structured-fog text mode, premium routing, codex descriptions, minimap colour-by-difficulty
93ee9dd

yxc20098 commited on

playback: save the SAME _minimap_v2 the model receives (fog-accumulating) — viewer now shows exactly what the model saw, not the legacy matplotlib render
d7ba62a

yxc20098 commited on

Training-parity minimap (real terrain + legend) + viewer (system/thinking/debrief)
39fba02

yxc20098 commited on

Unified Battle Viewer in app.py + run/model playback identity
0a488d3

yxc20098 commited on

Wire goal tracker into scoring + leaderboard
eeedfdf

yxc20098 commited on

Playback: capture model reasoning + per-turn goal tracker + viewer
f77eea7

yxc20098 commited on

S7 bench: surrender tool + loss outcome (tool schema 1:1, 15==15)
09ac234

yxc20098 commited on

Step 4 (bench): interrupt-driven loop + playback/trace capture
03c65ab

yxc20098 commited on

Pipeline step 7: per-episode playback persistence
28c736f

yxc20098 commited on

#12: real custom-map terrain via dynamic .oramap registry
83c6b8f

yxc20098 commited on

Bench: economy scenario pack + full-loop integ test + starting_cash constraint
dc028b6

yxc20098 commited on

Add scoring + P/R/A diagnostics + run_eval CLI
5b68a55

yxc20098 commited on

Add Rust-backed eval stack: scenario packs, adapter, spine, integration tests
098c6e0

yxc20098 commited on