Spaces:

qpluslab
/

OpenRA-Bench

Running

Commit History

yxc20098 commited on May 23

yxc20098 commited on May 23

yxc20098 commited on May 23

yxc20098 commited on May 22

yxc20098 commited on May 21

yxc20098 commited on May 21

yxc20098 commited on May 20

yxc20098 commited on May 19

yxc20098 commited on May 19

yxc20098 commited on May 19

yxc20098 commited on May 19

yxc20098 commited on May 18

yxc20098 commited on May 18

yxc20098 commited on May 18

yxc20098 commited on May 18

yxc20098 commited on May 18

yxc20098 commited on May 18

yxc20098 commited on May 18

yxc20098 commited on May 17

yxc20098 commited on May 17

yxc20098 commited on May 17

yxc20098 commited on May 17

Commit History

Defensive fix: strip agent.cash=0 when starting_cash>0 in tmp YAML b77e43d

Quality drive: schema fix, 5 new/revised packs, 4 engine tests, scenario audit 6d71d3b

Engine-feature integration: 4 commands + 9 scenario packs + 4 test suites 20960c1

Add perception ablation grid (observation channel × fog of war) 4a5b0dd

Phase 1: unified Controller interface for the eval stack c68e036

feat(scenario): scout-cycle-keep-info-fresh — re-scout to detect mid-episode reinforcement 0090040

feat(bench): forbidden_tools + tool_violations_gte — strict-toolban / procedural-compliance primitive (BFCL V4 / τ²-bench / IFBench anchor) e3e91b7

Default spawn_mcvs:false — stop engine auto-seeding phantom MCVs f1ea367

playback: append a terminal 'episode end' frame (the resolved win/loss board) 7dbeb97

Structured-fog text mode, premium routing, codex descriptions, minimap colour-by-difficulty 93ee9dd

playback: save the SAME _minimap_v2 the model receives (fog-accumulating) — viewer now shows exactly what the model saw, not the legacy matplotlib render d7ba62a

Training-parity minimap (real terrain + legend) + viewer (system/thinking/debrief) 39fba02

Unified Battle Viewer in app.py + run/model playback identity 0a488d3

Wire goal tracker into scoring + leaderboard eeedfdf

Playback: capture model reasoning + per-turn goal tracker + viewer f77eea7

S7 bench: surrender tool + loss outcome (tool schema 1:1, 15==15) 09ac234

Step 4 (bench): interrupt-driven loop + playback/trace capture 03c65ab

Pipeline step 7: per-episode playback persistence 28c736f

#12: real custom-map terrain via dynamic .oramap registry 83c6b8f

Bench: economy scenario pack + full-loop integ test + starting_cash constraint dc028b6

Add scoring + P/R/A diagnostics + run_eval CLI 5b68a55

Add Rust-backed eval stack: scenario packs, adapter, spine, integration tests 098c6e0

Defensive fix: strip agent.cash=0 when starting_cash>0 in tmp YAML

b77e43d

Quality drive: schema fix, 5 new/revised packs, 4 engine tests, scenario audit

6d71d3b

Engine-feature integration: 4 commands + 9 scenario packs + 4 test suites

20960c1

Add perception ablation grid (observation channel × fog of war)

4a5b0dd

Phase 1: unified Controller interface for the eval stack

c68e036

feat(scenario): scout-cycle-keep-info-fresh — re-scout to detect mid-episode reinforcement

0090040

feat(bench): forbidden_tools + tool_violations_gte — strict-toolban / procedural-compliance primitive (BFCL V4 / τ²-bench / IFBench anchor)

e3e91b7

Default spawn_mcvs:false — stop engine auto-seeding phantom MCVs

f1ea367

playback: append a terminal 'episode end' frame (the resolved win/loss board)

7dbeb97

Structured-fog text mode, premium routing, codex descriptions, minimap colour-by-difficulty

93ee9dd

playback: save the SAME _minimap_v2 the model receives (fog-accumulating) — viewer now shows exactly what the model saw, not the legacy matplotlib render

d7ba62a

Training-parity minimap (real terrain + legend) + viewer (system/thinking/debrief)

39fba02

Unified Battle Viewer in app.py + run/model playback identity

0a488d3

Wire goal tracker into scoring + leaderboard

eeedfdf

Playback: capture model reasoning + per-turn goal tracker + viewer

f77eea7

S7 bench: surrender tool + loss outcome (tool schema 1:1, 15==15)

09ac234

Step 4 (bench): interrupt-driven loop + playback/trace capture

03c65ab

Pipeline step 7: per-episode playback persistence

28c736f

#12: real custom-map terrain via dynamic .oramap registry

83c6b8f

Bench: economy scenario pack + full-loop integ test + starting_cash constraint

dc028b6

Add scoring + P/R/A diagnostics + run_eval CLI

5b68a55

Add Rust-backed eval stack: scenario packs, adapter, spine, integration tests

098c6e0