yxc20098 commited on
Commit
ec99be9
·
1 Parent(s): c211b47

fix(scenario): combat-vehicle-vs-infantry-counter — restore no-cheat bar after armor-class engine fix

Browse files

The OpenRA-Rust armor-class engine fix (4d91fe0) made pre-placed agent
combat units auto-fire effectively. The starter scout jeep then racked
up kills on its own, so a pure-observe `stall` policy reached the kill
bar and WON — violating the no-cheat bar.

Restore the bar:
- Starter jeep set to `stance: 0` (HoldFire) on every level / spawn
group — it scouts, it never auto-fires, so a stall policy scores
zero kills.
- Win predicate gains `unit_type_count_gte 2tnk:3` — the agent must
ACTUALLY field the 3-tank fist. Stall and wrong-counter (e3 / e1)
policies never build 2tnk → win clause structurally unmet.
- Fail clause gains `not own_units_gte:1` so a stalled-and-overrun
episode is a real LOSS, not an engine auto-`done` DRAW (the agent
starts with the jeep so the unit-less turn-1 mis-fire footgun does
not apply).
- Add `powr` + `fix` to every base — the war-factory vehicle queue
needs power online and a service depot for `2tnk` to clear its
prerequisites, so the tank counter is producible from turn 1.

Validated via scripted policies on 3 levels x seeds 1..4:
stall / build-e3 / build-e1 LOSE everywhere (real LOSS, no DRAW);
intended build-2tnk WINS everywhere.

openra_bench/scenarios/packs/combat-vehicle-vs-infantry-counter.yaml CHANGED
@@ -21,33 +21,43 @@
21
  # Pre-placed (each spawn group): agent `fact` + `tent` (infantry
22
  # trainer; enables e1 and e3) + `weap` (vehicle trainer; enables
23
  # 2tnk) + a single starter `jeep` (allies scout vehicle; visibility
24
- # over the enemy composition + satisfies own_units_gte:1 from turn 0
25
- # so the unit-less misfire footgun in CLAUDE.md doesn't trip on the
26
- # fail clause). The tent+weap pair makes BOTH counter compositions
27
- # buildable from turn 1 the decision the model faces is
28
- # composition, not tech-up.
 
 
 
 
 
 
 
 
 
29
  #
30
  # Discrimination on EASY / MEDIUM (single enemy cluster, both
31
- # compositions are buildable from t=1):
32
- # stall (only observe): never reaches the kill bar → after_ticks
33
- # LOSS.
34
- # build-only-e1 (match enemy 1:1 with cheap rifles): 25× e1 vs
35
- # the enemy e1 mass is a 1:1 attrition; the enemy holds entrenched
36
- # while the agent has to MOVE INTO range moving infantry get
37
- # shot first; attrition LOSES, the kill bar is unmet OR the
38
- # starter jeep + survivors evaporate before the cap.
39
- # • build-only-e3 (anti-armour rockets against infantry the wrong
40
- # counter): e3 ($2400) costs $300/unit to put out the same
41
- # anti-infantry DPS that 24× e1 ($2400) would; the rocket squad
42
- # is OUT-SHOT by the e1 mass on raw infantry-vs-infantry numbers
43
- # (e3 hp45 < e1 hp50; e3 rng4 < tank rng4.75; e3 anti-armour
44
- # warhead is overkill on infantry but the same slow projectiles
45
- # under-perform vs small targets) kill bar unmet OR survivors
46
- # under N LOSS.
47
- # intended build-2tnk (3× medium tanks @ $850 = $2550): the right
48
- # counter heavy armour soaks small-arms fire, tank dps22 + rng
49
- # 4.75 + sight 6c walks through a static e1 mass without losing
50
- # a tank kill bar met, all 3 tanks alive, fact intact → WIN.
 
51
  #
52
  # Discrimination on HARD (+1 axis: 2 agent spawn_point groups):
53
  # • The agent base seed-rotates between NORTH (y=12) and SOUTH
@@ -70,15 +80,31 @@
70
  # • `after_ticks` fail clauses reachable within max_turns
71
  # (within_ticks 5400 ≤ 5403 at max_turns 60): a staller hits
72
  # after_ticks 5401 and LOSES, never draws.
73
- # • Starting `jeep` ensures own_units_gte:1 is met from t=0 so
74
- # the unit-less mis-fire (CLAUDE.md) doesn't trip the fail
75
- # clause before the model has built anything.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
76
  # • Spawn-group footgun (CLAUDE.md oramap): on hard, ANY agent
77
  # actor with `spawn_point` filters OUT every agent actor without
78
- # one — so BOTH bases (fact + tent + weap + jeep) are
79
- # duplicated under BOTH spawn_point groups at their respective
80
- # coords. The single far enemy fact at (124,20) and the centre
81
- # enemy cluster have NO spawn_point and place on every seed.
 
82
  # • starting_cash $2550 = exactly 3× 2tnk; 8× e3 = $2400 (cash
83
  # unspent); 25× e1 = $2500. Neither rocket nor rifle mass is
84
  # dominant against the entrenched enemy.
@@ -130,13 +156,15 @@ base:
130
  planning: true
131
  termination: {max_ticks: 8000}
132
  # Default base (overridden on hard). The starter jeep gives turn-1
133
- # scouting (sight 7c) AND satisfies own_units_gte:1 from t=0 so the
134
- # fail clause's not own_units_gte:1 doesn't mis-fire on turn 1.
135
  actors:
136
  - {type: fact, owner: agent, position: [10, 20]}
 
137
  - {type: tent, owner: agent, position: [14, 18]}
138
  - {type: weap, owner: agent, position: [14, 22]}
139
- - {type: jeep, owner: agent, position: [12, 20]}
 
140
  # Far persistent enemy marker — prevents engine auto-done when
141
  # the live infantry cluster falls so the win/fail evaluator sees
142
  # the terminal frame.
@@ -146,9 +174,9 @@ levels:
146
  # ── EASY ─────────────────────────────────────────────────────────
147
  # Bare counter-selection skill: a small visible enemy infantry
148
  # cluster (8× e1) on the centre lane. Cash $2550 funds 3× 2tnk
149
- # (the right counter) cleanly. Loss cap is generous via the
150
- # own_units_gte:1 floor only (any survivor counts). Stall: kill bar
151
- # unmet → after_ticks LOSS.
152
  easy:
153
  description: >
154
  Cash $2550. The enemy is a small cluster of 8 rifle infantry
@@ -160,16 +188,18 @@ levels:
160
  armoured fist (2tnk @ $850, exactly 3 for $2550) OR rifle
161
  infantry (e1 @ $100, up to 25). Scout the enemy with the jeep,
162
  pick the correct hard counter, and commit the whole budget.
163
- Win when 6 enemy units are killed AND ≥1 of your units are
164
- alive AND your construction yard still stands, before tick
165
- 5400. Stalling, picking the wrong counter, or letting your
166
- force evaporate fails the bar.
167
  overrides:
168
  actors:
169
  - {type: fact, owner: agent, position: [10, 20]}
 
170
  - {type: tent, owner: agent, position: [14, 18]}
171
  - {type: weap, owner: agent, position: [14, 22]}
172
- - {type: jeep, owner: agent, position: [12, 20]}
 
173
  # 8× e1 entrenched cluster — stance:3 (attack anything in
174
  # range). Spread over a 3×3 grid centred on (70,20) so the
175
  # mass can be engaged from any approach axis.
@@ -184,15 +214,15 @@ levels:
184
  - {type: fact, owner: enemy, position: [124, 20]}
185
  win_condition:
186
  all_of:
 
187
  - {units_killed_gte: 6}
188
- - {own_units_gte: 1}
189
  - {has_building: fact}
190
  - {within_ticks: 5400}
191
  fail_condition:
192
  any_of:
193
  - {after_ticks: 5401}
194
- - {not: {own_units_gte: 1}}
195
  - {not: {has_building: fact}}
 
196
  max_turns: 60
197
 
198
  # ── MEDIUM ───────────────────────────────────────────────────────
@@ -214,15 +244,17 @@ levels:
214
  through small-arms fire. Mass rockets (e3 @ $300) waste cost-
215
  per-effect against soft targets and get out-DPSed by the rifle
216
  mass on attrition; matching with own rifles (e1 @ $100) is a
217
- 1:1 trade with no advantage. Win when 8 enemy units are killed
218
- AND ≥1 of your units are alive AND your fact still stands,
219
- before tick 5400.
220
  overrides:
221
  actors:
222
  - {type: fact, owner: agent, position: [10, 20]}
 
223
  - {type: tent, owner: agent, position: [14, 18]}
224
  - {type: weap, owner: agent, position: [14, 22]}
225
- - {type: jeep, owner: agent, position: [12, 20]}
 
226
  # 12× e1 entrenched cluster — deeper centre mass.
227
  - {type: e1, owner: enemy, position: [70, 17], stance: 3}
228
  - {type: e1, owner: enemy, position: [70, 18], stance: 3}
@@ -239,15 +271,15 @@ levels:
239
  - {type: fact, owner: enemy, position: [124, 20]}
240
  win_condition:
241
  all_of:
 
242
  - {units_killed_gte: 8}
243
- - {own_units_gte: 1}
244
  - {has_building: fact}
245
  - {within_ticks: 5400}
246
  fail_condition:
247
  any_of:
248
  - {after_ticks: 5401}
249
- - {not: {own_units_gte: 1}}
250
  - {not: {has_building: fact}}
 
251
  max_turns: 60
252
 
253
  # ── HARD ─────────────────────────────────────────────────────────
@@ -271,21 +303,25 @@ levels:
271
  medium tanks (2tnk) walk through small-arms fire; mass rockets
272
  (e3) waste cost-per-effect against soft targets and get out-
273
  DPSed on attrition; matching with own rifles is a 1:1 trade
274
- with no advantage. Win when 8 enemy units are killed AND ≥1 of
275
- your units are alive AND your fact still stands, before tick
276
- 5400.
277
  overrides:
278
  actors:
279
  # ── AGENT spawn 0 — NORTH base (y=12) ─────────────────────
280
  - {type: fact, owner: agent, position: [10, 12], spawn_point: 0}
 
281
  - {type: tent, owner: agent, position: [14, 10], spawn_point: 0}
282
  - {type: weap, owner: agent, position: [14, 14], spawn_point: 0}
283
- - {type: jeep, owner: agent, position: [12, 12], spawn_point: 0}
 
284
  # ── AGENT spawn 1 — SOUTH base (y=28) ─────────────────────
285
  - {type: fact, owner: agent, position: [10, 28], spawn_point: 1}
 
286
  - {type: tent, owner: agent, position: [14, 26], spawn_point: 1}
287
  - {type: weap, owner: agent, position: [14, 30], spawn_point: 1}
288
- - {type: jeep, owner: agent, position: [12, 28], spawn_point: 1}
 
289
  # ── CENTRE ENEMY CLUSTER — pure infantry (always places) ──
290
  # 12× e1 entrenched on the central lane. Per CLAUDE.md, enemy
291
  # actors do not honour spawn_point — this cluster lands on
@@ -307,13 +343,13 @@ levels:
307
  - {type: fact, owner: enemy, position: [124, 20]}
308
  win_condition:
309
  all_of:
 
310
  - {units_killed_gte: 8}
311
- - {own_units_gte: 1}
312
  - {has_building: fact}
313
  - {within_ticks: 5400}
314
  fail_condition:
315
  any_of:
316
  - {after_ticks: 5401}
317
- - {not: {own_units_gte: 1}}
318
  - {not: {has_building: fact}}
 
319
  max_turns: 60
 
21
  # Pre-placed (each spawn group): agent `fact` + `tent` (infantry
22
  # trainer; enables e1 and e3) + `weap` (vehicle trainer; enables
23
  # 2tnk) + a single starter `jeep` (allies scout vehicle; visibility
24
+ # over the enemy composition). The starter jeep is `stance: 0`
25
+ # (HoldFire) so it NEVER auto-fires it scouts, it does not fight.
26
+ # This is the load-bearing anti-stall guard (CLAUDE.md: an armour-
27
+ # class engine fix made pre-placed agent combat units auto-fire
28
+ # effectively, so a HoldFire jeep cannot rack up kills on its own).
29
+ # The tent+weap pair makes BOTH counter compositions buildable from
30
+ # turn 1 — the decision the model faces is composition, not tech-up.
31
+ #
32
+ # The win predicate requires `unit_type_count_gte: 2tnk:3` — the
33
+ # agent must have ACTUALLY FIELDED 3 medium tanks (the right
34
+ # counter). A stall policy never builds tanks; a wrong-counter
35
+ # policy (e3 / e1) never builds 2tnk — neither can satisfy the win
36
+ # regardless of how many kills the entrenched enemy concedes. Only
37
+ # building + commanding the 3-tank fist clears the bar.
38
  #
39
  # Discrimination on EASY / MEDIUM (single enemy cluster, both
40
+ # compositions are buildable from t=1). The win predicate is
41
+ # `unit_type_count_gte 2tnk:3 AND units_killed_gte K AND
42
+ # has_building fact` — the 2tnk:3 clause means ONLY a policy that
43
+ # actually builds the 3-tank fist can clear the bar:
44
+ # stall (only observe): builds nothing; the HoldFire jeep never
45
+ # fires 0 kills, 0 tanks. The entrenched e1 swarm eventually
46
+ # hunts down the idle jeep force-wipe (not own_units_gte:1)
47
+ # LOSS (or after_ticks LOSS if the jeep is never reached).
48
+ # • build-only-e1 (match enemy 1:1 with cheap rifles): never
49
+ # builds 2tnk the 2tnk:3 clause is structurally unmet; even
50
+ # a full kill count cannot win after_ticks / force-wipe LOSS.
51
+ # build-only-e3 (anti-armour rockets against infantry — the
52
+ # wrong counter): never builds 2tnk the 2tnk:3 clause is
53
+ # structurally unmet; the rocket squad is also OUT-SHOT by the
54
+ # e1 mass on raw infantry-vs-infantry numbers (e3 hp45 < e1
55
+ # hp50; slow anti-armour projectiles under-perform vs small
56
+ # targets) after_ticks / force-wipe LOSS.
57
+ # intended build-2tnk (3× medium tanks @ $850 = $2550): the
58
+ # right counter heavy armour soaks small-arms fire, tank
59
+ # dps22 + rng 4.75 + sight 6c walks through a static e1 mass.
60
+ # 2tnk:3 clause met, kill bar met, fact intact → WIN.
61
  #
62
  # Discrimination on HARD (+1 axis: 2 agent spawn_point groups):
63
  # • The agent base seed-rotates between NORTH (y=12) and SOUTH
 
80
  # • `after_ticks` fail clauses reachable within max_turns
81
  # (within_ticks 5400 ≤ 5403 at max_turns 60): a staller hits
82
  # after_ticks 5401 and LOSES, never draws.
83
+ # • Starting `jeep` is `stance: 0` (HoldFire) it scouts but
84
+ # never auto-fires, so a pure-observe stall policy cannot score
85
+ # kills with it. The win predicate's `unit_type_count_gte`
86
+ # 2tnk:3 is the real anti-cheat: only the agent that builds the
87
+ # 3-tank fist can clear the bar. The fail clause is
88
+ # `after_ticks | not has_building:fact | not own_units_gte:1` —
89
+ # the agent starts WITH the jeep so `own_units_gte:1` is
90
+ # satisfied from t=0 (the unit-less turn-1 mis-fire footgun in
91
+ # CLAUDE.md does not apply); the force-wipe clause turns a
92
+ # stalled-and-overrun episode into a real LOSS instead of an
93
+ # engine auto-`done` DRAW when the entrenched e1 swarm hunts
94
+ # down the idle HoldFire jeep.
95
+ # • The vehicle queue (`weap` war factory) needs `powr` online
96
+ # AND a `fix` service depot present for `2tnk` to clear its
97
+ # prerequisites — both are pre-placed in every base so the
98
+ # tank counter is producible from turn 1 (the decision is
99
+ # composition, not tech-up). `tent` produces e1/e3 with `powr`
100
+ # alone.
101
  # • Spawn-group footgun (CLAUDE.md oramap): on hard, ANY agent
102
  # actor with `spawn_point` filters OUT every agent actor without
103
+ # one — so BOTH bases (fact + powr + tent + weap + fix + jeep)
104
+ # are duplicated under BOTH spawn_point groups at their
105
+ # respective coords. The single far enemy fact at (124,20) and
106
+ # the centre enemy cluster have NO spawn_point and place on
107
+ # every seed.
108
  # • starting_cash $2550 = exactly 3× 2tnk; 8× e3 = $2400 (cash
109
  # unspent); 25× e1 = $2500. Neither rocket nor rifle mass is
110
  # dominant against the entrenched enemy.
 
156
  planning: true
157
  termination: {max_ticks: 8000}
158
  # Default base (overridden on hard). The starter jeep gives turn-1
159
+ # scouting (sight 7c). It is `stance: 0` (HoldFire) so it never
160
+ # auto-fires a stall policy cannot score kills with it.
161
  actors:
162
  - {type: fact, owner: agent, position: [10, 20]}
163
+ - {type: powr, owner: agent, position: [10, 16]}
164
  - {type: tent, owner: agent, position: [14, 18]}
165
  - {type: weap, owner: agent, position: [14, 22]}
166
+ - {type: fix, owner: agent, position: [18, 20]}
167
+ - {type: jeep, owner: agent, position: [12, 20], stance: 0}
168
  # Far persistent enemy marker — prevents engine auto-done when
169
  # the live infantry cluster falls so the win/fail evaluator sees
170
  # the terminal frame.
 
174
  # ── EASY ─────────────────────────────────────────────────────────
175
  # Bare counter-selection skill: a small visible enemy infantry
176
  # cluster (8× e1) on the centre lane. Cash $2550 funds 3× 2tnk
177
+ # (the right counter) cleanly. The win requires fielding 2tnk
178
+ # AND 6 kills; stall / wrong-counter never field the tanks → kill
179
+ # bar + 2tnk:3 clause unmet → after_ticks LOSS.
180
  easy:
181
  description: >
182
  Cash $2550. The enemy is a small cluster of 8 rifle infantry
 
188
  armoured fist (2tnk @ $850, exactly 3 for $2550) OR rifle
189
  infantry (e1 @ $100, up to 25). Scout the enemy with the jeep,
190
  pick the correct hard counter, and commit the whole budget.
191
+ Win when you have fielded 3 medium tanks (2tnk) AND 6 enemy
192
+ units are killed AND your construction yard still stands,
193
+ before tick 5400. Stalling or picking the wrong counter never
194
+ fields the 3-tank fist and fails the bar.
195
  overrides:
196
  actors:
197
  - {type: fact, owner: agent, position: [10, 20]}
198
+ - {type: powr, owner: agent, position: [10, 16]}
199
  - {type: tent, owner: agent, position: [14, 18]}
200
  - {type: weap, owner: agent, position: [14, 22]}
201
+ - {type: fix, owner: agent, position: [18, 20]}
202
+ - {type: jeep, owner: agent, position: [12, 20], stance: 0}
203
  # 8× e1 entrenched cluster — stance:3 (attack anything in
204
  # range). Spread over a 3×3 grid centred on (70,20) so the
205
  # mass can be engaged from any approach axis.
 
214
  - {type: fact, owner: enemy, position: [124, 20]}
215
  win_condition:
216
  all_of:
217
+ - {unit_type_count_gte: {type: 2tnk, n: 3}}
218
  - {units_killed_gte: 6}
 
219
  - {has_building: fact}
220
  - {within_ticks: 5400}
221
  fail_condition:
222
  any_of:
223
  - {after_ticks: 5401}
 
224
  - {not: {has_building: fact}}
225
+ - {not: {own_units_gte: 1}}
226
  max_turns: 60
227
 
228
  # ── MEDIUM ───────────────────────────────────────────────────────
 
244
  through small-arms fire. Mass rockets (e3 @ $300) waste cost-
245
  per-effect against soft targets and get out-DPSed by the rifle
246
  mass on attrition; matching with own rifles (e1 @ $100) is a
247
+ 1:1 trade with no advantage. Win when you have fielded 3
248
+ medium tanks (2tnk) AND 8 enemy units are killed AND your fact
249
+ still stands, before tick 5400.
250
  overrides:
251
  actors:
252
  - {type: fact, owner: agent, position: [10, 20]}
253
+ - {type: powr, owner: agent, position: [10, 16]}
254
  - {type: tent, owner: agent, position: [14, 18]}
255
  - {type: weap, owner: agent, position: [14, 22]}
256
+ - {type: fix, owner: agent, position: [18, 20]}
257
+ - {type: jeep, owner: agent, position: [12, 20], stance: 0}
258
  # 12× e1 entrenched cluster — deeper centre mass.
259
  - {type: e1, owner: enemy, position: [70, 17], stance: 3}
260
  - {type: e1, owner: enemy, position: [70, 18], stance: 3}
 
271
  - {type: fact, owner: enemy, position: [124, 20]}
272
  win_condition:
273
  all_of:
274
+ - {unit_type_count_gte: {type: 2tnk, n: 3}}
275
  - {units_killed_gte: 8}
 
276
  - {has_building: fact}
277
  - {within_ticks: 5400}
278
  fail_condition:
279
  any_of:
280
  - {after_ticks: 5401}
 
281
  - {not: {has_building: fact}}
282
+ - {not: {own_units_gte: 1}}
283
  max_turns: 60
284
 
285
  # ── HARD ─────────────────────────────────────────────────────────
 
303
  medium tanks (2tnk) walk through small-arms fire; mass rockets
304
  (e3) waste cost-per-effect against soft targets and get out-
305
  DPSed on attrition; matching with own rifles is a 1:1 trade
306
+ with no advantage. Win when you have fielded 3 medium tanks
307
+ (2tnk) AND 8 enemy units are killed AND your fact still
308
+ stands, before tick 5400.
309
  overrides:
310
  actors:
311
  # ── AGENT spawn 0 — NORTH base (y=12) ─────────────────────
312
  - {type: fact, owner: agent, position: [10, 12], spawn_point: 0}
313
+ - {type: powr, owner: agent, position: [10, 8], spawn_point: 0}
314
  - {type: tent, owner: agent, position: [14, 10], spawn_point: 0}
315
  - {type: weap, owner: agent, position: [14, 14], spawn_point: 0}
316
+ - {type: fix, owner: agent, position: [18, 12], spawn_point: 0}
317
+ - {type: jeep, owner: agent, position: [12, 12], stance: 0, spawn_point: 0}
318
  # ── AGENT spawn 1 — SOUTH base (y=28) ─────────────────────
319
  - {type: fact, owner: agent, position: [10, 28], spawn_point: 1}
320
+ - {type: powr, owner: agent, position: [10, 32], spawn_point: 1}
321
  - {type: tent, owner: agent, position: [14, 26], spawn_point: 1}
322
  - {type: weap, owner: agent, position: [14, 30], spawn_point: 1}
323
+ - {type: fix, owner: agent, position: [18, 28], spawn_point: 1}
324
+ - {type: jeep, owner: agent, position: [12, 28], stance: 0, spawn_point: 1}
325
  # ── CENTRE ENEMY CLUSTER — pure infantry (always places) ──
326
  # 12× e1 entrenched on the central lane. Per CLAUDE.md, enemy
327
  # actors do not honour spawn_point — this cluster lands on
 
343
  - {type: fact, owner: enemy, position: [124, 20]}
344
  win_condition:
345
  all_of:
346
+ - {unit_type_count_gte: {type: 2tnk, n: 3}}
347
  - {units_killed_gte: 8}
 
348
  - {has_building: fact}
349
  - {within_ticks: 5400}
350
  fail_condition:
351
  any_of:
352
  - {after_ticks: 5401}
 
353
  - {not: {has_building: fact}}
354
+ - {not: {own_units_gte: 1}}
355
  max_turns: 60
tests/test_combat_vehicle_vs_infantry_counter.py CHANGED
@@ -10,16 +10,27 @@ tank ordnance against soft targets — cost-per-effect waste + the
10
  rocket squad's short stand-off + low HP gets out-DPSed by the rifle
11
  mass); matching with own rifles is a 1:1 attrition that loses.
12
 
 
 
 
 
 
 
 
 
13
  The bar (per the spec):
14
- • stall (only observe) → LOSS (kill bar unmet after_ticks)
15
- build-only-e1 (match 1:1) → LOSS (attrition; movers shot first)
16
- • build-only-e3 (wrong counter) → LOSS (cost-per-effect + close-range)
17
- • intended build-2tnk WIN (heavy armour walks through e1)
18
-
19
- Validation is split between unit-level predicate checks (no engine)
20
- and engine-driven scripted policies. The unit-level checks are the
21
- load-bearing assertions for this commit (the engine-driven policies
22
- are documented as smoke-only and parametrised over the hard seeds).
 
 
 
23
  """
24
  from __future__ import annotations
25
 
@@ -67,8 +78,13 @@ def test_pack_compiles_and_meta_fields_populated():
67
  )
68
 
69
 
70
- def _ctx(*, units=(), tick=1000, kills=0, lost=0, has_fact=True):
71
- """Synthesize a WinContext for predicate-level checks."""
 
 
 
 
 
72
  import types
73
 
74
  sig = types.SimpleNamespace(
@@ -78,72 +94,105 @@ def _ctx(*, units=(), tick=1000, kills=0, lost=0, has_fact=True):
78
  cash=0,
79
  resources=0,
80
  own_buildings=[],
81
- own_building_types={"fact", "tent", "weap"} if has_fact else {"tent", "weap"},
 
 
 
 
82
  enemies_seen_ids=set(),
83
  enemy_buildings_seen_ids=set(),
84
  )
 
 
 
 
 
85
  return WinContext(
86
  signals=sig,
87
  render_state={"units_summary": list(units)},
88
  )
89
 
90
 
91
- def _alive(n, unit_type="2tnk"):
92
- return [
93
- {"cell_x": 30, "cell_y": 20, "type": unit_type, "id": str(1000 + i)}
94
- for i in range(n)
95
- ]
96
-
97
-
98
  def test_easy_predicates():
99
  c = compile_level(load_pack(PACK_PATH), "easy")
100
- # Intended: 6 kills, 3 tanks alive, fact still up, in time → WIN
101
- assert evaluate(c.win_condition, _ctx(units=_alive(3), tick=2000, kills=6, lost=0))
 
 
102
  # Kill bar unmet (only 5 kills) → not a win
103
- assert not evaluate(c.win_condition, _ctx(units=_alive(3), tick=2000, kills=5, lost=0))
 
 
 
 
 
 
104
  # Force wipe (all units dead) → fail via not own_units_gte:1
105
  assert evaluate(c.fail_condition, _ctx(units=[], tick=2000, kills=6, lost=4))
106
  # Fact destroyed → fail via not has_building:fact
107
  assert evaluate(
108
  c.fail_condition,
109
- _ctx(units=_alive(3), tick=2000, kills=6, lost=0, has_fact=False),
110
  )
111
  # Timeout with bar unmet → fail (after_ticks 5401 reachable)
112
- assert evaluate(c.fail_condition, _ctx(units=_alive(3), tick=5402, kills=5, lost=0))
113
 
114
 
115
  def test_medium_predicates():
116
  c = compile_level(load_pack(PACK_PATH), "medium")
117
- # Intended: 8 kills, 3 tanks alive, fact still up → WIN
118
- assert evaluate(c.win_condition, _ctx(units=_alive(3), tick=2000, kills=8, lost=0))
 
 
119
  # Bar unmet (only 7 kills) → not a win
120
- assert not evaluate(c.win_condition, _ctx(units=_alive(3), tick=2000, kills=7, lost=0))
121
  # Force wipe → fail
122
  assert evaluate(c.fail_condition, _ctx(units=[], tick=2000, kills=8, lost=4))
123
  # Fact destroyed → fail
124
  assert evaluate(
125
  c.fail_condition,
126
- _ctx(units=_alive(3), tick=2000, kills=8, lost=0, has_fact=False),
127
  )
128
  # Timeout → fail
129
- assert evaluate(c.fail_condition, _ctx(units=_alive(3), tick=5402, kills=7, lost=0))
130
 
131
 
132
  def test_hard_predicates():
133
  c = compile_level(load_pack(PACK_PATH), "hard")
134
- # Intended: 8 kills, 3 tanks alive, fact up → WIN
135
- assert evaluate(c.win_condition, _ctx(units=_alive(3), tick=2000, kills=8, lost=0))
 
 
136
  # Bar unmet → not a win
137
- assert not evaluate(c.win_condition, _ctx(units=_alive(3), tick=2000, kills=7, lost=0))
138
  # Force wipe → fail
139
  assert evaluate(c.fail_condition, _ctx(units=[], tick=2000, kills=8, lost=4))
140
  # Fact destroyed → fail
141
  assert evaluate(
142
  c.fail_condition,
143
- _ctx(units=_alive(3), tick=2000, kills=8, lost=0, has_fact=False),
144
  )
145
  # Timeout → fail
146
- assert evaluate(c.fail_condition, _ctx(units=_alive(3), tick=5402, kills=7, lost=0))
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
147
 
148
 
149
  def test_timeout_reachable_inside_max_turns():
@@ -198,17 +247,37 @@ def test_enemy_is_pure_infantry_no_anti_armour():
198
  assert n_e1 >= 6, f"{lvl}: needs ≥6 e1 in the enemy cluster; got {n_e1}"
199
 
200
 
201
- def test_agent_base_has_both_production_queues():
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
202
  """The composition decision is COMPOSITION, not tech-up. Each
203
- spawn group on every level must have BOTH a barracks (tent
204
- enables e1/e3) and a war factory (weap — enables 2tnk) so both
205
- counters are buildable from turn 1. The starter jeep must be
206
- present so own_units_gte:1 is satisfied from t=0 (avoiding the
207
- unit-less misfire footgun documented in CLAUDE.md)."""
208
  pack = load_pack(PACK_PATH)
209
  for lvl in ("easy", "medium", "hard"):
210
  c = compile_level(pack, lvl)
211
- # Per spawn group, the agent must have tent + weap + fact + jeep.
212
  # On non-hard levels there is exactly one (default) spawn
213
  # group (spawn_point None → 0); on hard there are two.
214
  groups: dict[int, list] = {}
@@ -219,7 +288,7 @@ def test_agent_base_has_both_production_queues():
219
  groups.setdefault(g, []).append(a.type)
220
  assert groups, f"{lvl}: no agent actors found"
221
  for g, ts in groups.items():
222
- for need in ("fact", "tent", "weap", "jeep"):
223
  assert need in ts, (
224
  f"{lvl}: spawn group {g} missing {need}; got {ts}"
225
  )
@@ -244,33 +313,137 @@ def test_starting_cash_funds_exactly_one_pure_composition():
244
  assert 25 * 100 == 2500
245
 
246
 
247
- # ── engine-driven scripted policy: intended build-2tnk smoke ────────
248
  #
249
- # The full RPS-counter bar (build-e3 LOSES / build-e1 LOSES / build-
250
- # 2tnk WINS) needs each pure-build policy to be exercised against the
251
- # live engine. The engine production / build-placement timing is
252
- # touchy enough that we keep these as smoke tests (one tier each) —
253
- # the unit-level predicate teeth above are the strict invariants.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
254
 
255
 
256
  def _stall(rs, Command):
257
- """Pure observe — kill bar never met after_ticks LOSS."""
 
 
258
  return [Command.observe()]
259
 
260
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
261
  @pytest.mark.parametrize("level", ["easy", "medium", "hard"])
262
  def test_stall_loses(level):
263
- """Stall must be a real timeout LOSS on every level (no draw):
264
- the kill bar (units_killed_gte:6 / 8 / 8) is structurally
265
- unreachable from a pure-observe policy, so after_ticks 5401
266
- fires."""
267
  pytest.importorskip("openra_train")
268
  from openra_bench.eval_core import run_level
269
 
270
  c = compile_level(load_pack(PACK_PATH), level)
271
- r = run_level(c, _stall, seed=1)
272
- assert r.outcome == "loss", (
273
- f"{level}: stall must LOSE (kill bar unmet → after_ticks); "
274
- f"got {r.outcome} after {r.turns} turns "
275
- f"(kills={r.signals.units_killed}, losses={r.signals.units_lost})"
276
- )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  rocket squad's short stand-off + low HP gets out-DPSed by the rifle
11
  mass); matching with own rifles is a 1:1 attrition that loses.
12
 
13
+ The win predicate is `unit_type_count_gte 2tnk:3 AND units_killed_gte
14
+ K AND has_building fact` — the 2tnk:3 clause is the load-bearing
15
+ anti-cheat: only a policy that ACTUALLY BUILDS the 3-tank fist can
16
+ clear the bar. (The armour-class engine fix on OpenRA-Rust main made
17
+ pre-placed agent combat units auto-fire effectively, so the starter
18
+ jeep is `stance: 0` HoldFire — it scouts, it cannot rack up kills on
19
+ its own.)
20
+
21
  The bar (per the spec):
22
+ • stall (only observe) → LOSS (no 2tnk, no kills; the
23
+ idle HoldFire jeep is hunted down force-wipe / after_ticks)
24
+ • build-only-e1 (match 1:1) → LOSS (never builds 2tnk → the
25
+ 2tnk:3 clause is structurally unmet)
26
+ • build-only-e3 (wrong counter) → LOSS (never builds 2tnk → the
27
+ 2tnk:3 clause is structurally unmet)
28
+ intended build-2tnk → WIN (3 medium tanks walk
29
+ through the e1 mass; 2tnk:3 + kill bar both latch)
30
+
31
+ Validation is scripted (no model / network) — every policy is
32
+ exercised against the live engine on every level and every hard
33
+ seed 1..4.
34
  """
35
  from __future__ import annotations
36
 
 
78
  )
79
 
80
 
81
+ def _ctx(*, tanks=0, tick=1000, kills=0, lost=0, has_fact=True, units=None):
82
+ """Synthesize a WinContext for predicate-level checks.
83
+
84
+ `tanks` synthesizes that many 2tnk units in `units_summary`;
85
+ pass `units` explicitly to model a different composition (or an
86
+ empty force).
87
+ """
88
  import types
89
 
90
  sig = types.SimpleNamespace(
 
94
  cash=0,
95
  resources=0,
96
  own_buildings=[],
97
+ own_building_types=(
98
+ {"fact", "powr", "tent", "weap", "fix"}
99
+ if has_fact
100
+ else {"powr", "tent", "weap", "fix"}
101
+ ),
102
  enemies_seen_ids=set(),
103
  enemy_buildings_seen_ids=set(),
104
  )
105
+ if units is None:
106
+ units = [
107
+ {"cell_x": 30, "cell_y": 20, "type": "2tnk", "id": str(1000 + i)}
108
+ for i in range(tanks)
109
+ ]
110
  return WinContext(
111
  signals=sig,
112
  render_state={"units_summary": list(units)},
113
  )
114
 
115
 
 
 
 
 
 
 
 
116
  def test_easy_predicates():
117
  c = compile_level(load_pack(PACK_PATH), "easy")
118
+ # Intended: 3 tanks fielded, 6 kills, fact still up, in time → WIN
119
+ assert evaluate(c.win_condition, _ctx(tanks=3, tick=2000, kills=6))
120
+ # Only 2 tanks fielded → 2tnk:3 clause unmet → not a win
121
+ assert not evaluate(c.win_condition, _ctx(tanks=2, tick=2000, kills=6))
122
  # Kill bar unmet (only 5 kills) → not a win
123
+ assert not evaluate(c.win_condition, _ctx(tanks=3, tick=2000, kills=5))
124
+ # Wrong counter: 8 e3 fielded, kill bar met, but 0 tanks → not a win
125
+ e3s = [
126
+ {"cell_x": 30, "cell_y": 20, "type": "e3", "id": str(2000 + i)}
127
+ for i in range(8)
128
+ ]
129
+ assert not evaluate(c.win_condition, _ctx(units=e3s, tick=2000, kills=6))
130
  # Force wipe (all units dead) → fail via not own_units_gte:1
131
  assert evaluate(c.fail_condition, _ctx(units=[], tick=2000, kills=6, lost=4))
132
  # Fact destroyed → fail via not has_building:fact
133
  assert evaluate(
134
  c.fail_condition,
135
+ _ctx(tanks=3, tick=2000, kills=6, has_fact=False),
136
  )
137
  # Timeout with bar unmet → fail (after_ticks 5401 reachable)
138
+ assert evaluate(c.fail_condition, _ctx(tanks=3, tick=5402, kills=5))
139
 
140
 
141
  def test_medium_predicates():
142
  c = compile_level(load_pack(PACK_PATH), "medium")
143
+ # Intended: 3 tanks, 8 kills, fact still up → WIN
144
+ assert evaluate(c.win_condition, _ctx(tanks=3, tick=2000, kills=8))
145
+ # Only 2 tanks → 2tnk:3 clause unmet → not a win
146
+ assert not evaluate(c.win_condition, _ctx(tanks=2, tick=2000, kills=8))
147
  # Bar unmet (only 7 kills) → not a win
148
+ assert not evaluate(c.win_condition, _ctx(tanks=3, tick=2000, kills=7))
149
  # Force wipe → fail
150
  assert evaluate(c.fail_condition, _ctx(units=[], tick=2000, kills=8, lost=4))
151
  # Fact destroyed → fail
152
  assert evaluate(
153
  c.fail_condition,
154
+ _ctx(tanks=3, tick=2000, kills=8, has_fact=False),
155
  )
156
  # Timeout → fail
157
+ assert evaluate(c.fail_condition, _ctx(tanks=3, tick=5402, kills=7))
158
 
159
 
160
  def test_hard_predicates():
161
  c = compile_level(load_pack(PACK_PATH), "hard")
162
+ # Intended: 3 tanks, 8 kills, fact up → WIN
163
+ assert evaluate(c.win_condition, _ctx(tanks=3, tick=2000, kills=8))
164
+ # Only 2 tanks → 2tnk:3 clause unmet → not a win
165
+ assert not evaluate(c.win_condition, _ctx(tanks=2, tick=2000, kills=8))
166
  # Bar unmet → not a win
167
+ assert not evaluate(c.win_condition, _ctx(tanks=3, tick=2000, kills=7))
168
  # Force wipe → fail
169
  assert evaluate(c.fail_condition, _ctx(units=[], tick=2000, kills=8, lost=4))
170
  # Fact destroyed → fail
171
  assert evaluate(
172
  c.fail_condition,
173
+ _ctx(tanks=3, tick=2000, kills=8, has_fact=False),
174
  )
175
  # Timeout → fail
176
+ assert evaluate(c.fail_condition, _ctx(tanks=3, tick=5402, kills=7))
177
+
178
+
179
+ def test_win_requires_three_medium_tanks():
180
+ """The load-bearing anti-cheat: every level's win predicate must
181
+ require `unit_type_count_gte 2tnk:3` — a stall / wrong-counter
182
+ policy that never builds the medium-tank fist can never win
183
+ regardless of how many kills the entrenched enemy concedes."""
184
+ pack = load_pack(PACK_PATH)
185
+ for lvl in ("easy", "medium", "hard"):
186
+ c = compile_level(pack, lvl)
187
+ # 0 tanks, kill bar trivially exceeded, fact up, in time → NOT a win.
188
+ assert not evaluate(c.win_condition, _ctx(tanks=0, tick=2000, kills=99)), (
189
+ f"{lvl}: win must require 3 fielded 2tnk — a 0-tank policy "
190
+ f"with the kill bar met must NOT win (anti-cheat clause)"
191
+ )
192
+ # 3 tanks + kill bar met + fact up + in time → WIN.
193
+ assert evaluate(c.win_condition, _ctx(tanks=3, tick=2000, kills=99)), (
194
+ f"{lvl}: 3 fielded 2tnk + kill bar met must WIN"
195
+ )
196
 
197
 
198
  def test_timeout_reachable_inside_max_turns():
 
247
  assert n_e1 >= 6, f"{lvl}: needs ≥6 e1 in the enemy cluster; got {n_e1}"
248
 
249
 
250
+ def test_starter_jeep_is_hold_fire():
251
+ """The armour-class engine fix made pre-placed agent combat units
252
+ auto-fire effectively. The starter jeep must be `stance: 0`
253
+ (HoldFire) on every spawn group of every level so a pure-observe
254
+ stall policy cannot rack up kills with it for free."""
255
+ pack = load_pack(PACK_PATH)
256
+ for lvl in ("easy", "medium", "hard"):
257
+ c = compile_level(pack, lvl)
258
+ jeeps = [
259
+ a for a in c.scenario.actors
260
+ if a.owner == "agent" and a.type == "jeep"
261
+ ]
262
+ assert jeeps, f"{lvl}: needs a starter jeep"
263
+ for j in jeeps:
264
+ assert j.stance == 0, (
265
+ f"{lvl}: starter jeep must be stance:0 (HoldFire) so a "
266
+ f"stall policy cannot score free kills; got stance={j.stance}"
267
+ )
268
+
269
+
270
+ def test_agent_base_can_build_both_counters():
271
  """The composition decision is COMPOSITION, not tech-up. Each
272
+ spawn group on every level must have the buildings that make BOTH
273
+ counters producible from turn 1: tent (e1/e3), weap+powr+fix
274
+ (2tnk the war-factory vehicle queue needs power online AND a
275
+ service depot for the medium tank to clear its prerequisites).
276
+ The starter jeep must also be present."""
277
  pack = load_pack(PACK_PATH)
278
  for lvl in ("easy", "medium", "hard"):
279
  c = compile_level(pack, lvl)
280
+ # Per spawn group, the agent must have the full base + jeep.
281
  # On non-hard levels there is exactly one (default) spawn
282
  # group (spawn_point None → 0); on hard there are two.
283
  groups: dict[int, list] = {}
 
288
  groups.setdefault(g, []).append(a.type)
289
  assert groups, f"{lvl}: no agent actors found"
290
  for g, ts in groups.items():
291
+ for need in ("fact", "powr", "tent", "weap", "fix", "jeep"):
292
  assert need in ts, (
293
  f"{lvl}: spawn group {g} missing {need}; got {ts}"
294
  )
 
313
  assert 25 * 100 == 2500
314
 
315
 
316
+ # ── engine-driven scripted policies ─────────────────────────────────
317
  #
318
+ # The full RPS-counter bar (stall LOSES / build-e3 LOSES / build-e1
319
+ # LOSES / build-2tnk WINS) is exercised against the live engine on
320
+ # every level and every hard seed 1..4.
321
+
322
+
323
+ def _own_units(rs, *, type_filter=None):
324
+ out = []
325
+ for u in (rs.get("units_summary", []) or []):
326
+ if type_filter and (u.get("type") or "").lower() not in type_filter:
327
+ continue
328
+ out.append(u)
329
+ return out
330
+
331
+
332
+ def _enemy_infantry(rs):
333
+ return [
334
+ e for e in (rs.get("enemy_summary") or [])
335
+ if (e.get("type") or "").lower() == "e1" and not e.get("is_building")
336
+ ]
337
 
338
 
339
  def _stall(rs, Command):
340
+ """Pure observe — no production. The HoldFire jeep never fires →
341
+ 0 kills, 0 tanks; the 2tnk:3 win clause never latches → LOSS
342
+ (force-wipe when the e1 swarm hunts the jeep, or after_ticks)."""
343
  return [Command.observe()]
344
 
345
 
346
+ def _make_build_policy(unit_type, cost):
347
+ """Queue `unit_type` every turn the budget allows and send each
348
+ produced unit at the enemy infantry cluster."""
349
+
350
+ def policy(rs, Command):
351
+ cmds = []
352
+ if rs.get("cash", 0) >= cost:
353
+ cmds.append(Command.build(unit_type))
354
+ fighters = _own_units(rs, type_filter={unit_type})
355
+ targets = _enemy_infantry(rs)
356
+ for u in fighters:
357
+ if targets:
358
+ cmds.append(
359
+ Command.attack_unit([str(u["id"])], str(targets[0]["id"]))
360
+ )
361
+ else:
362
+ cmds.append(Command.attack_move([str(u["id"])], 70, 20))
363
+ return cmds if cmds else [Command.observe()]
364
+
365
+ return policy
366
+
367
+
368
+ _build_e3 = _make_build_policy("e3", 300)
369
+ _build_e1 = _make_build_policy("e1", 100)
370
+ _build_2tnk = _make_build_policy("2tnk", 850)
371
+
372
+
373
  @pytest.mark.parametrize("level", ["easy", "medium", "hard"])
374
  def test_stall_loses(level):
375
+ """Stall must be a real LOSS on every level and every hard seed
376
+ (no draw): the win predicate requires `unit_type_count_gte
377
+ 2tnk:3` which a pure-observe policy can never satisfy, and the
378
+ idle HoldFire jeep is hunted down → force-wipe / after_ticks."""
379
  pytest.importorskip("openra_train")
380
  from openra_bench.eval_core import run_level
381
 
382
  c = compile_level(load_pack(PACK_PATH), level)
383
+ seeds = (1, 2, 3, 4) if level == "hard" else (1,)
384
+ for s in seeds:
385
+ r = run_level(c, _stall, seed=s)
386
+ assert r.outcome == "loss", (
387
+ f"{level} seed={s}: stall must be a real LOSS (no 2tnk → "
388
+ f"win clause unmet); got {r.outcome} after {r.turns} turns "
389
+ f"(kills={r.signals.units_killed}, lost={r.signals.units_lost})"
390
+ )
391
+
392
+
393
+ @pytest.mark.parametrize("level", ["easy", "medium", "hard"])
394
+ def test_build_e3_wrong_counter_loses(level):
395
+ """Mass anti-tank rockets are the WRONG counter — and crucially
396
+ the policy never builds 2tnk, so the `unit_type_count_gte 2tnk:3`
397
+ win clause is structurally unmet → real LOSS on every level and
398
+ every hard seed."""
399
+ pytest.importorskip("openra_train")
400
+ from openra_bench.eval_core import run_level
401
+
402
+ c = compile_level(load_pack(PACK_PATH), level)
403
+ seeds = (1, 2, 3, 4) if level == "hard" else (1,)
404
+ for s in seeds:
405
+ r = run_level(c, _build_e3, seed=s)
406
+ assert r.outcome == "loss", (
407
+ f"{level} seed={s}: build-e3 wrong-counter must LOSE (no "
408
+ f"2tnk → win clause unmet); got {r.outcome} "
409
+ f"(kills={r.signals.units_killed}, lost={r.signals.units_lost})"
410
+ )
411
+
412
+
413
+ @pytest.mark.parametrize("level", ["easy", "medium", "hard"])
414
+ def test_build_e1_wrong_counter_loses(level):
415
+ """Matching the enemy 1:1 with own rifles never builds 2tnk, so
416
+ the `unit_type_count_gte 2tnk:3` win clause is structurally unmet
417
+ → real LOSS on every level and every hard seed."""
418
+ pytest.importorskip("openra_train")
419
+ from openra_bench.eval_core import run_level
420
+
421
+ c = compile_level(load_pack(PACK_PATH), level)
422
+ seeds = (1, 2, 3, 4) if level == "hard" else (1,)
423
+ for s in seeds:
424
+ r = run_level(c, _build_e1, seed=s)
425
+ assert r.outcome == "loss", (
426
+ f"{level} seed={s}: build-e1 wrong-counter must LOSE (no "
427
+ f"2tnk → win clause unmet); got {r.outcome} "
428
+ f"(kills={r.signals.units_killed}, lost={r.signals.units_lost})"
429
+ )
430
+
431
+
432
+ @pytest.mark.parametrize("level", ["easy", "medium", "hard"])
433
+ def test_intended_build_2tnk_wins(level):
434
+ """The RPS counter pick: build 3× 2tnk (medium tanks) and engage.
435
+ Heavy armour walks through the e1 mass — the `2tnk:3` clause and
436
+ the kill bar both latch. Wins on every level and every hard
437
+ seed 1..4."""
438
+ pytest.importorskip("openra_train")
439
+ from openra_bench.eval_core import run_level
440
+
441
+ c = compile_level(load_pack(PACK_PATH), level)
442
+ seeds = (1, 2, 3, 4) if level == "hard" else (1,)
443
+ for s in seeds:
444
+ r = run_level(c, _build_2tnk, seed=s)
445
+ assert r.outcome == "win", (
446
+ f"{level} seed={s}: intended build-2tnk must WIN; got "
447
+ f"{r.outcome} (kills={r.signals.units_killed}, "
448
+ f"lost={r.signals.units_lost})"
449
+ )