Spaces:

qpluslab
/

OpenRA-Bench

Running

yxc20098 commited on May 21

Commit

ec99be9

1 Parent(s): c211b47

fix(scenario): combat-vehicle-vs-infantry-counter — restore no-cheat bar after armor-class engine fix

The OpenRA-Rust armor-class engine fix (4d91fe0) made pre-placed agent
combat units auto-fire effectively. The starter scout jeep then racked
up kills on its own, so a pure-observe `stall` policy reached the kill
bar and WON — violating the no-cheat bar.

Restore the bar:
- Starter jeep set to `stance: 0` (HoldFire) on every level / spawn
group — it scouts, it never auto-fires, so a stall policy scores
zero kills.
- Win predicate gains `unit_type_count_gte 2tnk:3` — the agent must
ACTUALLY field the 3-tank fist. Stall and wrong-counter (e3 / e1)
policies never build 2tnk → win clause structurally unmet.
- Fail clause gains `not own_units_gte:1` so a stalled-and-overrun
episode is a real LOSS, not an engine auto-`done` DRAW (the agent
starts with the jeep so the unit-less turn-1 mis-fire footgun does
not apply).
- Add `powr` + `fix` to every base — the war-factory vehicle queue
needs power online and a service depot for `2tnk` to clear its
prerequisites, so the tank counter is producible from turn 1.

Validated via scripted policies on 3 levels x seeds 1..4:
stall / build-e3 / build-e1 LOSE everywhere (real LOSS, no DRAW);
intended build-2tnk WINS everywhere.

Files changed (2) hide show

openra_bench/scenarios/packs/combat-vehicle-vs-infantry-counter.yaml +94 -58
tests/test_combat_vehicle_vs_infantry_counter.py +232 -59

openra_bench/scenarios/packs/combat-vehicle-vs-infantry-counter.yaml CHANGED Viewed

@@ -21,33 +21,43 @@
 # Pre-placed (each spawn group): agent `fact` + `tent` (infantry
 # trainer; enables e1 and e3) + `weap` (vehicle trainer; enables
 # 2tnk) + a single starter `jeep` (allies scout vehicle; visibility
-# over the enemy composition + satisfies own_units_gte:1 from turn 0
-# so the unit-less misfire footgun in CLAUDE.md doesn't trip on the
-# fail clause). The tent+weap pair makes BOTH counter compositions
-# buildable from turn 1 — the decision the model faces is
-# composition, not tech-up.
 #
 # Discrimination on EASY / MEDIUM (single enemy cluster, both
-# compositions are buildable from t=1):
-#   • stall (only observe): never reaches the kill bar → after_ticks
-#     LOSS.
-#   • build-only-e1 (match enemy 1:1 with cheap rifles): 25× e1 vs
-#     the enemy e1 mass is a 1:1 attrition; the enemy holds entrenched
-#     while the agent has to MOVE INTO range — moving infantry get
-#     shot first; attrition LOSES, the kill bar is unmet OR the
-#     starter jeep + survivors evaporate before the cap.
-#   • build-only-e3 (anti-armour rockets against infantry — the wrong
-#     counter): 8× e3 ($2400) costs $300/unit to put out the same
-#     anti-infantry DPS that 24× e1 ($2400) would; the rocket squad
-#     is OUT-SHOT by the e1 mass on raw infantry-vs-infantry numbers
-#     (e3 hp45 < e1 hp50; e3 rng4 < tank rng4.75; e3 anti-armour
-#     warhead is overkill on infantry but the same slow projectiles
-#     under-perform vs small targets) → kill bar unmet OR survivors
-#     under N → LOSS.
-#   • intended build-2tnk (3× medium tanks @ $850 = $2550): the right
-#     counter — heavy armour soaks small-arms fire, tank dps22 + rng
-#     4.75 + sight 6c walks through a static e1 mass without losing
-#     a tank → kill bar met, all 3 tanks alive, fact intact → WIN.
 #
 # Discrimination on HARD (+1 axis: 2 agent spawn_point groups):
 #   • The agent base seed-rotates between NORTH (y=12) and SOUTH
@@ -70,15 +80,31 @@
 #   • `after_ticks` fail clauses reachable within max_turns
 #     (within_ticks 5400 ≤ 5403 at max_turns 60): a staller hits
 #     after_ticks 5401 and LOSES, never draws.
-#   • Starting `jeep` ensures own_units_gte:1 is met from t=0 so
-#     the unit-less mis-fire (CLAUDE.md) doesn't trip the fail
-#     clause before the model has built anything.
 #   • Spawn-group footgun (CLAUDE.md oramap): on hard, ANY agent
 #     actor with `spawn_point` filters OUT every agent actor without
-#     one — so BOTH bases (fact + tent + weap + jeep) are
-#     duplicated under BOTH spawn_point groups at their respective
-#     coords. The single far enemy fact at (124,20) and the centre
-#     enemy cluster have NO spawn_point and place on every seed.
 #   • starting_cash $2550 = exactly 3× 2tnk; 8× e3 = $2400 (cash
 #     unspent); 25× e1 = $2500. Neither rocket nor rifle mass is
 #     dominant against the entrenched enemy.
@@ -130,13 +156,15 @@ base:
   planning: true
   termination: {max_ticks: 8000}
   # Default base (overridden on hard). The starter jeep gives turn-1
-  # scouting (sight 7c) AND satisfies own_units_gte:1 from t=0 so the
-  # fail clause's not own_units_gte:1 doesn't mis-fire on turn 1.
   actors:
     - {type: fact, owner: agent, position: [10, 20]}
     - {type: tent, owner: agent, position: [14, 18]}
     - {type: weap, owner: agent, position: [14, 22]}
-    - {type: jeep, owner: agent, position: [12, 20]}
     # Far persistent enemy marker — prevents engine auto-done when
     # the live infantry cluster falls so the win/fail evaluator sees
     # the terminal frame.
@@ -146,9 +174,9 @@ levels:
   # ── EASY ─────────────────────────────────────────────────────────
   # Bare counter-selection skill: a small visible enemy infantry
   # cluster (8× e1) on the centre lane. Cash $2550 funds 3× 2tnk
-  # (the right counter) cleanly. Loss cap is generous via the
-  # own_units_gte:1 floor only (any survivor counts). Stall: kill bar
-  # unmet → after_ticks LOSS.
   easy:
     description: >
       Cash $2550. The enemy is a small cluster of 8 rifle infantry
@@ -160,16 +188,18 @@ levels:
       armoured fist (2tnk @ $850, exactly 3 for $2550) OR rifle
       infantry (e1 @ $100, up to 25). Scout the enemy with the jeep,
       pick the correct hard counter, and commit the whole budget.
-      Win when 6 enemy units are killed AND ≥1 of your units are
-      alive AND your construction yard still stands, before tick
-      5400. Stalling, picking the wrong counter, or letting your
-      force evaporate fails the bar.
     overrides:
       actors:
         - {type: fact, owner: agent, position: [10, 20]}
         - {type: tent, owner: agent, position: [14, 18]}
         - {type: weap, owner: agent, position: [14, 22]}
-        - {type: jeep, owner: agent, position: [12, 20]}
         # 8× e1 entrenched cluster — stance:3 (attack anything in
         # range). Spread over a 3×3 grid centred on (70,20) so the
         # mass can be engaged from any approach axis.
@@ -184,15 +214,15 @@ levels:
         - {type: fact, owner: enemy, position: [124, 20]}
     win_condition:
       all_of:
         - {units_killed_gte: 6}
-        - {own_units_gte: 1}
         - {has_building: fact}
         - {within_ticks: 5400}
     fail_condition:
       any_of:
         - {after_ticks: 5401}
-        - {not: {own_units_gte: 1}}
         - {not: {has_building: fact}}
     max_turns: 60
   # ── MEDIUM ───────────────────────────────────────────────────────
@@ -214,15 +244,17 @@ levels:
       through small-arms fire. Mass rockets (e3 @ $300) waste cost-
       per-effect against soft targets and get out-DPSed by the rifle
       mass on attrition; matching with own rifles (e1 @ $100) is a
-      1:1 trade with no advantage. Win when 8 enemy units are killed
-      AND ≥1 of your units are alive AND your fact still stands,
-      before tick 5400.
     overrides:
       actors:
         - {type: fact, owner: agent, position: [10, 20]}
         - {type: tent, owner: agent, position: [14, 18]}
         - {type: weap, owner: agent, position: [14, 22]}
-        - {type: jeep, owner: agent, position: [12, 20]}
         # 12× e1 entrenched cluster — deeper centre mass.
         - {type: e1, owner: enemy, position: [70, 17], stance: 3}
         - {type: e1, owner: enemy, position: [70, 18], stance: 3}
@@ -239,15 +271,15 @@ levels:
         - {type: fact, owner: enemy, position: [124, 20]}
     win_condition:
       all_of:
         - {units_killed_gte: 8}
-        - {own_units_gte: 1}
         - {has_building: fact}
         - {within_ticks: 5400}
     fail_condition:
       any_of:
         - {after_ticks: 5401}
-        - {not: {own_units_gte: 1}}
         - {not: {has_building: fact}}
     max_turns: 60
   # ── HARD ─────────────────────────────────────────────────────────
@@ -271,21 +303,25 @@ levels:
       medium tanks (2tnk) walk through small-arms fire; mass rockets
       (e3) waste cost-per-effect against soft targets and get out-
       DPSed on attrition; matching with own rifles is a 1:1 trade
-      with no advantage. Win when 8 enemy units are killed AND ≥1 of
-      your units are alive AND your fact still stands, before tick
-      5400.
     overrides:
       actors:
         # ── AGENT spawn 0 — NORTH base (y=12) ─────────────────────
         - {type: fact, owner: agent, position: [10, 12], spawn_point: 0}
         - {type: tent, owner: agent, position: [14, 10], spawn_point: 0}
         - {type: weap, owner: agent, position: [14, 14], spawn_point: 0}
-        - {type: jeep, owner: agent, position: [12, 12], spawn_point: 0}
         # ── AGENT spawn 1 — SOUTH base (y=28) ─────────────────────
         - {type: fact, owner: agent, position: [10, 28], spawn_point: 1}
         - {type: tent, owner: agent, position: [14, 26], spawn_point: 1}
         - {type: weap, owner: agent, position: [14, 30], spawn_point: 1}
-        - {type: jeep, owner: agent, position: [12, 28], spawn_point: 1}
         # ── CENTRE ENEMY CLUSTER — pure infantry (always places) ──
         # 12× e1 entrenched on the central lane. Per CLAUDE.md, enemy
         # actors do not honour spawn_point — this cluster lands on
@@ -307,13 +343,13 @@ levels:
         - {type: fact, owner: enemy, position: [124, 20]}
     win_condition:
       all_of:
         - {units_killed_gte: 8}
-        - {own_units_gte: 1}
         - {has_building: fact}
         - {within_ticks: 5400}
     fail_condition:
       any_of:
         - {after_ticks: 5401}
-        - {not: {own_units_gte: 1}}
         - {not: {has_building: fact}}
     max_turns: 60

 # Pre-placed (each spawn group): agent `fact` + `tent` (infantry
 # trainer; enables e1 and e3) + `weap` (vehicle trainer; enables
 # 2tnk) + a single starter `jeep` (allies scout vehicle; visibility
+# over the enemy composition). The starter jeep is `stance: 0`
+# (HoldFire) so it NEVER auto-fires — it scouts, it does not fight.
+# This is the load-bearing anti-stall guard (CLAUDE.md: an armour-
+# class engine fix made pre-placed agent combat units auto-fire
+# effectively, so a HoldFire jeep cannot rack up kills on its own).
+# The tent+weap pair makes BOTH counter compositions buildable from
+# turn 1 — the decision the model faces is composition, not tech-up.
+#
+# The win predicate requires `unit_type_count_gte: 2tnk:3` — the
+# agent must have ACTUALLY FIELDED 3 medium tanks (the right
+# counter). A stall policy never builds tanks; a wrong-counter
+# policy (e3 / e1) never builds 2tnk — neither can satisfy the win
+# regardless of how many kills the entrenched enemy concedes. Only
+# building + commanding the 3-tank fist clears the bar.
 #
 # Discrimination on EASY / MEDIUM (single enemy cluster, both
+# compositions are buildable from t=1). The win predicate is
+# `unit_type_count_gte 2tnk:3 AND units_killed_gte K AND
+# has_building fact` — the 2tnk:3 clause means ONLY a policy that
+# actually builds the 3-tank fist can clear the bar:
+#   • stall (only observe): builds nothing; the HoldFire jeep never
+#     fires → 0 kills, 0 tanks. The entrenched e1 swarm eventually
+#     hunts down the idle jeep → force-wipe (not own_units_gte:1)
+#     LOSS (or after_ticks LOSS if the jeep is never reached).
+#   • build-only-e1 (match enemy 1:1 with cheap rifles): never
+#     builds 2tnk → the 2tnk:3 clause is structurally unmet; even
+#     a full kill count cannot win → after_ticks / force-wipe LOSS.
+#   • build-only-e3 (anti-armour rockets against infantry — the
+#     wrong counter): never builds 2tnk → the 2tnk:3 clause is
+#     structurally unmet; the rocket squad is also OUT-SHOT by the
+#     e1 mass on raw infantry-vs-infantry numbers (e3 hp45 < e1
+#     hp50; slow anti-armour projectiles under-perform vs small
+#     targets) → after_ticks / force-wipe LOSS.
+#   • intended build-2tnk (3× medium tanks @ $850 = $2550): the
+#     right counter — heavy armour soaks small-arms fire, tank
+#     dps22 + rng 4.75 + sight 6c walks through a static e1 mass.
+#     2tnk:3 clause met, kill bar met, fact intact → WIN.
 #
 # Discrimination on HARD (+1 axis: 2 agent spawn_point groups):
 #   • The agent base seed-rotates between NORTH (y=12) and SOUTH
 #   • `after_ticks` fail clauses reachable within max_turns
 #     (within_ticks 5400 ≤ 5403 at max_turns 60): a staller hits
 #     after_ticks 5401 and LOSES, never draws.
+#   • Starting `jeep` is `stance: 0` (HoldFire) — it scouts but
+#     never auto-fires, so a pure-observe stall policy cannot score
+#     kills with it. The win predicate's `unit_type_count_gte`
+#     2tnk:3 is the real anti-cheat: only the agent that builds the
+#     3-tank fist can clear the bar. The fail clause is
+#     `after_ticks | not has_building:fact | not own_units_gte:1` —
+#     the agent starts WITH the jeep so `own_units_gte:1` is
+#     satisfied from t=0 (the unit-less turn-1 mis-fire footgun in
+#     CLAUDE.md does not apply); the force-wipe clause turns a
+#     stalled-and-overrun episode into a real LOSS instead of an
+#     engine auto-`done` DRAW when the entrenched e1 swarm hunts
+#     down the idle HoldFire jeep.
+#   • The vehicle queue (`weap` war factory) needs `powr` online
+#     AND a `fix` service depot present for `2tnk` to clear its
+#     prerequisites — both are pre-placed in every base so the
+#     tank counter is producible from turn 1 (the decision is
+#     composition, not tech-up). `tent` produces e1/e3 with `powr`
+#     alone.
 #   • Spawn-group footgun (CLAUDE.md oramap): on hard, ANY agent
 #     actor with `spawn_point` filters OUT every agent actor without
+#     one — so BOTH bases (fact + powr + tent + weap + fix + jeep)
+#     are duplicated under BOTH spawn_point groups at their
+#     respective coords. The single far enemy fact at (124,20) and
+#     the centre enemy cluster have NO spawn_point and place on
+#     every seed.
 #   • starting_cash $2550 = exactly 3× 2tnk; 8× e3 = $2400 (cash
 #     unspent); 25× e1 = $2500. Neither rocket nor rifle mass is
 #     dominant against the entrenched enemy.
   planning: true
   termination: {max_ticks: 8000}
   # Default base (overridden on hard). The starter jeep gives turn-1
+  # scouting (sight 7c). It is `stance: 0` (HoldFire) so it never
+  # auto-fires — a stall policy cannot score kills with it.
   actors:
     - {type: fact, owner: agent, position: [10, 20]}
+    - {type: powr, owner: agent, position: [10, 16]}
     - {type: tent, owner: agent, position: [14, 18]}
     - {type: weap, owner: agent, position: [14, 22]}
+    - {type: fix,  owner: agent, position: [18, 20]}
+    - {type: jeep, owner: agent, position: [12, 20], stance: 0}
     # Far persistent enemy marker — prevents engine auto-done when
     # the live infantry cluster falls so the win/fail evaluator sees
     # the terminal frame.
   # ── EASY ─────────────────────────────────────────────────────────
   # Bare counter-selection skill: a small visible enemy infantry
   # cluster (8× e1) on the centre lane. Cash $2550 funds 3× 2tnk
+  # (the right counter) cleanly. The win requires fielding 3× 2tnk
+  # AND 6 kills; stall / wrong-counter never field the tanks → kill
+  # bar + 2tnk:3 clause unmet → after_ticks LOSS.
   easy:
     description: >
       Cash $2550. The enemy is a small cluster of 8 rifle infantry
       armoured fist (2tnk @ $850, exactly 3 for $2550) OR rifle
       infantry (e1 @ $100, up to 25). Scout the enemy with the jeep,
       pick the correct hard counter, and commit the whole budget.
+      Win when you have fielded 3 medium tanks (2tnk) AND 6 enemy
+      units are killed AND your construction yard still stands,
+      before tick 5400. Stalling or picking the wrong counter never
+      fields the 3-tank fist and fails the bar.
     overrides:
       actors:
         - {type: fact, owner: agent, position: [10, 20]}
+        - {type: powr, owner: agent, position: [10, 16]}
         - {type: tent, owner: agent, position: [14, 18]}
         - {type: weap, owner: agent, position: [14, 22]}
+        - {type: fix,  owner: agent, position: [18, 20]}
+        - {type: jeep, owner: agent, position: [12, 20], stance: 0}
         # 8× e1 entrenched cluster — stance:3 (attack anything in
         # range). Spread over a 3×3 grid centred on (70,20) so the
         # mass can be engaged from any approach axis.
         - {type: fact, owner: enemy, position: [124, 20]}
     win_condition:
       all_of:
+        - {unit_type_count_gte: {type: 2tnk, n: 3}}
         - {units_killed_gte: 6}
         - {has_building: fact}
         - {within_ticks: 5400}
     fail_condition:
       any_of:
         - {after_ticks: 5401}
         - {not: {has_building: fact}}
+        - {not: {own_units_gte: 1}}
     max_turns: 60
   # ── MEDIUM ───────────────────────────────────────────────────────
       through small-arms fire. Mass rockets (e3 @ $300) waste cost-
       per-effect against soft targets and get out-DPSed by the rifle
       mass on attrition; matching with own rifles (e1 @ $100) is a
+      1:1 trade with no advantage. Win when you have fielded 3
+      medium tanks (2tnk) AND 8 enemy units are killed AND your fact
+      still stands, before tick 5400.
     overrides:
       actors:
         - {type: fact, owner: agent, position: [10, 20]}
+        - {type: powr, owner: agent, position: [10, 16]}
         - {type: tent, owner: agent, position: [14, 18]}
         - {type: weap, owner: agent, position: [14, 22]}
+        - {type: fix,  owner: agent, position: [18, 20]}
+        - {type: jeep, owner: agent, position: [12, 20], stance: 0}
         # 12× e1 entrenched cluster — deeper centre mass.
         - {type: e1, owner: enemy, position: [70, 17], stance: 3}
         - {type: e1, owner: enemy, position: [70, 18], stance: 3}
         - {type: fact, owner: enemy, position: [124, 20]}
     win_condition:
       all_of:
+        - {unit_type_count_gte: {type: 2tnk, n: 3}}
         - {units_killed_gte: 8}
         - {has_building: fact}
         - {within_ticks: 5400}
     fail_condition:
       any_of:
         - {after_ticks: 5401}
         - {not: {has_building: fact}}
+        - {not: {own_units_gte: 1}}
     max_turns: 60
   # ── HARD ─────────────────────────────────────────────────────────
       medium tanks (2tnk) walk through small-arms fire; mass rockets
       (e3) waste cost-per-effect against soft targets and get out-
       DPSed on attrition; matching with own rifles is a 1:1 trade
+      with no advantage. Win when you have fielded 3 medium tanks
+      (2tnk) AND 8 enemy units are killed AND your fact still
+      stands, before tick 5400.
     overrides:
       actors:
         # ── AGENT spawn 0 — NORTH base (y=12) ─────────────────────
         - {type: fact, owner: agent, position: [10, 12], spawn_point: 0}
+        - {type: powr, owner: agent, position: [10, 8], spawn_point: 0}
         - {type: tent, owner: agent, position: [14, 10], spawn_point: 0}
         - {type: weap, owner: agent, position: [14, 14], spawn_point: 0}
+        - {type: fix,  owner: agent, position: [18, 12], spawn_point: 0}
+        - {type: jeep, owner: agent, position: [12, 12], stance: 0, spawn_point: 0}
         # ── AGENT spawn 1 — SOUTH base (y=28) ─────────────────────
         - {type: fact, owner: agent, position: [10, 28], spawn_point: 1}
+        - {type: powr, owner: agent, position: [10, 32], spawn_point: 1}
         - {type: tent, owner: agent, position: [14, 26], spawn_point: 1}
         - {type: weap, owner: agent, position: [14, 30], spawn_point: 1}
+        - {type: fix,  owner: agent, position: [18, 28], spawn_point: 1}
+        - {type: jeep, owner: agent, position: [12, 28], stance: 0, spawn_point: 1}
         # ── CENTRE ENEMY CLUSTER — pure infantry (always places) ──
         # 12× e1 entrenched on the central lane. Per CLAUDE.md, enemy
         # actors do not honour spawn_point — this cluster lands on
         - {type: fact, owner: enemy, position: [124, 20]}
     win_condition:
       all_of:
+        - {unit_type_count_gte: {type: 2tnk, n: 3}}
         - {units_killed_gte: 8}
         - {has_building: fact}
         - {within_ticks: 5400}
     fail_condition:
       any_of:
         - {after_ticks: 5401}
         - {not: {has_building: fact}}
+        - {not: {own_units_gte: 1}}
     max_turns: 60

tests/test_combat_vehicle_vs_infantry_counter.py CHANGED Viewed

@@ -10,16 +10,27 @@ tank ordnance against soft targets — cost-per-effect waste + the
 rocket squad's short stand-off + low HP gets out-DPSed by the rifle
 mass); matching with own rifles is a 1:1 attrition that loses.
 The bar (per the spec):
-  • stall (only observe)            → LOSS (kill bar unmet → after_ticks)
-  • build-only-e1 (match 1:1)       → LOSS (attrition; movers shot first)
-  • build-only-e3 (wrong counter)   → LOSS (cost-per-effect + close-range)
-  • intended build-2tnk             → WIN (heavy armour walks through e1)
-Validation is split between unit-level predicate checks (no engine)
-and engine-driven scripted policies. The unit-level checks are the
-load-bearing assertions for this commit (the engine-driven policies
-are documented as smoke-only and parametrised over the hard seeds).
 """
 from __future__ import annotations
@@ -67,8 +78,13 @@ def test_pack_compiles_and_meta_fields_populated():
         )
-def _ctx(*, units=(), tick=1000, kills=0, lost=0, has_fact=True):
-    """Synthesize a WinContext for predicate-level checks."""
     import types
     sig = types.SimpleNamespace(
@@ -78,72 +94,105 @@ def _ctx(*, units=(), tick=1000, kills=0, lost=0, has_fact=True):
         cash=0,
         resources=0,
         own_buildings=[],
-        own_building_types={"fact", "tent", "weap"} if has_fact else {"tent", "weap"},
         enemies_seen_ids=set(),
         enemy_buildings_seen_ids=set(),
     )
     return WinContext(
         signals=sig,
         render_state={"units_summary": list(units)},
     )
-def _alive(n, unit_type="2tnk"):
-    return [
-        {"cell_x": 30, "cell_y": 20, "type": unit_type, "id": str(1000 + i)}
-        for i in range(n)
-    ]
 def test_easy_predicates():
     c = compile_level(load_pack(PACK_PATH), "easy")
-    # Intended: 6 kills, 3 tanks alive, fact still up, in time → WIN
-    assert evaluate(c.win_condition, _ctx(units=_alive(3), tick=2000, kills=6, lost=0))
     # Kill bar unmet (only 5 kills) → not a win
-    assert not evaluate(c.win_condition, _ctx(units=_alive(3), tick=2000, kills=5, lost=0))
     # Force wipe (all units dead) → fail via not own_units_gte:1
     assert evaluate(c.fail_condition, _ctx(units=[], tick=2000, kills=6, lost=4))
     # Fact destroyed → fail via not has_building:fact
     assert evaluate(
         c.fail_condition,
-        _ctx(units=_alive(3), tick=2000, kills=6, lost=0, has_fact=False),
     )
     # Timeout with bar unmet → fail (after_ticks 5401 reachable)
-    assert evaluate(c.fail_condition, _ctx(units=_alive(3), tick=5402, kills=5, lost=0))
 def test_medium_predicates():
     c = compile_level(load_pack(PACK_PATH), "medium")
-    # Intended: 8 kills, 3 tanks alive, fact still up → WIN
-    assert evaluate(c.win_condition, _ctx(units=_alive(3), tick=2000, kills=8, lost=0))
     # Bar unmet (only 7 kills) → not a win
-    assert not evaluate(c.win_condition, _ctx(units=_alive(3), tick=2000, kills=7, lost=0))
     # Force wipe → fail
     assert evaluate(c.fail_condition, _ctx(units=[], tick=2000, kills=8, lost=4))
     # Fact destroyed → fail
     assert evaluate(
         c.fail_condition,
-        _ctx(units=_alive(3), tick=2000, kills=8, lost=0, has_fact=False),
     )
     # Timeout → fail
-    assert evaluate(c.fail_condition, _ctx(units=_alive(3), tick=5402, kills=7, lost=0))
 def test_hard_predicates():
     c = compile_level(load_pack(PACK_PATH), "hard")
-    # Intended: 8 kills, 3 tanks alive, fact up → WIN
-    assert evaluate(c.win_condition, _ctx(units=_alive(3), tick=2000, kills=8, lost=0))
     # Bar unmet → not a win
-    assert not evaluate(c.win_condition, _ctx(units=_alive(3), tick=2000, kills=7, lost=0))
     # Force wipe → fail
     assert evaluate(c.fail_condition, _ctx(units=[], tick=2000, kills=8, lost=4))
     # Fact destroyed → fail
     assert evaluate(
         c.fail_condition,
-        _ctx(units=_alive(3), tick=2000, kills=8, lost=0, has_fact=False),
     )
     # Timeout → fail
-    assert evaluate(c.fail_condition, _ctx(units=_alive(3), tick=5402, kills=7, lost=0))
 def test_timeout_reachable_inside_max_turns():
@@ -198,17 +247,37 @@ def test_enemy_is_pure_infantry_no_anti_armour():
         assert n_e1 >= 6, f"{lvl}: needs ≥6 e1 in the enemy cluster; got {n_e1}"
-def test_agent_base_has_both_production_queues():
     """The composition decision is COMPOSITION, not tech-up. Each
-    spawn group on every level must have BOTH a barracks (tent —
-    enables e1/e3) and a war factory (weap — enables 2tnk) so both
-    counters are buildable from turn 1. The starter jeep must be
-    present so own_units_gte:1 is satisfied from t=0 (avoiding the
-    unit-less misfire footgun documented in CLAUDE.md)."""
     pack = load_pack(PACK_PATH)
     for lvl in ("easy", "medium", "hard"):
         c = compile_level(pack, lvl)
-        # Per spawn group, the agent must have tent + weap + fact + jeep.
         # On non-hard levels there is exactly one (default) spawn
         # group (spawn_point None → 0); on hard there are two.
         groups: dict[int, list] = {}
@@ -219,7 +288,7 @@ def test_agent_base_has_both_production_queues():
             groups.setdefault(g, []).append(a.type)
         assert groups, f"{lvl}: no agent actors found"
         for g, ts in groups.items():
-            for need in ("fact", "tent", "weap", "jeep"):
                 assert need in ts, (
                     f"{lvl}: spawn group {g} missing {need}; got {ts}"
                 )
@@ -244,33 +313,137 @@ def test_starting_cash_funds_exactly_one_pure_composition():
     assert 25 * 100 == 2500
-# ── engine-driven scripted policy: intended build-2tnk smoke ────────
 #
-# The full RPS-counter bar (build-e3 LOSES / build-e1 LOSES / build-
-# 2tnk WINS) needs each pure-build policy to be exercised against the
-# live engine. The engine production / build-placement timing is
-# touchy enough that we keep these as smoke tests (one tier each) —
-# the unit-level predicate teeth above are the strict invariants.
 def _stall(rs, Command):
-    """Pure observe — kill bar never met → after_ticks LOSS."""
     return [Command.observe()]
 @pytest.mark.parametrize("level", ["easy", "medium", "hard"])
 def test_stall_loses(level):
-    """Stall must be a real timeout LOSS on every level (no draw):
-    the kill bar (units_killed_gte:6 / 8 / 8) is structurally
-    unreachable from a pure-observe policy, so after_ticks 5401
-    fires."""
     pytest.importorskip("openra_train")
     from openra_bench.eval_core import run_level
     c = compile_level(load_pack(PACK_PATH), level)
-    r = run_level(c, _stall, seed=1)
-    assert r.outcome == "loss", (
-        f"{level}: stall must LOSE (kill bar unmet → after_ticks); "
-        f"got {r.outcome} after {r.turns} turns "
-        f"(kills={r.signals.units_killed}, losses={r.signals.units_lost})"
-    )

 rocket squad's short stand-off + low HP gets out-DPSed by the rifle
 mass); matching with own rifles is a 1:1 attrition that loses.
+The win predicate is `unit_type_count_gte 2tnk:3 AND units_killed_gte
+K AND has_building fact` — the 2tnk:3 clause is the load-bearing
+anti-cheat: only a policy that ACTUALLY BUILDS the 3-tank fist can
+clear the bar. (The armour-class engine fix on OpenRA-Rust main made
+pre-placed agent combat units auto-fire effectively, so the starter
+jeep is `stance: 0` HoldFire — it scouts, it cannot rack up kills on
+its own.)
 The bar (per the spec):
+  • stall (only observe)            → LOSS (no 2tnk, no kills; the
+    idle HoldFire jeep is hunted down → force-wipe / after_ticks)
+  • build-only-e1 (match 1:1)       → LOSS (never builds 2tnk → the
+    2tnk:3 clause is structurally unmet)
+  • build-only-e3 (wrong counter)   → LOSS (never builds 2tnk → the
+    2tnk:3 clause is structurally unmet)
+  • intended build-2tnk             → WIN (3 medium tanks walk
+    through the e1 mass; 2tnk:3 + kill bar both latch)
+Validation is scripted (no model / network) — every policy is
+exercised against the live engine on every level and every hard
+seed 1..4.
 """
 from __future__ import annotations
         )
+def _ctx(*, tanks=0, tick=1000, kills=0, lost=0, has_fact=True, units=None):
+    """Synthesize a WinContext for predicate-level checks.
+    `tanks` synthesizes that many 2tnk units in `units_summary`;
+    pass `units` explicitly to model a different composition (or an
+    empty force).
+    """
     import types
     sig = types.SimpleNamespace(
         cash=0,
         resources=0,
         own_buildings=[],
+        own_building_types=(
+            {"fact", "powr", "tent", "weap", "fix"}
+            if has_fact
+            else {"powr", "tent", "weap", "fix"}
+        ),
         enemies_seen_ids=set(),
         enemy_buildings_seen_ids=set(),
     )
+    if units is None:
+        units = [
+            {"cell_x": 30, "cell_y": 20, "type": "2tnk", "id": str(1000 + i)}
+            for i in range(tanks)
+        ]
     return WinContext(
         signals=sig,
         render_state={"units_summary": list(units)},
     )
 def test_easy_predicates():
     c = compile_level(load_pack(PACK_PATH), "easy")
+    # Intended: 3 tanks fielded, 6 kills, fact still up, in time → WIN
+    assert evaluate(c.win_condition, _ctx(tanks=3, tick=2000, kills=6))
+    # Only 2 tanks fielded → 2tnk:3 clause unmet → not a win
+    assert not evaluate(c.win_condition, _ctx(tanks=2, tick=2000, kills=6))
     # Kill bar unmet (only 5 kills) → not a win
+    assert not evaluate(c.win_condition, _ctx(tanks=3, tick=2000, kills=5))
+    # Wrong counter: 8 e3 fielded, kill bar met, but 0 tanks → not a win
+    e3s = [
+        {"cell_x": 30, "cell_y": 20, "type": "e3", "id": str(2000 + i)}
+        for i in range(8)
+    ]
+    assert not evaluate(c.win_condition, _ctx(units=e3s, tick=2000, kills=6))
     # Force wipe (all units dead) → fail via not own_units_gte:1
     assert evaluate(c.fail_condition, _ctx(units=[], tick=2000, kills=6, lost=4))
     # Fact destroyed → fail via not has_building:fact
     assert evaluate(
         c.fail_condition,
+        _ctx(tanks=3, tick=2000, kills=6, has_fact=False),
     )
     # Timeout with bar unmet → fail (after_ticks 5401 reachable)
+    assert evaluate(c.fail_condition, _ctx(tanks=3, tick=5402, kills=5))
 def test_medium_predicates():
     c = compile_level(load_pack(PACK_PATH), "medium")
+    # Intended: 3 tanks, 8 kills, fact still up → WIN
+    assert evaluate(c.win_condition, _ctx(tanks=3, tick=2000, kills=8))
+    # Only 2 tanks → 2tnk:3 clause unmet → not a win
+    assert not evaluate(c.win_condition, _ctx(tanks=2, tick=2000, kills=8))
     # Bar unmet (only 7 kills) → not a win
+    assert not evaluate(c.win_condition, _ctx(tanks=3, tick=2000, kills=7))
     # Force wipe → fail
     assert evaluate(c.fail_condition, _ctx(units=[], tick=2000, kills=8, lost=4))
     # Fact destroyed → fail
     assert evaluate(
         c.fail_condition,
+        _ctx(tanks=3, tick=2000, kills=8, has_fact=False),
     )
     # Timeout → fail
+    assert evaluate(c.fail_condition, _ctx(tanks=3, tick=5402, kills=7))
 def test_hard_predicates():
     c = compile_level(load_pack(PACK_PATH), "hard")
+    # Intended: 3 tanks, 8 kills, fact up → WIN
+    assert evaluate(c.win_condition, _ctx(tanks=3, tick=2000, kills=8))
+    # Only 2 tanks → 2tnk:3 clause unmet → not a win
+    assert not evaluate(c.win_condition, _ctx(tanks=2, tick=2000, kills=8))
     # Bar unmet → not a win
+    assert not evaluate(c.win_condition, _ctx(tanks=3, tick=2000, kills=7))
     # Force wipe → fail
     assert evaluate(c.fail_condition, _ctx(units=[], tick=2000, kills=8, lost=4))
     # Fact destroyed → fail
     assert evaluate(
         c.fail_condition,
+        _ctx(tanks=3, tick=2000, kills=8, has_fact=False),
     )
     # Timeout → fail
+    assert evaluate(c.fail_condition, _ctx(tanks=3, tick=5402, kills=7))
+def test_win_requires_three_medium_tanks():
+    """The load-bearing anti-cheat: every level's win predicate must
+    require `unit_type_count_gte 2tnk:3` — a stall / wrong-counter
+    policy that never builds the medium-tank fist can never win
+    regardless of how many kills the entrenched enemy concedes."""
+    pack = load_pack(PACK_PATH)
+    for lvl in ("easy", "medium", "hard"):
+        c = compile_level(pack, lvl)
+        # 0 tanks, kill bar trivially exceeded, fact up, in time → NOT a win.
+        assert not evaluate(c.win_condition, _ctx(tanks=0, tick=2000, kills=99)), (
+            f"{lvl}: win must require 3 fielded 2tnk — a 0-tank policy "
+            f"with the kill bar met must NOT win (anti-cheat clause)"
+        )
+        # 3 tanks + kill bar met + fact up + in time → WIN.
+        assert evaluate(c.win_condition, _ctx(tanks=3, tick=2000, kills=99)), (
+            f"{lvl}: 3 fielded 2tnk + kill bar met must WIN"
+        )
 def test_timeout_reachable_inside_max_turns():
         assert n_e1 >= 6, f"{lvl}: needs ≥6 e1 in the enemy cluster; got {n_e1}"
+def test_starter_jeep_is_hold_fire():
+    """The armour-class engine fix made pre-placed agent combat units
+    auto-fire effectively. The starter jeep must be `stance: 0`
+    (HoldFire) on every spawn group of every level so a pure-observe
+    stall policy cannot rack up kills with it for free."""
+    pack = load_pack(PACK_PATH)
+    for lvl in ("easy", "medium", "hard"):
+        c = compile_level(pack, lvl)
+        jeeps = [
+            a for a in c.scenario.actors
+            if a.owner == "agent" and a.type == "jeep"
+        ]
+        assert jeeps, f"{lvl}: needs a starter jeep"
+        for j in jeeps:
+            assert j.stance == 0, (
+                f"{lvl}: starter jeep must be stance:0 (HoldFire) so a "
+                f"stall policy cannot score free kills; got stance={j.stance}"
+            )
+def test_agent_base_can_build_both_counters():
     """The composition decision is COMPOSITION, not tech-up. Each
+    spawn group on every level must have the buildings that make BOTH
+    counters producible from turn 1: tent (e1/e3), weap+powr+fix
+    (2tnk — the war-factory vehicle queue needs power online AND a
+    service depot for the medium tank to clear its prerequisites).
+    The starter jeep must also be present."""
     pack = load_pack(PACK_PATH)
     for lvl in ("easy", "medium", "hard"):
         c = compile_level(pack, lvl)
+        # Per spawn group, the agent must have the full base + jeep.
         # On non-hard levels there is exactly one (default) spawn
         # group (spawn_point None → 0); on hard there are two.
         groups: dict[int, list] = {}
             groups.setdefault(g, []).append(a.type)
         assert groups, f"{lvl}: no agent actors found"
         for g, ts in groups.items():
+            for need in ("fact", "powr", "tent", "weap", "fix", "jeep"):
                 assert need in ts, (
                     f"{lvl}: spawn group {g} missing {need}; got {ts}"
                 )
     assert 25 * 100 == 2500
+# ── engine-driven scripted policies ─────────────────────────────────
 #
+# The full RPS-counter bar (stall LOSES / build-e3 LOSES / build-e1
+# LOSES / build-2tnk WINS) is exercised against the live engine on
+# every level and every hard seed 1..4.
+def _own_units(rs, *, type_filter=None):
+    out = []
+    for u in (rs.get("units_summary", []) or []):
+        if type_filter and (u.get("type") or "").lower() not in type_filter:
+            continue
+        out.append(u)
+    return out
+def _enemy_infantry(rs):
+    return [
+        e for e in (rs.get("enemy_summary") or [])
+        if (e.get("type") or "").lower() == "e1" and not e.get("is_building")
+    ]
 def _stall(rs, Command):
+    """Pure observe — no production. The HoldFire jeep never fires →
+    0 kills, 0 tanks; the 2tnk:3 win clause never latches → LOSS
+    (force-wipe when the e1 swarm hunts the jeep, or after_ticks)."""
     return [Command.observe()]
+def _make_build_policy(unit_type, cost):
+    """Queue `unit_type` every turn the budget allows and send each
+    produced unit at the enemy infantry cluster."""
+    def policy(rs, Command):
+        cmds = []
+        if rs.get("cash", 0) >= cost:
+            cmds.append(Command.build(unit_type))
+        fighters = _own_units(rs, type_filter={unit_type})
+        targets = _enemy_infantry(rs)
+        for u in fighters:
+            if targets:
+                cmds.append(
+                    Command.attack_unit([str(u["id"])], str(targets[0]["id"]))
+                )
+            else:
+                cmds.append(Command.attack_move([str(u["id"])], 70, 20))
+        return cmds if cmds else [Command.observe()]
+    return policy
+_build_e3 = _make_build_policy("e3", 300)
+_build_e1 = _make_build_policy("e1", 100)
+_build_2tnk = _make_build_policy("2tnk", 850)
 @pytest.mark.parametrize("level", ["easy", "medium", "hard"])
 def test_stall_loses(level):
+    """Stall must be a real LOSS on every level and every hard seed
+    (no draw): the win predicate requires `unit_type_count_gte
+    2tnk:3` which a pure-observe policy can never satisfy, and the
+    idle HoldFire jeep is hunted down → force-wipe / after_ticks."""
     pytest.importorskip("openra_train")
     from openra_bench.eval_core import run_level
     c = compile_level(load_pack(PACK_PATH), level)
+    seeds = (1, 2, 3, 4) if level == "hard" else (1,)
+    for s in seeds:
+        r = run_level(c, _stall, seed=s)
+        assert r.outcome == "loss", (
+            f"{level} seed={s}: stall must be a real LOSS (no 2tnk → "
+            f"win clause unmet); got {r.outcome} after {r.turns} turns "
+            f"(kills={r.signals.units_killed}, lost={r.signals.units_lost})"
+        )
+@pytest.mark.parametrize("level", ["easy", "medium", "hard"])
+def test_build_e3_wrong_counter_loses(level):
+    """Mass anti-tank rockets are the WRONG counter — and crucially
+    the policy never builds 2tnk, so the `unit_type_count_gte 2tnk:3`
+    win clause is structurally unmet → real LOSS on every level and
+    every hard seed."""
+    pytest.importorskip("openra_train")
+    from openra_bench.eval_core import run_level
+    c = compile_level(load_pack(PACK_PATH), level)
+    seeds = (1, 2, 3, 4) if level == "hard" else (1,)
+    for s in seeds:
+        r = run_level(c, _build_e3, seed=s)
+        assert r.outcome == "loss", (
+            f"{level} seed={s}: build-e3 wrong-counter must LOSE (no "
+            f"2tnk → win clause unmet); got {r.outcome} "
+            f"(kills={r.signals.units_killed}, lost={r.signals.units_lost})"
+        )
+@pytest.mark.parametrize("level", ["easy", "medium", "hard"])
+def test_build_e1_wrong_counter_loses(level):
+    """Matching the enemy 1:1 with own rifles never builds 2tnk, so
+    the `unit_type_count_gte 2tnk:3` win clause is structurally unmet
+    → real LOSS on every level and every hard seed."""
+    pytest.importorskip("openra_train")
+    from openra_bench.eval_core import run_level
+    c = compile_level(load_pack(PACK_PATH), level)
+    seeds = (1, 2, 3, 4) if level == "hard" else (1,)
+    for s in seeds:
+        r = run_level(c, _build_e1, seed=s)
+        assert r.outcome == "loss", (
+            f"{level} seed={s}: build-e1 wrong-counter must LOSE (no "
+            f"2tnk → win clause unmet); got {r.outcome} "
+            f"(kills={r.signals.units_killed}, lost={r.signals.units_lost})"
+        )
+@pytest.mark.parametrize("level", ["easy", "medium", "hard"])
+def test_intended_build_2tnk_wins(level):
+    """The RPS counter pick: build 3× 2tnk (medium tanks) and engage.
+    Heavy armour walks through the e1 mass — the `2tnk:3` clause and
+    the kill bar both latch. Wins on every level and every hard
+    seed 1..4."""
+    pytest.importorskip("openra_train")
+    from openra_bench.eval_core import run_level
+    c = compile_level(load_pack(PACK_PATH), level)
+    seeds = (1, 2, 3, 4) if level == "hard" else (1,)
+    for s in seeds:
+        r = run_level(c, _build_2tnk, seed=s)
+        assert r.outcome == "win", (
+            f"{level} seed={s}: intended build-2tnk must WIN; got "
+            f"{r.outcome} (kills={r.signals.units_killed}, "
+            f"lost={r.signals.units_lost})"
+        )