Spaces:

qpluslab
/

OpenRA-Bench

Running

yxc20098 commited on May 21

Commit

b77f5b2

1 Parent(s): 0fb13a4

fix(scenario): combat-retreat-after-engagement — recalibrate after engine movement fixes

The OpenRA-Rust movement fixes (moving units fire and take fire en
route; attack_unit on out-of-sight targets paths normally) regressed
this pack on every tier:
- the close-range trade became far more lethal — killing the kill
quota required losing more tanks than the survival cap allowed, so
the intended engage-then-retreat could not win (medium lost 2 tanks
for 4 kills; the fight was unsolvable inside the loss cap);
- in interrupt mode an action-heavy episode advances FEWER ticks per
turn than a pure stall, so the old after_ticks:4501 deadline was
inert for the intended policy — a non-winning run DREW (tick ~4428
at max_turns) instead of LOSING (draw degeneracy).

Recalibration:
- lighter enemy: easy 2 e3 + 1 3tnk (kill bar 2), medium 3 e3 + 1
3tnk (kill bar 3), hard 2 e3 + 1 3tnk (kill bar 2). e3 set to
stance:2 Defend so they hold the firing line and do not chase the
retreating column (a stance:3 e3 hunts the tanks home and confounds
the retreat).
- deadline pulled down to after_ticks/within_ticks 4000 — crossed by
every policy inside max_turns=51 (a stall crosses ~turn 45, an
action policy before turn 51), so a non-winning run is a real LOSS,
never a draw.
- the intended test policy was rewritten: the old kill-count
inference (peak_visible - visible) misread enemies leaving vision
as kills; the new policy is a clean three-phase approach / engage /
retreat driven by an HP-floor + tank-lost disengage trigger.

Bar verified every level x seeds 1-4: stall / brute-attack-until-
death / never-engage all LOSE (real timeout LOSS, no draws); intended
engage-then-retreat WINS.

Files changed (2) hide show

openra_bench/scenarios/packs/combat-retreat-after-engagement.yaml +151 -143
tests/test_combat_retreat_after_engagement.py +103 -75

openra_bench/scenarios/packs/combat-retreat-after-engagement.yaml CHANGED Viewed

@@ -11,62 +11,68 @@
 #     trades hits for intel/attrition, then breaks contact before the
 #     enemy can encircle and destroy.
 #
 # Idiom (the four-policy bar — same shape used by combat-kite and
 # combat-focus-fire):
-#   • stall (only observe)                    → LOSS. The kill bar
-#     (units_killed_gte:3) is never met → after_ticks LOSS.
-#   • brute attack-until-death (commit fully) → LOSS. The 2× heavy
-#     3tnk + 5× e3 rocket-infantry mass out-trades 4× 2tnk in a
-#     stand-up fight at the enemy line; tanks die two-by-two and
-#     `own_units_gte:3` flips to fail before the survivors can pull
-#     back.
-#   • never-engage (sit at safe zone)         → LOSS. Three tanks are
-#     in the safe-zone region the whole episode, but the kill bar is
-#     unmet → after_ticks LOSS.
 #   • intended engage-then-retreat            → WIN. March east into
-#     fire range, attack_unit the soft e3 rocket infantry (the kill
-#     bar is units_killed_gte:3, and 5× e3 are the cheap kills), then
-#     the moment the first tank is gone (or the second is hurt) issue
-#     a move_units order WEST back to the safe-zone region (x≈5,
-#     y≈20, r=6). All four tanks need not survive — the bar is
-#     own_units_gte:3 (lose at most ONE tank) AND ≥3 tanks back in
 #     the safe-zone radius. The decision under test is DISENGAGE
 #     TIMING: bail BEFORE attrition busts the force-preservation cap.
 #
-# Why "battle is unwinnable" (verified vs engine combat sheet):
-#   • 4× 2tnk vs 5× e3 + 2× 3tnk: head-on, the two 3tnks alone
-#     (Soviet heavy cannon, ~range 5, high anti-armour DPS) trade
-#     2tnk-for-3tnk roughly 1:1 and the rocket infantry (e3 Dragon
-#     launcher, anti-armour) finishes the surviving 2tnks. A brute
-#     attack_move column dies before clearing the squad — the loss
-#     cap (lose at most 1) flips on tank #2 going down, while only
-#     ~2 e3 have died.
-#   • The "engage 3 e3 → break contact" play kills 3 cheap targets
-#     fast (4-vs-1 focus on each e3 ends it in 1-2 decision turns)
-#     then bails BEFORE the 3tnks have ground enough HP off the
-#     column to crater the survival cap.
 #
 # Engine guardrails:
 #   • Map: rush-hour-arena (128 × 40, playable x in [2..126],
 #     y in [2..38]). Tanks stage at the west edge (x=5..7), enemies
-#     at (60,20). Cells verified placeable per CLAUDE.md (60,28 was
-#     called out as silently-failing — we use (60,20) and adjacent
-#     y=17..23).
 #   • Enemy bot is `guard` (not `hunt`): guard holds post and lunges
-#     within GUARD_AGGRO~16, snaps back past leash 18. This lets the
-#     enemy COMMIT during the engagement but NOT pursue the retreating
-#     tanks all the way home — the retreat is genuinely safe once the
-#     tanks break leash range. Critically, `guard` also means the
-#     never-engage policy is not punished by enemy approach (the
-#     enemies stay east), which is exactly what makes never-engage a
-#     CLEAN kill-bar LOSS rather than a confounded contact loss.
 #   • Persistent unarmed `fact` far east (x=120, y=20) prevents engine
 #     auto-done on enemy-elimination collapsing the run to DRAW — if
 #     the agent accidentally kills everything, the run still evaluates
 #     the in-region predicate at the safe zone (and the agent likely
 #     hasn't retreated → fail clause fires on the clock or attrition).
-#   • after_ticks 4501 ≤ 93 + 90·(51-1) = 4593 → the stall LOSS is
-#     reachable inside max_turns; no draw degeneracy.
 #   • units_lost / units_killed are signal-level integers (combat-
 #     focus-fire idiom); units_in_region_gte is the geometry predicate
 #     established by action-multiunit-coordination / artofwar-lure-
@@ -123,27 +129,34 @@ base:
 levels:
   # ── EASY ─────────────────────────────────────────────────────────
-  # Bare retreat skill: 4 tanks vs LESS enemy (4× e3 + 1× 3tnk).
-  # Loss cap forgiving (own_units_gte:2 ⇒ lose up to 2 tanks), kill
-  # bar only 2. The engage-then-retreat play is comfortable; the
-  # brute may LOSE on the heavy + rocket attrition trade; stall and
-  # never-engage LOSE on the kill bar.
   easy:
     description: >
       Four medium tanks (2tnk) stage at the safe zone in the west
-      (x=5, y=18..21). A smaller enemy squad of FOUR rocket
-      infantry (e3, anti-tank Dragon launcher) on the firing line
-      at x=60, escorting ONE heavy tank (3tnk) set back at x=64,
-      holds the centre. The fight is unwinnable head-on — the
-      rocket infantry kill armour fast and the 3tnk cannon
-      out-trades the column once it overshoots into close range.
-      The intended play: march east into engagement range, kill TWO
-      enemy units (the cheap e3s die in 1-2 turns of focused tank
-      fire), then move your tanks WEST BACK to the safe zone before
-      attrition busts the loss cap. Win when ≥2 enemies are killed
-      AND ≥3 of your tanks are within radius 6 of (5,20) AND you
-      have ≥2 tanks alive, before tick 4500. Stall, brute attack
-      until-death, and never-engage all LOSE.
     overrides:
       actors:
         # Strike force — 4 medium tanks at the safe-zone start.
@@ -151,21 +164,18 @@ levels:
         - {type: 2tnk, owner: agent, position: [5, 19], stance: 1}
         - {type: 2tnk, owner: agent, position: [5, 20], stance: 1}
         - {type: 2tnk, owner: agent, position: [5, 21], stance: 1}
-        # Enemy squad — 4× e3 anti-tank rockets ON the firing line
-        # at x=60 (the kill bar's soft targets), with the 1× 3tnk
-        # heavy escort set 4 cells BACK at x=64. The set-back heavy
-        # only weighs in if the agent commits PAST the e3 line —
-        # i.e. the brute attack_move policy that overshoots — so it
-        # provides the attrition pressure that makes the brute LOSE
-        # without immediately crushing the intended engage-the-line
-        # play. (Engine balance fixes made a heavy ON the line too
-        # lethal for the bare-skill easy tier; setting it back
-        # mirrors the medium/hard geometry.)
-        - {type: e3,   owner: enemy, position: [60, 18], stance: 3}
-        - {type: e3,   owner: enemy, position: [60, 19], stance: 3}
-        - {type: e3,   owner: enemy, position: [60, 21], stance: 3}
-        - {type: e3,   owner: enemy, position: [60, 22], stance: 3}
-        - {type: 3tnk, owner: enemy, position: [64, 20], stance: 3}
         # Persistent far-east enemy fact — prevents engine auto-done
         # on enemy-elimination so the safe-zone predicate is
         # evaluated rather than the run collapsing to DRAW.
@@ -175,74 +185,66 @@ levels:
         - {units_killed_gte: 2}
         - {units_in_region_gte: {x: 5, y: 20, radius: 6, n: 3}}
         - {own_units_gte: 2}
-        - {within_ticks: 4500}
     fail_condition:
       any_of:
-        - {after_ticks: 4501}
         - {not: {own_units_gte: 2}}
     max_turns: 51
   # ── MEDIUM ───────────────────────────────────────────────────────
-  # +1 controlled variable vs easy: FULL enemy squad (5× e3 + 2× 3tnk)
-  # AND tighter survival bar (own_units_gte:3 ⇒ lose AT MOST ONE
-  # tank). The kill bar tightens to 3. The engage window is shorter:
-  # the second 3tnk doubles the close-range damage and the brute now
-  # loses tanks ~twice as fast. The intended engage-then-retreat
-  # still wins — focus-fire 3 e3s (1-2 turns each) and pull back
-  # before the heavies grind ≥2 tanks down.
   medium:
     description: >
       Four medium tanks (2tnk) stage at the safe zone in the west
-      (x=5, y=18..21). The enemy squad at (60, 20) is FIVE rocket
-      infantry (e3, anti-tank Dragon launcher) escorting TWO heavy
-      tanks (3tnk). The fight is unwinnable head-on — the rockets
-      and heavy cannons together collapse the column at close
-      range. The intended play: march east into engagement range,
-      focus-fire THREE enemy units (the cheap e3s die in 1-2 turns
-      of concentrated tank fire each), then move your tanks WEST
-      BACK to the safe zone before attrition takes a second tank.
-      Win when ≥3 enemies are killed AND ≥3 of your tanks are
-      within radius 6 of (5,20) AND you have ≥3 tanks alive,
-      before tick 4500. Stall, brute attack-until-death, and
-      never-engage all LOSE.
     overrides:
       actors:
         - {type: 2tnk, owner: agent, position: [5, 18], stance: 1}
         - {type: 2tnk, owner: agent, position: [5, 19], stance: 1}
         - {type: 2tnk, owner: agent, position: [5, 20], stance: 1}
         - {type: 2tnk, owner: agent, position: [5, 21], stance: 1}
-        # FULL enemy squad — 5× e3 (kill-bar fodder; anti-tank
-        # Dragon launcher) ON THE LINE at x=60, with the 2× 3tnk
-        # heavy escort placed 4 cells back at x=64. The e3 line is
-        # what the attacker must close on to score kills (Dragon
-        # range 5); from the e3 firing line (~x=55) the rear-rank
-        # 3tnks (range ~5) cannot yet engage, so they only weigh in
-        # if the agent commits PAST the e3 line — i.e. the brute
-        # attack_move policy that overshoots. Spread the e3 line
-        # across y=17..23 so all four tanks face fire.
-        - {type: e3,   owner: enemy, position: [60, 17], stance: 3}
-        - {type: e3,   owner: enemy, position: [60, 18], stance: 3}
-        - {type: e3,   owner: enemy, position: [60, 19], stance: 3}
-        - {type: e3,   owner: enemy, position: [60, 21], stance: 3}
-        - {type: e3,   owner: enemy, position: [60, 22], stance: 3}
-        # Heavy tanks placed 4 cells BEHIND the rocket line —
-        # within engagement leash for the guard bot to pursue when
-        # contact starts (GUARD_AGGRO ~16), but outside their own
-        # ~5-cell weapon range from the e3 firing line. They are
-        # the attrition trigger for the brute attack_move policy
-        # that closes past x=60 into 3tnk cannon range.
-        - {type: 3tnk, owner: enemy, position: [64, 19], stance: 3}
-        - {type: 3tnk, owner: enemy, position: [64, 21], stance: 3}
         - {type: fact, owner: enemy, position: [120, 20]}
     win_condition:
       all_of:
         - {units_killed_gte: 3}
         - {units_in_region_gte: {x: 5, y: 20, radius: 6, n: 3}}
         - {own_units_gte: 3}
-        - {within_ticks: 4500}
     fail_condition:
       any_of:
-        - {after_ticks: 4501}
         - {not: {own_units_gte: 3}}
     max_turns: 51
@@ -257,6 +259,15 @@ levels:
   # is symmetric across y=20 mid-latitude so both spawns face the
   # same engagement geometry.
   #
   # Per the CLAUDE.md `spawn_point` contract: ALL agent actors
   # carry an explicit spawn_point (the filter applies only to AGENT
   # actors); the enemy actors are unchanged and always place.
@@ -264,17 +275,17 @@ levels:
     description: >
       Four medium tanks (2tnk) stage at ONE of two safe-zone
       corridors (NORTH at x=5, y=8..11 OR SOUTH at x=5, y=28..31,
-      chosen by seed — anti-memorisation). The enemy squad of FIVE
-      rocket infantry (e3) escorting TWO heavy tanks (3tnk) holds
-      the centre at (60, 20). The fight is unwinnable head-on.
       The intended play: march east-and-toward-centre into
-      engagement range, focus-fire THREE enemy units (the cheap
-      e3s die fast under concentrated tank fire), then move your
-      tanks BACK to YOUR safe zone (the one you started in — read
-      your start cell from obs) before attrition takes a second
-      tank. Win when ≥3 enemies are killed AND ≥3 of your tanks
-      are within radius 6 of YOUR safe zone (north (5,10) OR south
-      (5,30)) AND you have ≥3 tanks alive, before tick 4500.
       Stall, brute attack-until-death, never-engage, and retreating
       to the WRONG safe zone all LOSE.
     overrides:
@@ -289,29 +300,26 @@ levels:
         - {type: 2tnk, owner: agent, position: [5, 29], stance: 1, spawn_point: 1}
         - {type: 2tnk, owner: agent, position: [5, 30], stance: 1, spawn_point: 1}
         - {type: 2tnk, owner: agent, position: [5, 31], stance: 1, spawn_point: 1}
-        # FULL enemy squad — symmetric across y=20 so both spawns
-        # face the same engagement geometry. e3 line forward at
-        # x=60 (Dragon range 5); 3tnk escort 4 cells back at x=64
-        # (out of weapon range from the e3 firing line so the heavy
-        # only weighs in on a brute overshoot past the rocket line).
-        - {type: e3,   owner: enemy, position: [60, 17], stance: 3}
-        - {type: e3,   owner: enemy, position: [60, 18], stance: 3}
-        - {type: e3,   owner: enemy, position: [60, 19], stance: 3}
-        - {type: e3,   owner: enemy, position: [60, 21], stance: 3}
-        - {type: e3,   owner: enemy, position: [60, 22], stance: 3}
-        - {type: 3tnk, owner: enemy, position: [64, 19], stance: 3}
-        - {type: 3tnk, owner: enemy, position: [64, 21], stance: 3}
         - {type: fact, owner: enemy, position: [120, 20]}
     win_condition:
       all_of:
-        - {units_killed_gte: 3}
         - any_of:
             - {units_in_region_gte: {x: 5, y: 10, radius: 6, n: 3}}
             - {units_in_region_gte: {x: 5, y: 30, radius: 6, n: 3}}
         - {own_units_gte: 3}
-        - {within_ticks: 4500}
     fail_condition:
       any_of:
-        - {after_ticks: 4501}
         - {not: {own_units_gte: 3}}
     max_turns: 51

 #     trades hits for intel/attrition, then breaks contact before the
 #     enemy can encircle and destroy.
 #
+# Recalibrated 2026-05-20 after the OpenRA-Rust movement fixes
+# (moving units fire AND take fire en route; attack_unit on an
+# out-of-sight target paths normally). Those fixes made the close-
+# range trade far more lethal, and the interrupt-mode tick cadence
+# means an action-heavy episode advances FEWER ticks per turn than a
+# pure stall — so the old after_ticks:4501 deadline was inert for the
+# intended policy (a non-winning run DREW instead of LOSING). The
+# enemy is now lighter and the deadline pulled down to 4000 (reached
+# by every policy inside max_turns=51; verified no draw).
+#
 # Idiom (the four-policy bar — same shape used by combat-kite and
 # combat-focus-fire):
+#   • stall (only observe)                    → LOSS. The kill bar is
+#     never met; the after_ticks:4000 deadline (reached ~turn 45)
+#     fires → real LOSS, never a draw.
+#   • brute attack-until-death (commit fully) → LOSS. An attack_move
+#     column overshoots the e3 line into the set-back 3tnk heavy and
+#     is out-traded; `own_units_gte:N` flips to fail before the
+#     survivors can pull back.
+#   • never-engage (sit at safe zone)         → LOSS. The tanks stay
+#     in the safe zone the whole episode but the kill bar is unmet →
+#     after_ticks LOSS.
 #   • intended engage-then-retreat            → WIN. March east into
+#     fire range, attack_unit the soft e3 rocket infantry (the kill-
+#     bar fodder), then the instant a tank is lost OR any tank's HP
+#     drops below a floor, issue a move_units order WEST back to the
+#     safe-zone region. The bar is own_units_gte:N (lose at most one
+#     tank on medium/hard, up to two on easy) AND ≥3 tanks back in
 #     the safe-zone radius. The decision under test is DISENGAGE
 #     TIMING: bail BEFORE attrition busts the force-preservation cap.
 #
+# Why "battle is lethal head-on" (verified vs engine combat sheet):
+#   • The e3 rocket infantry (Dragon launcher, anti-armour) alpha-
+#     strike the column hard — a brute attack_move that closes past
+#     the e3 line into the set-back 3tnk's cannon range is out-traded
+#     and the loss cap flips before the kill bar is met.
+#   • The "engage the e3 line → break contact" play kills the cheap
+#     targets under concentrated tank fire then bails the moment the
+#     trade turns — keeping ≥3 tanks in the loss cap.
 #
 # Engine guardrails:
 #   • Map: rush-hour-arena (128 × 40, playable x in [2..126],
 #     y in [2..38]). Tanks stage at the west edge (x=5..7), enemies
+#     at (60,20). Cells verified placeable per CLAUDE.md.
 #   • Enemy bot is `guard` (not `hunt`): guard holds post and lunges
+#     within GUARD_AGGRO~16, snaps back past leash 18. The e3 actors
+#     additionally carry stance:2 Defend so they auto-fire in range
+#     but never advance — they HOLD the firing line and do NOT chase
+#     the retreating column (a stance:3 e3 would hunt the tanks all
+#     the way home and confound the retreat). This also means the
+#     never-engage policy is not punished by enemy approach, so
+#     never-engage is a CLEAN kill-bar LOSS, not a confounded loss.
 #   • Persistent unarmed `fact` far east (x=120, y=20) prevents engine
 #     auto-done on enemy-elimination collapsing the run to DRAW — if
 #     the agent accidentally kills everything, the run still evaluates
 #     the in-region predicate at the safe zone (and the agent likely
 #     hasn't retreated → fail clause fires on the clock or attrition).
+#   • after_ticks 4000 is reached by every policy inside max_turns=51
+#     (a pure stall crosses tick 4000 at ~turn 45; an action-heavy
+#     policy, which advances fewer ticks per turn in interrupt mode,
+#     still crosses it before turn 51) → a non-winning run is a real
+#     LOSS, never a draw.
 #   • units_lost / units_killed are signal-level integers (combat-
 #     focus-fire idiom); units_in_region_gte is the geometry predicate
 #     established by action-multiunit-coordination / artofwar-lure-
 levels:
   # ── EASY ─────────────────────────────────────────────────────────
+  # Bare retreat skill. Recalibrated 2026-05-20 after the OpenRA-Rust
+  # movement fixes (moving units fire AND take fire en route; attack_
+  # unit on out-of-sight targets paths normally). Those fixes made the
+  # close-range trade far more lethal — the old 4×e3+1×3tnk enemy was
+  # unwinnable WITHIN the loss cap (killing the quota required losing
+  # too many tanks), and the inert after_ticks deadline let a
+  # non-winning run DRAW instead of LOSE.
+  # New shape: enemy is 2× e3 (anti-tank rockets, stance:2 Defend so
+  # they HOLD the line, not chase the retreat) on the firing line plus
+  # ONE 3tnk heavy escort set back at x=64. Loss cap forgiving
+  # (own_units_gte:2 ⇒ lose up to 2 tanks); kill bar 2 (kill both
+  # e3s). The engage-then-retreat play kills the e3 line and pulls
+  # back losing ~1 tank; the brute overcommits past the line into the
+  # heavy and is wiped; stall / never-engage never meet the kill bar.
   easy:
     description: >
       Four medium tanks (2tnk) stage at the safe zone in the west
+      (x=5, y=18..21). An enemy squad of TWO rocket infantry (e3,
+      anti-tank Dragon launcher) holds the firing line at x=60,
+      escorting ONE heavy tank (3tnk) set back at x=64. The fight is
+      lethal head-on — the rocket infantry shred armour and the 3tnk
+      cannon out-trades the column once it overshoots into close
+      range. The intended play: march east into engagement range,
+      focus-fire and kill the TWO e3s, then move your tanks WEST BACK
+      to the safe zone before attrition busts the loss cap. Win when
+      ≥2 enemies are killed AND ≥3 of your tanks are within radius 6
+      of (5,20) AND you have ≥2 tanks alive, before tick 4000. Stall,
+      brute attack-until-death, and never-engage all LOSE.
     overrides:
       actors:
         # Strike force — 4 medium tanks at the safe-zone start.
         - {type: 2tnk, owner: agent, position: [5, 19], stance: 1}
         - {type: 2tnk, owner: agent, position: [5, 20], stance: 1}
         - {type: 2tnk, owner: agent, position: [5, 21], stance: 1}
+        # Enemy squad — 2× e3 anti-tank rockets ON the firing line at
+        # x=60 (the kill bar's soft targets), stance:2 Defend so they
+        # HOLD the line and auto-fire in range but do NOT chase the
+        # retreating column (a stance:3 e3 would hunt the tanks all
+        # the way home and confound the retreat). The 1× 3tnk heavy
+        # escort sits 4 cells BACK at x=64 — out of weapon range from
+        # the e3 firing line, so it only weighs in when the agent
+        # commits PAST the e3 line (the brute overshoot), supplying
+        # the attrition that makes the brute LOSE.
+        - {type: e3,   owner: enemy, position: [60, 19], stance: 2}
+        - {type: e3,   owner: enemy, position: [60, 21], stance: 2}
+        - {type: 3tnk, owner: enemy, position: [64, 20], stance: 2}
         # Persistent far-east enemy fact — prevents engine auto-done
         # on enemy-elimination so the safe-zone predicate is
         # evaluated rather than the run collapsing to DRAW.
         - {units_killed_gte: 2}
         - {units_in_region_gte: {x: 5, y: 20, radius: 6, n: 3}}
         - {own_units_gte: 2}
+        - {within_ticks: 4000}
     fail_condition:
       any_of:
+        - {after_ticks: 4000}
         - {not: {own_units_gte: 2}}
     max_turns: 51
   # ── MEDIUM ───────────────────────────────────────────────────────
+  # +1 controlled variable vs easy: a bigger e3 line (3× e3 instead of
+  # 2×, kill bar 3 instead of 2) AND a tighter survival bar
+  # (own_units_gte:3 ⇒ lose AT MOST ONE tank). The engage window is
+  # shorter — three e3s alpha-strike the column harder, so the agent
+  # must focus-fire efficiently and break contact a turn sooner. The
+  # intended engage-then-retreat still wins (focus-fire the 3 e3s,
+  # pull back losing one tank); the brute overcommits past the line
+  # into the heavy and loses the force.
   medium:
     description: >
       Four medium tanks (2tnk) stage at the safe zone in the west
+      (x=5, y=18..21). The enemy squad is THREE rocket infantry (e3,
+      anti-tank Dragon launcher) holding the firing line at x=60,
+      escorting ONE heavy tank (3tnk) set back at x=64. The fight is
+      lethal head-on — the rockets shred armour and the 3tnk cannon
+      out-trades the column once it overshoots. The intended play:
+      march east into engagement range, focus-fire and kill the
+      THREE e3s, then move your tanks WEST BACK to the safe zone
+      before attrition takes a second tank. Win when ≥3 enemies are
+      killed AND ≥3 of your tanks are within radius 6 of (5,20) AND
+      you have ≥3 tanks alive, before tick 4000. Stall, brute
+      attack-until-death, and never-engage all LOSE.
     overrides:
       actors:
         - {type: 2tnk, owner: agent, position: [5, 18], stance: 1}
         - {type: 2tnk, owner: agent, position: [5, 19], stance: 1}
         - {type: 2tnk, owner: agent, position: [5, 20], stance: 1}
         - {type: 2tnk, owner: agent, position: [5, 21], stance: 1}
+        # Enemy squad — 3× e3 (kill-bar fodder; anti-tank Dragon
+        # launcher) ON THE LINE at x=60, stance:2 Defend so they hold
+        # the line and do not chase the retreat. The 1× 3tnk heavy
+        # escort sits 4 cells back at x=64, out of weapon range from
+        # the e3 firing line — it only weighs in when the agent
+        # commits PAST the e3 line (the brute overshoot). Spread the
+        # e3 line across y=18..22 so the squad faces fire.
+        - {type: e3,   owner: enemy, position: [60, 18], stance: 2}
+        - {type: e3,   owner: enemy, position: [60, 20], stance: 2}
+        - {type: e3,   owner: enemy, position: [60, 22], stance: 2}
+        # Heavy tank 4 cells BEHIND the rocket line — the attrition
+        # trigger for the brute attack_move policy that closes past
+        # x=60 into 3tnk cannon range.
+        - {type: 3tnk, owner: enemy, position: [64, 20], stance: 2}
         - {type: fact, owner: enemy, position: [120, 20]}
     win_condition:
       all_of:
         - {units_killed_gte: 3}
         - {units_in_region_gte: {x: 5, y: 20, radius: 6, n: 3}}
         - {own_units_gte: 3}
+        - {within_ticks: 4000}
     fail_condition:
       any_of:
+        - {after_ticks: 4000}
         - {not: {own_units_gte: 3}}
     max_turns: 51
   # is symmetric across y=20 mid-latitude so both spawns face the
   # same engagement geometry.
   #
+  # The corner spawn is the hard challenge: the squad must close on
+  # the e3 line along a long DIAGONAL, taking fire the whole approach
+  # (the engine movement fix means a moving column is a live target).
+  # That diagonal makes the medium-tier 3-e3 line genuinely
+  # unsolvable inside the loss cap, so hard trades raw enemy count
+  # for positional difficulty — 2× e3 + 1× 3tnk, kill bar 2 — while
+  # the seed-flipped corridor and the read-your-own-safe-zone
+  # requirement supply the discrimination.
+  #
   # Per the CLAUDE.md `spawn_point` contract: ALL agent actors
   # carry an explicit spawn_point (the filter applies only to AGENT
   # actors); the enemy actors are unchanged and always place.
     description: >
       Four medium tanks (2tnk) stage at ONE of two safe-zone
       corridors (NORTH at x=5, y=8..11 OR SOUTH at x=5, y=28..31,
+      chosen by seed — anti-memorisation). An enemy squad of TWO
+      rocket infantry (e3) escorting ONE heavy tank (3tnk) holds
+      the centre at (60, 20). The fight is lethal head-on, and the
+      corner spawn means a long diagonal approach under fire.
       The intended play: march east-and-toward-centre into
+      engagement range, focus-fire and kill the TWO e3s, then move
+      your tanks BACK to YOUR safe zone (the one you started in —
+      read your start cell from obs) before attrition takes a
+      second tank. Win when ≥2 enemies are killed AND ≥3 of your
+      tanks are within radius 6 of YOUR safe zone (north (5,10) OR
+      south (5,30)) AND you have ≥3 tanks alive, before tick 4000.
       Stall, brute attack-until-death, never-engage, and retreating
       to the WRONG safe zone all LOSE.
     overrides:
         - {type: 2tnk, owner: agent, position: [5, 29], stance: 1, spawn_point: 1}
         - {type: 2tnk, owner: agent, position: [5, 30], stance: 1, spawn_point: 1}
         - {type: 2tnk, owner: agent, position: [5, 31], stance: 1, spawn_point: 1}
+        # Enemy squad — symmetric across y=20 so both spawns face the
+        # same engagement geometry. e3 line forward at x=60 (Dragon
+        # range 5), stance:2 Defend so the e3 hold the line and do
+        # not chase the retreat. The 1× 3tnk escort sits 4 cells back
+        # at x=64, out of weapon range from the e3 firing line so the
+        # heavy only weighs in on a brute overshoot past the line.
+        - {type: e3,   owner: enemy, position: [60, 19], stance: 2}
+        - {type: e3,   owner: enemy, position: [60, 21], stance: 2}
+        - {type: 3tnk, owner: enemy, position: [64, 20], stance: 2}
         - {type: fact, owner: enemy, position: [120, 20]}
     win_condition:
       all_of:
+        - {units_killed_gte: 2}
         - any_of:
             - {units_in_region_gte: {x: 5, y: 10, radius: 6, n: 3}}
             - {units_in_region_gte: {x: 5, y: 30, radius: 6, n: 3}}
         - {own_units_gte: 3}
+        - {within_ticks: 4000}
     fail_condition:
       any_of:
+        - {after_ticks: 4000}
         - {not: {own_units_gte: 3}}
     max_turns: 51

tests/test_combat_retreat_after_engagement.py CHANGED Viewed

@@ -1,17 +1,38 @@
 """combat-retreat-after-engagement — disengage to preserve the force.
-Bar (four script-policy proxies):
-  • stall (observe only)                    → LOSS (kill bar unmet)
-  • brute attack-until-death                → LOSS (loses too many tanks)
-  • never-engage (sit at safe zone)         → LOSS (kill bar unmet)
-  • intended engage-then-retreat            → WIN
-The "intended" policy is the spec's load-bearing decision: march east
-into engagement range, focus-fire e3 rocket infantry (the cheap kill-
-bar targets), and the instant the kill bar is met OR a tank is lost
-pull back to the safe-zone radius. The retreat trigger is the
-capability under test — too early ⇒ kill bar fails; too late ⇒
-attrition busts the survival bar.
 """
 from __future__ import annotations
@@ -56,7 +77,8 @@ def _ctx(units_xy=(), tick=1000, killed=0, lost=0):
 def test_predicates_easy():
     c = compile_level(load_pack(PACK_PATH), "easy")
-    # 3 tanks back in safe zone (5,20,r=6), killed 2 enemies, 1 lost, in time → WIN
     home3 = [(5, 18), (5, 20), (5, 21)]
     assert evaluate(c.win_condition, _ctx(home3, tick=3000, killed=2, lost=1))
     # Kill bar unmet (only 1 killed) → not WIN
@@ -68,9 +90,9 @@ def test_predicates_easy():
     # 3 tanks lost (only 1 alive) → fail clause own_units_gte:2 fires
     assert evaluate(c.fail_condition, _ctx([(5, 20)], tick=3000, killed=3, lost=3))
     # Past deadline → real LOSS reachable within max_turns
-    assert evaluate(c.fail_condition, _ctx(home3, tick=4502, killed=0, lost=0))
-    assert 4501 <= 93 + 90 * (c.max_turns - 1), (
-        "easy after_ticks 4501 must be reachable within max_turns"
     )
@@ -90,27 +112,27 @@ def test_predicates_medium_force_preservation_bar():
     # 2 tanks alive ⇒ fail clause fires (preservation cap)
     assert evaluate(c.fail_condition, _ctx(home2, tick=3000, killed=3, lost=2))
     # Past deadline ⇒ real LOSS reachable
-    assert evaluate(c.fail_condition, _ctx(home3, tick=4502, killed=0, lost=0))
-    assert 4501 <= 93 + 90 * (c.max_turns - 1)
 def test_predicates_hard_two_safe_zones():
     c = compile_level(load_pack(PACK_PATH), "hard")
     # NORTH safe zone (5,10) satisfies the any_of geometry
     home_north = [(5, 9), (5, 10), (5, 11)]
-    assert evaluate(c.win_condition, _ctx(home_north, tick=3000, killed=3, lost=1))
     # SOUTH safe zone (5,30) also satisfies the any_of geometry
     home_south = [(5, 29), (5, 30), (5, 31)]
-    assert evaluate(c.win_condition, _ctx(home_south, tick=3000, killed=3, lost=1))
     # Tanks at the WRONG centre (5,20) — outside BOTH safe zones at r=6
     # ((5,20)-(5,10)=10>6 and (5,20)-(5,30)=10>6) → fails the geometry
     assert not evaluate(
         c.win_condition,
-        _ctx([(5, 20), (5, 19), (5, 21)], tick=3000, killed=3, lost=1),
     )
-    # Past tighter deadline → real LOSS reachable
-    assert evaluate(c.fail_condition, _ctx(home_north, tick=4502, killed=0, lost=0))
-    assert 4501 <= 93 + 90 * (c.max_turns - 1)
 def test_hard_has_two_spawn_point_groups():
@@ -143,13 +165,16 @@ def test_pack_compiles_and_meta_fields_populated():
 def test_timeout_loss_is_reachable_on_every_level():
-    """No draw degeneracy: the after_ticks deadline fits inside
-    max_turns on every level (∼90 ticks/turn ⇒ 93 + 90·(max_turns-1))."""
     pack = load_pack(PACK_PATH)
     for lvl in ("easy", "medium", "hard"):
         c = compile_level(pack, lvl)
-        assert 4501 <= 93 + 90 * (c.max_turns - 1), (
-            f"{lvl}: after_ticks 4501 not reachable within max_turns"
         )
@@ -158,18 +183,22 @@ def test_timeout_loss_is_reachable_on_every_level():
 # The four-policy bar. All engine-driven tests guard on the Rust env
 # wheel; predicate-level tests above run without it.
 def _stall_policy(rs, Command):
-    """Stall: only observe. Kill bar never met → after_ticks LOSS."""
     return [Command.observe()]
 def _brute_attack_until_death_policy(rs, Command):
     """Brute: attack_move toward the enemy centre and never retreat.
     The column overshoots the e3 firing line into the set-back 3tnk
-    heavy escort (x=64 on every tier — easy 1× 3tnk, medium/hard
-    2× 3tnk); the heavy + rocket mass alpha out-trades 4× 2tnk and
-    the column dies before clearing the squad → own_units_gte:N
     fails on every level."""
     units = rs.get("units_summary", []) or []
     if not units:
@@ -197,24 +226,28 @@ def _never_engage_policy(rs, Command):
 def _make_intended_engage_then_retreat():
-    """Intended policy (the spec's load-bearing decision): march to
-    the engagement axis, focus-fire e3 rocket infantry, and the
-    instant a tank is lost OR ≥3 enemies are observed killed pull
-    back to the safe-zone radius (detected from the agent's spawn
-    median-y latched on first observation). Stateful — uses a
-    closure to track the peak number of visible killables (so we
-    can infer kills from the shrink without reading
-    signals.units_killed)."""
-    state = {"peak_visible": 0, "retreat_latched": False, "home_y": None}
     def pol(rs, Command):
         units = rs.get("units_summary", []) or []
         enemies = rs.get("enemy_summary", []) or []
         if not units:
             return [Command.observe()]
-        # Latch the home Y on first observation. The agent's spawn
-        # cell median y resolves to one of the three safe-zone
-        # corridors (north y=10, centre y=20, south y=30).
         if state["home_y"] is None:
             ys = sorted(u["cell_y"] for u in units)
             hy_med = ys[len(ys) // 2]
@@ -226,43 +259,38 @@ def _make_intended_engage_then_retreat():
                 state["home_y"] = 20
         hy = state["home_y"]
         n_alive = len(units)
         killable = [
             e
             for e in enemies
             if not e.get("is_building")
             and (e.get("type") or "").lower() != "fact"
         ]
-        visible = len(killable)
-        if visible > state["peak_visible"]:
-            state["peak_visible"] = visible
-        killed_observed = state["peak_visible"] - visible
-        # RETREAT TRIGGER: latched, or any tank lost, or ≥3 enemies
-        # observed killed. Once retreating, stay retreating (a re-
-        # engagement would re-expose the survivors to attrition).
-        if state["retreat_latched"] or n_alive < 4 or killed_observed >= 3:
-            state["retreat_latched"] = True
             return [
                 Command.move_units([str(u["id"])], target_x=5, target_y=hy)
                 for u in units
             ]
-        # ENGAGE: pick the closest e3 to home and focus-fire it with
-        # ALL tanks (4-vs-1 ends a Dragon-soldier in 1-2 decision turns).
-        e3s = [e for e in killable if (e.get("type") or "").lower() == "e3"]
-        if e3s:
             e3s.sort(
-                key=lambda e: (e["cell_x"] - 5) ** 2 + (e["cell_y"] - hy) ** 2
             )
             t = e3s[0]
             return [
                 Command.attack_unit([str(u["id"])], str(t["id"])) for u in units
             ]
-        # APPROACH: advance toward the engagement axis (50, 20) so
-        # the spawn corridor (y=10 or y=30 on hard) closes onto the
-        # mid-latitude line where the e3s will come into view.
         return [
             Command.move_units(
                 [str(u["id"])],
-                target_x=min(50, u["cell_x"] + 12),
                 target_y=20,
             )
             for u in units
@@ -273,13 +301,13 @@ def _make_intended_engage_then_retreat():
 @pytest.mark.parametrize("level", ["easy", "medium", "hard"])
 def test_stall_policy_loses(level):
-    """Stall must LOSE on every level — kill bar unmet → after_ticks LOSS."""
     pytest.importorskip("openra_train")
     from openra_bench.eval_core import run_level
     c = compile_level(load_pack(PACK_PATH), level)
-    seeds = (1, 2, 3, 4) if level == "hard" else (1,)
-    for s in seeds:
         res = run_level(c, _stall_policy, seed=s)
         assert res.outcome == "loss", (
             f"{level} seed={s}: stall must LOSE; got {res.outcome} "
@@ -289,14 +317,14 @@ def test_stall_policy_loses(level):
 @pytest.mark.parametrize("level", ["easy", "medium", "hard"])
 def test_brute_attack_until_death_loses(level):
-    """Brute attack-until-death must LOSE — the mass alpha at the
-    enemy line out-trades the column before the bar is met."""
     pytest.importorskip("openra_train")
     from openra_bench.eval_core import run_level
     c = compile_level(load_pack(PACK_PATH), level)
-    seeds = (1, 2, 3, 4) if level == "hard" else (1,)
-    for s in seeds:
         res = run_level(c, _brute_attack_until_death_policy, seed=s)
         assert res.outcome == "loss", (
             f"{level} seed={s}: brute must LOSE; got {res.outcome} "
@@ -312,8 +340,7 @@ def test_never_engage_policy_loses(level):
     from openra_bench.eval_core import run_level
     c = compile_level(load_pack(PACK_PATH), level)
-    seeds = (1, 2, 3, 4) if level == "hard" else (1,)
-    for s in seeds:
         res = run_level(c, _never_engage_policy, seed=s)
         assert res.outcome == "loss", (
             f"{level} seed={s}: never-engage must LOSE; got {res.outcome} "
@@ -324,14 +351,15 @@ def test_never_engage_policy_loses(level):
 @pytest.mark.parametrize("level", ["easy", "medium", "hard"])
 def test_intended_engage_then_retreat_wins(level):
     """Intended engage-then-retreat must WIN on every level and every
-    hard seed (1..4): focus-fire e3s, retreat the instant a tank is
-    lost or ≥3 kills observed, end with ≥3 tanks in the safe zone."""
     pytest.importorskip("openra_train")
     from openra_bench.eval_core import run_level
     c = compile_level(load_pack(PACK_PATH), level)
-    seeds = (1, 2, 3, 4) if level == "hard" else (1,)
-    for s in seeds:
         pol = _make_intended_engage_then_retreat()
         res = run_level(c, pol, seed=s)
         assert res.outcome == "win", (

 """combat-retreat-after-engagement — disengage to preserve the force.
+Bar (recalibrated 2026-05-20 after the OpenRA-Rust engine movement
+fixes — moving units fire AND take fire en route, and attack_unit on
+an out-of-sight target paths normally). Those fixes made the close-
+range trade far more lethal: the old 4-5 e3 + 1-2 3tnk enemy was
+unwinnable inside the loss cap (killing the quota required losing too
+many tanks), and — because interrupt mode advances FEWER ticks per
+turn for an action-heavy policy than for a pure stall — the old
+after_ticks:4501 deadline was inert for the intended policy, so a
+non-winning run DREW instead of LOSING. Recalibration: lighter enemy
+(easy 2 e3 + 1 3tnk, medium 3 e3 + 1 3tnk, hard 2 e3 + 1 3tnk; e3 at
+stance:2 Defend so they hold the line and do not chase the retreat),
+deadline pulled down to 4000 (reached by every policy inside
+max_turns).
+The four script-policy proxies, every level, seeds 1-4:
+  • stall (observe only)            → LOSS — kill bar never met; the
+    after_ticks:4000 deadline (reached ~turn 45) fires → real LOSS.
+  • brute attack-until-death        → LOSS — the attack_move column
+    overshoots the e3 line into the set-back 3tnk and is out-traded;
+    loses too many tanks before the bar is met.
+  • never-engage (sit at safe zone) → LOSS — ≥3 tanks survive in the
+    safe zone but the kill bar is never met → after_ticks LOSS.
+  • intended engage-then-retreat    → WIN — march to the engagement
+    line, focus-fire the e3 rocket infantry, and the instant a tank
+    is lost OR any tank's HP drops below a floor pull back to the
+    safe zone. End with ≥3 tanks in the safe zone and the kill bar
+    met.
+The "intended" policy is the spec's load-bearing decision: the
+retreat trigger (HP-floor / tank-lost) is the capability under test —
+too late ⇒ attrition busts the survival bar; never engaging ⇒ the
+kill bar fails.
 """
 from __future__ import annotations
 def test_predicates_easy():
     c = compile_level(load_pack(PACK_PATH), "easy")
+    # 3 tanks back in safe zone (5,20,r=6), killed 2 enemies, 1 lost,
+    # in time → WIN
     home3 = [(5, 18), (5, 20), (5, 21)]
     assert evaluate(c.win_condition, _ctx(home3, tick=3000, killed=2, lost=1))
     # Kill bar unmet (only 1 killed) → not WIN
     # 3 tanks lost (only 1 alive) → fail clause own_units_gte:2 fires
     assert evaluate(c.fail_condition, _ctx([(5, 20)], tick=3000, killed=3, lost=3))
     # Past deadline → real LOSS reachable within max_turns
+    assert evaluate(c.fail_condition, _ctx(home3, tick=4002, killed=0, lost=0))
+    assert 4000 <= 93 + 90 * (c.max_turns - 1), (
+        "easy after_ticks 4000 must be reachable within max_turns"
     )
     # 2 tanks alive ⇒ fail clause fires (preservation cap)
     assert evaluate(c.fail_condition, _ctx(home2, tick=3000, killed=3, lost=2))
     # Past deadline ⇒ real LOSS reachable
+    assert evaluate(c.fail_condition, _ctx(home3, tick=4002, killed=0, lost=0))
+    assert 4000 <= 93 + 90 * (c.max_turns - 1)
 def test_predicates_hard_two_safe_zones():
     c = compile_level(load_pack(PACK_PATH), "hard")
     # NORTH safe zone (5,10) satisfies the any_of geometry
     home_north = [(5, 9), (5, 10), (5, 11)]
+    assert evaluate(c.win_condition, _ctx(home_north, tick=3000, killed=2, lost=1))
     # SOUTH safe zone (5,30) also satisfies the any_of geometry
     home_south = [(5, 29), (5, 30), (5, 31)]
+    assert evaluate(c.win_condition, _ctx(home_south, tick=3000, killed=2, lost=1))
     # Tanks at the WRONG centre (5,20) — outside BOTH safe zones at r=6
     # ((5,20)-(5,10)=10>6 and (5,20)-(5,30)=10>6) → fails the geometry
     assert not evaluate(
         c.win_condition,
+        _ctx([(5, 20), (5, 19), (5, 21)], tick=3000, killed=2, lost=1),
     )
+    # Past the deadline → real LOSS reachable
+    assert evaluate(c.fail_condition, _ctx(home_north, tick=4002, killed=0, lost=0))
+    assert 4000 <= 93 + 90 * (c.max_turns - 1)
 def test_hard_has_two_spawn_point_groups():
 def test_timeout_loss_is_reachable_on_every_level():
+    """No draw degeneracy: the after_ticks:4000 deadline fits inside
+    max_turns on every level (∼90 ticks/turn ⇒ 93 + 90·(max_turns-1)),
+    and — verified by the engine-driven tests below — is actually
+    crossed by every policy, including an action-heavy one running in
+    interrupt mode."""
     pack = load_pack(PACK_PATH)
     for lvl in ("easy", "medium", "hard"):
         c = compile_level(pack, lvl)
+        assert 4000 <= 93 + 90 * (c.max_turns - 1), (
+            f"{lvl}: after_ticks 4000 not reachable within max_turns"
         )
 # The four-policy bar. All engine-driven tests guard on the Rust env
 # wheel; predicate-level tests above run without it.
+# Retreat the instant any tank's HP drops below this floor; ENGAGE_X
+# is the x-line the squad closes to before opening fire.
+RETREAT_HP_FLOOR = 0.5
+ENGAGE_X = 54
 def _stall_policy(rs, Command):
+    """Stall: only observe. Kill bar never met → after_ticks:4000 LOSS."""
     return [Command.observe()]
 def _brute_attack_until_death_policy(rs, Command):
     """Brute: attack_move toward the enemy centre and never retreat.
     The column overshoots the e3 firing line into the set-back 3tnk
+    heavy escort (x=64); the heavy + rocket fire out-trades the column
+    and it loses too many tanks before the bar is met → own_units_gte:N
     fails on every level."""
     units = rs.get("units_summary", []) or []
     if not units:
 def _make_intended_engage_then_retreat():
+    """Intended policy (the spec's load-bearing decision), in three
+    phases driven purely by the per-turn observation — no fragile
+    kill-count inference:
+      1. APPROACH — march all tanks to the engagement line
+         (ENGAGE_X, 20). The home safe-zone latitude is latched from
+         the spawn cell median y on the first observation (north y=10,
+         centre y=20, south y=30).
+      2. ENGAGE — once the squad is at the line, focus-fire the
+         nearest e3 rocket soldier with ALL tanks.
+      3. RETREAT (latched) — the instant a tank is lost OR any tank's
+         HP drops below RETREAT_HP_FLOOR, pull every tank back to the
+         home safe zone and stay there. The HP-floor / tank-lost
+         trigger is the disengage-timing decision under test."""
+    state = {"latched": False, "home_y": None}
     def pol(rs, Command):
         units = rs.get("units_summary", []) or []
         enemies = rs.get("enemy_summary", []) or []
         if not units:
             return [Command.observe()]
+        # Latch the home Y on first observation.
         if state["home_y"] is None:
             ys = sorted(u["cell_y"] for u in units)
             hy_med = ys[len(ys) // 2]
                 state["home_y"] = 20
         hy = state["home_y"]
         n_alive = len(units)
+        min_hp = min((u.get("hp", 1.0) for u in units), default=1.0)
         killable = [
             e
             for e in enemies
             if not e.get("is_building")
             and (e.get("type") or "").lower() != "fact"
         ]
+        e3s = [e for e in killable if (e.get("type") or "").lower() == "e3"]
+        # RETREAT TRIGGER: latched, a tank lost, or any tank below the
+        # HP floor. Once retreating, stay retreating.
+        if state["latched"] or n_alive < 4 or min_hp <= RETREAT_HP_FLOOR:
+            state["latched"] = True
             return [
                 Command.move_units([str(u["id"])], target_x=5, target_y=hy)
                 for u in units
             ]
+        # ENGAGE: once the whole squad is at the line, focus-fire the
+        # nearest e3 with ALL tanks.
+        at_line = all(u["cell_x"] >= ENGAGE_X - 4 for u in units)
+        if e3s and at_line:
             e3s.sort(
+                key=lambda e: (e["cell_x"] - 5) ** 2 + (e["cell_y"] - 20) ** 2
             )
             t = e3s[0]
             return [
                 Command.attack_unit([str(u["id"])], str(t["id"])) for u in units
             ]
+        # APPROACH: advance toward the engagement line at y=20.
         return [
             Command.move_units(
                 [str(u["id"])],
+                target_x=min(ENGAGE_X, u["cell_x"] + 12),
                 target_y=20,
             )
             for u in units
 @pytest.mark.parametrize("level", ["easy", "medium", "hard"])
 def test_stall_policy_loses(level):
+    """Stall must LOSE on every level — kill bar unmet → the
+    after_ticks:4000 deadline fires (real LOSS, never a draw)."""
     pytest.importorskip("openra_train")
     from openra_bench.eval_core import run_level
     c = compile_level(load_pack(PACK_PATH), level)
+    for s in (1, 2, 3, 4):
         res = run_level(c, _stall_policy, seed=s)
         assert res.outcome == "loss", (
             f"{level} seed={s}: stall must LOSE; got {res.outcome} "
 @pytest.mark.parametrize("level", ["easy", "medium", "hard"])
 def test_brute_attack_until_death_loses(level):
+    """Brute attack-until-death must LOSE — the column overshoots the
+    e3 line into the set-back 3tnk and is out-traded before the bar
+    is met."""
     pytest.importorskip("openra_train")
     from openra_bench.eval_core import run_level
     c = compile_level(load_pack(PACK_PATH), level)
+    for s in (1, 2, 3, 4):
         res = run_level(c, _brute_attack_until_death_policy, seed=s)
         assert res.outcome == "loss", (
             f"{level} seed={s}: brute must LOSE; got {res.outcome} "
     from openra_bench.eval_core import run_level
     c = compile_level(load_pack(PACK_PATH), level)
+    for s in (1, 2, 3, 4):
         res = run_level(c, _never_engage_policy, seed=s)
         assert res.outcome == "loss", (
             f"{level} seed={s}: never-engage must LOSE; got {res.outcome} "
 @pytest.mark.parametrize("level", ["easy", "medium", "hard"])
 def test_intended_engage_then_retreat_wins(level):
     """Intended engage-then-retreat must WIN on every level and every
+    seed (1..4): march to the engagement line, focus-fire the e3s,
+    retreat the instant a tank is lost or any tank's HP drops below
+    the floor, end with ≥3 tanks in the safe zone and the kill bar
+    met (recalibrated 2026-05-20: killed 2-3, lost 1)."""
     pytest.importorskip("openra_train")
     from openra_bench.eval_core import run_level
     c = compile_level(load_pack(PACK_PATH), level)
+    for s in (1, 2, 3, 4):
         pol = _make_intended_engage_then_retreat()
         res = run_level(c, pol, seed=s)
         assert res.outcome == "win", (