Spaces:

qpluslab
/

OpenRA-Bench

Running

yxc20098 commited on May 20

Commit

7450894

1 Parent(s): 5cd2d8e

feat(scenario): combat-skirmish-then-disengage — strike then disengage (SC2 skirmisher / military recon-by-fire anchor)

Wave-6 combat-micro pack: ONE coordinated engagement done well — drive
east, score >=3 kills against a slow infantry cluster, then PULL BACK
to the spawn-corner recovery zone before the deadline. Distinct from
combat-harass-balanced-hit-and-run (which is the CYCLIC pulsed variant
with a zero-attrition bar): this pack is one big engagement with a
positional/temporal recovery bar.

Bar (all four-policy proxies, every level + every hard seed 1..4):
* stall (only observe) -> LOSS (kill bar unmet;
jeeps stance:0 so no auto-return-fire; on hard the hunt-bot e1
wipe the idle stack)
* never-engage (park at start) -> LOSS (recovery clause
trivially satisfied but kill bar unmet)
* commit-until-overwhelmed (charge & stay) -> LOSS (kill bar IS met
but jeeps end at the kill site x~50, not in the recovery region
around the spawn corner; region clause fails -> after_ticks LOSS)
* intended skirmish-then-disengage -> WIN on every seed
(kill bar met inside ~14 turns, then disengage to spawn corner
finishes inside the 4500-tick budget)

Win predicate (all levels):
units_killed_gte:3 AND own_units_gte:3 AND
units_in_region_gte:{x:5,y:<spawn>,radius:6,n:3} AND within_ticks:4500
Hard recovery clause is any_of over the two spawn-corner regions
(NORTH (5,10) or SOUTH (5,30)) — agent must return to its OWN start
corner.

Difficulty axis:
easy -> 4x e1 cluster at (50,20), no bot
medium -> 6x e1 cluster (same kill bar; the extra rifles tighten the
commit-and-stay failure mode by stretching the mop-up
window past the disengage budget)
hard -> 6x e1 cluster + bot_type:hunt (active pursuit) + 2 agent
spawn_point groups round-robined by seed (anti-memorisation)

UPGRADED in tests/test_hard_tier.py (>=2 distinct seed-driven spawn
groups verified). 18 scripted-policy tests pass (predicate teeth +
4-policy bar on every level / every hard seed).

Model smoke (Together/Qwen3.6-Plus, medium, seed=1): runs end-to-end,
loss outcome (model played a perception-failure variant — composite
0.2628, action=1.0, weakest=perception). Bar is on scripted policies,
not the model.

benchmark_anchor:
- SC2 skirmisher tactics
- military reconnaissance-by-fire
- harass-and-disengage doctrine
- armoured cavalry doctrine

Files changed (3) hide show

openra_bench/scenarios/packs/combat-skirmish-then-disengage.yaml +292 -0
tests/test_combat_skirmish_then_disengage.py +370 -0
tests/test_hard_tier.py +13 -0

openra_bench/scenarios/packs/combat-skirmish-then-disengage.yaml ADDED Viewed

	@@ -0,0 +1,292 @@

+# combat-skirmish-then-disengage — ONE coordinated strike-then-pull-back
+# (Wave-6 combat-micro pack; complement to combat-harass-balanced-hit-
+# and-run which is the CYCLIC pulsed variant).
+#
+# The capability under test is SKIRMISHER DOCTRINE: a single big
+# engagement done well — drive forward, score the kills, then pull the
+# force back to a recovery zone before being overwhelmed. Unlike
+# combat-harass-balanced (a cycle of small pulses with zero attrition),
+# this pack is ONE coordinated engagement: kills are easy to score; the
+# DISCRIMINATING decision is whether the agent stops fighting and
+# disengages before attrition mounts past the survival bar — and gets
+# the force HOME (recovery region around the start) before the clock.
+#
+# Real-world anchors:
+#   - SC2 skirmisher tactics: a controlled "pull back to base" call
+#     after scoring damage, vs the "commit until dead" anti-pattern.
+#   - Military reconnaissance-by-fire / armoured cavalry doctrine:
+#     probe, score, withdraw to friendly lines for re-supply.
+#   - Fire-and-maneuver doctrine in the SINGLE-engagement frame
+#     (the pulsed/cyclic frame is combat-harass-balanced-hit-and-run).
+#
+# Idiom (the four-policy bar — every level):
+#   • stall (only observe)            → LOSS. Kill bar unmet (≥3 kills
+#     required); the jeeps' default stance does not let them rack up
+#     return-fire kills sitting at home — they must drive east to
+#     find foes inside MG range.
+#   • never-engage (park at start)    → LOSS. units_in_region clause
+#     is satisfied (jeeps already at the recovery zone), but the kill
+#     bar (units_killed_gte:3) is unmet → after_ticks LOSS.
+#   • commit-until-overwhelmed (charge east, never retreat)
+#                                     → LOSS. Kill bar IS met (the
+#     enemy infantry trade poorly vs 4× jeep MG), but the jeeps end
+#     the run at the enemy cluster (~x=50), NOT in the recovery
+#     region around the start. units_in_region_gte:{x:5,y:..,r:6,n:3}
+#     is unmet → after_ticks LOSS. On hard tier additionally the
+#     hunt-bot spawn waves grind down the un-retreating force past
+#     own_units_gte:3.
+#   • intended skirmish-then-disengage (drive east, kill ≥3, then
+#     move_units back to the start) → WIN. All three clauses met
+#     inside the 4500-tick budget.
+#
+# Distinct from combat-harass-balanced-hit-and-run: the BALANCED pack
+# enforces zero attrition across a multi-pulse cycle (the "no loss"
+# bar), and the win is "kill workers without losing raiders". HERE the
+# win is "finish the kills BEFORE you get overwhelmed AND get the force
+# back HOME" — a positional/temporal recovery bar, not an attrition
+# bar. units_in_region_gte is the load-bearing clause that makes
+# disengage required.
+#
+# Engine notes (load-bearing for the bar):
+#   - Jeeps start `stance: 0` (HoldFire). With stance:0 they do NOT
+#     auto-return-fire on approaching enemies — sitting idle while
+#     hunt-bot e1 close in DOES NOT score kills (kill bar unmet).
+#     The only way to score is to explicitly `attack_unit` (or
+#     `attack_move`), which makes the agent's strike decision
+#     load-bearing.
+#   - Enemy `e1` at the mid-x cluster are placed at y=19/y=21 cells
+#     (verified-placement rows per CLAUDE.md — `e1` at some mid-x
+#     cells silently fails to surface; (50,19)/(50,21) are confirmed
+#     working).
+#   - Persistent unarmed enemy `fact` at far east (x=124) prevents the
+#     engine from auto-`done`ing on enemy unit wipe (which would
+#     collapse the run to DRAW before the within_ticks + region
+#     predicates evaluate cleanly on the terminal frame).
+meta:
+  id: combat-skirmish-then-disengage
+  title: 'Combat Skirmish — Strike, Score the Kills, Pull Back to Recovery'
+  capability: action
+  real_world_meaning: >
+    SKIRMISHER doctrine in the single-engagement frame: four fast
+    raiders (jeeps) must drive east into a slow infantry cluster,
+    score AT LEAST 3 kills, and then PULL BACK to the recovery zone
+    around the western start before the clock expires AND while
+    keeping at least 3 raiders alive. The skill under test is the
+    decision to STOP FIGHTING and disengage — committing until the
+    enemy is wiped or until the strike force is destroyed both LOSE
+    (commit leaves the raiders at the kill site instead of the
+    recovery zone; over-commit on hard loses raiders to the
+    hunt-bot spawn waves). Distinct from the BALANCED pulsed
+    harass-retreat cycle (combat-harass-balanced-hit-and-run, which
+    is many small pulses with zero attrition): this pack is ONE big
+    engagement done well, with a positional recovery bar.
+  robotics_analogue: >
+    Mission-with-egress: a mobile manipulator must complete a
+    threshold of reward-bearing actions in a contested workspace,
+    then return to a safe staging region before a time or attrition
+    budget expires. Knowing WHEN to stop the productive sub-task
+    and start the egress is the decision under test — a
+    productivity-only policy (greedy accumulation) leaves the agent
+    far from the staging region at deadline and fails the egress
+    clause.
+  benchmark_anchor:
+    - "SC2 skirmisher tactics"
+    - "military reconnaissance-by-fire"
+    - "harass-and-disengage doctrine"
+    - "armoured cavalry doctrine"
+  author: openra-bench
+base_map: rush-hour-arena
+base:
+  agent: {faction: allies, cash: 0}
+  enemy: {faction: soviet, cash: 0}
+  tools: [move_units, attack_unit, attack_move, stop]
+  planning: true
+  termination: {max_ticks: 6500}
+  actors: []
+levels:
+  # ── EASY ────────────────────────────────────────────────────────────
+  # Bare skirmish-then-disengage skill: 4 jeeps (stance:0, no
+  # auto-fire — kills require explicit attack_unit) vs 4× slow rifle
+  # infantry (e1, stance:0 — no auto-fire either, so a never-engage
+  # policy never scores). The strike force easily scores ≥3 kills
+  # under attack_unit (4× jeep MG vs unanswered rifles); the load-
+  # bearing decision is whether the agent then ORDERS THE RETREAT
+  # back to the recovery zone instead of mopping up the last enemy in
+  # place. stall / never-engage LOSE on the kill bar; commit-until-
+  # wiped LOSES because the jeeps end at x≈50 instead of the
+  # (5,20,r=6) recovery region.
+  easy:
+    description: >
+      Four jeeps stage at the west base (5,20). Four enemy
+      RIFLE INFANTRY (e1, slow) hold a cluster around (50,20). Drive
+      east, kill AT LEAST THREE rifle infantry, then PULL BACK so
+      AT LEAST THREE of your jeeps end inside the recovery zone (a
+      6-cell radius around (5,20) — i.e. your starting region). Keep
+      at least three jeeps alive. Finish before tick 4500. Stalling
+      LOSES (kill bar unmet); never engaging LOSES (kill bar unmet);
+      committing east and staying at the cluster LOSES (your jeeps
+      are at the kill site, not the recovery zone). The discriminator
+      is the DISENGAGE order — stop attacking and move_units back
+      to (5,20) once you have your 3 kills.
+    overrides:
+      actors:
+        # Strike force: 4 jeeps at the western staging point.
+        # stance:0 (HoldFire) — no auto-return-fire, so kills require
+        # an explicit attack_unit / attack_move order (the load-
+        # bearing decision under test).
+        - {type: jeep, owner: agent, position: [5, 19], stance: 0}
+        - {type: jeep, owner: agent, position: [5, 20], stance: 0}
+        - {type: jeep, owner: agent, position: [5, 21], stance: 0}
+        - {type: jeep, owner: agent, position: [6, 20], stance: 0}
+        # Enemy infantry cluster — 4× e1 spread across rows y=19/y=21
+        # (CLAUDE.md confirms y=19/y=21 mid-x cells place reliably).
+        # stance:0 so they sit on post — fair "discoverable cluster"
+        # for the test (a never-engage agent never gets attacked into
+        # an accidental kill).
+        - {type: e1, owner: enemy, position: [48, 19], stance: 0}
+        - {type: e1, owner: enemy, position: [50, 19], stance: 0}
+        - {type: e1, owner: enemy, position: [50, 21], stance: 0}
+        - {type: e1, owner: enemy, position: [52, 21], stance: 0}
+        # Persistent far-east enemy fact — prevents engine auto-done
+        # on enemy wipe collapsing the run to DRAW before the
+        # within_ticks + region predicates evaluate.
+        - {type: fact, owner: enemy, position: [124, 20]}
+    win_condition:
+      all_of:
+        - {units_killed_gte: 3}
+        - {own_units_gte: 3}
+        - {units_in_region_gte: {x: 5, y: 20, radius: 6, n: 3}}
+        - {within_ticks: 4500}
+    fail_condition:
+      any_of:
+        - {after_ticks: 4501}
+        - {not: {own_units_gte: 1}}
+    max_turns: 52
+  # ── MEDIUM ──────────────────────────────────────────────────────────
+  # +1 controlled variable: the enemy cluster grows to 6× e1 (vs 4 on
+  # easy). The kill bar (≥3) is unchanged, so the strike is still
+  # easily achievable — but the larger cluster means a commit-until-
+  # wiped policy spends MORE turns mopping up (more enemies = more
+  # rounds at the cluster), which leaves it even further from being
+  # able to RETREAT before the within_ticks deadline. The discriminator
+  # — "stop attacking after 3 kills and order the disengage" — is
+  # sharper.
+  medium:
+    description: >
+      Four jeeps stage at the west base (5,20). SIX enemy rifle
+      infantry hold a cluster around (50,20). Drive east, kill AT
+      LEAST THREE rifle infantry, then PULL BACK so AT LEAST THREE
+      of your jeeps end inside the recovery zone (6-cell radius
+      around (5,20)). Keep at least three jeeps alive. Finish
+      before tick 4500. With six enemies in the cluster a "commit
+      until everything is dead" policy spends most of the budget
+      mopping up — by the deadline your jeeps are still at the
+      kill site, not the recovery zone, and the run fails on the
+      region clause. Order the DISENGAGE after the third kill and
+      drive west to the recovery zone.
+    overrides:
+      actors:
+        - {type: jeep, owner: agent, position: [5, 19], stance: 0}
+        - {type: jeep, owner: agent, position: [5, 20], stance: 0}
+        - {type: jeep, owner: agent, position: [5, 21], stance: 0}
+        - {type: jeep, owner: agent, position: [6, 20], stance: 0}
+        # 6× e1 cluster around (50,20). Verified-placement rows
+        # (y=19/y=21 mid-x).
+        - {type: e1, owner: enemy, position: [48, 19], stance: 0}
+        - {type: e1, owner: enemy, position: [48, 21], stance: 0}
+        - {type: e1, owner: enemy, position: [50, 19], stance: 0}
+        - {type: e1, owner: enemy, position: [50, 21], stance: 0}
+        - {type: e1, owner: enemy, position: [52, 19], stance: 0}
+        - {type: e1, owner: enemy, position: [52, 21], stance: 0}
+        - {type: fact, owner: enemy, position: [124, 20]}
+    win_condition:
+      all_of:
+        - {units_killed_gte: 3}
+        - {own_units_gte: 3}
+        - {units_in_region_gte: {x: 5, y: 20, radius: 6, n: 3}}
+        - {within_ticks: 4500}
+    fail_condition:
+      any_of:
+        - {after_ticks: 4501}
+        - {not: {own_units_gte: 1}}
+    max_turns: 52
+  # ── HARD ────────────────────────────────────────────────────────────
+  # +2 controlled variables vs medium:
+  #   1. bot_type: hunt — the e1 cluster actively PURSUES the jeeps
+  #      (jeeps remain stance:0 so they only score on explicit
+  #      attack orders; the hunt bot turns the engagement into a
+  #      tightening window — a slow retreat or commit-and-stay loses
+  #      jeeps past own_units_gte:3). Spec's "hunt-bot pursues".
+  #   2. Two agent spawn_point groups (NORTH y=10 or SOUTH y=30)
+  #      round-robined by seed; the recovery zone is `any_of` over the
+  #      two spawn corners so the agent must return to ITS OWN start
+  #      corner (no "always retreat to (5,20)" memorisation). Spec's
+  #      "2 spawn groups".
+  # Enemy actors do NOT honour spawn_point (CLAUDE.md), so the e1
+  # cluster sits symmetrically at the mid-latitude (y=20) — both
+  # spawn corridors face the same eastern threat geometry. The
+  # cluster size stays at 6 (matching medium); the hunt bot is the
+  # threat-axis upgrade, not raw enemy count — extra waves would
+  # overwhelm 4 jeeps before any disengage could complete (verified
+  # 2026-05-20: +4 extra e1 at x≈90 + hunt drops the intended-policy
+  # win rate to ~0% as the swarm closes inside 5 turns).
+  hard:
+    description: >
+      Four jeeps stage at ONE of two western staging points (NORTH
+      (5,10) or SOUTH (5,30), chosen by seed — anti-memorisation).
+      Six enemy RIFLE INFANTRY (e1) sit at a cluster around
+      (50,20). The enemy side is HUNTING — surviving e1 actively
+      pursue your jeeps. Kill AT LEAST THREE rifle infantry, keep
+      at least three jeeps alive, AND end with at least three
+      jeeps inside the recovery zone (6-cell radius around YOUR
+      spawn corner, either (5,10) or (5,30)). Finish before tick
+      4500. Stalling, never engaging, and commit-and-stay all
+      LOSE; the hunt bot ensures that a slow disengage also fails
+      on the survival or region clause.
+    overrides:
+      actors:
+        # spawn_point 0 — NORTH staging (y=10)
+        - {type: jeep, owner: agent, position: [5,  9], stance: 0, spawn_point: 0}
+        - {type: jeep, owner: agent, position: [5, 10], stance: 0, spawn_point: 0}
+        - {type: jeep, owner: agent, position: [5, 11], stance: 0, spawn_point: 0}
+        - {type: jeep, owner: agent, position: [6, 10], stance: 0, spawn_point: 0}
+        # spawn_point 1 — SOUTH staging (y=30)
+        - {type: jeep, owner: agent, position: [5, 29], stance: 0, spawn_point: 1}
+        - {type: jeep, owner: agent, position: [5, 30], stance: 0, spawn_point: 1}
+        - {type: jeep, owner: agent, position: [5, 31], stance: 0, spawn_point: 1}
+        - {type: jeep, owner: agent, position: [6, 30], stance: 0, spawn_point: 1}
+        # 6× e1 cluster at (50,20). Hunt bot gives them stance:3 on
+        # init and issues Attack orders that drive them west toward
+        # the jeeps; the infantry walk to contact takes ~6-8 turns.
+        - {type: e1, owner: enemy, position: [48, 19], stance: 0}
+        - {type: e1, owner: enemy, position: [48, 21], stance: 0}
+        - {type: e1, owner: enemy, position: [50, 19], stance: 0}
+        - {type: e1, owner: enemy, position: [50, 21], stance: 0}
+        - {type: e1, owner: enemy, position: [52, 19], stance: 0}
+        - {type: e1, owner: enemy, position: [52, 21], stance: 0}
+        # Persistent far-east enemy fact.
+        - {type: fact, owner: enemy, position: [124, 20]}
+      enemy: {faction: soviet, cash: 0, bot_type: hunt}
+    # Hard win: recovery zone is `any_of` over the two spawn corners
+    # — the agent must return to ITS OWN start corner. (A wrong-corner
+    # return is geometrically infeasible inside the tick budget, but
+    # encoded for predicate clarity.)
+    win_condition:
+      all_of:
+        - {units_killed_gte: 3}
+        - {own_units_gte: 3}
+        - any_of:
+            - {units_in_region_gte: {x: 5, y: 10, radius: 6, n: 3}}
+            - {units_in_region_gte: {x: 5, y: 30, radius: 6, n: 3}}
+        - {within_ticks: 4500}
+    fail_condition:
+      any_of:
+        - {after_ticks: 4501}
+        - {not: {own_units_gte: 1}}
+    max_turns: 52

tests/test_combat_skirmish_then_disengage.py ADDED Viewed

	@@ -0,0 +1,370 @@

+"""combat-skirmish-then-disengage — ONE coordinated strike-then-pull-back.
+Bar: the intended skirmish-then-disengage policy WINS on every level
+and every hard seed; stall (only observe), never-engage (park at
+start), and commit-until-overwhelmed (charge east and never retreat)
+LOSE on every level. Non-win is a real reachable timeout LOSS (not a
+draw).
+Validation is scripted (no model / network): the four policies below
+are the exhaustive proxies for the four real strategies and exercise
+the predicate teeth directly. The load-bearing decision under test is
+"stop attacking after the kill bar is met and order the disengage
+back to the recovery zone before the deadline".
+"""
+from __future__ import annotations
+from pathlib import Path
+import pytest
+pytest.importorskip("openra_rl_training", reason="Rust env wheel not installed")
+from openra_bench.scenarios import load_pack
+from openra_bench.scenarios.loader import compile_level
+from openra_bench.scenarios.win_conditions import WinContext, evaluate
+PACKS = Path(__file__).parent.parent / "openra_bench" / "scenarios" / "packs"
+PACK_PATH = PACKS / "combat-skirmish-then-disengage.yaml"
+# ── unit-level predicate checks ──────────────────────────────────────
+def _ctx(units_xy=(), tick=1000, killed=0, lost=0):
+    """Synthesize a WinContext for predicate-level checks."""
+    import types
+    sig = types.SimpleNamespace(
+        game_tick=tick,
+        units_killed=killed,
+        units_lost=lost,
+        own_buildings=[],
+        own_building_types=set(),
+        enemies_seen_ids=set(),
+        enemy_buildings_seen_ids=set(),
+    )
+    return WinContext(
+        signals=sig,
+        render_state={
+            "units_summary": [
+                {"cell_x": x, "cell_y": y} for x, y in units_xy
+            ]
+        },
+    )
+def test_predicates_easy_recovery_clause():
+    c = compile_level(load_pack(PACK_PATH), "easy")
+    home = [(5, 20), (5, 20), (5, 20), (5, 20)]
+    cluster = [(50, 20), (50, 20), (50, 20), (50, 20)]
+    mixed_3_home = [(5, 20), (5, 20), (5, 20), (50, 20)]
+    # Intended: 3+ kills, ≥3 alive, ≥3 in recovery → WIN
+    assert evaluate(c.win_condition, _ctx(home, tick=2000, killed=3, lost=0))
+    assert evaluate(c.win_condition, _ctx(mixed_3_home, tick=2000, killed=4, lost=0))
+    # Kill bar met but all units still at the kill site → fail region clause
+    assert not evaluate(c.win_condition, _ctx(cluster, tick=2000, killed=4, lost=0))
+    # 3 kills but only 2 own_units → predicate fails
+    assert not evaluate(c.win_condition, _ctx(home[:2], tick=2000, killed=3, lost=2))
+    # 0 kills → predicate fails even if everyone is at home
+    assert not evaluate(c.win_condition, _ctx(home, tick=2000, killed=0, lost=0))
+    # Past deadline → real loss, reachable within max_turns
+    assert evaluate(c.fail_condition, _ctx(home, tick=4502, killed=0, lost=0))
+    assert 4501 <= 93 + 90 * (c.max_turns - 1), (
+        "after_ticks 4501 must be reachable within max_turns"
+    )
+def test_predicates_medium_same_bar_six_enemies():
+    c = compile_level(load_pack(PACK_PATH), "medium")
+    home = [(5, 20), (5, 20), (5, 20), (5, 20)]
+    cluster = [(50, 20), (50, 20), (50, 20), (50, 20)]
+    # Intended: 3+ kills, ≥3 alive, ≥3 in recovery → WIN
+    assert evaluate(c.win_condition, _ctx(home, tick=3000, killed=3, lost=0))
+    # Commit-and-stay: kill bar met but jeeps at cluster, not home → fail
+    assert not evaluate(c.win_condition, _ctx(cluster, tick=3000, killed=6, lost=0))
+    # Past deadline → real loss, reachable
+    assert evaluate(c.fail_condition, _ctx(home, tick=4502, killed=0, lost=0))
+    assert 4501 <= 93 + 90 * (c.max_turns - 1)
+def test_predicates_hard_any_of_spawn_corner_recovery():
+    c = compile_level(load_pack(PACK_PATH), "hard")
+    home_north = [(5, 10), (5, 10), (5, 10), (5, 10)]
+    home_south = [(5, 30), (5, 30), (5, 30), (5, 30)]
+    mid_lat = [(5, 20), (5, 20), (5, 20), (5, 20)]  # neither corner
+    cluster = [(50, 20), (50, 20), (50, 20), (50, 20)]
+    # Either spawn corner satisfies the any_of recovery clause.
+    assert evaluate(c.win_condition, _ctx(home_north, tick=3000, killed=3, lost=0))
+    assert evaluate(c.win_condition, _ctx(home_south, tick=3000, killed=3, lost=0))
+    # Mid-latitude (y=20) is outside BOTH spawn-corner radii (radius=6
+    # from (5,10) ⇒ y=20 is 10 cells away; same from (5,30)) → fail.
+    assert not evaluate(c.win_condition, _ctx(mid_lat, tick=3000, killed=3, lost=0))
+    # Commit-and-stay at cluster → fail region clause.
+    assert not evaluate(c.win_condition, _ctx(cluster, tick=3000, killed=6, lost=0))
+    # Past deadline → real loss, reachable
+    assert evaluate(c.fail_condition, _ctx(home_north, tick=4502, killed=0, lost=0))
+    assert 4501 <= 93 + 90 * (c.max_turns - 1)
+def test_hard_has_two_spawn_point_groups():
+    """Hard-tier curation contract: ≥2 distinct agent spawn_point
+    groups so the seed round-robins the raider start corner."""
+    c = compile_level(load_pack(PACK_PATH), "hard")
+    groups = {
+        (a.spawn_point if a.spawn_point is not None else 0)
+        for a in c.scenario.actors
+        if a.owner == "agent"
+    }
+    assert len(groups) >= 2, f"hard needs ≥2 spawn_point groups, got {groups}"
+def test_pack_compiles_and_meta_fields_populated():
+    pack = load_pack(PACK_PATH)
+    assert pack.meta.capability == "action"
+    assert pack.meta.id == "combat-skirmish-then-disengage"
+    anchors = pack.meta.benchmark_anchor
+    assert isinstance(anchors, list) and anchors, "benchmark_anchor required"
+    joined = " ".join(anchors).lower()
+    # Anchored to the doctrines the brief calls out: SC2 skirmisher +
+    # military reconnaissance-by-fire / cavalry doctrine.
+    assert "skirmish" in joined
+    assert "recon" in joined or "cavalry" in joined or "disengage" in joined
+    for lvl in ("easy", "medium", "hard"):
+        c = compile_level(pack, lvl)
+        assert c.map_supported
+        assert c.win_condition is not None and c.fail_condition is not None
+def test_timeout_loss_is_reachable_on_every_level():
+    """No draw degeneracy: after_ticks 4501 fits inside max_turns on
+    every level (∼90 ticks/turn ⇒ 93 + 90·(max_turns-1))."""
+    pack = load_pack(PACK_PATH)
+    for lvl in ("easy", "medium", "hard"):
+        c = compile_level(pack, lvl)
+        assert 4501 <= 93 + 90 * (c.max_turns - 1), lvl
+# ── engine-driven scripted policies ──────────────────────────────────
+#
+# Stage thresholds for the intended policy:
+#   PHASE_STRIKE_UNTIL_TICK = 1300 — drive east, attack_unit any
+#     visible e1 until this tick (~14 turns @ ~90 ticks/turn). At
+#     this point the kill bar (≥3) is comfortably met and the
+#     disengage budget (4500 - 1300 = ~3200 ticks ≈ 35 turns) is
+#     more than enough to march back to (5,20)-ish.
+#   APPROACH_STEP = 15  cells / turn east toward the cluster axis
+#     (jeep ~50% faster than infantry).
+PHASE_STRIKE_UNTIL_TICK = 1300
+APPROACH_STEP = 15
+APPROACH_LIMIT_X = 46  # don't overrun the cluster (~x=50)
+def _e1_foes(enemies):
+    return [
+        e for e in enemies
+        if (e.get("type") or "").lower() == "e1"
+        and not e.get("is_building")
+    ]
+def _stall_policy(rs, Command):
+    """Stall: only observe. Kill bar never met (jeeps are stance:0;
+    no auto-return-fire) → LOSS on the clock; on hard the hunt-bot
+    e1 close on the idle stack and wipe it → LOSS on
+    `not own_units_gte:1`."""
+    return [Command.observe()]
+def _never_engage_policy(rs, Command):
+    """Park at the start; never move east, never fire. Recovery
+    region clause is trivially satisfied but the kill bar is unmet
+    → LOSS on the clock (easy/medium) or LOSS on hard when hunt-bot
+    e1 wipe the idle stack."""
+    units = rs.get("units_summary", []) or []
+    if not units:
+        return [Command.observe()]
+    cmds = []
+    for u in units:
+        cmds.append(
+            Command.move_units(
+                [str(u["id"])], target_x=u["cell_x"], target_y=u["cell_y"]
+            )
+        )
+    return cmds
+def _commit_until_overwhelmed_policy(rs, Command):
+    """Charge east; attack_unit any visible foe; never retreat. The
+    kill bar IS met (4× jeep MG vs stance:0 rifles), but the jeeps
+    end the run sitting at the kill site (~x=50), not in the
+    recovery region. The region clause fails → after_ticks LOSS.
+    """
+    units = rs.get("units_summary", []) or []
+    enemies = rs.get("enemy_summary", []) or []
+    if not units:
+        return [Command.observe()]
+    foes = _e1_foes(enemies)
+    cmds = []
+    for u in units:
+        ux, uy = u["cell_x"], u["cell_y"]
+        if foes:
+            t = min(
+                foes,
+                key=lambda e: (e["cell_x"] - ux) ** 2 + (e["cell_y"] - uy) ** 2,
+            )
+            cmds.append(Command.attack_unit([str(u["id"])], str(t["id"])))
+        else:
+            # March east to the cluster axis but STOP at the cluster
+            # (don't overrun to the far-east fact and trip auto-done).
+            cmds.append(
+                Command.move_units(
+                    [str(u["id"])], target_x=min(50, ux + 12), target_y=uy
+                )
+            )
+    return cmds
+def _intended_skirmish_then_disengage_policy(rs, Command):
+    """Intended skirmisher cycle:
+      - PHASE 1 (tick < PHASE_STRIKE_UNTIL_TICK): drive east, attack_unit
+        any visible e1.
+      - PHASE 2 (tick >= PHASE_STRIKE_UNTIL_TICK): stop attacking; order
+        move_units back to the nearest spawn corner — the RECOVERY zone.
+    The phase switch is the spec's load-bearing decision: "stop
+    fighting and pull back" before the deadline.
+    """
+    units = rs.get("units_summary", []) or []
+    enemies = rs.get("enemy_summary", []) or []
+    tick = rs.get("game_tick") or 0
+    if not units:
+        return [Command.observe()]
+    foes = _e1_foes(enemies)
+    # Pick the nearest spawn-corner candidate as the recovery target
+    # (stateless — works for both single-corner and any_of-corner
+    # recovery clauses).
+    candidates = [(5, 20), (5, 10), (5, 30)]
+    cx = sum(u["cell_x"] for u in units) / len(units)
+    cy = sum(u["cell_y"] for u in units) / len(units)
+    home = min(
+        candidates, key=lambda p: (p[0] - cx) ** 2 + (p[1] - cy) ** 2
+    )
+    cmds = []
+    if tick < PHASE_STRIKE_UNTIL_TICK:
+        if foes:
+            for u in units:
+                ux, uy = u["cell_x"], u["cell_y"]
+                t = min(
+                    foes,
+                    key=lambda e: (e["cell_x"] - ux) ** 2
+                    + (e["cell_y"] - uy) ** 2,
+                )
+                cmds.append(
+                    Command.attack_unit([str(u["id"])], str(t["id"]))
+                )
+        else:
+            # No foes in sight yet — drive east toward the cluster
+            # axis. Cap at APPROACH_LIMIT_X so the strike force
+            # doesn't overrun past the cluster.
+            for u in units:
+                ux, uy = u["cell_x"], u["cell_y"]
+                cmds.append(
+                    Command.move_units(
+                        [str(u["id"])],
+                        target_x=min(APPROACH_LIMIT_X, ux + APPROACH_STEP),
+                        target_y=uy,
+                    )
+                )
+    else:
+        # PHASE 2: PULL BACK. Stop fighting; drive home.
+        for u in units:
+            cmds.append(
+                Command.move_units(
+                    [str(u["id"])], target_x=home[0], target_y=home[1]
+                )
+            )
+    return cmds
+# ── policy bar tests ────────────────────────────────────────────────
+@pytest.mark.parametrize("level", ["easy", "medium", "hard"])
+def test_stall_loses(level):
+    """Stall must LOSE on every level: jeeps are stance:0 so they
+    never return fire (kill bar unmet); on hard the hunt-bot e1
+    close on the idle stack and trip `not own_units_gte:1`."""
+    pytest.importorskip("openra_train")
+    from openra_bench.eval_core import run_level
+    c = compile_level(load_pack(PACK_PATH), level)
+    seeds = (1, 2, 3, 4) if level == "hard" else (1,)
+    for s in seeds:
+        res = run_level(c, _stall_policy, seed=s)
+        assert res.outcome == "loss", (
+            f"{level} seed={s}: stall must LOSE, got {res.outcome} "
+            f"killed={res.signals.units_killed} lost={res.signals.units_lost}"
+        )
+@pytest.mark.parametrize("level", ["easy", "medium", "hard"])
+def test_never_engage_loses(level):
+    """Park-at-start must LOSE: kill bar unmet; on hard hunt-bot e1
+    wipe the idle stack."""
+    pytest.importorskip("openra_train")
+    from openra_bench.eval_core import run_level
+    c = compile_level(load_pack(PACK_PATH), level)
+    seeds = (1, 2, 3, 4) if level == "hard" else (1,)
+    for s in seeds:
+        res = run_level(c, _never_engage_policy, seed=s)
+        assert res.outcome == "loss", (
+            f"{level} seed={s}: never-engage must LOSE, got {res.outcome} "
+            f"killed={res.signals.units_killed} lost={res.signals.units_lost}"
+        )
+@pytest.mark.parametrize("level", ["easy", "medium", "hard"])
+def test_commit_until_overwhelmed_loses(level):
+    """Commit-and-stay at the cluster must LOSE on every level: the
+    kill bar IS met but the jeeps end the run at the kill site
+    (~x=50), not the recovery region around the start. The region
+    clause fails → after_ticks LOSS."""
+    pytest.importorskip("openra_train")
+    from openra_bench.eval_core import run_level
+    c = compile_level(load_pack(PACK_PATH), level)
+    seeds = (1, 2, 3, 4) if level == "hard" else (1,)
+    for s in seeds:
+        res = run_level(c, _commit_until_overwhelmed_policy, seed=s)
+        assert res.outcome == "loss", (
+            f"{level} seed={s}: commit-and-stay must LOSE, got "
+            f"{res.outcome} killed={res.signals.units_killed} "
+            f"lost={res.signals.units_lost}"
+        )
+@pytest.mark.parametrize("level", ["easy", "medium", "hard"])
+def test_intended_skirmish_then_disengage_wins(level):
+    """Intended skirmisher (strike phase → disengage phase) must
+    WIN on every level and every hard seed: kill bar met, ≥3 jeeps
+    alive, ≥3 jeeps inside the spawn-corner recovery region, all
+    inside the 4500-tick budget."""
+    pytest.importorskip("openra_train")
+    from openra_bench.eval_core import run_level
+    c = compile_level(load_pack(PACK_PATH), level)
+    seeds = (1, 2, 3, 4) if level == "hard" else (1,)
+    for s in seeds:
+        res = run_level(
+            c, _intended_skirmish_then_disengage_policy, seed=s
+        )
+        assert res.outcome == "win", (
+            f"{level} seed={s}: intended skirmish-then-disengage should "
+            f"WIN, got {res.outcome} after {res.turns} turns "
+            f"(killed={res.signals.units_killed}, "
+            f"lost={res.signals.units_lost})"
+        )

tests/test_hard_tier.py CHANGED Viewed

@@ -200,6 +200,19 @@ UPGRADED = [
     # flips per seed and no memorised "retreat west on y=20" opening
     # generalises.
     "combat-kite-jeep-vs-tank",
     # Wave-4 Group B TURTLE node of the expansion triple (SC2 fortress
     # macro / 1-base mass-defence; military fortress doctrine; risk-
     # averse single-market deep-investment anchor). Hard tier defines

     # flips per seed and no memorised "retreat west on y=20" opening
     # generalises.
     "combat-kite-jeep-vs-tank",
+    # Wave-6 combat-micro skirmish pack (SC2 skirmisher tactics /
+    # military reconnaissance-by-fire anchor). One coordinated
+    # strike-then-pull-back; the load-bearing decision is "stop
+    # attacking after the kill bar is met and order the disengage
+    # back to the spawn-corner recovery zone before the deadline".
+    # Hard tier defines two agent spawn_point groups (NORTH (5,10)
+    # vs SOUTH (5,30)) round-robined by seed; the recovery clause is
+    # `any_of` over the two spawn-corner regions so the agent must
+    # return to ITS OWN start corner. Hunt-bot pursuit (e1 cluster
+    # attacks-anything) makes a slow-disengage policy also LOSE on
+    # the survival bar — the "stop fighting and pull back" call is
+    # mandatory on every seed.
+    "combat-skirmish-then-disengage",
     # Wave-4 Group B TURTLE node of the expansion triple (SC2 fortress
     # macro / 1-base mass-defence; military fortress doctrine; risk-
     # averse single-market deep-investment anchor). Hard tier defines