Spaces:

qpluslab
/

OpenRA-Bench

Running

yxc20098 commited on May 21

Commit

7604738

1 Parent(s): f493054

feat(scenario): combat-kite-and-pull — hit-and-pull kiting micro vs a slow heavy (SC2 kiting micro)

ACTION pack: fast 2tnk raiders must strike-then-PULL a hunting 3tnk
heavy — fire at range, retreat out of the heavy's lethal close-range
window, repeat. Stand-and-fight, brute, and stall all LOSE; only the
move-away + attack_unit kite cycle WINS. Medium/hard tighten the bar
to a perfect pull (all three raiders survive). Hard defines two
seed-driven spawn_point corridors.

Files changed (3) hide show

openra_bench/scenarios/packs/combat-kite-and-pull.yaml +254 -0
tests/test_combat_kite_and_pull.py +319 -0
tests/test_hard_tier.py +1 -0

openra_bench/scenarios/packs/combat-kite-and-pull.yaml ADDED Viewed

	@@ -0,0 +1,254 @@

+# combat-kite-and-pull — ACTION, kiting micro: hit-and-PULL a slow
+# heavy enemy with a fast light strike force (Wave-12).
+#
+# The capability under test is the STRIKE-then-PULL cycle: a fast
+# light unit closes to weapon range, FIRES, then retreats out of the
+# heavy's lethal close-range window BEFORE the heavy can fire back —
+# and repeats. The heavy out-trades the light force head-on, so
+# standing and fighting LOSES; the light force's speed advantage is
+# the only edge, and it only pays off if the agent strings together
+# the move-away + attack_unit cycle every turn.
+#
+# Real-world anchors:
+#   • SC2 kiting micro (vulture/muta vs marines, stalker-vs-zealot):
+#     fast unit fires, steps back before the slower foe closes,
+#     re-engages. The "blink-back" / micro-dance.
+#   • Cavalry skirmish doctrine: light cavalry charges to contact,
+#     looses a volley, then WHEELS AWAY before the heavier line can
+#     engage — fire-and-maneuver, never a sustained melee.
+#   • Economy-of-force: a small mobile force defeats a heavier,
+#     concentrated foe by exploiting a mobility asymmetry rather
+#     than by mass.
+#
+# Distinct from `combat-kite-jeep-vs-tank` (Wave-4): that pack frames
+# the trade as "preserve ≥2 of 3"; this pack tightens the bar to a
+# perfect PULL — medium and hard require ALL THREE raiders to survive
+# (`own_units_gte:3` — a sloppy kite that trades even one raider for
+# the heavy LOSES) and carries an explicit "no-disengage" brute /
+# wrong-path policy in its four-policy bar. The shared idiom —
+# diagonal-lag geometry, hunt-bot heavy, the move-away + attack_unit
+# cycle — is the proven engine-realised kite test
+# (`combat-kite-jeep-vs-tank` medium/hard).
+#
+# Engine-realised pairing note: in OpenRA-Rust the literal jeep MG is
+# anti-infantry and does not dent heavy armour (engine weapons
+# table), so the fast "raider" is the allied medium tank 2tnk
+# (faster than the soviet 3tnk heavy, and its cannon CAN damage heavy
+# armour). The capability under test is the kite-and-pull cycle —
+# the unit pairing is the vehicle for that test, not the point.
+#
+# The four-policy bar (CLAUDE.md "no defect, no cheat, no draw"):
+#   • stall (observe only)            → LOSS. The raiders are
+#     stance:1 (ReturnFire) — they auto-return fire ONLY after the
+#     heavy shoots them, but a passive stack that never kites is
+#     out-traded and overrun by the closing heavy force → the
+#     survival bar fails and/or the `after_ticks` deadline bites.
+#   • stand-and-fight (attack_move onto the heavy, never retreat)
+#     → LOSS. The heavy cannon out-trades the raider stack head-on;
+#     the raiders die before the heavy's HP runs out → the survival
+#     bar (own_units_gte) fails.
+#   • brute / wrong-path (one attack_move order far east, chase the
+#     heavy with no disengage) → LOSS. Same close-range trade as
+#     stand-and-fight; no kite cycle ⇒ the raiders are overrun.
+#   • intended kite-and-pull (each turn: if the heavy is within ~6
+#     cells, MOVE the raiders AWAY along the lane; else attack_unit
+#     the heavy; repeat) → WIN. The speed advantage keeps the heavy
+#     at the edge of the raiders' fire envelope, whittling it down
+#     across fire-then-retreat cycles while preserving the survival
+#     bar.
+#
+# Topology (rush-hour-arena, 128×40, playable x 2..126, y 2..38):
+#   • Raiders stage centre-west, spread across three cells (not
+#     stacked — a stack pin-piles in retreat).
+#   • The heavy starts centre-east on the MID latitude (y=20) under
+#     the hunt bot so it pursues the raiders' centroid — the agent
+#     must out-pace it, and the hunt advance is what brings the
+#     heavy into vision so the kite cycle has a target.
+#   • Raiders stage OFF the heavy's latitude (a north corridor on
+#     easy/medium) so the kite cycle has a reactive y-axis window.
+#   • Persistent unarmed enemy `fact` far east keeps the episode
+#     alive past the heavy's death so the win/fail evaluator runs
+#     (CLAUDE.md auto-done footgun).
+#
+# Validate (no model / no network):
+#   python3 -m pytest tests/test_combat_kite_and_pull.py -q
+meta:
+  id: combat-kite-and-pull
+  title: 'Combat Micro — Kite and Pull a Slow Heavy Force'
+  capability: action
+  real_world_meaning: >
+    A fast light strike force must destroy a slower, heavier enemy
+    that out-trades it head-on. The only winning play is the
+    hit-and-PULL cycle: each turn, strike the heavy at weapon range,
+    then RETREAT the strike force out of the heavy's lethal
+    close-range window before it can fire back — and repeat. Standing
+    and fighting LOSES: the heavy cannon collapses the light force's
+    HP before its own runs out. The skill being measured is combat
+    micro under a mobility asymmetry — exploit the speed edge by
+    stringing together move-away + attack cycles instead of issuing
+    one beeline charge.
+  robotics_analogue: >
+    A fast/light agent team defeating a slow/heavy adversary by
+    exploiting a mobility asymmetry: a closed-loop evade-then-engage
+    policy rather than a one-shot commit. The per-turn decision is
+    proximity control — stay outside the adversary's lethal radius
+    while delivering effect at standoff range.
+  benchmark_anchor:
+    - "SC2 kiting micro"
+    - "cavalry skirmish doctrine"
+    - "military fire-and-maneuver doctrine"
+    - "economy-of-force"
+  author: openra-bench
+base_map: rush-hour-arena
+base:
+  agent: {faction: allies, cash: 0}
+  # `hunt` bot: the heavy actively PURSUES the raiders' centroid so
+  # the engagement starts on contact (no fog-blind opening) and the
+  # kite cycle has a moving target to pull. A stance:2 heavy left
+  # idle in the fog would never be discoverable; the hunt advance is
+  # what makes the heavy visible AND what the agent must out-pace.
+  enemy: {faction: soviet, cash: 0, bot_type: hunt}
+  tools: [observe, move_units, attack_unit, attack_move, stop]
+  planning: true
+  termination: {max_ticks: 7000}
+  actors: []
+levels:
+  # ── EASY ────────────────────────────────────────────────────────
+  # Bare kite-and-pull skill: 3 medium-tank raiders vs ONE heavy
+  # (3tnk). Raiders stage off the heavy's latitude (north corridor
+  # y=10) so the kite cycle has a reactive window. Survival bar ≥2
+  # raiders. Stall LOSES (HoldFire raiders never engage → kill bar
+  # unmet → deadline). Stand-and-fight / brute LOSE (the heavy
+  # cannon out-trades the stack head-on).
+  easy:
+    description: >
+      Three fast medium-tank raiders (2tnk) stage at the centre-west
+      north corridor (y=10). ONE enemy heavy tank (3tnk) sits
+      centre-east on the mid latitude (x≈80, y=20). The heavy
+      out-trades your raiders at close range — standing and fighting
+      LOSES. The only winning play is the kite-and-PULL cycle: each
+      turn, if the heavy has closed within ~6 cells, MOVE your
+      raiders a few cells AWAY along the lane; otherwise attack_unit
+      the heavy from range; repeat. Kill the heavy and keep at least
+      TWO raiders alive before tick 4500. Stall (observe only) LOSES
+      — your raiders hold fire and never engage. Standing still or
+      bruting east with no disengage LOSES — the heavy cannon
+      collapses the stack.
+    overrides:
+      actors:
+        # RAIDERS — 3 medium tanks, stance:1 (ReturnFire): they
+        # return fire when shot but never open an engagement or
+        # advance on their own, so the agent must drive the kite
+        # cycle. Spread across three cells (a stack pin-piles in
+        # retreat).
+        - {type: 2tnk, owner: agent, position: [28,  9], stance: 1}
+        - {type: 2tnk, owner: agent, position: [30, 10], stance: 1}
+        - {type: 2tnk, owner: agent, position: [28, 11], stance: 1}
+        # THE HEAVY — soviet 3tnk under the hunt bot: it pursues
+        # the raiders' centroid. Mid latitude staging.
+        - {type: 3tnk, owner: enemy, position: [80, 20], stance: 2}
+        # Persistent unarmed far-east enemy marker — anti-DRAW.
+        - {type: fact, owner: enemy, position: [124, 20]}
+    win_condition:
+      all_of:
+        - {units_killed_gte: 1}
+        - {own_units_gte: 2}
+        - {within_ticks: 4500}
+    fail_condition:
+      any_of:
+        - {after_ticks: 4501}
+        - {not: {own_units_gte: 2}}
+    max_turns: 51
+  # ── MEDIUM ──────────────────────────────────────────────────────
+  # +1 controlled variable: the survival bar tightens to ALL THREE
+  # raiders (`own_units_gte:3`). The kite-and-pull must be PERFECT —
+  # a sloppy cycle that lets the heavy land even one cannon shot
+  # trades a raider and busts the bar. Same single-heavy diagonal-lag
+  # geometry as easy (two heavies are unkiteable by a 3-raider force
+  # in the engine combat sheet — verified — so the medium escalation
+  # is bar tightness, not enemy count).
+  medium:
+    description: >
+      Three medium-tank raiders (2tnk) stage at the centre-west
+      north corridor (y=10). ONE enemy heavy tank (3tnk) starts
+      centre-east on the mid latitude (x≈80, y=20) and HUNTS your
+      raiders. The heavy out-trades your raiders head-on. Win by
+      kiting: each turn, if the heavy is within ~6 cells MOVE your
+      raiders AWAY along the lane, else attack_unit the heavy;
+      repeat. Kill the heavy and keep ALL THREE raiders alive
+      before tick 4500 — a kite that lets the heavy land even one
+      cannon shot trades a raider and LOSES. Stall, stand-and-fight,
+      and brute attack_move all LOSE.
+    overrides:
+      actors:
+        - {type: 2tnk, owner: agent, position: [28,  9], stance: 1}
+        - {type: 2tnk, owner: agent, position: [30, 10], stance: 1}
+        - {type: 2tnk, owner: agent, position: [28, 11], stance: 1}
+        # ONE heavy on the mid latitude.
+        - {type: 3tnk, owner: enemy, position: [80, 20], stance: 2}
+        - {type: fact, owner: enemy, position: [124, 20]}
+    win_condition:
+      all_of:
+        - {units_killed_gte: 1}
+        - {own_units_gte: 3}
+        - {within_ticks: 4500}
+    fail_condition:
+      any_of:
+        - {after_ticks: 4501}
+        - {not: {own_units_gte: 3}}
+    max_turns: 51
+  # ── HARD ────────────────────────────────────────────────────────
+  # +2 controlled variables vs medium:
+  #   1. Tighter deadline (~3600 ticks) — the kite cadence must be
+  #      efficient: dawdle and the clock LOSES.
+  #   2. TWO agent spawn_point groups (NORTH y=10 / SOUTH y=30
+  #      corridor) round-robined by seed, so the pull vector flips
+  #      per seed and a memorised "always retreat on y=10" opening
+  #      cannot generalise. The heavy sits on the mid latitude
+  #      (y=20) between the corridors so both spawns face a
+  #      symmetric engagement geometry. The all-three survival bar
+  #      carries over from medium.
+  hard:
+    description: >
+      Three medium-tank raiders (2tnk) stage at ONE of two
+      centre-west corridors (NORTH y=10 OR SOUTH y=30, chosen by
+      seed). ONE enemy heavy tank (3tnk) starts centre-east on the
+      mid latitude (y=20) between the two corridors and HUNTS your
+      raiders. The heavy out-trades your raiders head-on; the only
+      winning play is the kite-and-PULL cycle — when the heavy
+      closes within ~6 cells MOVE your raiders AWAY along your lane,
+      else attack_unit the heavy; repeat. Kill the heavy and keep
+      ALL THREE raiders alive before tick 3600. Stall,
+      stand-and-fight, and brute attack_move all LOSE. The start
+      corridor varies by seed so a memorised opening cannot
+      generalise.
+    overrides:
+      actors:
+        # spawn_point 0 — NORTH corridor (y=10)
+        - {type: 2tnk, owner: agent, position: [28,  9], stance: 1, spawn_point: 0}
+        - {type: 2tnk, owner: agent, position: [30, 10], stance: 1, spawn_point: 0}
+        - {type: 2tnk, owner: agent, position: [28, 11], stance: 1, spawn_point: 0}
+        # spawn_point 1 — SOUTH corridor (y=30)
+        - {type: 2tnk, owner: agent, position: [28, 29], stance: 1, spawn_point: 1}
+        - {type: 2tnk, owner: agent, position: [30, 30], stance: 1, spawn_point: 1}
+        - {type: 2tnk, owner: agent, position: [28, 31], stance: 1, spawn_point: 1}
+        # One heavy centred on the mid latitude — symmetric
+        # engagement geometry from either spawn corridor.
+        - {type: 3tnk, owner: enemy, position: [80, 20], stance: 2}
+        - {type: fact, owner: enemy, position: [124, 20]}
+    win_condition:
+      all_of:
+        - {units_killed_gte: 1}
+        - {own_units_gte: 3}
+        - {within_ticks: 3600}
+    fail_condition:
+      any_of:
+        - {after_ticks: 3601}
+        - {not: {own_units_gte: 3}}
+    max_turns: 41

tests/test_combat_kite_and_pull.py ADDED Viewed

	@@ -0,0 +1,319 @@

+"""combat-kite-and-pull — ACTION capability validation.
+Kiting micro: a fast light strike force must hit-and-PULL a slow
+heavy enemy — strike at weapon range, retreat out of the heavy's
+lethal close-range window before it can fire back, repeat. Standing
+and fighting LOSES (the heavy cannon out-trades the raider stack
+head-on); only the move-away + attack_unit kite cycle WINS.
+Bar (CLAUDE.md "no defect, no cheat, no draw"):
+  * stall (observe-only) LOSES every tier / every hard seed — a
+    passive ReturnFire stack that never kites is overrun by the
+    hunting heavy → the survival bar fails / the deadline bites.
+  * stand-and-fight (attack_move onto the heavy, never retreat)
+    LOSES every tier / seed — the heavy cannon collapses the stack
+    head-on.
+  * brute / wrong-path (one attack_move far east, no disengage)
+    LOSES every tier / seed — same close-range trade.
+  * intended kite-and-pull (retreat when the heavy closes within
+    ~7 cells, else attack_unit) WINS every tier / every hard seed,
+    preserving ALL THREE raiders (own_units_gte:3 on medium/hard).
+  * hard tier defines ≥2 agent spawn_point groups (NORTH y=10 /
+    SOUTH y=30 corridor) round-robined by seed so a memorised
+    opening cannot generalise.
+"""
+from __future__ import annotations
+from pathlib import Path
+import pytest
+pytest.importorskip("openra_train", reason="Rust env wheel not installed")
+pytest.importorskip("openra_rl_training", reason="Rust env wheel not installed")
+from openra_bench.eval_core import run_level
+from openra_bench.scenarios import load_pack
+from openra_bench.scenarios.loader import PACKS_DIR, compile_level
+from openra_bench.scenarios.win_conditions import WinContext, evaluate
+PACK = PACKS_DIR / "combat-kite-and-pull.yaml"
+LEVELS = ("easy", "medium", "hard")
+SEEDS = (1, 2, 3, 4)
+# ── scripted policies ───────────────────────────────────────────────
+def _raiders(rs):
+    return [u for u in rs.get("units_summary", []) if u.get("type") == "2tnk"]
+def _stall(rs, C):
+    """Observe-only. A passive ReturnFire stack that never kites is
+    overrun by the hunting heavy → LOSS."""
+    return [C.observe()]
+def _stand(rs, C):
+    """Stand-and-fight: attack_move straight onto the heavy and never
+    retreat. The heavy cannon out-trades the stack head-on → LOSS."""
+    own = _raiders(rs)
+    if not own:
+        return [C.observe()]
+    return [C.attack_move([str(u["id"]) for u in own], target_x=81, target_y=20)]
+def _brute(rs, C):
+    """Brute / wrong-path: one attack_move far east, no disengage.
+    Same close-range trade as stand-and-fight → LOSS."""
+    own = _raiders(rs)
+    if not own:
+        return [C.observe()]
+    return [
+        C.attack_move(
+            [str(u["id"]) for u in own], target_x=120, target_y=own[0]["cell_y"]
+        )
+    ]
+def _kite(rs, C):
+    """Intended kite-and-pull: each turn, if the heavy has closed
+    within ~7 cells of a raider, MOVE that raider ~10 cells AWAY
+    along its lane (the PULL); otherwise attack_unit the heavy from
+    range (the STRIKE). The cycle is purely reactive — derived each
+    turn from geometry, no memory."""
+    own = _raiders(rs)
+    if not own:
+        return [C.observe()]
+    enemies = rs.get("enemy_summary") or []
+    heavies = [e for e in enemies if (e.get("type") or "").lower() == "3tnk"]
+    cmds = []
+    if heavies:
+        for u in own:
+            t = min(
+                heavies,
+                key=lambda e: abs(e["cell_x"] - u["cell_x"])
+                + abs(e["cell_y"] - u["cell_y"]),
+            )
+            d = abs(u["cell_x"] - t["cell_x"]) + abs(u["cell_y"] - t["cell_y"])
+            if d <= 7:
+                cmds.append(
+                    C.move_units(
+                        [str(u["id"])],
+                        target_x=max(4, u["cell_x"] - 10),
+                        target_y=u["cell_y"],
+                    )
+                )
+            else:
+                cmds.append(C.attack_unit([str(u["id"])], str(t["id"])))
+    else:
+        # No vision yet — march east on the staging lane until the
+        # hunting heavy comes into sight.
+        cmds.append(
+            C.move_units(
+                [str(u["id"]) for u in own],
+                target_x=min(70, own[0]["cell_x"] + 10),
+                target_y=own[0]["cell_y"],
+            )
+        )
+    return cmds
+# ── structural tests ────────────────────────────────────────────────
+def test_pack_loads_and_meta_action():
+    pack = load_pack(PACK)
+    assert pack.meta.id == "combat-kite-and-pull"
+    assert pack.meta.capability == "action"
+    assert pack.meta.real_world_meaning
+    assert pack.meta.robotics_analogue
+    anchors = " ".join(pack.meta.benchmark_anchor).lower()
+    assert "sc2 kiting micro" in anchors, anchors
+    assert "cavalry skirmish doctrine" in anchors, anchors
+def test_enemy_uses_hunt_bot_on_every_level():
+    """The heavy must HUNT — a stance:2 heavy idle in fog would never
+    be discoverable; the hunt advance brings it into vision."""
+    pack = load_pack(PACK)
+    for lvl in LEVELS:
+        c = compile_level(pack, lvl)
+        assert c.map_supported, f"{lvl}: rush-hour-arena terrain required"
+        enemy = c.scenario.enemy
+        bot = getattr(enemy, "bot_type", None) or getattr(enemy, "bot", None)
+        assert str(bot).lower() == "hunt", f"{lvl}: enemy bot must be 'hunt'; got {bot}"
+def test_tools_are_combat_only():
+    pack = load_pack(PACK)
+    tools = set(pack.base.get("tools", []) if isinstance(pack.base, dict) else [])
+    for required in ("move_units", "attack_unit", "attack_move", "stop"):
+        assert required in tools, f"missing tool: {required!r}"
+    assert "build" not in tools, "this is a combat-micro pack — no build tool"
+def test_every_level_has_reachable_timeout_fail():
+    """`after_ticks` fail must bite within max_turns; within_ticks+1
+    == after_ticks so a boundary non-finisher LOSES, not draws."""
+    pack = load_pack(PACK)
+    for lvl in LEVELS:
+        L = pack.levels[lvl]
+        ceiling = 93 + 90 * (L.max_turns - 1)
+        wt = next(
+            int(c["within_ticks"])
+            for c in L.win_condition.model_dump()["all_of"]
+            if "within_ticks" in c
+        )
+        ft = next(
+            int(c["after_ticks"])
+            for c in L.fail_condition.model_dump()["any_of"]
+            if "after_ticks" in c
+        )
+        assert wt < ceiling, f"{lvl}: within_ticks {wt} >= ceiling {ceiling}"
+        assert ft <= ceiling, f"{lvl}: after_ticks {ft} > ceiling {ceiling}"
+        assert wt + 1 == ft, f"{lvl}: within/after mismatch {wt}/{ft}"
+def test_every_level_has_a_fail_condition():
+    pack = load_pack(PACK)
+    for lvl in LEVELS:
+        c = compile_level(pack, lvl)
+        assert c.fail_condition is not None, f"{lvl} needs a fail_condition"
+def test_medium_and_hard_require_all_three_raiders():
+    """The tightened pull bar: medium/hard win only if ALL THREE
+    raiders survive (own_units_gte:3)."""
+    pack = load_pack(PACK)
+    for lvl in ("medium", "hard"):
+        L = pack.levels[lvl]
+        bar = next(
+            int(c["own_units_gte"])
+            for c in L.win_condition.model_dump()["all_of"]
+            if "own_units_gte" in c
+        )
+        assert bar == 3, f"{lvl}: survival bar must be 3; got {bar}"
+def test_hard_has_two_seed_driven_spawn_groups():
+    c = compile_level(load_pack(PACK), "hard")
+    sp = {
+        (a.spawn_point if a.spawn_point is not None else 0)
+        for a in c.scenario.actors
+        if a.owner == "agent"
+    }
+    assert sp == {0, 1}, f"hard must define spawn_point groups {{0,1}}; got {sorted(sp)}"
+def test_in_bounds_actors_on_every_level():
+    pack = load_pack(PACK)
+    for lvl in LEVELS:
+        c = compile_level(pack, lvl)
+        for a in c.scenario.actors:
+            x, y = a.position
+            assert 2 <= x <= 126 and 2 <= y <= 38, (
+                f"{lvl}: actor {a.type} at ({x},{y}) out of bounds"
+            )
+# ── predicate-level (no engine) ─────────────────────────────────────
+def _ctx(*, tick=0, killed=0, n_units=3):
+    import types
+    sig = types.SimpleNamespace(
+        game_tick=tick,
+        units_killed=killed,
+        units_lost=3 - n_units,
+        own_buildings=[],
+        own_building_types=set(),
+        enemies_seen_ids=set(),
+        enemy_buildings_seen_ids=set(),
+    )
+    return WinContext(
+        signals=sig,
+        render_state={
+            "units_summary": [
+                {"cell_x": 28, "cell_y": 10} for _ in range(n_units)
+            ]
+        },
+    )
+def test_predicates_enforce_kill_and_survival():
+    pe = compile_level(load_pack(PACK), "easy")
+    # easy: kill 1, ≥2 alive, in time → WIN
+    assert evaluate(pe.win_condition, _ctx(tick=1000, killed=1, n_units=2))
+    # easy: kill 0 → not win
+    assert not evaluate(pe.win_condition, _ctx(tick=1000, killed=0, n_units=3))
+    # easy: 1 raider left → fail (need ≥2)
+    assert evaluate(pe.fail_condition, _ctx(tick=1000, killed=1, n_units=1))
+    pm = compile_level(load_pack(PACK), "medium")
+    # medium: all 3 alive + kill → WIN
+    assert evaluate(pm.win_condition, _ctx(tick=1000, killed=1, n_units=3))
+    # medium: only 2 alive → not win, and fail fires
+    assert not evaluate(pm.win_condition, _ctx(tick=1000, killed=1, n_units=2))
+    assert evaluate(pm.fail_condition, _ctx(tick=1000, killed=1, n_units=2))
+    # medium: past deadline → fail
+    assert evaluate(pm.fail_condition, _ctx(tick=4502, killed=0, n_units=3))
+# ── engine-driven: every lazy/wrong policy LOSES, intended WINS ──────
+@pytest.mark.parametrize("level", LEVELS)
+@pytest.mark.parametrize("seed", SEEDS)
+def test_stall_loses_every_tier_and_seed(level, seed):
+    c = compile_level(load_pack(PACK), level)
+    r = run_level(c, _stall, seed=seed)
+    assert r.outcome == "loss", (
+        f"{level}/seed{seed}: stall must LOSE; got {r.outcome} "
+        f"killed={r.signals.units_killed} lost={r.signals.units_lost}"
+    )
+@pytest.mark.parametrize("level", LEVELS)
+@pytest.mark.parametrize("seed", SEEDS)
+def test_stand_and_fight_loses_every_tier_and_seed(level, seed):
+    c = compile_level(load_pack(PACK), level)
+    r = run_level(c, _stand, seed=seed)
+    assert r.outcome == "loss", (
+        f"{level}/seed{seed}: stand-and-fight must LOSE; got {r.outcome} "
+        f"killed={r.signals.units_killed} lost={r.signals.units_lost}"
+    )
+@pytest.mark.parametrize("level", LEVELS)
+@pytest.mark.parametrize("seed", SEEDS)
+def test_brute_loses_every_tier_and_seed(level, seed):
+    c = compile_level(load_pack(PACK), level)
+    r = run_level(c, _brute, seed=seed)
+    assert r.outcome == "loss", (
+        f"{level}/seed{seed}: brute attack_move must LOSE; got {r.outcome} "
+        f"killed={r.signals.units_killed} lost={r.signals.units_lost}"
+    )
+@pytest.mark.parametrize("level", LEVELS)
+@pytest.mark.parametrize("seed", SEEDS)
+def test_kite_wins_every_tier_and_seed(level, seed):
+    c = compile_level(load_pack(PACK), level)
+    r = run_level(c, _kite, seed=seed)
+    assert r.outcome == "win", (
+        f"{level}/seed{seed}: kite-and-pull must WIN; got {r.outcome} "
+        f"killed={r.signals.units_killed} lost={r.signals.units_lost}"
+    )
+def test_kite_run_is_deterministic_per_seed():
+    c = compile_level(load_pack(PACK), "medium")
+    a = run_level(c, _kite, seed=2)
+    b = run_level(c, _kite, seed=2)
+    assert (a.outcome, a.turns, a.signals.units_killed) == (
+        b.outcome, b.turns, b.signals.units_killed
+    )

tests/test_hard_tier.py CHANGED Viewed

@@ -1448,6 +1448,7 @@ UPGRADED = [
     "econ-quantitative-vs-qualitative-spend",  # hard: 2 agent spawn_point groups
     "def-tower-line-vs-cluster",  # hard: 2 agent spawn_point groups
     "coord-cover-and-move",  # hard: 2 agent spawn_point groups
 ]
 # Consciously NOT spawn-varied, with the reason (keeps the curation

     "econ-quantitative-vs-qualitative-spend",  # hard: 2 agent spawn_point groups
     "def-tower-line-vs-cluster",  # hard: 2 agent spawn_point groups
     "coord-cover-and-move",  # hard: 2 agent spawn_point groups
+    "combat-kite-and-pull",  # hard: 2 agent spawn_point groups (Wave-12)
 ]
 # Consciously NOT spawn-varied, with the reason (keeps the curation