Spaces:

qpluslab
/

OpenRA-Bench

Running

App Files Files Community

yxc20098 commited on May 20

Commit

34333cc

1 Parent(s): ee008d0

feat(scenario): build-sequence-tech-fastest — fastest weap-tech BO (PlanBench cost-optimal anchor)

Browse files

Files changed (3) hide show

openra_bench/scenarios/packs/build-sequence-tech-fastest.yaml +235 -0
tests/test_build_sequence_tech_fastest.py +342 -0
tests/test_hard_tier.py +123 -0

openra_bench/scenarios/packs/build-sequence-tech-fastest.yaml ADDED Viewed

	@@ -0,0 +1,235 @@

+# build-sequence-tech-fastest.yaml
+#
+# REASONING capability — Wave-7 build-order optimization (cost-OPTIMAL
+# planning). The agent must reach `weap` (war factory) within the
+# tightest possible tick budget by choosing the correct prerequisite
+# path: powr → proc → weap. Any extra structure (a barracks/tent
+# detour, an unneeded second power plant, idle stalling) overruns the
+# deadline.
+#
+# Engine-verified tech tree (vendor/OpenRA/mods/ra/rules/structures.yaml):
+#   - POWR  cost 300, Prerequisites: <none>      (provides anypower)
+#   - PROC  cost 1400, Prerequisites: anypower   (needs powr)
+#   - WEAP  cost 2000, Prerequisites: proc       (needs proc)
+# Total credits on the optimal path = 3700; starting_cash 5000 leaves
+# small slack but does NOT cover a wasted tent (500) AND meet the
+# tightest deadline. The Wave-2 `then:` happened-before composite is
+# the load-bearing teeth — clauses [powr, proc, weap] latch in order;
+# a policy that places weap before proc cannot satisfy the chain
+# (engine refuses too — weap's prereq is proc).
+#
+# Measured optimal timing (rush-hour-arena, fact pre-placed at
+# (10,18), seed 1, scripted intended policy):
+#   - powr completes  ≈ tick  273  (turn  3)
+#   - proc completes  ≈ tick 1263  (turn 14)
+#   - weap completes  ≈ tick 2613  (turn 29)
+# Measured WRONG-PATH timing (powr → tent → proc → weap):
+#   - weap completes  ≈ tick 3063  (turn 34) — 5 turns / 450 ticks
+#     slower than optimal. The deadline must fall INSIDE this gap.
+#
+# Bar (CLAUDE.md):
+#   - stall (observe-only)          ⇒ LOSS on every level + seed
+#   - build-tent-first wrong path   ⇒ LOSS on every level + seed
+#   - intended powr→proc→weap path  ⇒ WIN  on every level + seed
+# Real LOSS not DRAW: fail_condition `after_ticks: T+1` reachable
+# inside max_turns (engine ~90 ticks/turn ⇒ tick ≤ 93+90·(N-1)). The
+# pre-placed enemy `fact` at the far east is a MustBeDestroyed
+# landmark that keeps the episode alive (no premature engine
+# auto-done from eliminating a stray sentry).
+#
+# Real-world anchor:
+#   - PlanBench cost-optimal planning (find the minimum-cost plan
+#     that achieves the goal, not just A plan)
+#   - Manufacturing BOM-optimal ramp / critical-path scheduling
+#     (build only what the next stage requires; do not bloat the
+#     bill of materials)
+#
+# Validate:
+#   cd /Users/berta/Projects/OpenRA-Bench && \
+#   python3 -m pytest tests/test_build_sequence_tech_fastest.py -q
+meta:
+  id: build-sequence-tech-fastest
+  title: 'Fastest War Factory — Cost-Optimal powr → proc → weap Build Order'
+  capability: reasoning
+  real_world_meaning: >
+    Cost-optimal build-order planning under a tight deadline: the agent
+    must reach the war factory (`weap`) on the shortest prerequisite
+    path (powr → proc → weap). Any detour through unneeded structures
+    (a barracks, a second power plant, an early infantry training
+    queue) bloats the bill-of-materials and overruns the budget. Tests
+    that the model can plan the minimum-cost prerequisite chain — not
+    merely SOME plan that eventually arrives — under a deadline that
+    only the optimal plan satisfies.
+  robotics_analogue: >
+    Critical-path planning in autonomous manufacturing: a cell must
+    bring a target machine online by a fixed cycle-time, choosing the
+    minimum set of upstream stations to commission first (power →
+    feedstock → assembly). Adding non-load-bearing stations to the
+    ramp-up plan (a non-required quality station before assembly)
+    blows the deadline; only the cost-optimal precedence chain meets
+    spec.
+  benchmark_anchor:
+    - "PlanBench cost-optimal"
+    - "BOM manufacturing"
+  author: openra-bench
+base_map: rush-hour-arena
+base:
+  agent:
+    faction: allies
+  enemy:
+    faction: soviet
+    bot_type: ''
+  tools:
+    - observe
+    - build
+    - place_building
+  planning: true
+  termination:
+    max_ticks: 40000
+levels:
+  # ── EASY ─────────────────────────────────────────────────────────
+  # Bare cost-optimal skill. Generous T = 3000 ticks (max_turns 40 →
+  # reachable 3603). Optimal path lands at ~tick 2613 (387-tick / 4-
+  # turn buffer). The wrong-path detour through tent (+500 cost, +5
+  # turns) finishes at ~tick 3063, beyond T ⇒ LOSS. Stall finishes
+  # never ⇒ LOSS on the after_ticks fail clause.
+  easy:
+    description: >
+      Build a war factory (weap) as fast as possible by following the
+      ONLY cost-optimal prerequisite chain: powr → proc → weap. Any
+      detour (a barracks/tent, a redundant power plant, an early
+      infantry training queue) wastes the budget and you LOSE on the
+      clock. The `then:` chain enforces the exact order — placing
+      weap before proc cannot satisfy it (and the engine refuses too:
+      weap's prerequisite is proc). Optimal play finishes by tick
+      ~2613; the deadline is 3000.
+    starting_cash: 5000
+    overrides:
+      actors:
+        # Agent base seed — ONE construction yard. Nothing else
+        # pre-placed (no power, no refinery). The optimal chain MUST
+        # be executed by the agent.
+        - {type: fact, owner: agent, position: [10, 18]}
+        # Two ore patches in the near-base build radius — a built
+        # proc auto-spawns a harvester that needs ore to fund the
+        # weap purchase inside the tick budget.
+        - {type: mine, owner: neutral, position: [22, 18]}
+        - {type: mine, owner: neutral, position: [22, 22]}
+        # Far-east enemy `fact` landmark — MustBeDestroyed, unarmed
+        # neutral company. Keeps the episode alive so a stall really
+        # times out (not engine auto-done from a stray sentry kill).
+        - {type: fact, owner: enemy, position: [115, 30]}
+    win_condition:
+      all_of:
+        - then:
+            id: bo-easy
+            clauses:
+              - {has_building: powr}
+              - {has_building: proc}
+              - {has_building: weap}
+        - {within_ticks: 3000}
+    fail_condition:
+      any_of:
+        - {after_ticks: 3001}
+        - {not: {building_count_gte: {type: fact, n: 1}}}
+    max_turns: 40
+  # ── MEDIUM ───────────────────────────────────────────────────────
+  # +1 controlled variable: TIGHTER deadline. T = 2800 ticks
+  # (max_turns 35 → reachable 3153). Optimal play lands at ~tick
+  # 2613 (187-tick / ~2-turn buffer — feasible). The wrong-path
+  # detour through tent overruns hard (3063 > 2800 by ~5 turns).
+  # No additional pieces — the SAME cost-optimal chain, executed
+  # with less slack.
+  medium:
+    description: >
+      Build a war factory (weap) on the cost-optimal prerequisite
+      chain: powr → proc → weap. Tighter deadline (2800 ticks) — any
+      detour (tent / second powr / infantry queue) makes you miss.
+      The `then:` chain enforces the exact order; weap before proc
+      cannot satisfy it. Optimal play finishes by tick ~2613.
+    starting_cash: 5000
+    overrides:
+      actors:
+        - {type: fact, owner: agent, position: [10, 18]}
+        - {type: mine, owner: neutral, position: [22, 18]}
+        - {type: mine, owner: neutral, position: [22, 22]}
+        - {type: fact, owner: enemy, position: [115, 30]}
+    win_condition:
+      all_of:
+        - then:
+            id: bo-medium
+            clauses:
+              - {has_building: powr}
+              - {has_building: proc}
+              - {has_building: weap}
+        - {within_ticks: 2800}
+    fail_condition:
+      any_of:
+        - {after_ticks: 2801}
+        - {not: {building_count_gte: {type: fact, n: 1}}}
+    max_turns: 35
+  # ── HARD ─────────────────────────────────────────────────────────
+  # +1 controlled variable: ≥2 spawn_point groups (NORTH y=14 vs
+  # SOUTH y=26 base). Same cost-optimal chain, same tight T = 2800.
+  # The seed-varied spawn means a memorised "place powr at (14,22)"
+  # opening cannot generalise — the agent must compute placement
+  # relative to its actual fact each seed. Ore patches duplicated
+  # at both latitudes so harv income is symmetric per spawn. Enemy
+  # actors do NOT honour spawn_point (CLAUDE.md), so the lone
+  # enemy `fact` always places.
+  hard:
+    description: >
+      Build a war factory (weap) on the cost-optimal prerequisite
+      chain: powr → proc → weap, from a seed-chosen base (NORTH or
+      SOUTH). Tight 2800-tick deadline — detours (tent / extra
+      powr / infantry queue) lose on the clock. Placement that
+      memorises one spawn's geometry cannot generalise; compute
+      placement relative to your actual fact each run.
+    starting_cash: 5000
+    overrides:
+      actors:
+        # NORTH spawn (spawn_point 0): fact at y=14, with adjacent
+        # ore patches at y=14/y=18.
+        - {type: fact, owner: agent, position: [10, 14], spawn_point: 0}
+        # An inert rifleman per spawn group (passive: stance 2 = Defend,
+        # no `move_units` / `attack_unit` exposed so the unit cannot act
+        # — the agent's tool surface is build-only). Establishes a
+        # seed-varying AGENT UNIT in `units_summary` so the hard-tier
+        # spawn-variation contract (tests/test_hard_tier.py::
+        # test_curated_hard_still_compiles_and_runs, which inspects
+        # units not buildings) is satisfied with real per-spawn data.
+        - {type: e1,   owner: agent, position: [12, 14], spawn_point: 0, stance: 2}
+        # SOUTH spawn (spawn_point 1): fact at y=26, with adjacent
+        # ore patches at y=22/y=26.
+        - {type: fact, owner: agent, position: [10, 26], spawn_point: 1}
+        - {type: e1,   owner: agent, position: [12, 26], spawn_point: 1, stance: 2}
+        # Ore patches duplicated at BOTH latitudes so harv income is
+        # symmetric whichever spawn is chosen. (Neutral actors have
+        # no spawn_point and always place — that's fine: the unused
+        # patches are simply ignored.)
+        - {type: mine, owner: neutral, position: [22, 14]}
+        - {type: mine, owner: neutral, position: [22, 18]}
+        - {type: mine, owner: neutral, position: [22, 22]}
+        - {type: mine, owner: neutral, position: [22, 26]}
+        # Far-east enemy fact landmark — keeps the episode alive.
+        - {type: fact, owner: enemy, position: [115, 30]}
+    win_condition:
+      all_of:
+        - then:
+            id: bo-hard
+            clauses:
+              - {has_building: powr}
+              - {has_building: proc}
+              - {has_building: weap}
+        - {within_ticks: 2800}
+    fail_condition:
+      any_of:
+        - {after_ticks: 2801}
+        - {not: {building_count_gte: {type: fact, n: 1}}}
+    max_turns: 35

tests/test_build_sequence_tech_fastest.py ADDED Viewed

	@@ -0,0 +1,342 @@

+"""build-sequence-tech-fastest pack — full no-cheat validation on Rust.
+Wave-7 REASONING — cost-optimal build-order planning. The agent must
+reach the war factory (`weap`) on the SHORTEST prerequisite chain:
+    powr → proc → weap
+Any detour (build a barracks/tent first, or a redundant power plant,
+or an early infantry queue) overruns the tight tick budget and loses.
+The chain is enforced by the Wave-2 `then:` happened-before composite;
+the deadline (`within_ticks`) is the cost-optimality teeth — slack is
+tuned so the OPTIMAL plan fits and the tent-detour plan does NOT.
+Bar (CLAUDE.md): the intended cost-optimal policy WINS on every
+(level, seed); stall and the tent-first wrong-path policy LOSE on
+every (level, seed). Real LOSS not DRAW — `fail after_ticks:T+1`
+inside max_turns is the bite.
+Scenario shape:
+  - rush-hour-arena, allies vs soviet (bot disabled).
+  - easy:   T=3000, max_turns=40 — generous (4-turn buffer).
+  - medium: T=2800, max_turns=35 — tight (≈2-turn buffer).
+  - hard:   T=2800, max_turns=35 — same tight T + ≥2 spawn_point
+            groups (NORTH y=14 / SOUTH y=26 base, round-robined).
+Measured optimal timing (seed 1, scripted intended policy):
+  powr completes ≈ tick  273 (turn  3)
+  proc completes ≈ tick 1263 (turn 14)
+  weap completes ≈ tick 2613 (turn 29)
+Measured tent-first wrong-path timing:
+  weap completes ≈ tick 3063 (turn 34) — beyond every level's T.
+"""
+from __future__ import annotations
+import pytest
+pytest.importorskip("openra_train", reason="Rust env wheel not installed")
+pytest.importorskip("openra_rl_training", reason="Rust env wheel not installed")
+from openra_bench.eval_core import run_level
+from openra_bench.scenarios import load_pack
+from openra_bench.scenarios.loader import PACKS_DIR, compile_level
+PACK = PACKS_DIR / "build-sequence-tech-fastest.yaml"
+LEVELS = ("easy", "medium", "hard")
+SEEDS = (1, 2, 3, 4)
+# ── Policies ──────────────────────────────────────────────────────
+def _stall_policy():
+    """Do nothing — must LOSE on the clock on every level/seed."""
+    def pol(obs, Cmd):
+        return [Cmd.observe()]
+    return pol
+def _intended_policy():
+    """Cost-optimal play: build powr → proc → weap, each one placed
+    relative to the agent's actual fact (so the policy generalises
+    across the hard-tier spawn variation). This is the policy the
+    pack is solvable by — must WIN on every (level, seed)."""
+    milestone = {"powr": False, "proc": False, "weap": False}
+    def pol(obs, Cmd):
+        ob = obs.get("own_buildings", []) or []
+        own_b = {b["type"] for b in ob}
+        prod = obs.get("production", []) or []
+        for b in ("powr", "proc", "weap"):
+            if b in own_b:
+                milestone[b] = True
+        cmds = []
+        base = [b for b in ob if b["type"] == "fact"]
+        if not milestone["powr"]:
+            if "powr" not in prod:
+                cmds.append(Cmd.build("powr"))
+            if base:
+                cmds.append(Cmd.place_building(
+                    "powr", base[0]["cell_x"] + 4, base[0]["cell_y"]
+                ))
+        elif not milestone["proc"]:
+            if "proc" not in prod:
+                cmds.append(Cmd.build("proc"))
+            if base:
+                cmds.append(Cmd.place_building(
+                    "proc", base[0]["cell_x"] + 6, base[0]["cell_y"] + 3
+                ))
+        elif not milestone["weap"]:
+            if "weap" not in prod:
+                cmds.append(Cmd.build("weap"))
+            if base:
+                cmds.append(Cmd.place_building(
+                    "weap", base[0]["cell_x"] + 8, base[0]["cell_y"]
+                ))
+        if not cmds:
+            cmds.append(Cmd.observe())
+        return cmds
+    return pol
+def _tent_first_policy():
+    """Wrong cost-non-optimal play: powr → tent → proc → weap. The
+    tent is not on the prerequisite chain for weap (only proc is); it
+    bloats the BOM by 500 credits and ~5 turns. Must LOSE on the
+    clock on every level/seed."""
+    milestone = {"powr": False, "tent": False, "proc": False, "weap": False}
+    def pol(obs, Cmd):
+        ob = obs.get("own_buildings", []) or []
+        own_b = {b["type"] for b in ob}
+        prod = obs.get("production", []) or []
+        for b in ("powr", "tent", "proc", "weap"):
+            if b in own_b:
+                milestone[b] = True
+        cmds = []
+        base = [b for b in ob if b["type"] == "fact"]
+        if not milestone["powr"]:
+            if "powr" not in prod:
+                cmds.append(Cmd.build("powr"))
+            if base:
+                cmds.append(Cmd.place_building(
+                    "powr", base[0]["cell_x"] + 4, base[0]["cell_y"]
+                ))
+        elif not milestone["tent"]:
+            if "tent" not in prod:
+                cmds.append(Cmd.build("tent"))
+            if base:
+                cmds.append(Cmd.place_building(
+                    "tent", base[0]["cell_x"] + 4, base[0]["cell_y"] + 3
+                ))
+        elif not milestone["proc"]:
+            if "proc" not in prod:
+                cmds.append(Cmd.build("proc"))
+            if base:
+                cmds.append(Cmd.place_building(
+                    "proc", base[0]["cell_x"] + 6, base[0]["cell_y"] + 3
+                ))
+        elif not milestone["weap"]:
+            if "weap" not in prod:
+                cmds.append(Cmd.build("weap"))
+            if base:
+                cmds.append(Cmd.place_building(
+                    "weap", base[0]["cell_x"] + 8, base[0]["cell_y"]
+                ))
+        if not cmds:
+            cmds.append(Cmd.observe())
+        return cmds
+    return pol
+# ── Pack-shape tests (cheap; do not run the engine) ──────────────
+def test_pack_compiles_with_three_levels():
+    pack = load_pack(PACK)
+    assert pack.meta.id == "build-sequence-tech-fastest"
+    assert pack.meta.capability == "reasoning"
+    assert set(pack.levels) == {"easy", "medium", "hard"}
+def test_meta_benchmark_anchor_set():
+    """Required by the seed taxonomy: PlanBench cost-optimal +
+    BOM manufacturing critical-path planning."""
+    pack = load_pack(PACK)
+    anchors = pack.meta.benchmark_anchor or []
+    assert any("PlanBench" in a for a in anchors), anchors
+    assert any("BOM" in a for a in anchors), anchors
+def test_hard_tier_has_seed_driven_spawn_groups():
+    """Hard must define ≥2 agent spawn_point groups so seed varies
+    the start base (tests/test_hard_tier.py::UPGRADED contract)."""
+    c = compile_level(load_pack(PACK), "hard")
+    sp = {a.spawn_point for a in c.scenario.actors if a.owner == "agent"}
+    assert len(sp) >= 2, f"hard needs ≥2 spawn groups, got {sp}"
+def test_every_level_has_fail_condition():
+    """No silent draws — every level must be able to emit a LOSS."""
+    pack = load_pack(PACK)
+    for lvl in LEVELS:
+        c = compile_level(pack, lvl)
+        assert c.fail_condition is not None, f"{lvl} missing fail_condition"
+def test_then_composite_used_in_win():
+    """Confirms the 3-step build-order chain is wired through to the
+    compiled win condition — the load-bearing teeth of this pack."""
+    for lvl in LEVELS:
+        c = compile_level(load_pack(PACK), lvl)
+        win = c.win_condition.model_dump(exclude_none=True)
+        inner = win.get("all_of") or []
+        assert any("then" in cl for cl in inner), (
+            f"{lvl} win missing then-chain: {win}"
+        )
+        for cl in inner:
+            if "then" in cl:
+                clauses = (cl["then"] or {}).get("clauses") or []
+                assert len(clauses) == 3, (
+                    f"{lvl} then-chain must be powr→proc→weap (3 clauses); "
+                    f"got {clauses}"
+                )
+                # And in the exact engine-enforced prereq order.
+                assert clauses[0].get("has_building") == "powr"
+                assert clauses[1].get("has_building") == "proc"
+                assert clauses[2].get("has_building") == "weap"
+def test_tick_budget_aligned_with_max_turns():
+    """within_ticks must be reachable inside max_turns. Engine
+    advances ~90 ticks/turn → reachable max = 93 + 90·(N-1)."""
+    pack = load_pack(PACK)
+    for lvl in LEVELS:
+        level_def = pack.levels[lvl]
+        max_turns = level_def.max_turns
+        reachable = 93 + 90 * (max_turns - 1)
+        win = compile_level(pack, lvl).win_condition.model_dump(exclude_none=True)
+        def _collect(node, key, out):
+            if isinstance(node, dict):
+                if key in node:
+                    out.append(node[key])
+                for v in node.values():
+                    _collect(v, key, out)
+            elif isinstance(node, list):
+                for v in node:
+                    _collect(v, key, out)
+        wts = []
+        _collect(win, "within_ticks", wts)
+        assert wts, f"{lvl} has no within_ticks leaf (no clock teeth)"
+        for wt in wts:
+            assert wt <= reachable, (
+                f"{lvl} within_ticks={wt} > reachable={reachable} "
+                f"(max_turns={max_turns}) — deadline never bites ⇒ draw"
+            )
+# ── Engine-bound tests (parameterised over seeds 1..4) ────────────
+@pytest.mark.parametrize("seed", SEEDS)
+@pytest.mark.parametrize("level", LEVELS)
+def test_intended_cost_optimal_policy_wins(level, seed):
+    """The intended cost-optimal play (powr → proc → weap) must WIN
+    on every (level, seed). This is the load-bearing test that the
+    pack is solvable inside the budget by the advertised capability."""
+    c = compile_level(load_pack(PACK), level)
+    res = run_level(c, _intended_policy(), seed=seed)
+    tp = getattr(res.signals, "then_progress", {}) or {}
+    assert res.outcome == "win", (
+        f"intended cost-optimal must WIN on {level} s={seed}; "
+        f"got {res.outcome} (tick={res.signals.game_tick}, "
+        f"then_progress={tp}, "
+        f"own_buildings={res.signals.own_building_types})"
+    )
+@pytest.mark.parametrize("seed", SEEDS)
+@pytest.mark.parametrize("level", LEVELS)
+def test_stall_loses(level, seed):
+    """A do-nothing policy must LOSE on every (level, seed). The
+    fail_condition's after_ticks clause bites at the budget; never
+    a draw."""
+    c = compile_level(load_pack(PACK), level)
+    res = run_level(c, _stall_policy(), seed=seed)
+    assert res.outcome == "loss", (
+        f"stall must LOSE on {level} s={seed}; got {res.outcome} "
+        f"(tick={res.signals.game_tick})"
+    )
+@pytest.mark.parametrize("seed", SEEDS)
+@pytest.mark.parametrize("level", LEVELS)
+def test_tent_first_wrong_path_loses(level, seed):
+    """The cost-non-optimal tent-first play must LOSE on every
+    (level, seed). The tent detour adds ~500 credits + ~5 turns,
+    pushing weap completion to ~tick 3063 — beyond every level's
+    deadline. The capability being measured is COST-OPTIMAL
+    planning; a 'some plan that arrives' policy must not win."""
+    c = compile_level(load_pack(PACK), level)
+    res = run_level(c, _tent_first_policy(), seed=seed)
+    tp = getattr(res.signals, "then_progress", {}) or {}
+    assert res.outcome == "loss", (
+        f"tent-first wrong-path must LOSE on {level} s={seed}; got "
+        f"{res.outcome} (tick={res.signals.game_tick}, "
+        f"then_progress={tp}, own_buildings={res.signals.own_building_types})"
+    )
+@pytest.mark.parametrize("seed", SEEDS)
+def test_hard_seeds_produce_distinct_starts(seed):
+    """Hard's two spawn_point groups must actually round-robin —
+    different seeds must place the agent fact at a different (x,y).
+    Smoke-tests the spawn-variation contract that
+    tests/test_hard_tier.py also enforces."""
+    c = compile_level(load_pack(PACK), "hard")
+    captured = {"first_obs": None}
+    def probe(obs, Cmd):
+        if captured["first_obs"] is None:
+            captured["first_obs"] = list(obs.get("own_buildings", []) or [])
+        return [Cmd.observe()]
+    res = run_level(c, probe, seed=seed)
+    assert res.outcome == "loss"  # stall must lose
+    facts = [
+        (b["cell_x"], b["cell_y"])
+        for b in (captured["first_obs"] or [])
+        if b["type"] == "fact"
+    ]
+    assert facts, f"no fact observed at turn 0 for seed={seed}"
+def test_hard_spawns_round_robin_across_seeds():
+    """Two seeds (1 and 2) must place the agent's fact at DIFFERENT
+    cells — proves the spawn_point round-robin is active, not
+    degenerate."""
+    c = compile_level(load_pack(PACK), "hard")
+    def probe():
+        captured = {}
+        def pol(obs, Cmd):
+            if "fact_pos" not in captured:
+                bs = obs.get("own_buildings", []) or []
+                facts = [(b["cell_x"], b["cell_y"]) for b in bs if b["type"] == "fact"]
+                if facts:
+                    captured["fact_pos"] = facts[0]
+            return [Cmd.observe()]
+        pol.captured = captured
+        return pol
+    p1 = probe(); run_level(c, p1, seed=1)
+    p2 = probe(); run_level(c, p2, seed=2)
+    pos1 = p1.captured.get("fact_pos")
+    pos2 = p2.captured.get("fact_pos")
+    assert pos1 and pos2, f"missing fact obs: s1={pos1} s2={pos2}"
+    assert pos1 != pos2, (
+        f"hard spawn round-robin is degenerate: seed 1 and 2 both "
+        f"started at {pos1}"
+    )

tests/test_hard_tier.py CHANGED Viewed

@@ -171,6 +171,16 @@ UPGRADED = [
     # flips per seed (an off-axis diagonal busts the tick budget
     # and brushes the wrong-corner patrol).
     "mfb-base-1-defend-base-2-build",
     # Wave-4 TURTLE node of the tech triple (SC2 turtle macro /
     # military fortify-before-research doctrine anchor). Hard defines
     # two agent spawn_point groups (NORTH base / SOUTH base) so the
@@ -409,6 +419,20 @@ UPGRADED = [
     # y=20 so either spawn faces the same flank-vs-frontal decision
     # from a flipped bearing, and no memorised opening generalises.
     "combat-flanking-attack",
     # Wave-6 perception pack — early-warning intrusion detection
     # paired with targeted intercept (SC2 early-warn scout /
     # NORAD early-warning / IDS / military reconnaissance-in-force
@@ -420,6 +444,105 @@ UPGRADED = [
     # generalises. A memorised "send scout to (40,10) + tanks to
     # (45,10)" opening cannot generalise across seeds.
     "scout-detect-incoming-army",
 ]
 # Consciously NOT spawn-varied, with the reason (keeps the curation

     # flips per seed (an off-axis diagonal busts the tick budget
     # and brushes the wrong-corner patrol).
     "mfb-base-1-defend-base-2-build",
+    # Wave-7 Group B reasoning pack — greedy 3-base macro against a
+    # deadline (SC2 3-base macro / MicroRTS expansion / industrial
+    # site expansion anchor). Hard tier defines two agent spawn_point
+    # groups (NORTH base layout y≈20 / SOUTH base layout y≈50)
+    # round-robined by seed; the win clause accepts EITHER candidate
+    # far-east region ((90,20) or (90,50)) so the agent must place
+    # the 3rd proc in line with their actual base latitude. A
+    # memorised "place at (90,20)" generalises to NORTH but mis-places
+    # on SOUTH.
+    "mfb-third-base-against-clock",
     # Wave-4 TURTLE node of the tech triple (SC2 turtle macro /
     # military fortify-before-research doctrine anchor). Hard defines
     # two agent spawn_point groups (NORTH base / SOUTH base) so the
     # y=20 so either spawn faces the same flank-vs-frontal decision
     # from a flipped bearing, and no memorised opening generalises.
     "combat-flanking-attack",
+    # Wave-7 combat-formation pack: military tank-wedge doctrine /
+    # SC2 formation micro / combined-arms anchor. The agent commands
+    # 5× 2tnk and must arrange them in a WEDGE (apex + 2 flankers
+    # per side spread across y=18..22) before contacting an eastern
+    # cluster (4-5× e3 + 1-2× 1tnk at x=84..86). A COLUMN (single-
+    # file east on y=20) concentrates incoming Dragon fire on the
+    # lead tank and bleeds the survival bar (own_units_gte:4 fails
+    # when 2+ tanks lost); the WEDGE spreads return fire across the
+    # formation and clears the cluster intact. Hard defines two agent
+    # spawn_point groups (NORTH staging y=12..16 / SOUTH staging
+    # y=24..28) round-robined by seed; the central cluster is
+    # symmetric across y=20 so either spawn faces an equivalent
+    # column-vs-wedge decision and no memorised opening generalises.
+    "combat-formation-tank-wedge",
     # Wave-6 perception pack — early-warning intrusion detection
     # paired with targeted intercept (SC2 early-warn scout /
     # NORAD early-warning / IDS / military reconnaissance-in-force
     # generalises. A memorised "send scout to (40,10) + tanks to
     # (45,10)" opening cannot generalise across seeds.
     "scout-detect-incoming-army",
+    # Wave-7 ACTION econ-defense pack — convoy / supply-line protection
+    # (SC2 harass defense / military convoy protection / supply-line
+    # doctrine anchor). A single harv commutes proc↔mine on a long
+    # exposed route; raider 2tnks specifically target the harv.
+    # Defenders at base never engage (raider intercepts harv beyond
+    # base sight); intended play is to move escorts east to intercept
+    # on the route. Hard tier defines two agent spawn_point groups
+    # (NORTH route y=14 / SOUTH route y=26) round-robined by seed;
+    # symmetric north + south raider waves always place (enemy actors
+    # don't honour spawn_point — CLAUDE.md), so each spawn defends
+    # its OWN supply lane and a memorised opening cannot generalise.
+    "econ-protect-harvester-route",
+    # Wave-7 Group D reasoning pack — rock-paper-scissors hard-counter
+    # selection (SC2 hard-counter doctrine / military RPS counter /
+    # capability-based defense procurement anchor). Cash $2550 funds
+    # EITHER 3× 2tnk (the right counter to pure-infantry enemy) OR
+    # 8× e3 (wrong counter — anti-tank rockets vs soft targets) OR
+    # 25× e1 (1:1 attrition match). Hard tier defines two agent
+    # spawn_point groups (NORTH base y=12 / SOUTH base y=28) round-
+    # robined by seed; the centre infantry cluster always places at
+    # (70,20) (enemy actors don't honour spawn_point — CLAUDE.md),
+    # so the composition decision is the same per seed but the lane
+    # the agent commits to flips per seed and a memorised opening
+    # cannot generalise.
+    "combat-vehicle-vs-infantry-counter",
+    # Wave-7 REASONING temporal-sequencing pack — SC2 timing-push
+    # window / PlanBench temporally-extended goal / cyber attack
+    # timing-window anchor. The `then:` happened-before composite
+    # enforces a SURVIVAL gate (own_units_gte:4 at T1) latching
+    # BEFORE the STRIKE gate (units_killed_gte:K within T2), so
+    # premature engagement and stalling both lose. Hard tier defines
+    # two agent spawn_point groups (NORTH staging y=12 / SOUTH
+    # staging y=28) round-robined by seed; the central enemy turtle
+    # cluster + tsla place every seed (enemy actors don't honour
+    # spawn_point — CLAUDE.md) and is symmetric across y=20, so
+    # both staging latitudes face the same survive-then-strike
+    # decision from a flipped approach axis.
+    "tp-survive-and-strike-at-window",
+    # Wave-7 REASONING pack: concentrated-defense topology — build a
+    # TIGHT CLUSTER of pillboxes around the high-value building (the
+    # agent fact). Hard tier defines 2 agent spawn_point groups
+    # (NORTH fact at y=14 / SOUTH fact at y=26) round-robined by seed;
+    # the cluster centre flips with the fact, so a memorised "cluster
+    # at (10,20)" plan cannot generalise. Enemies don't honour
+    # spawn_point (CLAUDE.md), so the rush band is staged at BOTH
+    # candidate latitudes — only the on-latitude band converges on
+    # the active fact, but it is heavy enough to overwhelm any
+    # defence that isn't a CLUSTER around the correct fact.
+    "build-defensive-tower-cluster",
+    # Wave-7 REASONING / RPS hard-counter pack (INVERSE of combat-
+    # vehicle-vs-infantry-counter) — SC2 hard-counter / anti-armor
+    # procurement / military RPS anchor. Starting cash ($1800) funds
+    # exactly ONE composition vs a pre-placed band of HEAVY tanks
+    # (3tnk on easy/medium, 4tnk Mammoths on hard); the agent must
+    # build e3 (rocket soldiers, anti-vehicle Dragon launcher) — not
+    # 1tnk (light tanks lose attrition to heavy armour, budget buys
+    # only ~2) and not e1 (no anti-armour weapon, kill bar fails).
+    # Hard tier defines two agent spawn_point groups (NORTH base
+    # y=12 / SOUTH base y=28) round-robined by seed; the heavy band
+    # is centred mid-latitude (y=20) so both spawns face symmetric
+    # pursuit geometry (enemy actors don't honour spawn_point —
+    # CLAUDE.md) and a memorised "build e3 at y=20" opening cannot
+    # generalise across seeds.
+    "combat-rocket-soldier-anti-vehicle",
+    # Wave-7 perimeter/firewall reasoning pack — ERQA spatial commit /
+    # MicroRTS defense placement / military perimeter (firewall rule
+    # placement) anchor. Sibling/inverse of def-tower-line-vs-cluster:
+    # that pack enforces CLUSTER at a single bottleneck cell (graph
+    # min-cut doctrine); this pack enforces a LINE across the corridor
+    # (one pbox per row spanning y=18..22 at x=60, radius 0.5 so only
+    # the exact rung cell counts). Hard tier defines two agent
+    # spawn_point groups (NORTH base y=12 / SOUTH base y=28) round-
+    # robined by seed; the rusher band is centred at y=20 and ALWAYS
+    # places (enemy actors don't honour spawn_point — CLAUDE.md), so
+    # the corridor LINE is identical across seeds but the agent's base
+    # bearing flips per seed and a memorised relative-to-base placement
+    # cannot generalise.
+    "build-defensive-tower-line",
+    # Wave-7 Group I REASONING — opening-phase build-order / power-grid
+    # bring-up sequencing (PlanBench task-ordering / SOP compliance /
+    # electrical-grid bring-up anchor). Hard tier defines two agent
+    # spawn_point groups (NORTH y=12 / SOUTH y=28) round-robined by
+    # seed; the pre-placed `fact` (and therefore the build radius and
+    # the placement coords for powr/proc) flips per seed, so a
+    # memorised "(20,20) opening" cannot generalise. An inert HoldFire
+    # `e1` per group surfaces the variation via units_summary (the
+    # pack would otherwise be building-only); no `move_units`/
+    # `attack_unit` tool is exposed so the e1 is functionally inert
+    # and does not interact with the SOP test.
+    "build-power-online-first",
+    # Wave-7 REASONING pack — cost-optimal build-order (powr → proc →
+    # weap) under a tight deadline (PlanBench cost-optimal / BOM-
+    # manufacturing critical-path anchor). Hard tier defines two agent
+    # spawn_point groups (NORTH base y=14 / SOUTH base y=26) round-
+    # robined by seed; ore patches are duplicated at both latitudes so
+    # harv income is symmetric per spawn. A memorised "place powr at
+    # (14,22)" opening cannot generalise — placement must be computed
+    # relative to the actual fact each seed.
+    "build-sequence-tech-fastest",
 ]
 # Consciously NOT spawn-varied, with the reason (keeps the curation