Spaces:
Running
Running
feat(scenario): build-sequence-tech-fastest — fastest weap-tech BO (PlanBench cost-optimal anchor)
Browse files
openra_bench/scenarios/packs/build-sequence-tech-fastest.yaml
ADDED
|
@@ -0,0 +1,235 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# build-sequence-tech-fastest.yaml
|
| 2 |
+
#
|
| 3 |
+
# REASONING capability — Wave-7 build-order optimization (cost-OPTIMAL
|
| 4 |
+
# planning). The agent must reach `weap` (war factory) within the
|
| 5 |
+
# tightest possible tick budget by choosing the correct prerequisite
|
| 6 |
+
# path: powr → proc → weap. Any extra structure (a barracks/tent
|
| 7 |
+
# detour, an unneeded second power plant, idle stalling) overruns the
|
| 8 |
+
# deadline.
|
| 9 |
+
#
|
| 10 |
+
# Engine-verified tech tree (vendor/OpenRA/mods/ra/rules/structures.yaml):
|
| 11 |
+
# - POWR cost 300, Prerequisites: <none> (provides anypower)
|
| 12 |
+
# - PROC cost 1400, Prerequisites: anypower (needs powr)
|
| 13 |
+
# - WEAP cost 2000, Prerequisites: proc (needs proc)
|
| 14 |
+
# Total credits on the optimal path = 3700; starting_cash 5000 leaves
|
| 15 |
+
# small slack but does NOT cover a wasted tent (500) AND meet the
|
| 16 |
+
# tightest deadline. The Wave-2 `then:` happened-before composite is
|
| 17 |
+
# the load-bearing teeth — clauses [powr, proc, weap] latch in order;
|
| 18 |
+
# a policy that places weap before proc cannot satisfy the chain
|
| 19 |
+
# (engine refuses too — weap's prereq is proc).
|
| 20 |
+
#
|
| 21 |
+
# Measured optimal timing (rush-hour-arena, fact pre-placed at
|
| 22 |
+
# (10,18), seed 1, scripted intended policy):
|
| 23 |
+
# - powr completes ≈ tick 273 (turn 3)
|
| 24 |
+
# - proc completes ≈ tick 1263 (turn 14)
|
| 25 |
+
# - weap completes ≈ tick 2613 (turn 29)
|
| 26 |
+
# Measured WRONG-PATH timing (powr → tent → proc → weap):
|
| 27 |
+
# - weap completes ≈ tick 3063 (turn 34) — 5 turns / 450 ticks
|
| 28 |
+
# slower than optimal. The deadline must fall INSIDE this gap.
|
| 29 |
+
#
|
| 30 |
+
# Bar (CLAUDE.md):
|
| 31 |
+
# - stall (observe-only) ⇒ LOSS on every level + seed
|
| 32 |
+
# - build-tent-first wrong path ⇒ LOSS on every level + seed
|
| 33 |
+
# - intended powr→proc→weap path ⇒ WIN on every level + seed
|
| 34 |
+
# Real LOSS not DRAW: fail_condition `after_ticks: T+1` reachable
|
| 35 |
+
# inside max_turns (engine ~90 ticks/turn ⇒ tick ≤ 93+90·(N-1)). The
|
| 36 |
+
# pre-placed enemy `fact` at the far east is a MustBeDestroyed
|
| 37 |
+
# landmark that keeps the episode alive (no premature engine
|
| 38 |
+
# auto-done from eliminating a stray sentry).
|
| 39 |
+
#
|
| 40 |
+
# Real-world anchor:
|
| 41 |
+
# - PlanBench cost-optimal planning (find the minimum-cost plan
|
| 42 |
+
# that achieves the goal, not just A plan)
|
| 43 |
+
# - Manufacturing BOM-optimal ramp / critical-path scheduling
|
| 44 |
+
# (build only what the next stage requires; do not bloat the
|
| 45 |
+
# bill of materials)
|
| 46 |
+
#
|
| 47 |
+
# Validate:
|
| 48 |
+
# cd /Users/berta/Projects/OpenRA-Bench && \
|
| 49 |
+
# python3 -m pytest tests/test_build_sequence_tech_fastest.py -q
|
| 50 |
+
|
| 51 |
+
meta:
|
| 52 |
+
id: build-sequence-tech-fastest
|
| 53 |
+
title: 'Fastest War Factory — Cost-Optimal powr → proc → weap Build Order'
|
| 54 |
+
capability: reasoning
|
| 55 |
+
real_world_meaning: >
|
| 56 |
+
Cost-optimal build-order planning under a tight deadline: the agent
|
| 57 |
+
must reach the war factory (`weap`) on the shortest prerequisite
|
| 58 |
+
path (powr → proc → weap). Any detour through unneeded structures
|
| 59 |
+
(a barracks, a second power plant, an early infantry training
|
| 60 |
+
queue) bloats the bill-of-materials and overruns the budget. Tests
|
| 61 |
+
that the model can plan the minimum-cost prerequisite chain — not
|
| 62 |
+
merely SOME plan that eventually arrives — under a deadline that
|
| 63 |
+
only the optimal plan satisfies.
|
| 64 |
+
robotics_analogue: >
|
| 65 |
+
Critical-path planning in autonomous manufacturing: a cell must
|
| 66 |
+
bring a target machine online by a fixed cycle-time, choosing the
|
| 67 |
+
minimum set of upstream stations to commission first (power →
|
| 68 |
+
feedstock → assembly). Adding non-load-bearing stations to the
|
| 69 |
+
ramp-up plan (a non-required quality station before assembly)
|
| 70 |
+
blows the deadline; only the cost-optimal precedence chain meets
|
| 71 |
+
spec.
|
| 72 |
+
benchmark_anchor:
|
| 73 |
+
- "PlanBench cost-optimal"
|
| 74 |
+
- "BOM manufacturing"
|
| 75 |
+
author: openra-bench
|
| 76 |
+
|
| 77 |
+
base_map: rush-hour-arena
|
| 78 |
+
|
| 79 |
+
base:
|
| 80 |
+
agent:
|
| 81 |
+
faction: allies
|
| 82 |
+
enemy:
|
| 83 |
+
faction: soviet
|
| 84 |
+
bot_type: ''
|
| 85 |
+
tools:
|
| 86 |
+
- observe
|
| 87 |
+
- build
|
| 88 |
+
- place_building
|
| 89 |
+
planning: true
|
| 90 |
+
termination:
|
| 91 |
+
max_ticks: 40000
|
| 92 |
+
|
| 93 |
+
levels:
|
| 94 |
+
# ── EASY ─────────────────────────────────────────────────────────
|
| 95 |
+
# Bare cost-optimal skill. Generous T = 3000 ticks (max_turns 40 →
|
| 96 |
+
# reachable 3603). Optimal path lands at ~tick 2613 (387-tick / 4-
|
| 97 |
+
# turn buffer). The wrong-path detour through tent (+500 cost, +5
|
| 98 |
+
# turns) finishes at ~tick 3063, beyond T ⇒ LOSS. Stall finishes
|
| 99 |
+
# never ⇒ LOSS on the after_ticks fail clause.
|
| 100 |
+
easy:
|
| 101 |
+
description: >
|
| 102 |
+
Build a war factory (weap) as fast as possible by following the
|
| 103 |
+
ONLY cost-optimal prerequisite chain: powr → proc → weap. Any
|
| 104 |
+
detour (a barracks/tent, a redundant power plant, an early
|
| 105 |
+
infantry training queue) wastes the budget and you LOSE on the
|
| 106 |
+
clock. The `then:` chain enforces the exact order — placing
|
| 107 |
+
weap before proc cannot satisfy it (and the engine refuses too:
|
| 108 |
+
weap's prerequisite is proc). Optimal play finishes by tick
|
| 109 |
+
~2613; the deadline is 3000.
|
| 110 |
+
starting_cash: 5000
|
| 111 |
+
overrides:
|
| 112 |
+
actors:
|
| 113 |
+
# Agent base seed — ONE construction yard. Nothing else
|
| 114 |
+
# pre-placed (no power, no refinery). The optimal chain MUST
|
| 115 |
+
# be executed by the agent.
|
| 116 |
+
- {type: fact, owner: agent, position: [10, 18]}
|
| 117 |
+
# Two ore patches in the near-base build radius — a built
|
| 118 |
+
# proc auto-spawns a harvester that needs ore to fund the
|
| 119 |
+
# weap purchase inside the tick budget.
|
| 120 |
+
- {type: mine, owner: neutral, position: [22, 18]}
|
| 121 |
+
- {type: mine, owner: neutral, position: [22, 22]}
|
| 122 |
+
# Far-east enemy `fact` landmark — MustBeDestroyed, unarmed
|
| 123 |
+
# neutral company. Keeps the episode alive so a stall really
|
| 124 |
+
# times out (not engine auto-done from a stray sentry kill).
|
| 125 |
+
- {type: fact, owner: enemy, position: [115, 30]}
|
| 126 |
+
win_condition:
|
| 127 |
+
all_of:
|
| 128 |
+
- then:
|
| 129 |
+
id: bo-easy
|
| 130 |
+
clauses:
|
| 131 |
+
- {has_building: powr}
|
| 132 |
+
- {has_building: proc}
|
| 133 |
+
- {has_building: weap}
|
| 134 |
+
- {within_ticks: 3000}
|
| 135 |
+
fail_condition:
|
| 136 |
+
any_of:
|
| 137 |
+
- {after_ticks: 3001}
|
| 138 |
+
- {not: {building_count_gte: {type: fact, n: 1}}}
|
| 139 |
+
max_turns: 40
|
| 140 |
+
|
| 141 |
+
# ── MEDIUM ───────────────────────────────────────────────────────
|
| 142 |
+
# +1 controlled variable: TIGHTER deadline. T = 2800 ticks
|
| 143 |
+
# (max_turns 35 → reachable 3153). Optimal play lands at ~tick
|
| 144 |
+
# 2613 (187-tick / ~2-turn buffer — feasible). The wrong-path
|
| 145 |
+
# detour through tent overruns hard (3063 > 2800 by ~5 turns).
|
| 146 |
+
# No additional pieces — the SAME cost-optimal chain, executed
|
| 147 |
+
# with less slack.
|
| 148 |
+
medium:
|
| 149 |
+
description: >
|
| 150 |
+
Build a war factory (weap) on the cost-optimal prerequisite
|
| 151 |
+
chain: powr → proc → weap. Tighter deadline (2800 ticks) — any
|
| 152 |
+
detour (tent / second powr / infantry queue) makes you miss.
|
| 153 |
+
The `then:` chain enforces the exact order; weap before proc
|
| 154 |
+
cannot satisfy it. Optimal play finishes by tick ~2613.
|
| 155 |
+
starting_cash: 5000
|
| 156 |
+
overrides:
|
| 157 |
+
actors:
|
| 158 |
+
- {type: fact, owner: agent, position: [10, 18]}
|
| 159 |
+
- {type: mine, owner: neutral, position: [22, 18]}
|
| 160 |
+
- {type: mine, owner: neutral, position: [22, 22]}
|
| 161 |
+
- {type: fact, owner: enemy, position: [115, 30]}
|
| 162 |
+
win_condition:
|
| 163 |
+
all_of:
|
| 164 |
+
- then:
|
| 165 |
+
id: bo-medium
|
| 166 |
+
clauses:
|
| 167 |
+
- {has_building: powr}
|
| 168 |
+
- {has_building: proc}
|
| 169 |
+
- {has_building: weap}
|
| 170 |
+
- {within_ticks: 2800}
|
| 171 |
+
fail_condition:
|
| 172 |
+
any_of:
|
| 173 |
+
- {after_ticks: 2801}
|
| 174 |
+
- {not: {building_count_gte: {type: fact, n: 1}}}
|
| 175 |
+
max_turns: 35
|
| 176 |
+
|
| 177 |
+
# ── HARD ─────────────────────────────────────────────────────────
|
| 178 |
+
# +1 controlled variable: ≥2 spawn_point groups (NORTH y=14 vs
|
| 179 |
+
# SOUTH y=26 base). Same cost-optimal chain, same tight T = 2800.
|
| 180 |
+
# The seed-varied spawn means a memorised "place powr at (14,22)"
|
| 181 |
+
# opening cannot generalise — the agent must compute placement
|
| 182 |
+
# relative to its actual fact each seed. Ore patches duplicated
|
| 183 |
+
# at both latitudes so harv income is symmetric per spawn. Enemy
|
| 184 |
+
# actors do NOT honour spawn_point (CLAUDE.md), so the lone
|
| 185 |
+
# enemy `fact` always places.
|
| 186 |
+
hard:
|
| 187 |
+
description: >
|
| 188 |
+
Build a war factory (weap) on the cost-optimal prerequisite
|
| 189 |
+
chain: powr → proc → weap, from a seed-chosen base (NORTH or
|
| 190 |
+
SOUTH). Tight 2800-tick deadline — detours (tent / extra
|
| 191 |
+
powr / infantry queue) lose on the clock. Placement that
|
| 192 |
+
memorises one spawn's geometry cannot generalise; compute
|
| 193 |
+
placement relative to your actual fact each run.
|
| 194 |
+
starting_cash: 5000
|
| 195 |
+
overrides:
|
| 196 |
+
actors:
|
| 197 |
+
# NORTH spawn (spawn_point 0): fact at y=14, with adjacent
|
| 198 |
+
# ore patches at y=14/y=18.
|
| 199 |
+
- {type: fact, owner: agent, position: [10, 14], spawn_point: 0}
|
| 200 |
+
# An inert rifleman per spawn group (passive: stance 2 = Defend,
|
| 201 |
+
# no `move_units` / `attack_unit` exposed so the unit cannot act
|
| 202 |
+
# — the agent's tool surface is build-only). Establishes a
|
| 203 |
+
# seed-varying AGENT UNIT in `units_summary` so the hard-tier
|
| 204 |
+
# spawn-variation contract (tests/test_hard_tier.py::
|
| 205 |
+
# test_curated_hard_still_compiles_and_runs, which inspects
|
| 206 |
+
# units not buildings) is satisfied with real per-spawn data.
|
| 207 |
+
- {type: e1, owner: agent, position: [12, 14], spawn_point: 0, stance: 2}
|
| 208 |
+
# SOUTH spawn (spawn_point 1): fact at y=26, with adjacent
|
| 209 |
+
# ore patches at y=22/y=26.
|
| 210 |
+
- {type: fact, owner: agent, position: [10, 26], spawn_point: 1}
|
| 211 |
+
- {type: e1, owner: agent, position: [12, 26], spawn_point: 1, stance: 2}
|
| 212 |
+
# Ore patches duplicated at BOTH latitudes so harv income is
|
| 213 |
+
# symmetric whichever spawn is chosen. (Neutral actors have
|
| 214 |
+
# no spawn_point and always place — that's fine: the unused
|
| 215 |
+
# patches are simply ignored.)
|
| 216 |
+
- {type: mine, owner: neutral, position: [22, 14]}
|
| 217 |
+
- {type: mine, owner: neutral, position: [22, 18]}
|
| 218 |
+
- {type: mine, owner: neutral, position: [22, 22]}
|
| 219 |
+
- {type: mine, owner: neutral, position: [22, 26]}
|
| 220 |
+
# Far-east enemy fact landmark — keeps the episode alive.
|
| 221 |
+
- {type: fact, owner: enemy, position: [115, 30]}
|
| 222 |
+
win_condition:
|
| 223 |
+
all_of:
|
| 224 |
+
- then:
|
| 225 |
+
id: bo-hard
|
| 226 |
+
clauses:
|
| 227 |
+
- {has_building: powr}
|
| 228 |
+
- {has_building: proc}
|
| 229 |
+
- {has_building: weap}
|
| 230 |
+
- {within_ticks: 2800}
|
| 231 |
+
fail_condition:
|
| 232 |
+
any_of:
|
| 233 |
+
- {after_ticks: 2801}
|
| 234 |
+
- {not: {building_count_gte: {type: fact, n: 1}}}
|
| 235 |
+
max_turns: 35
|
tests/test_build_sequence_tech_fastest.py
ADDED
|
@@ -0,0 +1,342 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""build-sequence-tech-fastest pack — full no-cheat validation on Rust.
|
| 2 |
+
|
| 3 |
+
Wave-7 REASONING — cost-optimal build-order planning. The agent must
|
| 4 |
+
reach the war factory (`weap`) on the SHORTEST prerequisite chain:
|
| 5 |
+
|
| 6 |
+
powr → proc → weap
|
| 7 |
+
|
| 8 |
+
Any detour (build a barracks/tent first, or a redundant power plant,
|
| 9 |
+
or an early infantry queue) overruns the tight tick budget and loses.
|
| 10 |
+
The chain is enforced by the Wave-2 `then:` happened-before composite;
|
| 11 |
+
the deadline (`within_ticks`) is the cost-optimality teeth — slack is
|
| 12 |
+
tuned so the OPTIMAL plan fits and the tent-detour plan does NOT.
|
| 13 |
+
|
| 14 |
+
Bar (CLAUDE.md): the intended cost-optimal policy WINS on every
|
| 15 |
+
(level, seed); stall and the tent-first wrong-path policy LOSE on
|
| 16 |
+
every (level, seed). Real LOSS not DRAW — `fail after_ticks:T+1`
|
| 17 |
+
inside max_turns is the bite.
|
| 18 |
+
|
| 19 |
+
Scenario shape:
|
| 20 |
+
- rush-hour-arena, allies vs soviet (bot disabled).
|
| 21 |
+
- easy: T=3000, max_turns=40 — generous (4-turn buffer).
|
| 22 |
+
- medium: T=2800, max_turns=35 — tight (≈2-turn buffer).
|
| 23 |
+
- hard: T=2800, max_turns=35 — same tight T + ≥2 spawn_point
|
| 24 |
+
groups (NORTH y=14 / SOUTH y=26 base, round-robined).
|
| 25 |
+
|
| 26 |
+
Measured optimal timing (seed 1, scripted intended policy):
|
| 27 |
+
powr completes ≈ tick 273 (turn 3)
|
| 28 |
+
proc completes ≈ tick 1263 (turn 14)
|
| 29 |
+
weap completes ≈ tick 2613 (turn 29)
|
| 30 |
+
Measured tent-first wrong-path timing:
|
| 31 |
+
weap completes ≈ tick 3063 (turn 34) — beyond every level's T.
|
| 32 |
+
"""
|
| 33 |
+
|
| 34 |
+
from __future__ import annotations
|
| 35 |
+
|
| 36 |
+
import pytest
|
| 37 |
+
|
| 38 |
+
pytest.importorskip("openra_train", reason="Rust env wheel not installed")
|
| 39 |
+
pytest.importorskip("openra_rl_training", reason="Rust env wheel not installed")
|
| 40 |
+
|
| 41 |
+
from openra_bench.eval_core import run_level
|
| 42 |
+
from openra_bench.scenarios import load_pack
|
| 43 |
+
from openra_bench.scenarios.loader import PACKS_DIR, compile_level
|
| 44 |
+
|
| 45 |
+
PACK = PACKS_DIR / "build-sequence-tech-fastest.yaml"
|
| 46 |
+
LEVELS = ("easy", "medium", "hard")
|
| 47 |
+
SEEDS = (1, 2, 3, 4)
|
| 48 |
+
|
| 49 |
+
|
| 50 |
+
# ── Policies ──────────────────────────────────────────────────────
|
| 51 |
+
|
| 52 |
+
|
| 53 |
+
def _stall_policy():
|
| 54 |
+
"""Do nothing — must LOSE on the clock on every level/seed."""
|
| 55 |
+
def pol(obs, Cmd):
|
| 56 |
+
return [Cmd.observe()]
|
| 57 |
+
return pol
|
| 58 |
+
|
| 59 |
+
|
| 60 |
+
def _intended_policy():
|
| 61 |
+
"""Cost-optimal play: build powr → proc → weap, each one placed
|
| 62 |
+
relative to the agent's actual fact (so the policy generalises
|
| 63 |
+
across the hard-tier spawn variation). This is the policy the
|
| 64 |
+
pack is solvable by — must WIN on every (level, seed)."""
|
| 65 |
+
milestone = {"powr": False, "proc": False, "weap": False}
|
| 66 |
+
|
| 67 |
+
def pol(obs, Cmd):
|
| 68 |
+
ob = obs.get("own_buildings", []) or []
|
| 69 |
+
own_b = {b["type"] for b in ob}
|
| 70 |
+
prod = obs.get("production", []) or []
|
| 71 |
+
for b in ("powr", "proc", "weap"):
|
| 72 |
+
if b in own_b:
|
| 73 |
+
milestone[b] = True
|
| 74 |
+
cmds = []
|
| 75 |
+
base = [b for b in ob if b["type"] == "fact"]
|
| 76 |
+
if not milestone["powr"]:
|
| 77 |
+
if "powr" not in prod:
|
| 78 |
+
cmds.append(Cmd.build("powr"))
|
| 79 |
+
if base:
|
| 80 |
+
cmds.append(Cmd.place_building(
|
| 81 |
+
"powr", base[0]["cell_x"] + 4, base[0]["cell_y"]
|
| 82 |
+
))
|
| 83 |
+
elif not milestone["proc"]:
|
| 84 |
+
if "proc" not in prod:
|
| 85 |
+
cmds.append(Cmd.build("proc"))
|
| 86 |
+
if base:
|
| 87 |
+
cmds.append(Cmd.place_building(
|
| 88 |
+
"proc", base[0]["cell_x"] + 6, base[0]["cell_y"] + 3
|
| 89 |
+
))
|
| 90 |
+
elif not milestone["weap"]:
|
| 91 |
+
if "weap" not in prod:
|
| 92 |
+
cmds.append(Cmd.build("weap"))
|
| 93 |
+
if base:
|
| 94 |
+
cmds.append(Cmd.place_building(
|
| 95 |
+
"weap", base[0]["cell_x"] + 8, base[0]["cell_y"]
|
| 96 |
+
))
|
| 97 |
+
if not cmds:
|
| 98 |
+
cmds.append(Cmd.observe())
|
| 99 |
+
return cmds
|
| 100 |
+
return pol
|
| 101 |
+
|
| 102 |
+
|
| 103 |
+
def _tent_first_policy():
|
| 104 |
+
"""Wrong cost-non-optimal play: powr → tent → proc → weap. The
|
| 105 |
+
tent is not on the prerequisite chain for weap (only proc is); it
|
| 106 |
+
bloats the BOM by 500 credits and ~5 turns. Must LOSE on the
|
| 107 |
+
clock on every level/seed."""
|
| 108 |
+
milestone = {"powr": False, "tent": False, "proc": False, "weap": False}
|
| 109 |
+
|
| 110 |
+
def pol(obs, Cmd):
|
| 111 |
+
ob = obs.get("own_buildings", []) or []
|
| 112 |
+
own_b = {b["type"] for b in ob}
|
| 113 |
+
prod = obs.get("production", []) or []
|
| 114 |
+
for b in ("powr", "tent", "proc", "weap"):
|
| 115 |
+
if b in own_b:
|
| 116 |
+
milestone[b] = True
|
| 117 |
+
cmds = []
|
| 118 |
+
base = [b for b in ob if b["type"] == "fact"]
|
| 119 |
+
if not milestone["powr"]:
|
| 120 |
+
if "powr" not in prod:
|
| 121 |
+
cmds.append(Cmd.build("powr"))
|
| 122 |
+
if base:
|
| 123 |
+
cmds.append(Cmd.place_building(
|
| 124 |
+
"powr", base[0]["cell_x"] + 4, base[0]["cell_y"]
|
| 125 |
+
))
|
| 126 |
+
elif not milestone["tent"]:
|
| 127 |
+
if "tent" not in prod:
|
| 128 |
+
cmds.append(Cmd.build("tent"))
|
| 129 |
+
if base:
|
| 130 |
+
cmds.append(Cmd.place_building(
|
| 131 |
+
"tent", base[0]["cell_x"] + 4, base[0]["cell_y"] + 3
|
| 132 |
+
))
|
| 133 |
+
elif not milestone["proc"]:
|
| 134 |
+
if "proc" not in prod:
|
| 135 |
+
cmds.append(Cmd.build("proc"))
|
| 136 |
+
if base:
|
| 137 |
+
cmds.append(Cmd.place_building(
|
| 138 |
+
"proc", base[0]["cell_x"] + 6, base[0]["cell_y"] + 3
|
| 139 |
+
))
|
| 140 |
+
elif not milestone["weap"]:
|
| 141 |
+
if "weap" not in prod:
|
| 142 |
+
cmds.append(Cmd.build("weap"))
|
| 143 |
+
if base:
|
| 144 |
+
cmds.append(Cmd.place_building(
|
| 145 |
+
"weap", base[0]["cell_x"] + 8, base[0]["cell_y"]
|
| 146 |
+
))
|
| 147 |
+
if not cmds:
|
| 148 |
+
cmds.append(Cmd.observe())
|
| 149 |
+
return cmds
|
| 150 |
+
return pol
|
| 151 |
+
|
| 152 |
+
|
| 153 |
+
# ── Pack-shape tests (cheap; do not run the engine) ──────────────
|
| 154 |
+
|
| 155 |
+
|
| 156 |
+
def test_pack_compiles_with_three_levels():
|
| 157 |
+
pack = load_pack(PACK)
|
| 158 |
+
assert pack.meta.id == "build-sequence-tech-fastest"
|
| 159 |
+
assert pack.meta.capability == "reasoning"
|
| 160 |
+
assert set(pack.levels) == {"easy", "medium", "hard"}
|
| 161 |
+
|
| 162 |
+
|
| 163 |
+
def test_meta_benchmark_anchor_set():
|
| 164 |
+
"""Required by the seed taxonomy: PlanBench cost-optimal +
|
| 165 |
+
BOM manufacturing critical-path planning."""
|
| 166 |
+
pack = load_pack(PACK)
|
| 167 |
+
anchors = pack.meta.benchmark_anchor or []
|
| 168 |
+
assert any("PlanBench" in a for a in anchors), anchors
|
| 169 |
+
assert any("BOM" in a for a in anchors), anchors
|
| 170 |
+
|
| 171 |
+
|
| 172 |
+
def test_hard_tier_has_seed_driven_spawn_groups():
|
| 173 |
+
"""Hard must define ≥2 agent spawn_point groups so seed varies
|
| 174 |
+
the start base (tests/test_hard_tier.py::UPGRADED contract)."""
|
| 175 |
+
c = compile_level(load_pack(PACK), "hard")
|
| 176 |
+
sp = {a.spawn_point for a in c.scenario.actors if a.owner == "agent"}
|
| 177 |
+
assert len(sp) >= 2, f"hard needs ≥2 spawn groups, got {sp}"
|
| 178 |
+
|
| 179 |
+
|
| 180 |
+
def test_every_level_has_fail_condition():
|
| 181 |
+
"""No silent draws — every level must be able to emit a LOSS."""
|
| 182 |
+
pack = load_pack(PACK)
|
| 183 |
+
for lvl in LEVELS:
|
| 184 |
+
c = compile_level(pack, lvl)
|
| 185 |
+
assert c.fail_condition is not None, f"{lvl} missing fail_condition"
|
| 186 |
+
|
| 187 |
+
|
| 188 |
+
def test_then_composite_used_in_win():
|
| 189 |
+
"""Confirms the 3-step build-order chain is wired through to the
|
| 190 |
+
compiled win condition — the load-bearing teeth of this pack."""
|
| 191 |
+
for lvl in LEVELS:
|
| 192 |
+
c = compile_level(load_pack(PACK), lvl)
|
| 193 |
+
win = c.win_condition.model_dump(exclude_none=True)
|
| 194 |
+
inner = win.get("all_of") or []
|
| 195 |
+
assert any("then" in cl for cl in inner), (
|
| 196 |
+
f"{lvl} win missing then-chain: {win}"
|
| 197 |
+
)
|
| 198 |
+
for cl in inner:
|
| 199 |
+
if "then" in cl:
|
| 200 |
+
clauses = (cl["then"] or {}).get("clauses") or []
|
| 201 |
+
assert len(clauses) == 3, (
|
| 202 |
+
f"{lvl} then-chain must be powr→proc→weap (3 clauses); "
|
| 203 |
+
f"got {clauses}"
|
| 204 |
+
)
|
| 205 |
+
# And in the exact engine-enforced prereq order.
|
| 206 |
+
assert clauses[0].get("has_building") == "powr"
|
| 207 |
+
assert clauses[1].get("has_building") == "proc"
|
| 208 |
+
assert clauses[2].get("has_building") == "weap"
|
| 209 |
+
|
| 210 |
+
|
| 211 |
+
def test_tick_budget_aligned_with_max_turns():
|
| 212 |
+
"""within_ticks must be reachable inside max_turns. Engine
|
| 213 |
+
advances ~90 ticks/turn → reachable max = 93 + 90·(N-1)."""
|
| 214 |
+
pack = load_pack(PACK)
|
| 215 |
+
for lvl in LEVELS:
|
| 216 |
+
level_def = pack.levels[lvl]
|
| 217 |
+
max_turns = level_def.max_turns
|
| 218 |
+
reachable = 93 + 90 * (max_turns - 1)
|
| 219 |
+
win = compile_level(pack, lvl).win_condition.model_dump(exclude_none=True)
|
| 220 |
+
|
| 221 |
+
def _collect(node, key, out):
|
| 222 |
+
if isinstance(node, dict):
|
| 223 |
+
if key in node:
|
| 224 |
+
out.append(node[key])
|
| 225 |
+
for v in node.values():
|
| 226 |
+
_collect(v, key, out)
|
| 227 |
+
elif isinstance(node, list):
|
| 228 |
+
for v in node:
|
| 229 |
+
_collect(v, key, out)
|
| 230 |
+
wts = []
|
| 231 |
+
_collect(win, "within_ticks", wts)
|
| 232 |
+
assert wts, f"{lvl} has no within_ticks leaf (no clock teeth)"
|
| 233 |
+
for wt in wts:
|
| 234 |
+
assert wt <= reachable, (
|
| 235 |
+
f"{lvl} within_ticks={wt} > reachable={reachable} "
|
| 236 |
+
f"(max_turns={max_turns}) — deadline never bites ⇒ draw"
|
| 237 |
+
)
|
| 238 |
+
|
| 239 |
+
|
| 240 |
+
# ── Engine-bound tests (parameterised over seeds 1..4) ────────────
|
| 241 |
+
|
| 242 |
+
|
| 243 |
+
@pytest.mark.parametrize("seed", SEEDS)
|
| 244 |
+
@pytest.mark.parametrize("level", LEVELS)
|
| 245 |
+
def test_intended_cost_optimal_policy_wins(level, seed):
|
| 246 |
+
"""The intended cost-optimal play (powr → proc → weap) must WIN
|
| 247 |
+
on every (level, seed). This is the load-bearing test that the
|
| 248 |
+
pack is solvable inside the budget by the advertised capability."""
|
| 249 |
+
c = compile_level(load_pack(PACK), level)
|
| 250 |
+
res = run_level(c, _intended_policy(), seed=seed)
|
| 251 |
+
tp = getattr(res.signals, "then_progress", {}) or {}
|
| 252 |
+
assert res.outcome == "win", (
|
| 253 |
+
f"intended cost-optimal must WIN on {level} s={seed}; "
|
| 254 |
+
f"got {res.outcome} (tick={res.signals.game_tick}, "
|
| 255 |
+
f"then_progress={tp}, "
|
| 256 |
+
f"own_buildings={res.signals.own_building_types})"
|
| 257 |
+
)
|
| 258 |
+
|
| 259 |
+
|
| 260 |
+
@pytest.mark.parametrize("seed", SEEDS)
|
| 261 |
+
@pytest.mark.parametrize("level", LEVELS)
|
| 262 |
+
def test_stall_loses(level, seed):
|
| 263 |
+
"""A do-nothing policy must LOSE on every (level, seed). The
|
| 264 |
+
fail_condition's after_ticks clause bites at the budget; never
|
| 265 |
+
a draw."""
|
| 266 |
+
c = compile_level(load_pack(PACK), level)
|
| 267 |
+
res = run_level(c, _stall_policy(), seed=seed)
|
| 268 |
+
assert res.outcome == "loss", (
|
| 269 |
+
f"stall must LOSE on {level} s={seed}; got {res.outcome} "
|
| 270 |
+
f"(tick={res.signals.game_tick})"
|
| 271 |
+
)
|
| 272 |
+
|
| 273 |
+
|
| 274 |
+
@pytest.mark.parametrize("seed", SEEDS)
|
| 275 |
+
@pytest.mark.parametrize("level", LEVELS)
|
| 276 |
+
def test_tent_first_wrong_path_loses(level, seed):
|
| 277 |
+
"""The cost-non-optimal tent-first play must LOSE on every
|
| 278 |
+
(level, seed). The tent detour adds ~500 credits + ~5 turns,
|
| 279 |
+
pushing weap completion to ~tick 3063 — beyond every level's
|
| 280 |
+
deadline. The capability being measured is COST-OPTIMAL
|
| 281 |
+
planning; a 'some plan that arrives' policy must not win."""
|
| 282 |
+
c = compile_level(load_pack(PACK), level)
|
| 283 |
+
res = run_level(c, _tent_first_policy(), seed=seed)
|
| 284 |
+
tp = getattr(res.signals, "then_progress", {}) or {}
|
| 285 |
+
assert res.outcome == "loss", (
|
| 286 |
+
f"tent-first wrong-path must LOSE on {level} s={seed}; got "
|
| 287 |
+
f"{res.outcome} (tick={res.signals.game_tick}, "
|
| 288 |
+
f"then_progress={tp}, own_buildings={res.signals.own_building_types})"
|
| 289 |
+
)
|
| 290 |
+
|
| 291 |
+
|
| 292 |
+
@pytest.mark.parametrize("seed", SEEDS)
|
| 293 |
+
def test_hard_seeds_produce_distinct_starts(seed):
|
| 294 |
+
"""Hard's two spawn_point groups must actually round-robin —
|
| 295 |
+
different seeds must place the agent fact at a different (x,y).
|
| 296 |
+
Smoke-tests the spawn-variation contract that
|
| 297 |
+
tests/test_hard_tier.py also enforces."""
|
| 298 |
+
c = compile_level(load_pack(PACK), "hard")
|
| 299 |
+
captured = {"first_obs": None}
|
| 300 |
+
|
| 301 |
+
def probe(obs, Cmd):
|
| 302 |
+
if captured["first_obs"] is None:
|
| 303 |
+
captured["first_obs"] = list(obs.get("own_buildings", []) or [])
|
| 304 |
+
return [Cmd.observe()]
|
| 305 |
+
|
| 306 |
+
res = run_level(c, probe, seed=seed)
|
| 307 |
+
assert res.outcome == "loss" # stall must lose
|
| 308 |
+
facts = [
|
| 309 |
+
(b["cell_x"], b["cell_y"])
|
| 310 |
+
for b in (captured["first_obs"] or [])
|
| 311 |
+
if b["type"] == "fact"
|
| 312 |
+
]
|
| 313 |
+
assert facts, f"no fact observed at turn 0 for seed={seed}"
|
| 314 |
+
|
| 315 |
+
|
| 316 |
+
def test_hard_spawns_round_robin_across_seeds():
|
| 317 |
+
"""Two seeds (1 and 2) must place the agent's fact at DIFFERENT
|
| 318 |
+
cells — proves the spawn_point round-robin is active, not
|
| 319 |
+
degenerate."""
|
| 320 |
+
c = compile_level(load_pack(PACK), "hard")
|
| 321 |
+
|
| 322 |
+
def probe():
|
| 323 |
+
captured = {}
|
| 324 |
+
def pol(obs, Cmd):
|
| 325 |
+
if "fact_pos" not in captured:
|
| 326 |
+
bs = obs.get("own_buildings", []) or []
|
| 327 |
+
facts = [(b["cell_x"], b["cell_y"]) for b in bs if b["type"] == "fact"]
|
| 328 |
+
if facts:
|
| 329 |
+
captured["fact_pos"] = facts[0]
|
| 330 |
+
return [Cmd.observe()]
|
| 331 |
+
pol.captured = captured
|
| 332 |
+
return pol
|
| 333 |
+
|
| 334 |
+
p1 = probe(); run_level(c, p1, seed=1)
|
| 335 |
+
p2 = probe(); run_level(c, p2, seed=2)
|
| 336 |
+
pos1 = p1.captured.get("fact_pos")
|
| 337 |
+
pos2 = p2.captured.get("fact_pos")
|
| 338 |
+
assert pos1 and pos2, f"missing fact obs: s1={pos1} s2={pos2}"
|
| 339 |
+
assert pos1 != pos2, (
|
| 340 |
+
f"hard spawn round-robin is degenerate: seed 1 and 2 both "
|
| 341 |
+
f"started at {pos1}"
|
| 342 |
+
)
|
tests/test_hard_tier.py
CHANGED
|
@@ -171,6 +171,16 @@ UPGRADED = [
|
|
| 171 |
# flips per seed (an off-axis diagonal busts the tick budget
|
| 172 |
# and brushes the wrong-corner patrol).
|
| 173 |
"mfb-base-1-defend-base-2-build",
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 174 |
# Wave-4 TURTLE node of the tech triple (SC2 turtle macro /
|
| 175 |
# military fortify-before-research doctrine anchor). Hard defines
|
| 176 |
# two agent spawn_point groups (NORTH base / SOUTH base) so the
|
|
@@ -409,6 +419,20 @@ UPGRADED = [
|
|
| 409 |
# y=20 so either spawn faces the same flank-vs-frontal decision
|
| 410 |
# from a flipped bearing, and no memorised opening generalises.
|
| 411 |
"combat-flanking-attack",
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 412 |
# Wave-6 perception pack — early-warning intrusion detection
|
| 413 |
# paired with targeted intercept (SC2 early-warn scout /
|
| 414 |
# NORAD early-warning / IDS / military reconnaissance-in-force
|
|
@@ -420,6 +444,105 @@ UPGRADED = [
|
|
| 420 |
# generalises. A memorised "send scout to (40,10) + tanks to
|
| 421 |
# (45,10)" opening cannot generalise across seeds.
|
| 422 |
"scout-detect-incoming-army",
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 423 |
]
|
| 424 |
|
| 425 |
# Consciously NOT spawn-varied, with the reason (keeps the curation
|
|
|
|
| 171 |
# flips per seed (an off-axis diagonal busts the tick budget
|
| 172 |
# and brushes the wrong-corner patrol).
|
| 173 |
"mfb-base-1-defend-base-2-build",
|
| 174 |
+
# Wave-7 Group B reasoning pack — greedy 3-base macro against a
|
| 175 |
+
# deadline (SC2 3-base macro / MicroRTS expansion / industrial
|
| 176 |
+
# site expansion anchor). Hard tier defines two agent spawn_point
|
| 177 |
+
# groups (NORTH base layout y≈20 / SOUTH base layout y≈50)
|
| 178 |
+
# round-robined by seed; the win clause accepts EITHER candidate
|
| 179 |
+
# far-east region ((90,20) or (90,50)) so the agent must place
|
| 180 |
+
# the 3rd proc in line with their actual base latitude. A
|
| 181 |
+
# memorised "place at (90,20)" generalises to NORTH but mis-places
|
| 182 |
+
# on SOUTH.
|
| 183 |
+
"mfb-third-base-against-clock",
|
| 184 |
# Wave-4 TURTLE node of the tech triple (SC2 turtle macro /
|
| 185 |
# military fortify-before-research doctrine anchor). Hard defines
|
| 186 |
# two agent spawn_point groups (NORTH base / SOUTH base) so the
|
|
|
|
| 419 |
# y=20 so either spawn faces the same flank-vs-frontal decision
|
| 420 |
# from a flipped bearing, and no memorised opening generalises.
|
| 421 |
"combat-flanking-attack",
|
| 422 |
+
# Wave-7 combat-formation pack: military tank-wedge doctrine /
|
| 423 |
+
# SC2 formation micro / combined-arms anchor. The agent commands
|
| 424 |
+
# 5× 2tnk and must arrange them in a WEDGE (apex + 2 flankers
|
| 425 |
+
# per side spread across y=18..22) before contacting an eastern
|
| 426 |
+
# cluster (4-5× e3 + 1-2× 1tnk at x=84..86). A COLUMN (single-
|
| 427 |
+
# file east on y=20) concentrates incoming Dragon fire on the
|
| 428 |
+
# lead tank and bleeds the survival bar (own_units_gte:4 fails
|
| 429 |
+
# when 2+ tanks lost); the WEDGE spreads return fire across the
|
| 430 |
+
# formation and clears the cluster intact. Hard defines two agent
|
| 431 |
+
# spawn_point groups (NORTH staging y=12..16 / SOUTH staging
|
| 432 |
+
# y=24..28) round-robined by seed; the central cluster is
|
| 433 |
+
# symmetric across y=20 so either spawn faces an equivalent
|
| 434 |
+
# column-vs-wedge decision and no memorised opening generalises.
|
| 435 |
+
"combat-formation-tank-wedge",
|
| 436 |
# Wave-6 perception pack — early-warning intrusion detection
|
| 437 |
# paired with targeted intercept (SC2 early-warn scout /
|
| 438 |
# NORAD early-warning / IDS / military reconnaissance-in-force
|
|
|
|
| 444 |
# generalises. A memorised "send scout to (40,10) + tanks to
|
| 445 |
# (45,10)" opening cannot generalise across seeds.
|
| 446 |
"scout-detect-incoming-army",
|
| 447 |
+
# Wave-7 ACTION econ-defense pack — convoy / supply-line protection
|
| 448 |
+
# (SC2 harass defense / military convoy protection / supply-line
|
| 449 |
+
# doctrine anchor). A single harv commutes proc↔mine on a long
|
| 450 |
+
# exposed route; raider 2tnks specifically target the harv.
|
| 451 |
+
# Defenders at base never engage (raider intercepts harv beyond
|
| 452 |
+
# base sight); intended play is to move escorts east to intercept
|
| 453 |
+
# on the route. Hard tier defines two agent spawn_point groups
|
| 454 |
+
# (NORTH route y=14 / SOUTH route y=26) round-robined by seed;
|
| 455 |
+
# symmetric north + south raider waves always place (enemy actors
|
| 456 |
+
# don't honour spawn_point — CLAUDE.md), so each spawn defends
|
| 457 |
+
# its OWN supply lane and a memorised opening cannot generalise.
|
| 458 |
+
"econ-protect-harvester-route",
|
| 459 |
+
# Wave-7 Group D reasoning pack — rock-paper-scissors hard-counter
|
| 460 |
+
# selection (SC2 hard-counter doctrine / military RPS counter /
|
| 461 |
+
# capability-based defense procurement anchor). Cash $2550 funds
|
| 462 |
+
# EITHER 3× 2tnk (the right counter to pure-infantry enemy) OR
|
| 463 |
+
# 8× e3 (wrong counter — anti-tank rockets vs soft targets) OR
|
| 464 |
+
# 25× e1 (1:1 attrition match). Hard tier defines two agent
|
| 465 |
+
# spawn_point groups (NORTH base y=12 / SOUTH base y=28) round-
|
| 466 |
+
# robined by seed; the centre infantry cluster always places at
|
| 467 |
+
# (70,20) (enemy actors don't honour spawn_point — CLAUDE.md),
|
| 468 |
+
# so the composition decision is the same per seed but the lane
|
| 469 |
+
# the agent commits to flips per seed and a memorised opening
|
| 470 |
+
# cannot generalise.
|
| 471 |
+
"combat-vehicle-vs-infantry-counter",
|
| 472 |
+
# Wave-7 REASONING temporal-sequencing pack — SC2 timing-push
|
| 473 |
+
# window / PlanBench temporally-extended goal / cyber attack
|
| 474 |
+
# timing-window anchor. The `then:` happened-before composite
|
| 475 |
+
# enforces a SURVIVAL gate (own_units_gte:4 at T1) latching
|
| 476 |
+
# BEFORE the STRIKE gate (units_killed_gte:K within T2), so
|
| 477 |
+
# premature engagement and stalling both lose. Hard tier defines
|
| 478 |
+
# two agent spawn_point groups (NORTH staging y=12 / SOUTH
|
| 479 |
+
# staging y=28) round-robined by seed; the central enemy turtle
|
| 480 |
+
# cluster + tsla place every seed (enemy actors don't honour
|
| 481 |
+
# spawn_point — CLAUDE.md) and is symmetric across y=20, so
|
| 482 |
+
# both staging latitudes face the same survive-then-strike
|
| 483 |
+
# decision from a flipped approach axis.
|
| 484 |
+
"tp-survive-and-strike-at-window",
|
| 485 |
+
# Wave-7 REASONING pack: concentrated-defense topology — build a
|
| 486 |
+
# TIGHT CLUSTER of pillboxes around the high-value building (the
|
| 487 |
+
# agent fact). Hard tier defines 2 agent spawn_point groups
|
| 488 |
+
# (NORTH fact at y=14 / SOUTH fact at y=26) round-robined by seed;
|
| 489 |
+
# the cluster centre flips with the fact, so a memorised "cluster
|
| 490 |
+
# at (10,20)" plan cannot generalise. Enemies don't honour
|
| 491 |
+
# spawn_point (CLAUDE.md), so the rush band is staged at BOTH
|
| 492 |
+
# candidate latitudes — only the on-latitude band converges on
|
| 493 |
+
# the active fact, but it is heavy enough to overwhelm any
|
| 494 |
+
# defence that isn't a CLUSTER around the correct fact.
|
| 495 |
+
"build-defensive-tower-cluster",
|
| 496 |
+
# Wave-7 REASONING / RPS hard-counter pack (INVERSE of combat-
|
| 497 |
+
# vehicle-vs-infantry-counter) — SC2 hard-counter / anti-armor
|
| 498 |
+
# procurement / military RPS anchor. Starting cash ($1800) funds
|
| 499 |
+
# exactly ONE composition vs a pre-placed band of HEAVY tanks
|
| 500 |
+
# (3tnk on easy/medium, 4tnk Mammoths on hard); the agent must
|
| 501 |
+
# build e3 (rocket soldiers, anti-vehicle Dragon launcher) — not
|
| 502 |
+
# 1tnk (light tanks lose attrition to heavy armour, budget buys
|
| 503 |
+
# only ~2) and not e1 (no anti-armour weapon, kill bar fails).
|
| 504 |
+
# Hard tier defines two agent spawn_point groups (NORTH base
|
| 505 |
+
# y=12 / SOUTH base y=28) round-robined by seed; the heavy band
|
| 506 |
+
# is centred mid-latitude (y=20) so both spawns face symmetric
|
| 507 |
+
# pursuit geometry (enemy actors don't honour spawn_point —
|
| 508 |
+
# CLAUDE.md) and a memorised "build e3 at y=20" opening cannot
|
| 509 |
+
# generalise across seeds.
|
| 510 |
+
"combat-rocket-soldier-anti-vehicle",
|
| 511 |
+
# Wave-7 perimeter/firewall reasoning pack — ERQA spatial commit /
|
| 512 |
+
# MicroRTS defense placement / military perimeter (firewall rule
|
| 513 |
+
# placement) anchor. Sibling/inverse of def-tower-line-vs-cluster:
|
| 514 |
+
# that pack enforces CLUSTER at a single bottleneck cell (graph
|
| 515 |
+
# min-cut doctrine); this pack enforces a LINE across the corridor
|
| 516 |
+
# (one pbox per row spanning y=18..22 at x=60, radius 0.5 so only
|
| 517 |
+
# the exact rung cell counts). Hard tier defines two agent
|
| 518 |
+
# spawn_point groups (NORTH base y=12 / SOUTH base y=28) round-
|
| 519 |
+
# robined by seed; the rusher band is centred at y=20 and ALWAYS
|
| 520 |
+
# places (enemy actors don't honour spawn_point — CLAUDE.md), so
|
| 521 |
+
# the corridor LINE is identical across seeds but the agent's base
|
| 522 |
+
# bearing flips per seed and a memorised relative-to-base placement
|
| 523 |
+
# cannot generalise.
|
| 524 |
+
"build-defensive-tower-line",
|
| 525 |
+
# Wave-7 Group I REASONING — opening-phase build-order / power-grid
|
| 526 |
+
# bring-up sequencing (PlanBench task-ordering / SOP compliance /
|
| 527 |
+
# electrical-grid bring-up anchor). Hard tier defines two agent
|
| 528 |
+
# spawn_point groups (NORTH y=12 / SOUTH y=28) round-robined by
|
| 529 |
+
# seed; the pre-placed `fact` (and therefore the build radius and
|
| 530 |
+
# the placement coords for powr/proc) flips per seed, so a
|
| 531 |
+
# memorised "(20,20) opening" cannot generalise. An inert HoldFire
|
| 532 |
+
# `e1` per group surfaces the variation via units_summary (the
|
| 533 |
+
# pack would otherwise be building-only); no `move_units`/
|
| 534 |
+
# `attack_unit` tool is exposed so the e1 is functionally inert
|
| 535 |
+
# and does not interact with the SOP test.
|
| 536 |
+
"build-power-online-first",
|
| 537 |
+
# Wave-7 REASONING pack — cost-optimal build-order (powr → proc →
|
| 538 |
+
# weap) under a tight deadline (PlanBench cost-optimal / BOM-
|
| 539 |
+
# manufacturing critical-path anchor). Hard tier defines two agent
|
| 540 |
+
# spawn_point groups (NORTH base y=14 / SOUTH base y=26) round-
|
| 541 |
+
# robined by seed; ore patches are duplicated at both latitudes so
|
| 542 |
+
# harv income is symmetric per spawn. A memorised "place powr at
|
| 543 |
+
# (14,22)" opening cannot generalise — placement must be computed
|
| 544 |
+
# relative to the actual fact each seed.
|
| 545 |
+
"build-sequence-tech-fastest",
|
| 546 |
]
|
| 547 |
|
| 548 |
# Consciously NOT spawn-varied, with the reason (keeps the curation
|