Spaces:
Running
no-cheat redesign: build-defensive-tower-line — wide-front LINE topology
Browse filesTailor the build-defensive-tower-line pack for the WIDE-FRONT rush
geometry the capability advertises. The previous design had attackers
funnelled through a 4-row corridor (y=18..22) and demanded a 4-pbox
line along that corridor; the rush was concentrated, which made
placement-vs-cluster less discriminating than intended.
New design (per-tier):
* easy — 3-pbox LINE (rungs y=8/20/32), $1800 budget, single rush
wave of 6 e1 (2 per row × 3 rows) at tick 1500. Kill bar 4.
* medium — 5-pbox LINE (rungs y=4/12/20/28/36), $3000 budget, single
rush wave of 10 e1 (2 per row × 5 rows) at tick 2200. Kill bar 7.
* hard — 6-pbox LINE (rungs y=4/10/16/22/28/34), $4800 budget
(= 6 rungs + 2 rebuilds), TWO scheduled rush waves (tick 1800 and
tick 3000) so a rung the first wave razes must be REBUILT before
wave 2 lands — the "attrition over time" mechanic the spec asks
for. Kill bar 8. Hard also flips the agent base latitude per seed
(NORTH y=12 / SOUTH y=28 via spawn_point round-robin) so a memorised
relative-to-base placement plan cannot generalise; the LINE rungs
themselves stay fixed at x=60 since that is map geometry.
Each level spreads attackers across the FULL vertical width of the
playable arena at distinct rows (the spec's WIDE front). The rusher
bot charges the agent fact centroid on the west, so each row's spawn
group walks WEST through x=60 on its starting y — different rows
cross the central column at different y values, forcing a LINE
topology to intercept every row.
The win predicate keeps placement load-bearing: one
`building_in_region` clause per rung at radius 0.5 (cell-exact) so
a cluster on the centre row misses every flank rung, a scatter near
the base misses every rung, and the intended LINE (one pbox per row)
is the only configuration that simultaneously satisfies the count,
all rung clauses, the kill quota, the fact-alive clause and the
within_ticks deadline. `after_ticks` in the fail clause makes
non-winners a real reachable LOSS (no interrupts ⇒ exactly 90 ticks/
step).
Validation (scripted, no model, four-script no-cheat bar):
* stall (observe-only) — LOSS every level + every seed (fact razed by
the rush AND clock runs out with no pbox);
* cluster-on-centre (K pboxes piled on the y=20 row) — LOSS every
level + every seed (count satisfied but flank rungs unmet, flank
rows leak the rush through to the fact);
* scatter-near-base (K pboxes hugging the fact) — LOSS every level +
every seed (every rung region unmet, rush reaches fact past
unguarded front);
* intended LINE (one pbox per rung, with rebuild on hard) — WIN every
level + every seed (count, all rungs, kill quota, fact alive,
within_ticks all satisfied before the deadline).
Pre-existing CLAUDE.md footguns honoured: rusher bot charges centroid;
`place_building` works at arbitrary in-bounds coords (no adjacency);
Building/Defense single-stream queue gives the LINE serial build
time; unarmed enemy fact at (120,20) keeps the engine alive past full
rush elimination so the win/fail check fires (auto-`done`
mitigation); fact-alive uses the present-tense
`building_count_gte:{fact,n:1}` not the one-shot `has_building`;
NO pre-placed agent combat screen, only one non-combatant corner e1
per active spawn group so units_summary is non-empty without leaking
kills.
Tests updated: per-tier rung topology, budget, kill bar, scatter-
near-base policy (replaces the previous random-4-pbox policy), and a
new cluster-on-centre wrong-topology policy. Hard tier has 6 rungs
and TWO scheduled waves; the test_rush_arrives_as_a_scheduled_event
contract now checks the per-tier wave count.
|
@@ -1,64 +1,136 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
meta:
|
| 2 |
id: build-defensive-tower-line
|
| 3 |
-
title: 'Build a Defensive Tower LINE Across
|
| 4 |
capability: reasoning
|
| 5 |
real_world_meaning: >
|
| 6 |
-
Where do you commit your defensive
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
|
|
|
| 13 |
The win predicate makes the LINE topology load-bearing — total
|
| 14 |
pillbox count alone is not enough; ≥1 pillbox must sit on EACH of
|
| 15 |
-
the
|
| 16 |
-
|
| 17 |
-
actually KILL the rush funnelled through the corridor.
|
| 18 |
robotics_analogue: >
|
| 19 |
Network firewall / Web Application Firewall rule placement: when
|
| 20 |
-
every
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
|
|
|
| 27 |
benchmark_anchor:
|
| 28 |
- "ERQA"
|
| 29 |
- "MicroRTS defense"
|
| 30 |
- "military perimeter"
|
| 31 |
author: openra-bench
|
| 32 |
|
| 33 |
-
# rush-hour-arena (128×40). The map has a narrow lane around y≈18..22
|
| 34 |
-
# at mid-map (x≈60); the agent fact sits on the WEST (x=10). The rush
|
| 35 |
-
# arrives as a `scheduled_events: spawn_actors` wave injected EAST of
|
| 36 |
-
# the corridor (x≈80) at a fixed tick — AFTER the agent has had time to
|
| 37 |
-
# build all 4 pillboxes serially. The `rusher` bot charges the agent
|
| 38 |
-
# centroid (the fact on the west), so the wave is forced THROUGH the
|
| 39 |
-
# x=60 corridor on its way to the fact on every seed.
|
| 40 |
-
#
|
| 41 |
-
# pbox is the load-bearing weapon. After the engine pbox-weapon fix
|
| 42 |
-
# (`fix(engine): pbox gets a direct-fire Armament`) a BUILT pbox is an
|
| 43 |
-
# active direct-fire tower (M60mg anti-infantry MG: one burst one-shots
|
| 44 |
-
# an e1). The win predicate now demands a KILL quota the pbox LINE must
|
| 45 |
-
# deliver — there are NO pre-placed agent defenders, so the only source
|
| 46 |
-
# of kill output is the agent's own pillbox LINE. A mis-placed pbox
|
| 47 |
-
# layout (cluster on the centre row, scatter near the base) leaves the
|
| 48 |
-
# corridor rungs open: the rush slips past on an unguarded row, kills
|
| 49 |
-
# nothing, and razes the fact.
|
| 50 |
-
#
|
| 51 |
-
# An unarmed high-HP enemy `fact` far east keeps the engine alive past
|
| 52 |
-
# rush annihilation so the win/fail check actually runs.
|
| 53 |
-
#
|
| 54 |
-
# SISTER PACK: def-tower-line-vs-cluster inverts the topology bar to
|
| 55 |
-
# enforce CLUSTER (graph min-cut doctrine); this pack enforces LINE
|
| 56 |
-
# (corridor-width perimeter doctrine). The two together discriminate
|
| 57 |
-
# whether the model understands the FORCING GEOMETRY: a chokepoint
|
| 58 |
-
# (single cell on a wide approach → cluster) vs a corridor (full
|
| 59 |
-
# vertical width that any one row can leak through → line).
|
| 60 |
base_map: rush-hour-arena
|
| 61 |
-
starting_cash: 2400
|
| 62 |
|
| 63 |
base:
|
| 64 |
agent:
|
|
@@ -75,91 +147,87 @@ base:
|
|
| 75 |
- attack_move
|
| 76 |
- stop
|
| 77 |
planning: true
|
| 78 |
-
# No interrupts — perimeter design is a STATIC up-front decision
|
| 79 |
-
#
|
| 80 |
-
# Dropping interrupts also makes the tick budget deterministic
|
| 81 |
-
#
|
| 82 |
-
#
|
| 83 |
interrupts: {}
|
| 84 |
termination:
|
| 85 |
max_ticks: 12000
|
| 86 |
actors: [] # every level supplies its own actor list via overrides.
|
| 87 |
|
| 88 |
levels:
|
| 89 |
-
# ── EASY ── bare LINE skill. Budget covers exactly
|
| 90 |
-
# Win requires ≥1 pbox in EACH of the
|
| 91 |
-
#
|
| 92 |
-
#
|
| 93 |
-
#
|
| 94 |
-
#
|
| 95 |
-
#
|
| 96 |
-
#
|
| 97 |
-
#
|
| 98 |
-
# clause AND the fact razed by the rush.
|
| 99 |
# max_turns 60 ⇒ reachable tick 93+90·59 = 5403; deadline 5400.
|
| 100 |
easy:
|
| 101 |
# Original (pre-verbosity-sweep) description preserved for
|
| 102 |
# contributors. The trimmed version below removes scripted-policy
|
| 103 |
# spoilers and cell-coord dumps; load-bearing intent kept.
|
| 104 |
#
|
| 105 |
-
# A
|
| 106 |
-
#
|
| 107 |
-
#
|
| 108 |
-
#
|
| 109 |
-
#
|
| 110 |
-
#
|
| 111 |
-
#
|
| 112 |
-
#
|
| 113 |
-
# pure-army layout (no pbox) fails the count and lets the rush
|
| 114 |
-
# raze the fact. Your pillboxes must kill at least 4 of the rush;
|
| 115 |
-
# your fact must survive.
|
| 116 |
-
# Original (pre-verbosity-sweep) description preserved for
|
| 117 |
-
# contributors. The trimmed version below removes scripted-policy
|
| 118 |
-
# spoilers and cell-coord dumps; load-bearing intent kept.
|
| 119 |
-
#
|
| 120 |
-
# A rush will funnel through the narrow mid-map corridor at x=60, rows
|
| 121 |
-
# y=18 to 22. Budget $2400 — build a pillbox on each of those four
|
| 122 |
-
# rows so nothing slips through. Kill four enemies and keep your
|
| 123 |
-
# construction yard, within about 60 turns.
|
| 124 |
description: >
|
| 125 |
-
A rifle rush
|
| 126 |
-
y=
|
| 127 |
-
|
| 128 |
-
at
|
| 129 |
-
|
| 130 |
-
|
|
|
|
| 131 |
overrides:
|
| 132 |
actors:
|
|
|
|
|
|
|
| 133 |
- {type: fact, owner: agent, position: [10, 20]}
|
| 134 |
- {type: tent, owner: agent, position: [14, 18]}
|
| 135 |
- {type: powr, owner: agent, position: [14, 22]}
|
| 136 |
# ONE non-combatant agent e1 parked in the far SW corner,
|
| 137 |
-
# nowhere near
|
| 138 |
-
#
|
| 139 |
-
#
|
| 140 |
-
#
|
|
|
|
| 141 |
- {type: e1, owner: agent, position: [4, 36], stance: 2}
|
| 142 |
# Unarmed high-HP marker (anti-DRAW): keeps the episode alive
|
| 143 |
# past full rusher elimination so the win/fail check fires.
|
| 144 |
- {type: fact, owner: enemy, position: [120, 20]}
|
| 145 |
-
# Scheduled rush wave —
|
| 146 |
-
#
|
| 147 |
-
#
|
| 148 |
-
# the
|
| 149 |
-
#
|
|
|
|
|
|
|
|
|
|
|
|
|
| 150 |
scheduled_events:
|
| 151 |
-
- tick:
|
| 152 |
type: spawn_actors
|
| 153 |
actors:
|
| 154 |
-
- {type: e1, owner: enemy, position: [
|
| 155 |
-
- {type:
|
|
|
|
| 156 |
win_condition:
|
| 157 |
all_of:
|
| 158 |
-
- building_count_gte: {type: pbox, n:
|
| 159 |
-
- building_in_region: {type: pbox, x: 60, y:
|
| 160 |
-
- building_in_region: {type: pbox, x: 60, y:
|
| 161 |
-
- building_in_region: {type: pbox, x: 60, y:
|
| 162 |
-
- building_in_region: {type: pbox, x: 60, y: 22, radius: 0.5, count: 1}
|
| 163 |
- units_killed_gte: 4
|
| 164 |
- building_count_gte: {type: fact, n: 1}
|
| 165 |
- within_ticks: 5400
|
|
@@ -169,55 +237,67 @@ levels:
|
|
| 169 |
- not: {building_count_gte: {type: fact, n: 1}}
|
| 170 |
max_turns: 60
|
| 171 |
|
| 172 |
-
# ── MEDIUM ── +1 axis:
|
| 173 |
-
#
|
| 174 |
-
#
|
| 175 |
-
#
|
| 176 |
-
#
|
| 177 |
-
#
|
| 178 |
-
#
|
|
|
|
| 179 |
medium:
|
| 180 |
# Original (pre-verbosity-sweep) description preserved for
|
| 181 |
# contributors. The trimmed version below removes scripted-policy
|
| 182 |
# spoilers and cell-coord dumps; load-bearing intent kept.
|
| 183 |
#
|
| 184 |
-
#
|
| 185 |
-
# (budget
|
| 186 |
-
# each of
|
| 187 |
-
#
|
| 188 |
-
#
|
| 189 |
-
#
|
| 190 |
-
# lose; the fact must survive.
|
| 191 |
description: >
|
| 192 |
-
|
| 193 |
-
|
| 194 |
-
|
| 195 |
-
|
|
|
|
|
|
|
| 196 |
overrides:
|
| 197 |
actors:
|
| 198 |
- {type: fact, owner: agent, position: [10, 20]}
|
| 199 |
- {type: tent, owner: agent, position: [14, 18]}
|
| 200 |
- {type: powr, owner: agent, position: [14, 22]}
|
| 201 |
-
#
|
| 202 |
-
#
|
|
|
|
|
|
|
|
|
|
| 203 |
- {type: e1, owner: agent, position: [4, 36], stance: 2}
|
| 204 |
# Anti-DRAW marker.
|
| 205 |
- {type: fact, owner: enemy, position: [120, 20]}
|
| 206 |
-
# Heavier rush wave:
|
| 207 |
-
# the
|
|
|
|
|
|
|
|
|
|
| 208 |
scheduled_events:
|
| 209 |
-
- tick:
|
| 210 |
type: spawn_actors
|
| 211 |
actors:
|
| 212 |
-
- {type: e1, owner: enemy, position: [
|
| 213 |
-
- {type:
|
|
|
|
|
|
|
|
|
|
| 214 |
win_condition:
|
| 215 |
all_of:
|
| 216 |
-
- building_count_gte: {type: pbox, n:
|
| 217 |
-
- building_in_region: {type: pbox, x: 60, y:
|
| 218 |
-
- building_in_region: {type: pbox, x: 60, y:
|
| 219 |
-
- building_in_region: {type: pbox, x: 60, y:
|
| 220 |
-
- building_in_region: {type: pbox, x: 60, y:
|
|
|
|
| 221 |
- units_killed_gte: 7
|
| 222 |
- building_count_gte: {type: fact, n: 1}
|
| 223 |
- within_ticks: 5400
|
|
@@ -227,73 +307,102 @@ levels:
|
|
| 227 |
- not: {building_count_gte: {type: fact, n: 1}}
|
| 228 |
max_turns: 60
|
| 229 |
|
| 230 |
-
# ── HARD ── +
|
| 231 |
-
#
|
| 232 |
-
#
|
| 233 |
-
#
|
| 234 |
-
#
|
| 235 |
-
#
|
| 236 |
-
#
|
| 237 |
-
#
|
| 238 |
-
#
|
| 239 |
-
#
|
| 240 |
-
#
|
| 241 |
-
#
|
| 242 |
-
#
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 243 |
hard:
|
| 244 |
# Original (pre-verbosity-sweep) description preserved for
|
| 245 |
# contributors. The trimmed version below removes scripted-policy
|
| 246 |
# spoilers and cell-coord dumps; load-bearing intent kept.
|
| 247 |
#
|
| 248 |
-
#
|
| 249 |
-
#
|
| 250 |
-
#
|
| 251 |
-
#
|
| 252 |
-
#
|
| 253 |
-
#
|
| 254 |
-
#
|
| 255 |
-
#
|
| 256 |
description: >
|
| 257 |
-
|
| 258 |
-
|
| 259 |
-
|
| 260 |
-
|
| 261 |
-
|
|
|
|
| 262 |
overrides:
|
| 263 |
actors:
|
| 264 |
-
# spawn_point 0 — NORTH base at y=12. Fact at (10, 12);
|
| 265 |
-
#
|
| 266 |
-
#
|
|
|
|
| 267 |
- {type: fact, owner: agent, position: [10, 12], spawn_point: 0}
|
| 268 |
- {type: tent, owner: agent, position: [6, 12], spawn_point: 0}
|
| 269 |
- {type: powr, owner: agent, position: [6, 14], spawn_point: 0}
|
|
|
|
| 270 |
- {type: e1, owner: agent, position: [4, 36], stance: 2, spawn_point: 0}
|
| 271 |
# spawn_point 1 — SOUTH base at y=28 (mirror across y=20).
|
|
|
|
| 272 |
- {type: fact, owner: agent, position: [10, 28], spawn_point: 1}
|
| 273 |
- {type: tent, owner: agent, position: [6, 28], spawn_point: 1}
|
| 274 |
- {type: powr, owner: agent, position: [6, 26], spawn_point: 1}
|
|
|
|
| 275 |
- {type: e1, owner: agent, position: [4, 4], stance: 2, spawn_point: 1}
|
| 276 |
# Anti-DRAW marker (enemy fact doesn't honour spawn_point).
|
| 277 |
- {type: fact, owner: enemy, position: [120, 20]}
|
| 278 |
-
#
|
| 279 |
-
#
|
| 280 |
-
#
|
| 281 |
-
#
|
| 282 |
-
#
|
|
|
|
|
|
|
|
|
|
|
|
|
| 283 |
scheduled_events:
|
| 284 |
- tick: 1800
|
| 285 |
type: spawn_actors
|
| 286 |
actors:
|
| 287 |
-
- {type: e1, owner: enemy, position: [
|
| 288 |
-
- {type:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 289 |
win_condition:
|
| 290 |
all_of:
|
| 291 |
-
- building_count_gte: {type: pbox, n:
|
| 292 |
-
- building_in_region: {type: pbox, x: 60, y:
|
| 293 |
-
- building_in_region: {type: pbox, x: 60, y:
|
| 294 |
-
- building_in_region: {type: pbox, x: 60, y:
|
| 295 |
- building_in_region: {type: pbox, x: 60, y: 22, radius: 0.5, count: 1}
|
| 296 |
-
-
|
|
|
|
|
|
|
| 297 |
- building_count_gte: {type: fact, n: 1}
|
| 298 |
- within_ticks: 6300
|
| 299 |
fail_condition:
|
|
|
|
| 1 |
+
# build-defensive-tower-line — Build a Defensive Pillbox LINE Across a WIDE Front
|
| 2 |
+
#
|
| 3 |
+
# REASONING focus: when the threat is funnelled along a WIDE front (a
|
| 4 |
+
# rush spread across the full vertical width of the map, not pinched
|
| 5 |
+
# through a single corridor cell), the right defensive architecture is
|
| 6 |
+
# ONE pillbox per row across the FULL width — a LINE that no enemy unit
|
| 7 |
+
# can slip past on an unguarded row. A dense cluster at the centre row
|
| 8 |
+
# wastes overlapping fire on one cell while the flanks stay open; a
|
| 9 |
+
# scatter near the base never engages the rush at all.
|
| 10 |
+
#
|
| 11 |
+
# This pack is the SISTER / INVERSE of `def-tower-line-vs-cluster` (which
|
| 12 |
+
# enforces CLUSTER topology at a single-cell chokepoint, graph min-cut
|
| 13 |
+
# doctrine). Together the two packs discriminate whether the model
|
| 14 |
+
# understands the FORCING GEOMETRY: a chokepoint (single cell on a wide
|
| 15 |
+
# approach → cluster) vs a wide front (full vertical width where every
|
| 16 |
+
# row carries a rush column → line).
|
| 17 |
+
#
|
| 18 |
+
# Real-world anchor:
|
| 19 |
+
# • military perimeter doctrine — when an attacker can approach across
|
| 20 |
+
# the full width of a sector, perimeter posts cover EVERY lane
|
| 21 |
+
# across the front; concentrating the entire garrison at one post
|
| 22 |
+
# leaves the rest of the front traversable.
|
| 23 |
+
# • firewall / IDS rule placement — one inspection rule per
|
| 24 |
+
# protocol/port across the full inspection surface; duplicated rules
|
| 25 |
+
# on one port leave the rest open.
|
| 26 |
+
# • MicroRTS defense placement — defending a wide approach demands
|
| 27 |
+
# spread coverage; concentrating into one cell of an open arena is
|
| 28 |
+
# known to LOSE to a multi-lane advance.
|
| 29 |
+
#
|
| 30 |
+
# Map: `rush-hour-arena` (128×40, fully open). The agent base sits on
|
| 31 |
+
# the WEST (fact at x=10); the rush wave is injected at the EAST edge
|
| 32 |
+
# (x=100) spread across MULTIPLE distinct y rows that span the full
|
| 33 |
+
# vertical width of the playable area. The `rusher` scripted bot then
|
| 34 |
+
# charges the agent fact centroid, so each row's spawn group walks WEST
|
| 35 |
+
# through the central x=60 column on its way to the fact — different
|
| 36 |
+
# rows cross x=60 at DIFFERENT y values, forcing a LINE topology to
|
| 37 |
+
# intercept every row.
|
| 38 |
+
#
|
| 39 |
+
# pbox is the load-bearing weapon. After the engine pbox-weapon fix
|
| 40 |
+
# (`fix(engine): pbox gets a direct-fire Armament`) a BUILT pbox is an
|
| 41 |
+
# active direct-fire anti-infantry tower (M60mg burst-5; one burst
|
| 42 |
+
# one-shots an e1). With NO pre-placed agent combat screen the pbox
|
| 43 |
+
# LINE is the SOLE source of kill output — a stall / wrong-placement
|
| 44 |
+
# layout kills nothing AND the rush razes the fact.
|
| 45 |
+
#
|
| 46 |
+
# Win predicate (load-bearing decomposition):
|
| 47 |
+
# • `building_count_gte:{pbox,n:K}` — built the full budget;
|
| 48 |
+
# • `building_in_region:{pbox, x:60, y:<rung>, radius:0.5, count:1}`
|
| 49 |
+
# for EACH of the K rung rows across the full front (radius 0.5 ⇒
|
| 50 |
+
# cell-exact; a cluster on the middle row misses every flank rung,
|
| 51 |
+
# a scatter near the base misses every rung);
|
| 52 |
+
# • `units_killed_gte:K` — the pbox LINE must actively KILL the rush,
|
| 53 |
+
# not just stand (a stall / pure-army layout kills 0);
|
| 54 |
+
# • `building_count_gte:{fact,n:1}` (PRESENT-TENSE — `has_building`
|
| 55 |
+
# is the documented CLAUDE.md "ever-seen" footgun);
|
| 56 |
+
# • `within_ticks` + `after_ticks` fail clause ⇒ a non-finisher is a
|
| 57 |
+
# real reachable timeout LOSS (no interrupts ⇒ exactly 90 ticks per
|
| 58 |
+
# step, so `max_turns` is a hard tick budget the `after_ticks`
|
| 59 |
+
# deadline reliably bites in).
|
| 60 |
+
#
|
| 61 |
+
# Discrimination (four-script bar — scripted, no model needed):
|
| 62 |
+
# • stall (observe-only): spends nothing; the rush razes the fact →
|
| 63 |
+
# fact-alive fail clause fires → LOSS. The `after_ticks` deadline is
|
| 64 |
+
# a backstop so a staller who somehow keeps the fact also times out
|
| 65 |
+
# (no draw degeneracy).
|
| 66 |
+
# • cluster-on-centre (K pboxes piled on the y=20 row): satisfies the
|
| 67 |
+
# count but EVERY flank rung region is empty (radius 0.5 ⇒
|
| 68 |
+
# cell-exact, the central pile doesn't touch the flanks); the win
|
| 69 |
+
# never latches and the unguarded flank rows let the rush leak
|
| 70 |
+
# through → LOSS.
|
| 71 |
+
# • scatter-near-base (K pboxes hugging the fact west of x=20): every
|
| 72 |
+
# rung region is empty AND the pboxes are too far west to engage
|
| 73 |
+
# the rush before it reaches the fact → LOSS.
|
| 74 |
+
# • intended LINE (one pbox at each of the K corridor rungs at x=60):
|
| 75 |
+
# every row is covered AND the wave is killed at the central
|
| 76 |
+
# column AND the fact survives → WIN.
|
| 77 |
+
#
|
| 78 |
+
# Engine footguns honoured:
|
| 79 |
+
# • `place_building` does NOT enforce build-adjacency (CLAUDE.md) — the
|
| 80 |
+
# LINE rungs sit deep at x=60 with no nearby agent base; the engine
|
| 81 |
+
# places them anyway.
|
| 82 |
+
# • Building / Defense queues feed from the construction yard
|
| 83 |
+
# (single-stream); the agent must build pboxes serially. The tick
|
| 84 |
+
# budget on every tier gives the LINE time to assemble BEFORE the
|
| 85 |
+
# scheduled rush wave hits.
|
| 86 |
+
# • Hard tier defines TWO agent spawn_point groups (NORTH y=12 / SOUTH
|
| 87 |
+
# y=28) round-robined by seed (CLAUDE.md hard-tier contract). Enemy
|
| 88 |
+
# actors do not honour spawn_point, so the rush wave is fixed on the
|
| 89 |
+
# full-width spawn axis and crosses the LINE rungs identically for
|
| 90 |
+
# both base latitudes — what flips per seed is the BEARING the rush
|
| 91 |
+
# approaches the agent fact from, not the rung topology itself.
|
| 92 |
+
# • An unarmed high-HP enemy `fact` marker at (120,20) keeps the
|
| 93 |
+
# engine alive past full rush annihilation so the win/fail check
|
| 94 |
+
# fires (CLAUDE.md auto-`done` mitigation).
|
| 95 |
+
# • NO pre-placed agent combat screen — one non-combatant e1 in a far
|
| 96 |
+
# corner per spawn group satisfies the hard-tier env-reset
|
| 97 |
+
# non-empty-units check while contributing ZERO kills, so a
|
| 98 |
+
# wrong-placement spend cannot pass the kill clause off it.
|
| 99 |
+
|
| 100 |
meta:
|
| 101 |
id: build-defensive-tower-line
|
| 102 |
+
title: 'Build a Defensive Tower LINE Across a WIDE Front (Not a Cluster, Not a Scatter)'
|
| 103 |
capability: reasoning
|
| 104 |
real_world_meaning: >
|
| 105 |
+
Where do you commit your defensive towers when the threat is a rush
|
| 106 |
+
spread across the FULL WIDTH of the map — not pinched through a
|
| 107 |
+
single corridor cell, but advancing on every row of a wide front?
|
| 108 |
+
Military perimeter doctrine and firewall rule design both say:
|
| 109 |
+
cover EVERY lane across the front, one post per row, so no enemy
|
| 110 |
+
unit can slip past on an unguarded row. A single dense cluster on
|
| 111 |
+
one row wastes overlapping fire on one cell while every other row
|
| 112 |
+
stays open; a scatter near the base never engages the rush at all.
|
| 113 |
The win predicate makes the LINE topology load-bearing — total
|
| 114 |
pillbox count alone is not enough; ≥1 pillbox must sit on EACH of
|
| 115 |
+
the front's rung rows (cell-exact via radius 0.5), AND those
|
| 116 |
+
pillboxes must actually KILL the rush spread across the front.
|
|
|
|
| 117 |
robotics_analogue: >
|
| 118 |
Network firewall / Web Application Firewall rule placement: when
|
| 119 |
+
every protocol/port could be the path of compromise, the right
|
| 120 |
+
architecture is one rule per port across the full inspection
|
| 121 |
+
surface, not three duplicated rules on one port while the rest stay
|
| 122 |
+
open. Likewise a physical perimeter patrol covers EVERY approach
|
| 123 |
+
lane across the front — a cluster at one waypoint or a scatter
|
| 124 |
+
across unrelated nodes both leave the actual front lanes
|
| 125 |
+
traversable. Defense in depth across a wide approach demands one
|
| 126 |
+
responder per lane, not many responders at one waypoint.
|
| 127 |
benchmark_anchor:
|
| 128 |
- "ERQA"
|
| 129 |
- "MicroRTS defense"
|
| 130 |
- "military perimeter"
|
| 131 |
author: openra-bench
|
| 132 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 133 |
base_map: rush-hour-arena
|
|
|
|
| 134 |
|
| 135 |
base:
|
| 136 |
agent:
|
|
|
|
| 147 |
- attack_move
|
| 148 |
- stop
|
| 149 |
planning: true
|
| 150 |
+
# No interrupts — perimeter design is a STATIC up-front decision (the
|
| 151 |
+
# front geometry is known a priori, the rush composition is fixed).
|
| 152 |
+
# Dropping interrupts also makes the tick budget deterministic (each
|
| 153 |
+
# step is exactly 90 ticks ⇒ max_turns is a hard tick budget that the
|
| 154 |
+
# `after_ticks` fail clause reliably bites in).
|
| 155 |
interrupts: {}
|
| 156 |
termination:
|
| 157 |
max_ticks: 12000
|
| 158 |
actors: [] # every level supplies its own actor list via overrides.
|
| 159 |
|
| 160 |
levels:
|
| 161 |
+
# ── EASY ── bare LINE skill. Budget covers exactly 3 pbox (1800cr).
|
| 162 |
+
# Win requires ≥1 pbox in EACH of the 3 front rungs (y=8, 20, 32 at
|
| 163 |
+
# x=60, radius 0.5 — only the exact rung cell counts, so a cluster on
|
| 164 |
+
# y=20 misses the y=8 and y=32 flank rungs). The rush arrives at
|
| 165 |
+
# tick 1500 — after the LINE has had time to assemble. A cluster at
|
| 166 |
+
# (60,20) satisfies the count clause but FAILS the flank rung clauses
|
| 167 |
+
# AND lets the flank rush leak through the open rows; a random
|
| 168 |
+
# scatter near the base misses every rung AND kills nothing; a stall
|
| 169 |
+
# loses on the count clause AND the fact razed by the rush.
|
|
|
|
| 170 |
# max_turns 60 ⇒ reachable tick 93+90·59 = 5403; deadline 5400.
|
| 171 |
easy:
|
| 172 |
# Original (pre-verbosity-sweep) description preserved for
|
| 173 |
# contributors. The trimmed version below removes scripted-policy
|
| 174 |
# spoilers and cell-coord dumps; load-bearing intent kept.
|
| 175 |
#
|
| 176 |
+
# A rifle rush will charge across the FULL WIDTH of the front (3
|
| 177 |
+
# distinct rows: y=8, y=20, y=32 at the east edge). Budget is
|
| 178 |
+
# exactly $1800 — drop one pillbox on each of those three rows at
|
| 179 |
+
# x=60 so nothing slips past on any row. A cluster on the centre
|
| 180 |
+
# row misses both flank rungs and lets the flank rush leak through;
|
| 181 |
+
# a scatter near the base never engages the rush. Kill at least
|
| 182 |
+
# four and keep your construction yard standing, within about 60
|
| 183 |
+
# turns.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 184 |
description: >
|
| 185 |
+
A rifle rush charges across the full vertical width of the map —
|
| 186 |
+
three lanes at y=8, y=20, and y=32 — toward your yard on the west.
|
| 187 |
+
Budget $1800 — three pillboxes. Drop one on each of those three
|
| 188 |
+
rows at x=60 so no lane is open. A cluster on the centre row
|
| 189 |
+
leaves both flanks unguarded; a scatter near the base never meets
|
| 190 |
+
the rush. Four kills, yard intact, within about 60 turns.
|
| 191 |
+
starting_cash: 1800
|
| 192 |
overrides:
|
| 193 |
actors:
|
| 194 |
+
# Pre-placed agent base on the WEST. NO combat units near the
|
| 195 |
+
# base — the pbox LINE must be the sole source of kill output.
|
| 196 |
- {type: fact, owner: agent, position: [10, 20]}
|
| 197 |
- {type: tent, owner: agent, position: [14, 18]}
|
| 198 |
- {type: powr, owner: agent, position: [14, 22]}
|
| 199 |
# ONE non-combatant agent e1 parked in the far SW corner,
|
| 200 |
+
# nowhere near any rush lane. It exists only so units_summary
|
| 201 |
+
# is non-empty (hard-tier env-reset check); it never reaches
|
| 202 |
+
# combat and contributes ZERO kills — the pbox LINE is the sole
|
| 203 |
+
# source of kill output, so a scatter or stall play cannot pass
|
| 204 |
+
# the kill clause off it.
|
| 205 |
- {type: e1, owner: agent, position: [4, 36], stance: 2}
|
| 206 |
# Unarmed high-HP marker (anti-DRAW): keeps the episode alive
|
| 207 |
# past full rusher elimination so the win/fail check fires.
|
| 208 |
- {type: fact, owner: enemy, position: [120, 20]}
|
| 209 |
+
# Scheduled rush wave — 6 e1 spread across the full vertical
|
| 210 |
+
# width (2 per row × 3 rows: y=8, y=20, y=32), injected at tick
|
| 211 |
+
# 1500 from the east at x=100. By tick 1500 all 3 LINE pillboxes
|
| 212 |
+
# are built (the 3rd pbox completes ~tick 1350 from a fresh-cash
|
| 213 |
+
# tent + serial defense queue). The rusher charges the agent
|
| 214 |
+
# centroid (fact at (10,20)), so each row's spawn group walks
|
| 215 |
+
# WEST through x=60 — the y=8 and y=32 flank groups cross x=60 at
|
| 216 |
+
# their starting y values, demanding a LINE that covers every
|
| 217 |
+
# row.
|
| 218 |
scheduled_events:
|
| 219 |
+
- tick: 1500
|
| 220 |
type: spawn_actors
|
| 221 |
actors:
|
| 222 |
+
- {type: e1, owner: enemy, position: [100, 8], stance: 3, count: 2}
|
| 223 |
+
- {type: e1, owner: enemy, position: [100, 20], stance: 3, count: 2}
|
| 224 |
+
- {type: e1, owner: enemy, position: [100, 32], stance: 3, count: 2}
|
| 225 |
win_condition:
|
| 226 |
all_of:
|
| 227 |
+
- building_count_gte: {type: pbox, n: 3}
|
| 228 |
+
- building_in_region: {type: pbox, x: 60, y: 8, radius: 0.5, count: 1}
|
| 229 |
+
- building_in_region: {type: pbox, x: 60, y: 20, radius: 0.5, count: 1}
|
| 230 |
+
- building_in_region: {type: pbox, x: 60, y: 32, radius: 0.5, count: 1}
|
|
|
|
| 231 |
- units_killed_gte: 4
|
| 232 |
- building_count_gte: {type: fact, n: 1}
|
| 233 |
- within_ticks: 5400
|
|
|
|
| 237 |
- not: {building_count_gte: {type: fact, n: 1}}
|
| 238 |
max_turns: 60
|
| 239 |
|
| 240 |
+
# ── MEDIUM ── +1 axis: 5-pbox LINE across the full front (rungs at
|
| 241 |
+
# y=4, 12, 20, 28, 36 — finer spacing than easy, covering the full
|
| 242 |
+
# 36-cell playable height). Budget covers exactly 5 pbox (3000cr).
|
| 243 |
+
# The rush wave is 10 e1 (2 per row × 5 rows), so a complete LINE
|
| 244 |
+
# must hold every row — a 3-rung easy-style layout leaves 2 rows
|
| 245 |
+
# unguarded and the flanks leak. A cluster on y=20 misses the y=4 /
|
| 246 |
+
# y=12 / y=28 / y=36 rungs (radius 0.5 ⇒ cell-exact). max_turns 60
|
| 247 |
+
# ⇒ reachable tick 5403; deadline 5400.
|
| 248 |
medium:
|
| 249 |
# Original (pre-verbosity-sweep) description preserved for
|
| 250 |
# contributors. The trimmed version below removes scripted-policy
|
| 251 |
# spoilers and cell-coord dumps; load-bearing intent kept.
|
| 252 |
#
|
| 253 |
+
# The rush widens to 5 distinct rows (y=4, 12, 20, 28, 36). Build 5
|
| 254 |
+
# pillboxes (budget 3000cr = exactly 5 pbox at 600 each) AND place
|
| 255 |
+
# ONE on each of those five rows at x=60. The complete 5-rung LINE
|
| 256 |
+
# is required — any open rung lets the rush slip past on that row.
|
| 257 |
+
# Kill at least seven and keep your construction yard, within about
|
| 258 |
+
# 60 turns.
|
|
|
|
| 259 |
description: >
|
| 260 |
+
A wider rush now: five lanes at y=4, y=12, y=20, y=28, y=36 spread
|
| 261 |
+
across the full vertical front. Budget $3000 — five pillboxes.
|
| 262 |
+
One on each of those five rows at x=60; an easy-style three-rung
|
| 263 |
+
line leaves two flank rows open. Seven kills, yard intact, within
|
| 264 |
+
about 60 turns.
|
| 265 |
+
starting_cash: 3000
|
| 266 |
overrides:
|
| 267 |
actors:
|
| 268 |
- {type: fact, owner: agent, position: [10, 20]}
|
| 269 |
- {type: tent, owner: agent, position: [14, 18]}
|
| 270 |
- {type: powr, owner: agent, position: [14, 22]}
|
| 271 |
+
# Two powr to cover the 5-pbox power draw (5×-20 = -100; tent
|
| 272 |
+
# also draws; one powr +100 not enough margin once the rush
|
| 273 |
+
# damages buildings, so use 2 powr for stability).
|
| 274 |
+
- {type: powr, owner: agent, position: [14, 16]}
|
| 275 |
+
# Non-combatant SW-corner e1 (see easy).
|
| 276 |
- {type: e1, owner: agent, position: [4, 36], stance: 2}
|
| 277 |
# Anti-DRAW marker.
|
| 278 |
- {type: fact, owner: enemy, position: [120, 20]}
|
| 279 |
+
# Heavier rush wave: 10 e1 (2 per row × 5 rows), injected at tick
|
| 280 |
+
# 2200 — after the intended 5-pbox LINE has time to assemble (~
|
| 281 |
+
# tick 1700 for the 5th from a fresh-cash tent + serial defense
|
| 282 |
+
# queue). The rusher charges the agent centroid, so each row's
|
| 283 |
+
# spawn group walks WEST through x=60 on its starting y.
|
| 284 |
scheduled_events:
|
| 285 |
+
- tick: 2200
|
| 286 |
type: spawn_actors
|
| 287 |
actors:
|
| 288 |
+
- {type: e1, owner: enemy, position: [100, 4], stance: 3, count: 2}
|
| 289 |
+
- {type: e1, owner: enemy, position: [100, 12], stance: 3, count: 2}
|
| 290 |
+
- {type: e1, owner: enemy, position: [100, 20], stance: 3, count: 2}
|
| 291 |
+
- {type: e1, owner: enemy, position: [100, 28], stance: 3, count: 2}
|
| 292 |
+
- {type: e1, owner: enemy, position: [100, 36], stance: 3, count: 2}
|
| 293 |
win_condition:
|
| 294 |
all_of:
|
| 295 |
+
- building_count_gte: {type: pbox, n: 5}
|
| 296 |
+
- building_in_region: {type: pbox, x: 60, y: 4, radius: 0.5, count: 1}
|
| 297 |
+
- building_in_region: {type: pbox, x: 60, y: 12, radius: 0.5, count: 1}
|
| 298 |
+
- building_in_region: {type: pbox, x: 60, y: 20, radius: 0.5, count: 1}
|
| 299 |
+
- building_in_region: {type: pbox, x: 60, y: 28, radius: 0.5, count: 1}
|
| 300 |
+
- building_in_region: {type: pbox, x: 60, y: 36, radius: 0.5, count: 1}
|
| 301 |
- units_killed_gte: 7
|
| 302 |
- building_count_gte: {type: fact, n: 1}
|
| 303 |
- within_ticks: 5400
|
|
|
|
| 307 |
- not: {building_count_gte: {type: fact, n: 1}}
|
| 308 |
max_turns: 60
|
| 309 |
|
| 310 |
+
# ── HARD ── +2 axes: (1) a 6-rung LINE (y=4, 10, 16, 22, 28, 34 at
|
| 311 |
+
# x=60) covering the full front with TIGHTER spacing AND (2) ATTRITION
|
| 312 |
+
# over time via TWO scheduled waves (tick 1800 and tick 3000), so a
|
| 313 |
+
# pbox damaged in wave 1 may fall to wave 2 — the agent must REBUILD
|
| 314 |
+
# any rung the first wave razes before the second wave hits. Budget
|
| 315 |
+
# 4800cr = exactly 8 pbox (6 rungs + 2 rebuilds), so the cash is
|
| 316 |
+
# tight: there is no slack for a 7th rung OR a pbox parked near the
|
| 317 |
+
# base. Hard tier also flips the agent base latitude per seed (NORTH
|
| 318 |
+
# y=12 / SOUTH y=28 round-robined via spawn_point) — the LINE
|
| 319 |
+
# topology is identical across seeds (the front rungs are fixed map
|
| 320 |
+
# geometry at x=60) but the agent base bearing flips, so a memorised
|
| 321 |
+
# relative-to-base placement plan cannot generalise. Enemies don't
|
| 322 |
+
# honour spawn_point (CLAUDE.md), so the rush waves inject on both
|
| 323 |
+
# bases' candidate latitudes regardless of seed — but the rush
|
| 324 |
+
# geometry is the SAME because the rusher charges the agent centroid,
|
| 325 |
+
# and on either base latitude the LINE rungs at x=60 are what catches
|
| 326 |
+
# the rush before it reaches the fact. max_turns 70 ⇒ reachable tick
|
| 327 |
+
# 93+90·69 = 6303; deadline 6300.
|
| 328 |
hard:
|
| 329 |
# Original (pre-verbosity-sweep) description preserved for
|
| 330 |
# contributors. The trimmed version below removes scripted-policy
|
| 331 |
# spoilers and cell-coord dumps; load-bearing intent kept.
|
| 332 |
#
|
| 333 |
+
# The full front widens to 6 rungs (y=4, 10, 16, 22, 28, 34) and the
|
| 334 |
+
# rush arrives in TWO waves (attrition): wave 1 at tick 1800 plus
|
| 335 |
+
# wave 2 at tick 3000. Budget 4800cr (= 8 pbox) — enough for the 6
|
| 336 |
+
# rungs PLUS 2 rebuilds for any rung the first wave razes. Your
|
| 337 |
+
# base latitude flips between NORTH (y=12) and SOUTH (y=28) by seed,
|
| 338 |
+
# so the bearing the rush approaches the fact from changes per
|
| 339 |
+
# seed; the LINE rungs themselves stay fixed at x=60. Kill at least
|
| 340 |
+
# 8, keep the fact, within about 70 turns.
|
| 341 |
description: >
|
| 342 |
+
The front widens to six lanes (y=4, 10, 16, 22, 28, 34) and the
|
| 343 |
+
rush arrives in TWO waves — pbox attrition is real, you must
|
| 344 |
+
rebuild lost rungs between waves. Budget $4800 = six rungs plus
|
| 345 |
+
two rebuilds. Your base flips NORTH/SOUTH by seed; the rungs at
|
| 346 |
+
x=60 don't. Eight kills, yard intact, within about 70 turns.
|
| 347 |
+
starting_cash: 4800
|
| 348 |
overrides:
|
| 349 |
actors:
|
| 350 |
+
# spawn_point 0 — NORTH base at y=12. Fact at (10, 12); tent/
|
| 351 |
+
# powr offset west so they aren't directly on the rusher path.
|
| 352 |
+
# Two powr for the 8-pbox power draw (8×-20=-160; 2 powr=+200).
|
| 353 |
+
# Non-combatant corner e1 in the far SW.
|
| 354 |
- {type: fact, owner: agent, position: [10, 12], spawn_point: 0}
|
| 355 |
- {type: tent, owner: agent, position: [6, 12], spawn_point: 0}
|
| 356 |
- {type: powr, owner: agent, position: [6, 14], spawn_point: 0}
|
| 357 |
+
- {type: powr, owner: agent, position: [6, 10], spawn_point: 0}
|
| 358 |
- {type: e1, owner: agent, position: [4, 36], stance: 2, spawn_point: 0}
|
| 359 |
# spawn_point 1 — SOUTH base at y=28 (mirror across y=20).
|
| 360 |
+
# Non-combatant corner e1 in the far NW.
|
| 361 |
- {type: fact, owner: agent, position: [10, 28], spawn_point: 1}
|
| 362 |
- {type: tent, owner: agent, position: [6, 28], spawn_point: 1}
|
| 363 |
- {type: powr, owner: agent, position: [6, 26], spawn_point: 1}
|
| 364 |
+
- {type: powr, owner: agent, position: [6, 30], spawn_point: 1}
|
| 365 |
- {type: e1, owner: agent, position: [4, 4], stance: 2, spawn_point: 1}
|
| 366 |
# Anti-DRAW marker (enemy fact doesn't honour spawn_point).
|
| 367 |
- {type: fact, owner: enemy, position: [120, 20]}
|
| 368 |
+
# Two-wave attrition. Each wave is 6 e1 (1 per row × 6 rows) at
|
| 369 |
+
# tick 1800 and tick 3000. The first wave finishes ~tick 2500,
|
| 370 |
+
# giving the agent a ~500-tick window to rebuild any razed rung
|
| 371 |
+
# before wave 2 lands. The serial defense queue can build 1
|
| 372 |
+
# pbox per ~270 ticks, so 2 rebuilds in 500 ticks is the budget
|
| 373 |
+
# the design enforces. A pure-stamp policy that places 6 pboxes
|
| 374 |
+
# and walks away will lose ANY rung wave 1 razed, then the
|
| 375 |
+
# surviving line fails to cover that row in wave 2 → rung clause
|
| 376 |
+
# unmet → LOSS.
|
| 377 |
scheduled_events:
|
| 378 |
- tick: 1800
|
| 379 |
type: spawn_actors
|
| 380 |
actors:
|
| 381 |
+
- {type: e1, owner: enemy, position: [100, 4], stance: 3, count: 1}
|
| 382 |
+
- {type: e1, owner: enemy, position: [100, 10], stance: 3, count: 1}
|
| 383 |
+
- {type: e1, owner: enemy, position: [100, 16], stance: 3, count: 1}
|
| 384 |
+
- {type: e1, owner: enemy, position: [100, 22], stance: 3, count: 1}
|
| 385 |
+
- {type: e1, owner: enemy, position: [100, 28], stance: 3, count: 1}
|
| 386 |
+
- {type: e1, owner: enemy, position: [100, 34], stance: 3, count: 1}
|
| 387 |
+
- tick: 3000
|
| 388 |
+
type: spawn_actors
|
| 389 |
+
actors:
|
| 390 |
+
- {type: e1, owner: enemy, position: [100, 4], stance: 3, count: 1}
|
| 391 |
+
- {type: e1, owner: enemy, position: [100, 10], stance: 3, count: 1}
|
| 392 |
+
- {type: e1, owner: enemy, position: [100, 16], stance: 3, count: 1}
|
| 393 |
+
- {type: e1, owner: enemy, position: [100, 22], stance: 3, count: 1}
|
| 394 |
+
- {type: e1, owner: enemy, position: [100, 28], stance: 3, count: 1}
|
| 395 |
+
- {type: e1, owner: enemy, position: [100, 34], stance: 3, count: 1}
|
| 396 |
win_condition:
|
| 397 |
all_of:
|
| 398 |
+
- building_count_gte: {type: pbox, n: 6}
|
| 399 |
+
- building_in_region: {type: pbox, x: 60, y: 4, radius: 0.5, count: 1}
|
| 400 |
+
- building_in_region: {type: pbox, x: 60, y: 10, radius: 0.5, count: 1}
|
| 401 |
+
- building_in_region: {type: pbox, x: 60, y: 16, radius: 0.5, count: 1}
|
| 402 |
- building_in_region: {type: pbox, x: 60, y: 22, radius: 0.5, count: 1}
|
| 403 |
+
- building_in_region: {type: pbox, x: 60, y: 28, radius: 0.5, count: 1}
|
| 404 |
+
- building_in_region: {type: pbox, x: 60, y: 34, radius: 0.5, count: 1}
|
| 405 |
+
- units_killed_gte: 8
|
| 406 |
- building_count_gte: {type: fact, n: 1}
|
| 407 |
- within_ticks: 6300
|
| 408 |
fail_condition:
|
|
@@ -1,13 +1,15 @@
|
|
| 1 |
"""build-defensive-tower-line scenario family, full loop on Rust.
|
| 2 |
|
| 3 |
-
The pack tests DEFENSIVE PERIMETER TOPOLOGY
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
|
|
|
|
|
|
| 11 |
|
| 12 |
Anchors: ERQA spatial commit / MicroRTS defense placement / military
|
| 13 |
perimeter (firewall rule placement).
|
|
@@ -15,23 +17,26 @@ perimeter (firewall rule placement).
|
|
| 15 |
The pbox is the load-bearing weapon. After the engine pbox-weapon fix
|
| 16 |
(`fix(engine): pbox gets a direct-fire Armament`) a BUILT pbox is an
|
| 17 |
active direct-fire anti-infantry tower. The rush arrives as a
|
| 18 |
-
`scheduled_events: spawn_actors` wave
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
|
|
|
|
|
|
| 23 |
|
| 24 |
The win predicate makes the LINE topology load-bearing — total pbox
|
| 25 |
count alone is not enough:
|
| 26 |
|
| 27 |
-
* `building_count_gte:{pbox, n:
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
|
|
|
|
| 33 |
* `units_killed_gte:K` ⇒ the pbox LINE must actively KILL the rush
|
| 34 |
-
|
| 35 |
* `building_count_gte:{fact,n:1}` (present-tense — `has_building` is
|
| 36 |
the one-shot "ever-seen" set, see CLAUDE.md footgun);
|
| 37 |
* `within_ticks` paired with `after_ticks` in the fail clause ⇒ a
|
|
@@ -39,13 +44,23 @@ count alone is not enough:
|
|
| 39 |
pack ⇒ each step is exactly 90 ticks, so max_turns is a hard tick
|
| 40 |
budget that the `after_ticks` deadline reliably bites in).
|
| 41 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 42 |
The scripted-policy validations prove deterministically that:
|
| 43 |
|
| 44 |
-
* the intended LINE policy (one pbox at each
|
| 45 |
-
|
| 46 |
-
* stall /
|
| 47 |
-
|
| 48 |
-
|
| 49 |
* the hard tier defines ≥2 spawn_point groups (NORTH base y=12 / SOUTH
|
| 50 |
base y=28) so a memorised base-relative placement cannot generalise.
|
| 51 |
"""
|
|
@@ -65,16 +80,36 @@ PACK = PACKS_DIR / "build-defensive-tower-line.yaml"
|
|
| 65 |
LEVELS = ("easy", "medium", "hard")
|
| 66 |
SEEDS = (1, 2, 3, 4)
|
| 67 |
|
| 68 |
-
#
|
| 69 |
-
#
|
| 70 |
-
#
|
| 71 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 72 |
|
| 73 |
-
# Cells used by the "
|
| 74 |
-
# clustered near the base rather than
|
| 75 |
# lie inside ANY rung region (radius 0.5 around the rung cells), so the
|
| 76 |
# region clauses are all unsatisfied.
|
| 77 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 78 |
|
| 79 |
|
| 80 |
# ── scripted policies ────────────────────────────────────────────────
|
|
@@ -86,55 +121,76 @@ def stall(rs, C):
|
|
| 86 |
return [C.observe()]
|
| 87 |
|
| 88 |
|
| 89 |
-
def make_line():
|
| 90 |
-
"""Intended LINE topology: one pbox at EACH
|
| 91 |
-
rung
|
|
|
|
|
|
|
| 92 |
|
| 93 |
def policy(rs, C):
|
| 94 |
own_b = rs.get("own_buildings") or []
|
| 95 |
-
|
|
|
|
|
|
|
|
|
|
| 96 |
prod = rs.get("production") or []
|
| 97 |
-
prod_items = [
|
| 98 |
-
|
| 99 |
-
|
| 100 |
-
|
| 101 |
-
|
| 102 |
-
|
| 103 |
-
|
| 104 |
-
|
| 105 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 106 |
|
| 107 |
return policy
|
| 108 |
|
| 109 |
|
| 110 |
-
def
|
| 111 |
-
"""
|
| 112 |
-
|
| 113 |
-
|
| 114 |
-
radius-0.5 disk), so the win predicate cannot fire."""
|
| 115 |
|
| 116 |
def policy(rs, C):
|
| 117 |
own_b = rs.get("own_buildings") or []
|
| 118 |
n = sum(1 for b in own_b if b.get("type") == "pbox")
|
| 119 |
prod = rs.get("production") or []
|
| 120 |
-
prod_items = [
|
| 121 |
-
|
|
|
|
|
|
|
| 122 |
return [C.observe()]
|
| 123 |
cmds = []
|
| 124 |
if "pbox" not in prod_items:
|
| 125 |
cmds.append(C.build("pbox"))
|
| 126 |
-
cmds.append(
|
| 127 |
-
C.place_building(
|
| 128 |
-
"pbox",
|
| 129 |
-
RANDOM_CELLS_NEAR_BASE[n][0],
|
| 130 |
-
RANDOM_CELLS_NEAR_BASE[n][1],
|
| 131 |
-
)
|
| 132 |
-
)
|
| 133 |
return cmds
|
| 134 |
|
| 135 |
return policy
|
| 136 |
|
| 137 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 138 |
# ── scenario-shape invariants ────────────────────────────────────────
|
| 139 |
|
| 140 |
|
|
@@ -148,8 +204,9 @@ def test_pack_compiles_with_three_levels_and_rusher_bot():
|
|
| 148 |
assert "ERQA" in anchors, anchors
|
| 149 |
assert "MicroRTS defense" in anchors, anchors
|
| 150 |
assert "military perimeter" in anchors, anchors
|
| 151 |
-
# Rusher bot wired through (charges agent centroid → forces
|
| 152 |
-
# rush
|
|
|
|
| 153 |
for lvl in LEVELS:
|
| 154 |
c = compile_level(pack, lvl)
|
| 155 |
assert c.map_supported
|
|
@@ -159,14 +216,18 @@ def test_pack_compiles_with_three_levels_and_rusher_bot():
|
|
| 159 |
assert str(bot).lower() == "rusher", (lvl, bot)
|
| 160 |
|
| 161 |
|
| 162 |
-
def
|
| 163 |
-
"""
|
| 164 |
-
|
| 165 |
-
|
|
|
|
|
|
|
| 166 |
pack = load_pack(PACK)
|
| 167 |
for lvl in LEVELS:
|
| 168 |
c = compile_level(pack, lvl)
|
| 169 |
-
assert c.starting_cash ==
|
|
|
|
|
|
|
| 170 |
|
| 171 |
|
| 172 |
@pytest.mark.parametrize("level", LEVELS)
|
|
@@ -208,28 +269,30 @@ def test_fact_alive_clause_uses_present_tense_predicate():
|
|
| 208 |
assert fact_clauses, f"{lvl}: missing present-tense fact-alive fail clause"
|
| 209 |
|
| 210 |
|
| 211 |
-
|
| 212 |
-
|
| 213 |
-
|
| 214 |
-
|
| 215 |
-
|
| 216 |
-
|
| 217 |
-
|
| 218 |
-
|
| 219 |
-
|
| 220 |
-
|
| 221 |
-
|
| 222 |
-
|
| 223 |
-
|
| 224 |
-
|
| 225 |
-
|
| 226 |
-
|
| 227 |
-
|
| 228 |
-
)
|
| 229 |
-
|
| 230 |
-
|
| 231 |
-
|
| 232 |
-
|
|
|
|
|
|
|
| 233 |
|
| 234 |
|
| 235 |
def test_win_requires_a_kill_quota():
|
|
@@ -251,7 +314,9 @@ def test_win_requires_a_kill_quota():
|
|
| 251 |
def test_rush_arrives_as_a_scheduled_event():
|
| 252 |
"""The rush is injected via `scheduled_events: spawn_actors` AFTER the
|
| 253 |
LINE has time to assemble — there is no t=0 enemy band racing the
|
| 254 |
-
build. This is what makes the build/rush race fair.
|
|
|
|
|
|
|
| 255 |
for lvl in LEVELS:
|
| 256 |
pack = load_pack(PACK)
|
| 257 |
raw = pack.levels[lvl]
|
|
@@ -260,25 +325,34 @@ def test_rush_arrives_as_a_scheduled_event():
|
|
| 260 |
ov = ov.model_dump(exclude_none=True)
|
| 261 |
evts = ov.get("scheduled_events") or []
|
| 262 |
assert evts, f"{lvl}: expected a scheduled rush wave"
|
| 263 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 264 |
|
| 265 |
|
| 266 |
def test_no_pre_placed_agent_combat_screen():
|
| 267 |
"""The pbox LINE must be the sole kill source — there is no
|
| 268 |
pre-placed agent combat screen ringing the base. Only ONE
|
| 269 |
-
non-combatant agent e1 is parked in a far corner (
|
| 270 |
-
is non-empty for the hard-tier env-reset check
|
|
|
|
| 271 |
for lvl in LEVELS:
|
| 272 |
c = compile_level(load_pack(PACK), lvl)
|
| 273 |
agent_units = [
|
| 274 |
a for a in c.scenario.actors
|
| 275 |
if a.owner == "agent" and a.type == "e1"
|
| 276 |
]
|
| 277 |
-
# At most one non-combatant marker per active spawn group
|
|
|
|
|
|
|
| 278 |
assert len(agent_units) <= 2, (lvl, [a.position for a in agent_units])
|
| 279 |
for a in agent_units:
|
| 280 |
x, y = a.position
|
| 281 |
-
# Parked in a far corner, well clear of the
|
|
|
|
| 282 |
assert x <= 6 and (y <= 6 or y >= 34), (lvl, a.position)
|
| 283 |
|
| 284 |
|
|
@@ -292,7 +366,7 @@ def test_hard_has_two_spawn_point_groups():
|
|
| 292 |
if a.owner == "agent" and a.spawn_point is not None
|
| 293 |
}
|
| 294 |
assert groups == {0, 1}, groups
|
| 295 |
-
# In-bounds check (rush-hour-arena playable y ≈ 2..38, x ≈ 2..126)
|
| 296 |
for a in c.scenario.actors:
|
| 297 |
x, y = a.position
|
| 298 |
assert 2 <= x <= 126 and 2 <= y <= 38, (a.type, a.position)
|
|
@@ -305,7 +379,7 @@ def test_hard_has_two_spawn_point_groups():
|
|
| 305 |
def test_intended_line_wins_every_level_and_seed(level):
|
| 306 |
c = compile_level(load_pack(PACK), level)
|
| 307 |
for seed in SEEDS:
|
| 308 |
-
r = run_level(c, make_line(), seed=seed)
|
| 309 |
assert r.outcome == "win", (
|
| 310 |
f"{level} seed{seed}: intended LINE topology must WIN; "
|
| 311 |
f"got {r.outcome} (tick={r.signals.game_tick}, "
|
|
@@ -322,19 +396,20 @@ def test_intended_line_wins_every_level_and_seed(level):
|
|
| 322 |
@pytest.mark.parametrize(
|
| 323 |
"policy_name,policy_factory",
|
| 324 |
[
|
| 325 |
-
("stall",
|
| 326 |
-
("
|
|
|
|
| 327 |
],
|
| 328 |
)
|
| 329 |
def test_lazy_and_wrong_topology_policies_lose_every_level_and_seed(
|
| 330 |
level, policy_name, policy_factory
|
| 331 |
):
|
| 332 |
-
"""Stall (rush razes fact AND clock runs out with no pbox)
|
| 333 |
-
|
| 334 |
-
|
| 335 |
-
every level + every seed — no draw."""
|
| 336 |
c = compile_level(load_pack(PACK), level)
|
| 337 |
-
fn = policy_factory()
|
| 338 |
for seed in SEEDS:
|
| 339 |
r = run_level(c, fn, seed=seed)
|
| 340 |
assert r.outcome == "loss", (
|
|
@@ -349,8 +424,8 @@ def test_lazy_and_wrong_topology_policies_lose_every_level_and_seed(
|
|
| 349 |
|
| 350 |
def test_intended_run_is_deterministic_on_easy():
|
| 351 |
c = compile_level(load_pack(PACK), "easy")
|
| 352 |
-
a = run_level(c, make_line(), seed=3)
|
| 353 |
-
b = run_level(c, make_line(), seed=3)
|
| 354 |
assert (a.outcome, a.turns, a.signals.units_killed) == (
|
| 355 |
b.outcome,
|
| 356 |
b.turns,
|
|
|
|
| 1 |
"""build-defensive-tower-line scenario family, full loop on Rust.
|
| 2 |
|
| 3 |
+
The pack tests DEFENSIVE PERIMETER TOPOLOGY ACROSS A WIDE FRONT: when
|
| 4 |
+
the threat is a rush spread across the FULL VERTICAL WIDTH of the map
|
| 5 |
+
(multiple distinct rows simultaneously), not pinched through a single
|
| 6 |
+
corridor cell, the right architecture is one pbox per row across the
|
| 7 |
+
FULL width (a LINE). A dense cluster on one row leaves every other row
|
| 8 |
+
unguarded; a scatter near the base never engages the rush. This is the
|
| 9 |
+
sibling/inverse of `def-tower-line-vs-cluster` (which forces a CLUSTER
|
| 10 |
+
at a single bottleneck cell); together the two packs discriminate
|
| 11 |
+
whether the model understands the FORCING GEOMETRY (single-cell
|
| 12 |
+
chokepoint vs wide-front approach).
|
| 13 |
|
| 14 |
Anchors: ERQA spatial commit / MicroRTS defense placement / military
|
| 15 |
perimeter (firewall rule placement).
|
|
|
|
| 17 |
The pbox is the load-bearing weapon. After the engine pbox-weapon fix
|
| 18 |
(`fix(engine): pbox gets a direct-fire Armament`) a BUILT pbox is an
|
| 19 |
active direct-fire anti-infantry tower. The rush arrives as a
|
| 20 |
+
`scheduled_events: spawn_actors` wave (or two waves, on hard) EAST of
|
| 21 |
+
the central column spread across multiple distinct rows AFTER the
|
| 22 |
+
agent has had time to build its LINE serially, and the `rusher` bot
|
| 23 |
+
charges the agent fact on the west, so each row's spawn group walks
|
| 24 |
+
WEST through the x=60 column on its starting y. There are NO pre-
|
| 25 |
+
placed agent defenders, so the pbox LINE is the sole source of kill
|
| 26 |
+
output.
|
| 27 |
|
| 28 |
The win predicate makes the LINE topology load-bearing — total pbox
|
| 29 |
count alone is not enough:
|
| 30 |
|
| 31 |
+
* `building_count_gte:{pbox, n:K}` ⇒ the agent built the full budget
|
| 32 |
+
(K = 3 easy / 5 medium / 6 hard);
|
| 33 |
+
* `building_in_region:{pbox, x:60, y:Y, radius:0.5, count:1}` for
|
| 34 |
+
EACH of the K front rungs ⇒ exactly one pbox per row across the
|
| 35 |
+
front (a tiny radius 0.5 means only the exact cell counts, so a
|
| 36 |
+
cluster on (60,20) misses all flank rungs and a scatter near the
|
| 37 |
+
base misses every rung);
|
| 38 |
* `units_killed_gte:K` ⇒ the pbox LINE must actively KILL the rush
|
| 39 |
+
spread across the front (a stall / pure-army layout kills 0);
|
| 40 |
* `building_count_gte:{fact,n:1}` (present-tense — `has_building` is
|
| 41 |
the one-shot "ever-seen" set, see CLAUDE.md footgun);
|
| 42 |
* `within_ticks` paired with `after_ticks` in the fail clause ⇒ a
|
|
|
|
| 44 |
pack ⇒ each step is exactly 90 ticks, so max_turns is a hard tick
|
| 45 |
budget that the `after_ticks` deadline reliably bites in).
|
| 46 |
|
| 47 |
+
Per-tier design:
|
| 48 |
+
* easy — 3-pbox LINE (rungs y=8/20/32), budget $1800, one wave.
|
| 49 |
+
* medium — 5-pbox LINE (rungs y=4/12/20/28/36), budget $3000, one wave.
|
| 50 |
+
* hard — 6-pbox LINE (rungs y=4/10/16/22/28/34), budget $4800
|
| 51 |
+
(= 6 rungs + 2 rebuilds), TWO scheduled waves (tick 1800 + tick
|
| 52 |
+
3000) so a rung the first wave razes must be REBUILT before wave 2.
|
| 53 |
+
Hard also flips the agent base latitude per seed (NORTH y=12 /
|
| 54 |
+
SOUTH y=28 via spawn_point) so a memorised relative-to-base
|
| 55 |
+
placement cannot generalise.
|
| 56 |
+
|
| 57 |
The scripted-policy validations prove deterministically that:
|
| 58 |
|
| 59 |
+
* the intended LINE policy (one pbox at each front rung, with rebuild
|
| 60 |
+
on hard) WINS every level + every hard seed (1..4);
|
| 61 |
+
* stall / cluster-on-centre / scatter-near-base all LOSE every level +
|
| 62 |
+
every hard seed — a real LOSS, not a draw (the rung clauses are
|
| 63 |
+
never satisfied);
|
| 64 |
* the hard tier defines ≥2 spawn_point groups (NORTH base y=12 / SOUTH
|
| 65 |
base y=28) so a memorised base-relative placement cannot generalise.
|
| 66 |
"""
|
|
|
|
| 80 |
LEVELS = ("easy", "medium", "hard")
|
| 81 |
SEEDS = (1, 2, 3, 4)
|
| 82 |
|
| 83 |
+
# Per-tier rung topology (front rows across the full vertical width at
|
| 84 |
+
# x=60). A cluster on the centre row misses every flank rung because the
|
| 85 |
+
# region predicate uses radius 0.5 (cell-exact).
|
| 86 |
+
RUNGS_BY_LEVEL = {
|
| 87 |
+
"easy": [(60, 8), (60, 20), (60, 32)],
|
| 88 |
+
"medium": [(60, 4), (60, 12), (60, 20), (60, 28), (60, 36)],
|
| 89 |
+
"hard": [(60, 4), (60, 10), (60, 16), (60, 22), (60, 28), (60, 34)],
|
| 90 |
+
}
|
| 91 |
+
|
| 92 |
+
CASH_BY_LEVEL = {"easy": 1800, "medium": 3000, "hard": 4800}
|
| 93 |
|
| 94 |
+
# Cells used by the "scatter near base" wrong-topology policy: pboxes
|
| 95 |
+
# clustered near the base rather than across the front. None of these
|
| 96 |
# lie inside ANY rung region (radius 0.5 around the rung cells), so the
|
| 97 |
# region clauses are all unsatisfied.
|
| 98 |
+
SCATTER_NEAR_BASE_BY_LEVEL = {
|
| 99 |
+
"easy": [(20, 18), (22, 20), (24, 22)],
|
| 100 |
+
"medium": [(20, 18), (22, 20), (24, 22), (26, 19), (18, 21)],
|
| 101 |
+
"hard": [(20, 18), (22, 20), (24, 22), (26, 19), (18, 21), (16, 23)],
|
| 102 |
+
}
|
| 103 |
+
|
| 104 |
+
# Cells used by the "cluster on centre row" wrong-topology policy:
|
| 105 |
+
# pboxes piled on the centre rung (y=20). Satisfies the count clause
|
| 106 |
+
# (and on hard meets the y=20-adjacent rungs if they exist) but misses
|
| 107 |
+
# every other rung because radius 0.5 is cell-exact.
|
| 108 |
+
CLUSTER_ON_CENTRE_BY_LEVEL = {
|
| 109 |
+
"easy": [(60, 19), (60, 20), (60, 21)],
|
| 110 |
+
"medium": [(60, 18), (60, 19), (60, 20), (60, 21), (60, 22)],
|
| 111 |
+
"hard": [(60, 17), (60, 18), (60, 19), (60, 20), (60, 21), (60, 22)],
|
| 112 |
+
}
|
| 113 |
|
| 114 |
|
| 115 |
# ── scripted policies ────────────────────────────────────────────────
|
|
|
|
| 121 |
return [C.observe()]
|
| 122 |
|
| 123 |
|
| 124 |
+
def make_line(level: str):
|
| 125 |
+
"""Intended LINE topology: one pbox at EACH front rung. On hard the
|
| 126 |
+
policy also REBUILDS any rung the first wave razes (the cash budget
|
| 127 |
+
has slack for ≤2 rebuilds across the two-wave attrition)."""
|
| 128 |
+
rungs = RUNGS_BY_LEVEL[level]
|
| 129 |
|
| 130 |
def policy(rs, C):
|
| 131 |
own_b = rs.get("own_buildings") or []
|
| 132 |
+
pboxes = [b for b in own_b if b.get("type") == "pbox"]
|
| 133 |
+
present_cells = {
|
| 134 |
+
(int(b["cell_x"]), int(b["cell_y"])) for b in pboxes
|
| 135 |
+
}
|
| 136 |
prod = rs.get("production") or []
|
| 137 |
+
prod_items = [
|
| 138 |
+
p.get("item") for p in prod if isinstance(p, dict)
|
| 139 |
+
]
|
| 140 |
+
# Find the first rung that is currently uncovered (initial
|
| 141 |
+
# build or post-attrition rebuild) and (build +) place there.
|
| 142 |
+
for cell in rungs:
|
| 143 |
+
if cell not in present_cells:
|
| 144 |
+
cmds = []
|
| 145 |
+
if "pbox" not in prod_items:
|
| 146 |
+
cmds.append(C.build("pbox"))
|
| 147 |
+
cmds.append(C.place_building("pbox", cell[0], cell[1]))
|
| 148 |
+
return cmds
|
| 149 |
+
# All rungs currently covered — idle.
|
| 150 |
+
return [C.observe()]
|
| 151 |
|
| 152 |
return policy
|
| 153 |
|
| 154 |
|
| 155 |
+
def _wrong_topology_policy(cells):
|
| 156 |
+
"""Pile pboxes at a fixed list of cells (count-only, no rung
|
| 157 |
+
rebuilding). Used for cluster-on-centre and scatter-near-base."""
|
| 158 |
+
cells = list(cells)
|
|
|
|
| 159 |
|
| 160 |
def policy(rs, C):
|
| 161 |
own_b = rs.get("own_buildings") or []
|
| 162 |
n = sum(1 for b in own_b if b.get("type") == "pbox")
|
| 163 |
prod = rs.get("production") or []
|
| 164 |
+
prod_items = [
|
| 165 |
+
p.get("item") for p in prod if isinstance(p, dict)
|
| 166 |
+
]
|
| 167 |
+
if n >= len(cells):
|
| 168 |
return [C.observe()]
|
| 169 |
cmds = []
|
| 170 |
if "pbox" not in prod_items:
|
| 171 |
cmds.append(C.build("pbox"))
|
| 172 |
+
cmds.append(C.place_building("pbox", cells[n][0], cells[n][1]))
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 173 |
return cmds
|
| 174 |
|
| 175 |
return policy
|
| 176 |
|
| 177 |
|
| 178 |
+
def make_cluster_on_centre(level: str):
|
| 179 |
+
"""WRONG TOPOLOGY: K pboxes piled on the centre row (y≈20).
|
| 180 |
+
Satisfies the count and the y=20 rung (if it exists) but misses
|
| 181 |
+
every flank rung because the rung regions are radius 0.5 (cell-
|
| 182 |
+
exact). The unguarded flank rows let the rush leak past."""
|
| 183 |
+
return _wrong_topology_policy(CLUSTER_ON_CENTRE_BY_LEVEL[level])
|
| 184 |
+
|
| 185 |
+
|
| 186 |
+
def make_scatter_near_base(level: str):
|
| 187 |
+
"""WRONG TOPOLOGY: K pboxes hugging the fact west of x=20. Misses
|
| 188 |
+
every front rung AND too far west to engage the rush before it
|
| 189 |
+
reaches the fact (on harder tiers the flank rows reach the fact
|
| 190 |
+
without ever encountering the LINE)."""
|
| 191 |
+
return _wrong_topology_policy(SCATTER_NEAR_BASE_BY_LEVEL[level])
|
| 192 |
+
|
| 193 |
+
|
| 194 |
# ── scenario-shape invariants ────────────────────────────────────────
|
| 195 |
|
| 196 |
|
|
|
|
| 204 |
assert "ERQA" in anchors, anchors
|
| 205 |
assert "MicroRTS defense" in anchors, anchors
|
| 206 |
assert "military perimeter" in anchors, anchors
|
| 207 |
+
# Rusher bot wired through (charges agent centroid → forces each
|
| 208 |
+
# row's rush column WEST through the central x=60 LINE on every
|
| 209 |
+
# seed).
|
| 210 |
for lvl in LEVELS:
|
| 211 |
c = compile_level(pack, lvl)
|
| 212 |
assert c.map_supported
|
|
|
|
| 216 |
assert str(bot).lower() == "rusher", (lvl, bot)
|
| 217 |
|
| 218 |
|
| 219 |
+
def test_starting_cash_scales_per_tier_for_pbox_budget():
|
| 220 |
+
"""Cash is intentionally tight per tier — exactly K pboxes for
|
| 221 |
+
easy (K=3) and medium (K=5), plus a 2-pbox rebuild margin for hard
|
| 222 |
+
(K=6+2 rebuilds for the two-wave attrition) — so a model that
|
| 223 |
+
spends on units OR extra rebuilds beyond the design cannot pass
|
| 224 |
+
the count clause."""
|
| 225 |
pack = load_pack(PACK)
|
| 226 |
for lvl in LEVELS:
|
| 227 |
c = compile_level(pack, lvl)
|
| 228 |
+
assert c.starting_cash == CASH_BY_LEVEL[lvl], (
|
| 229 |
+
lvl, c.starting_cash, CASH_BY_LEVEL[lvl]
|
| 230 |
+
)
|
| 231 |
|
| 232 |
|
| 233 |
@pytest.mark.parametrize("level", LEVELS)
|
|
|
|
| 269 |
assert fact_clauses, f"{lvl}: missing present-tense fact-alive fail clause"
|
| 270 |
|
| 271 |
|
| 272 |
+
@pytest.mark.parametrize("level", LEVELS)
|
| 273 |
+
def test_win_requires_one_pbox_per_front_rung(level):
|
| 274 |
+
"""The LINE-enforcement contract: each level's win clause requires
|
| 275 |
+
exactly one pbox in EACH of the front rungs at x=60 spanning the
|
| 276 |
+
full vertical width. A cluster on the centre row (y=20) misses
|
| 277 |
+
every flank rung because each rung region has radius 0.5 (cell-
|
| 278 |
+
exact). The rungs grow per tier (3 easy / 5 medium / 6 hard)."""
|
| 279 |
+
expected = {y for (_, y) in RUNGS_BY_LEVEL[level]}
|
| 280 |
+
c = compile_level(load_pack(PACK), level)
|
| 281 |
+
wc = c.win_condition.model_dump(exclude_none=True)
|
| 282 |
+
rungs_seen = set()
|
| 283 |
+
for clause in wc.get("all_of", []) or []:
|
| 284 |
+
br = clause.get("building_in_region")
|
| 285 |
+
if (
|
| 286 |
+
isinstance(br, dict)
|
| 287 |
+
and br.get("type") == "pbox"
|
| 288 |
+
and int(br.get("x", -1)) == 60
|
| 289 |
+
and int(br.get("count", 0)) == 1
|
| 290 |
+
and float(br.get("radius", 0)) <= 1.0
|
| 291 |
+
):
|
| 292 |
+
rungs_seen.add(int(br["y"]))
|
| 293 |
+
assert rungs_seen == expected, (
|
| 294 |
+
f"{level}: front rungs y∈{sorted(expected)} required, got {sorted(rungs_seen)}"
|
| 295 |
+
)
|
| 296 |
|
| 297 |
|
| 298 |
def test_win_requires_a_kill_quota():
|
|
|
|
| 314 |
def test_rush_arrives_as_a_scheduled_event():
|
| 315 |
"""The rush is injected via `scheduled_events: spawn_actors` AFTER the
|
| 316 |
LINE has time to assemble — there is no t=0 enemy band racing the
|
| 317 |
+
build. This is what makes the build/rush race fair. Hard tier has
|
| 318 |
+
TWO scheduled waves (attrition mechanic)."""
|
| 319 |
+
expected_wave_counts = {"easy": 1, "medium": 1, "hard": 2}
|
| 320 |
for lvl in LEVELS:
|
| 321 |
pack = load_pack(PACK)
|
| 322 |
raw = pack.levels[lvl]
|
|
|
|
| 325 |
ov = ov.model_dump(exclude_none=True)
|
| 326 |
evts = ov.get("scheduled_events") or []
|
| 327 |
assert evts, f"{lvl}: expected a scheduled rush wave"
|
| 328 |
+
spawn_waves = [e for e in evts if e.get("type") == "spawn_actors"]
|
| 329 |
+
assert spawn_waves, (lvl, evts)
|
| 330 |
+
assert len(spawn_waves) == expected_wave_counts[lvl], (
|
| 331 |
+
f"{lvl}: expected {expected_wave_counts[lvl]} spawn_actors waves, "
|
| 332 |
+
f"got {len(spawn_waves)} ({evts})"
|
| 333 |
+
)
|
| 334 |
|
| 335 |
|
| 336 |
def test_no_pre_placed_agent_combat_screen():
|
| 337 |
"""The pbox LINE must be the sole kill source — there is no
|
| 338 |
pre-placed agent combat screen ringing the base. Only ONE
|
| 339 |
+
non-combatant agent e1 is parked in a far corner (per spawn group)
|
| 340 |
+
so units_summary is non-empty for the hard-tier env-reset check;
|
| 341 |
+
it never fights."""
|
| 342 |
for lvl in LEVELS:
|
| 343 |
c = compile_level(load_pack(PACK), lvl)
|
| 344 |
agent_units = [
|
| 345 |
a for a in c.scenario.actors
|
| 346 |
if a.owner == "agent" and a.type == "e1"
|
| 347 |
]
|
| 348 |
+
# At most one non-combatant marker per active spawn group
|
| 349 |
+
# (hard has 2 spawn_point groups so up to 2 corner e1s
|
| 350 |
+
# declared; only the active spawn group's e1 is materialised).
|
| 351 |
assert len(agent_units) <= 2, (lvl, [a.position for a in agent_units])
|
| 352 |
for a in agent_units:
|
| 353 |
x, y = a.position
|
| 354 |
+
# Parked in a far corner, well clear of the rush lanes
|
| 355 |
+
# spread across y=4..36 at x=60.
|
| 356 |
assert x <= 6 and (y <= 6 or y >= 34), (lvl, a.position)
|
| 357 |
|
| 358 |
|
|
|
|
| 366 |
if a.owner == "agent" and a.spawn_point is not None
|
| 367 |
}
|
| 368 |
assert groups == {0, 1}, groups
|
| 369 |
+
# In-bounds check (rush-hour-arena playable y ≈ 2..38, x ≈ 2..126).
|
| 370 |
for a in c.scenario.actors:
|
| 371 |
x, y = a.position
|
| 372 |
assert 2 <= x <= 126 and 2 <= y <= 38, (a.type, a.position)
|
|
|
|
| 379 |
def test_intended_line_wins_every_level_and_seed(level):
|
| 380 |
c = compile_level(load_pack(PACK), level)
|
| 381 |
for seed in SEEDS:
|
| 382 |
+
r = run_level(c, make_line(level), seed=seed)
|
| 383 |
assert r.outcome == "win", (
|
| 384 |
f"{level} seed{seed}: intended LINE topology must WIN; "
|
| 385 |
f"got {r.outcome} (tick={r.signals.game_tick}, "
|
|
|
|
| 396 |
@pytest.mark.parametrize(
|
| 397 |
"policy_name,policy_factory",
|
| 398 |
[
|
| 399 |
+
("stall", lambda lvl: stall),
|
| 400 |
+
("cluster_on_centre", lambda lvl: make_cluster_on_centre(lvl)),
|
| 401 |
+
("scatter_near_base", lambda lvl: make_scatter_near_base(lvl)),
|
| 402 |
],
|
| 403 |
)
|
| 404 |
def test_lazy_and_wrong_topology_policies_lose_every_level_and_seed(
|
| 405 |
level, policy_name, policy_factory
|
| 406 |
):
|
| 407 |
+
"""Stall (rush razes fact AND clock runs out with no pbox), cluster-
|
| 408 |
+
on-centre (count satisfied but every flank rung unmet), and scatter-
|
| 409 |
+
near-base (every rung region unmet, rush reaches fact past unguarded
|
| 410 |
+
front) must ALL LOSE on every level + every seed — no draw."""
|
| 411 |
c = compile_level(load_pack(PACK), level)
|
| 412 |
+
fn = policy_factory(level)
|
| 413 |
for seed in SEEDS:
|
| 414 |
r = run_level(c, fn, seed=seed)
|
| 415 |
assert r.outcome == "loss", (
|
|
|
|
| 424 |
|
| 425 |
def test_intended_run_is_deterministic_on_easy():
|
| 426 |
c = compile_level(load_pack(PACK), "easy")
|
| 427 |
+
a = run_level(c, make_line("easy"), seed=3)
|
| 428 |
+
b = run_level(c, make_line("easy"), seed=3)
|
| 429 |
assert (a.outcome, a.turns, a.signals.units_killed) == (
|
| 430 |
b.outcome,
|
| 431 |
b.turns,
|