yxc20098 commited on
Commit
b65a333
·
1 Parent(s): 16b217d

no-cheat redesign: build-defensive-tower-line — wide-front LINE topology

Browse files

Tailor the build-defensive-tower-line pack for the WIDE-FRONT rush
geometry the capability advertises. The previous design had attackers
funnelled through a 4-row corridor (y=18..22) and demanded a 4-pbox
line along that corridor; the rush was concentrated, which made
placement-vs-cluster less discriminating than intended.

New design (per-tier):
* easy — 3-pbox LINE (rungs y=8/20/32), $1800 budget, single rush
wave of 6 e1 (2 per row × 3 rows) at tick 1500. Kill bar 4.
* medium — 5-pbox LINE (rungs y=4/12/20/28/36), $3000 budget, single
rush wave of 10 e1 (2 per row × 5 rows) at tick 2200. Kill bar 7.
* hard — 6-pbox LINE (rungs y=4/10/16/22/28/34), $4800 budget
(= 6 rungs + 2 rebuilds), TWO scheduled rush waves (tick 1800 and
tick 3000) so a rung the first wave razes must be REBUILT before
wave 2 lands — the "attrition over time" mechanic the spec asks
for. Kill bar 8. Hard also flips the agent base latitude per seed
(NORTH y=12 / SOUTH y=28 via spawn_point round-robin) so a memorised
relative-to-base placement plan cannot generalise; the LINE rungs
themselves stay fixed at x=60 since that is map geometry.

Each level spreads attackers across the FULL vertical width of the
playable arena at distinct rows (the spec's WIDE front). The rusher
bot charges the agent fact centroid on the west, so each row's spawn
group walks WEST through x=60 on its starting y — different rows
cross the central column at different y values, forcing a LINE
topology to intercept every row.

The win predicate keeps placement load-bearing: one
`building_in_region` clause per rung at radius 0.5 (cell-exact) so
a cluster on the centre row misses every flank rung, a scatter near
the base misses every rung, and the intended LINE (one pbox per row)
is the only configuration that simultaneously satisfies the count,
all rung clauses, the kill quota, the fact-alive clause and the
within_ticks deadline. `after_ticks` in the fail clause makes
non-winners a real reachable LOSS (no interrupts ⇒ exactly 90 ticks/
step).

Validation (scripted, no model, four-script no-cheat bar):
* stall (observe-only) — LOSS every level + every seed (fact razed by
the rush AND clock runs out with no pbox);
* cluster-on-centre (K pboxes piled on the y=20 row) — LOSS every
level + every seed (count satisfied but flank rungs unmet, flank
rows leak the rush through to the fact);
* scatter-near-base (K pboxes hugging the fact) — LOSS every level +
every seed (every rung region unmet, rush reaches fact past
unguarded front);
* intended LINE (one pbox per rung, with rebuild on hard) — WIN every
level + every seed (count, all rungs, kill quota, fact alive,
within_ticks all satisfied before the deadline).

Pre-existing CLAUDE.md footguns honoured: rusher bot charges centroid;
`place_building` works at arbitrary in-bounds coords (no adjacency);
Building/Defense single-stream queue gives the LINE serial build
time; unarmed enemy fact at (120,20) keeps the engine alive past full
rush elimination so the win/fail check fires (auto-`done`
mitigation); fact-alive uses the present-tense
`building_count_gte:{fact,n:1}` not the one-shot `has_building`;
NO pre-placed agent combat screen, only one non-combatant corner e1
per active spawn group so units_summary is non-empty without leaking
kills.

Tests updated: per-tier rung topology, budget, kill bar, scatter-
near-base policy (replaces the previous random-4-pbox policy), and a
new cluster-on-centre wrong-topology policy. Hard tier has 6 rungs
and TWO scheduled waves; the test_rush_arrives_as_a_scheduled_event
contract now checks the per-tier wave count.

openra_bench/scenarios/packs/build-defensive-tower-line.yaml CHANGED
@@ -1,64 +1,136 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  meta:
2
  id: build-defensive-tower-line
3
- title: 'Build a Defensive Tower LINE Across the Choke (Not a Cluster, Not a Scatter)'
4
  capability: reasoning
5
  real_world_meaning: >
6
- Where do you commit your defensive structures when the threat is
7
- funnelled through a known corridor whose width matters? Military
8
- perimeter doctrine and firewall rule design both say: spread the
9
- coverage across the full width of the corridor so no enemy unit
10
- can slip past on an unguarded row. A single dense cluster wastes
11
- overlapping fire on one cell while the corridor edges stay open;
12
- a scatter across the map leaves the corridor itself uncovered.
 
13
  The win predicate makes the LINE topology load-bearing — total
14
  pillbox count alone is not enough; ≥1 pillbox must sit on EACH of
15
- the corridor's vertical rungs (the four narrow sub-regions that
16
- span y=18..22 at the choke column), AND those pillboxes must
17
- actually KILL the rush funnelled through the corridor.
18
  robotics_analogue: >
19
  Network firewall / Web Application Firewall rule placement: when
20
- every connection MUST traverse a known ingress (the public WAN
21
- edge / the only API gateway), the right architecture is one rule
22
- per protocol/port across the FULL inspection surface, not three
23
- duplicated rules on one port while the rest stay open. Likewise a
24
- physical perimeter patrol covers the WHOLE corridor width a
25
- cluster at one waypoint or a scatter across unrelated nodes both
26
- leave the actual approach lane traversable.
 
27
  benchmark_anchor:
28
  - "ERQA"
29
  - "MicroRTS defense"
30
  - "military perimeter"
31
  author: openra-bench
32
 
33
- # rush-hour-arena (128×40). The map has a narrow lane around y≈18..22
34
- # at mid-map (x≈60); the agent fact sits on the WEST (x=10). The rush
35
- # arrives as a `scheduled_events: spawn_actors` wave injected EAST of
36
- # the corridor (x≈80) at a fixed tick — AFTER the agent has had time to
37
- # build all 4 pillboxes serially. The `rusher` bot charges the agent
38
- # centroid (the fact on the west), so the wave is forced THROUGH the
39
- # x=60 corridor on its way to the fact on every seed.
40
- #
41
- # pbox is the load-bearing weapon. After the engine pbox-weapon fix
42
- # (`fix(engine): pbox gets a direct-fire Armament`) a BUILT pbox is an
43
- # active direct-fire tower (M60mg anti-infantry MG: one burst one-shots
44
- # an e1). The win predicate now demands a KILL quota the pbox LINE must
45
- # deliver — there are NO pre-placed agent defenders, so the only source
46
- # of kill output is the agent's own pillbox LINE. A mis-placed pbox
47
- # layout (cluster on the centre row, scatter near the base) leaves the
48
- # corridor rungs open: the rush slips past on an unguarded row, kills
49
- # nothing, and razes the fact.
50
- #
51
- # An unarmed high-HP enemy `fact` far east keeps the engine alive past
52
- # rush annihilation so the win/fail check actually runs.
53
- #
54
- # SISTER PACK: def-tower-line-vs-cluster inverts the topology bar to
55
- # enforce CLUSTER (graph min-cut doctrine); this pack enforces LINE
56
- # (corridor-width perimeter doctrine). The two together discriminate
57
- # whether the model understands the FORCING GEOMETRY: a chokepoint
58
- # (single cell on a wide approach → cluster) vs a corridor (full
59
- # vertical width that any one row can leak through → line).
60
  base_map: rush-hour-arena
61
- starting_cash: 2400
62
 
63
  base:
64
  agent:
@@ -75,91 +147,87 @@ base:
75
  - attack_move
76
  - stop
77
  planning: true
78
- # No interrupts — perimeter design is a STATIC up-front decision
79
- # (the corridor is known a priori, the rush composition is fixed).
80
- # Dropping interrupts also makes the tick budget deterministic
81
- # (each step is exactly 90 ticks ⇒ max_turns is a hard tick
82
- # budget that the `after_ticks` fail clause reliably bites in).
83
  interrupts: {}
84
  termination:
85
  max_ticks: 12000
86
  actors: [] # every level supplies its own actor list via overrides.
87
 
88
  levels:
89
- # ── EASY ── bare LINE skill. Budget covers exactly 4 pbox (2400cr).
90
- # Win requires ≥1 pbox in EACH of the 4 corridor rungs (y=18,19,21,22
91
- # at x=60, radius 0.5 — only the exact rung cell counts, so a cluster
92
- # on y=20 misses ALL FOUR), the full count (4 pbox), a kill quota
93
- # (≥4 of the 6-unit rush) AND the fact alive. The rush arrives at
94
- # tick 1800 — after the LINE has had time to assemble. A cluster at
95
- # (60,20) satisfies the count clause but FAILS each rung AND lets the
96
- # rush leak past the open rungs; a random scatter near the base
97
- # misses every rung AND kills nothing; a stall loses on the count
98
- # clause AND the fact razed by the rush.
99
  # max_turns 60 ⇒ reachable tick 93+90·59 = 5403; deadline 5400.
100
  easy:
101
  # Original (pre-verbosity-sweep) description preserved for
102
  # contributors. The trimmed version below removes scripted-policy
103
  # spoilers and cell-coord dumps; load-bearing intent kept.
104
  #
105
- # A rusher band will spawn east of the corridor and must traverse
106
- # the narrow corridor at x=60, y=18..22 to reach your base on the
107
- # west. Build 4 pillboxes (pbox 600cr each, budget exactly 2400)
108
- # AND place ONE on each of the four corridor rungs (at (60,18),
109
- # (60,19), (60,21), (60,22)) so the rusher cannot slip past on any
110
- # row. A cluster on the middle of the corridor satisfies the count
111
- # but FAILS every rung and lets the rush leak through; a random
112
- # scatter near the base fails every rung and kills nothing; a
113
- # pure-army layout (no pbox) fails the count and lets the rush
114
- # raze the fact. Your pillboxes must kill at least 4 of the rush;
115
- # your fact must survive.
116
- # Original (pre-verbosity-sweep) description preserved for
117
- # contributors. The trimmed version below removes scripted-policy
118
- # spoilers and cell-coord dumps; load-bearing intent kept.
119
- #
120
- # A rush will funnel through the narrow mid-map corridor at x=60, rows
121
- # y=18 to 22. Budget $2400 — build a pillbox on each of those four
122
- # rows so nothing slips through. Kill four enemies and keep your
123
- # construction yard, within about 60 turns.
124
  description: >
125
- A rifle rush will funnel through the narrow corridor at x=60, rows
126
- y=18 to 22, on its way to your base. Budget is exactly $2400 — drop
127
- one pillbox on each of those four rows so nothing slips past. Kill
128
- at least four and keep your construction yard standing, within about
129
- 60 turns.
130
- starting_cash: 2400
 
131
  overrides:
132
  actors:
 
 
133
  - {type: fact, owner: agent, position: [10, 20]}
134
  - {type: tent, owner: agent, position: [14, 18]}
135
  - {type: powr, owner: agent, position: [14, 22]}
136
  # ONE non-combatant agent e1 parked in the far SW corner,
137
- # nowhere near the y=20 rush lane. It exists only so
138
- # units_summary is non-empty (hard-tier env-reset check); it
139
- # never reaches combat and contributes ZERO kills — the pbox
140
- # LINE is the sole source of kill output.
 
141
  - {type: e1, owner: agent, position: [4, 36], stance: 2}
142
  # Unarmed high-HP marker (anti-DRAW): keeps the episode alive
143
  # past full rusher elimination so the win/fail check fires.
144
  - {type: fact, owner: enemy, position: [120, 20]}
145
- # Scheduled rush wave — injected at tick 1800, EAST of the
146
- # corridor on the y=20 lane. The rusher bot charges the agent
147
- # centroid (the fact at x=10), so the wave is forced WEST through
148
- # the x=60 corridor. By tick 1800 all 4 LINE pillboxes are built;
149
- # a complete LINE shreds the wave at the corridor.
 
 
 
 
150
  scheduled_events:
151
- - tick: 1800
152
  type: spawn_actors
153
  actors:
154
- - {type: e1, owner: enemy, position: [80, 20], stance: 3, count: 5}
155
- - {type: e3, owner: enemy, position: [82, 20], stance: 3, count: 1}
 
156
  win_condition:
157
  all_of:
158
- - building_count_gte: {type: pbox, n: 4}
159
- - building_in_region: {type: pbox, x: 60, y: 18, radius: 0.5, count: 1}
160
- - building_in_region: {type: pbox, x: 60, y: 19, radius: 0.5, count: 1}
161
- - building_in_region: {type: pbox, x: 60, y: 21, radius: 0.5, count: 1}
162
- - building_in_region: {type: pbox, x: 60, y: 22, radius: 0.5, count: 1}
163
  - units_killed_gte: 4
164
  - building_count_gte: {type: fact, n: 1}
165
  - within_ticks: 5400
@@ -169,55 +237,67 @@ levels:
169
  - not: {building_count_gte: {type: fact, n: 1}}
170
  max_turns: 60
171
 
172
- # ── MEDIUM ── +1 axis: HEAVIER wave (rusher band ~doubled, +1 e3)
173
- # with the SAME tight 4-pbox budget and a higher kill bar (≥7). The
174
- # cluster, scatter, and pure-army layouts still all lose the same way
175
- # as easy. The LINE must be COMPLETE (all four rungs covered) so the
176
- # corridor leaks nothing — a single open rung lets enough of the
177
- # heavier wave through to fail the kill bar AND raze the fact.
178
- # max_turns 60 reachable tick 5403; deadline 5400.
 
179
  medium:
180
  # Original (pre-verbosity-sweep) description preserved for
181
  # contributors. The trimmed version below removes scripted-policy
182
  # spoilers and cell-coord dumps; load-bearing intent kept.
183
  #
184
- # Same forced rusher corridor at x=60, y=18..22. Build 4 pillboxes
185
- # (budget 2400cr = exactly 4 pbox at 600 each) AND place ONE on
186
- # each of the four corridor rungs (at (60,18), (60,19), (60,21),
187
- # (60,22)). The rush wave is heavier than easy the complete LINE
188
- # must shred it at the corridor. Your pillboxes must kill at least
189
- # 7 of the rush; a cluster, a scatter, and a pure-army layout all
190
- # lose; the fact must survive.
191
  description: >
192
- Same corridor at x=60, y=18 to 22, heavier rush wave. Budget
193
- $2400 build a pillbox on each of those four rows. Kill seven
194
- enemies and keep your construction yard, within about 60 turns.
195
- starting_cash: 2400
 
 
196
  overrides:
197
  actors:
198
  - {type: fact, owner: agent, position: [10, 20]}
199
  - {type: tent, owner: agent, position: [14, 18]}
200
  - {type: powr, owner: agent, position: [14, 22]}
201
- # Non-combatant SW-corner e1 (see easy) — non-empty
202
- # units_summary, zero kill contribution.
 
 
 
203
  - {type: e1, owner: agent, position: [4, 36], stance: 2}
204
  # Anti-DRAW marker.
205
  - {type: fact, owner: enemy, position: [120, 20]}
206
- # Heavier rush wave: e1 + 2×e3, all on the y=20 lane east of
207
- # the corridor, injected at tick 1800.
 
 
 
208
  scheduled_events:
209
- - tick: 1800
210
  type: spawn_actors
211
  actors:
212
- - {type: e1, owner: enemy, position: [80, 20], stance: 3, count: 8}
213
- - {type: e3, owner: enemy, position: [82, 20], stance: 3, count: 2}
 
 
 
214
  win_condition:
215
  all_of:
216
- - building_count_gte: {type: pbox, n: 4}
217
- - building_in_region: {type: pbox, x: 60, y: 18, radius: 0.5, count: 1}
218
- - building_in_region: {type: pbox, x: 60, y: 19, radius: 0.5, count: 1}
219
- - building_in_region: {type: pbox, x: 60, y: 21, radius: 0.5, count: 1}
220
- - building_in_region: {type: pbox, x: 60, y: 22, radius: 0.5, count: 1}
 
221
  - units_killed_gte: 7
222
  - building_count_gte: {type: fact, n: 1}
223
  - within_ticks: 5400
@@ -227,73 +307,102 @@ levels:
227
  - not: {building_count_gte: {type: fact, n: 1}}
228
  max_turns: 60
229
 
230
- # ── HARD ── +1 axis: TWO spawn_point groups so the agent base
231
- # latitude flips by seed (NORTH (10,12) vs SOUTH (10,28)). The
232
- # rusher band is symmetric across y=20 and ALWAYS places (enemy
233
- # actors don't honour spawn_point CLAUDE.md), so the corridor
234
- # column at x=60 remains the choke for both seeds, but the rush
235
- # geometry approaches each base from a different bearing. The
236
- # corridor is still y=18..22 at x=60 (it's a fixed map feature) so
237
- # the LINE topology is identical across seeds what flips is the
238
- # agent's interpretation of "which corridor" to defend (the NORTH
239
- # spawn could be tempted to cover y=14..18, the SOUTH spawn to
240
- # cover y=22..26; both are WRONG the corridor itself is at
241
- # y=18..22 regardless of base latitude). max_turns 70 ⇒ reachable
242
- # tick 93+90·69 = 6303; deadline 6300.
 
 
 
 
 
243
  hard:
244
  # Original (pre-verbosity-sweep) description preserved for
245
  # contributors. The trimmed version below removes scripted-policy
246
  # spoilers and cell-coord dumps; load-bearing intent kept.
247
  #
248
- # Agent base latitude flips between NORTH (y=12) and SOUTH (y=28)
249
- # by seed. Build 4 pillboxes (budget 2400cr = exactly 4 pbox at
250
- # 600 each) AND place ONE on each of the four corridor rungs
251
- # (at (60,18), (60,19), (60,21), (60,22)). The corridor at x=60
252
- # y=18..22 is a fixed map feature covering the rows next to your
253
- # base instead (y=14..18 for NORTH, y=22..26 for SOUTH) FAILS the
254
- # rung clauses and lets the rush leak through. Your pillboxes must
255
- # kill at least 7 of the rush; the fact must survive.
256
  description: >
257
- Same fixed corridor at x=60, rows y=18 to 22 but your base may sit
258
- to its north or south depending on seed. Don't cover the rows next
259
- to your own base; the rush still funnels through the mid-map gap.
260
- Build one pillbox on each of the four corridor rungs, kill at least
261
- seven, and keep your construction yard within about 70 turns.
 
262
  overrides:
263
  actors:
264
- # spawn_point 0 — NORTH base at y=12. Fact at (10, 12);
265
- # tent/powr offset west so they aren't directly on the
266
- # rusher path. Non-combatant corner e1 in the far SW.
 
267
  - {type: fact, owner: agent, position: [10, 12], spawn_point: 0}
268
  - {type: tent, owner: agent, position: [6, 12], spawn_point: 0}
269
  - {type: powr, owner: agent, position: [6, 14], spawn_point: 0}
 
270
  - {type: e1, owner: agent, position: [4, 36], stance: 2, spawn_point: 0}
271
  # spawn_point 1 — SOUTH base at y=28 (mirror across y=20).
 
272
  - {type: fact, owner: agent, position: [10, 28], spawn_point: 1}
273
  - {type: tent, owner: agent, position: [6, 28], spawn_point: 1}
274
  - {type: powr, owner: agent, position: [6, 26], spawn_point: 1}
 
275
  - {type: e1, owner: agent, position: [4, 4], stance: 2, spawn_point: 1}
276
  # Anti-DRAW marker (enemy fact doesn't honour spawn_point).
277
  - {type: fact, owner: enemy, position: [120, 20]}
278
- # Scheduled rush wave symmetric on the y=20 lane east of the
279
- # corridor, injected at tick 1800. The rusher charges the agent
280
- # centroid so its path crosses the x=60 corridor on every seed
281
- # regardless of which base latitude was picked. The scheduled
282
- # spawn list is not spawn_point-filtered, so it injects once.
 
 
 
 
283
  scheduled_events:
284
  - tick: 1800
285
  type: spawn_actors
286
  actors:
287
- - {type: e1, owner: enemy, position: [80, 20], stance: 3, count: 8}
288
- - {type: e3, owner: enemy, position: [82, 20], stance: 3, count: 2}
 
 
 
 
 
 
 
 
 
 
 
 
 
289
  win_condition:
290
  all_of:
291
- - building_count_gte: {type: pbox, n: 4}
292
- - building_in_region: {type: pbox, x: 60, y: 18, radius: 0.5, count: 1}
293
- - building_in_region: {type: pbox, x: 60, y: 19, radius: 0.5, count: 1}
294
- - building_in_region: {type: pbox, x: 60, y: 21, radius: 0.5, count: 1}
295
  - building_in_region: {type: pbox, x: 60, y: 22, radius: 0.5, count: 1}
296
- - units_killed_gte: 7
 
 
297
  - building_count_gte: {type: fact, n: 1}
298
  - within_ticks: 6300
299
  fail_condition:
 
1
+ # build-defensive-tower-line — Build a Defensive Pillbox LINE Across a WIDE Front
2
+ #
3
+ # REASONING focus: when the threat is funnelled along a WIDE front (a
4
+ # rush spread across the full vertical width of the map, not pinched
5
+ # through a single corridor cell), the right defensive architecture is
6
+ # ONE pillbox per row across the FULL width — a LINE that no enemy unit
7
+ # can slip past on an unguarded row. A dense cluster at the centre row
8
+ # wastes overlapping fire on one cell while the flanks stay open; a
9
+ # scatter near the base never engages the rush at all.
10
+ #
11
+ # This pack is the SISTER / INVERSE of `def-tower-line-vs-cluster` (which
12
+ # enforces CLUSTER topology at a single-cell chokepoint, graph min-cut
13
+ # doctrine). Together the two packs discriminate whether the model
14
+ # understands the FORCING GEOMETRY: a chokepoint (single cell on a wide
15
+ # approach → cluster) vs a wide front (full vertical width where every
16
+ # row carries a rush column → line).
17
+ #
18
+ # Real-world anchor:
19
+ # • military perimeter doctrine — when an attacker can approach across
20
+ # the full width of a sector, perimeter posts cover EVERY lane
21
+ # across the front; concentrating the entire garrison at one post
22
+ # leaves the rest of the front traversable.
23
+ # • firewall / IDS rule placement — one inspection rule per
24
+ # protocol/port across the full inspection surface; duplicated rules
25
+ # on one port leave the rest open.
26
+ # • MicroRTS defense placement — defending a wide approach demands
27
+ # spread coverage; concentrating into one cell of an open arena is
28
+ # known to LOSE to a multi-lane advance.
29
+ #
30
+ # Map: `rush-hour-arena` (128×40, fully open). The agent base sits on
31
+ # the WEST (fact at x=10); the rush wave is injected at the EAST edge
32
+ # (x=100) spread across MULTIPLE distinct y rows that span the full
33
+ # vertical width of the playable area. The `rusher` scripted bot then
34
+ # charges the agent fact centroid, so each row's spawn group walks WEST
35
+ # through the central x=60 column on its way to the fact — different
36
+ # rows cross x=60 at DIFFERENT y values, forcing a LINE topology to
37
+ # intercept every row.
38
+ #
39
+ # pbox is the load-bearing weapon. After the engine pbox-weapon fix
40
+ # (`fix(engine): pbox gets a direct-fire Armament`) a BUILT pbox is an
41
+ # active direct-fire anti-infantry tower (M60mg burst-5; one burst
42
+ # one-shots an e1). With NO pre-placed agent combat screen the pbox
43
+ # LINE is the SOLE source of kill output — a stall / wrong-placement
44
+ # layout kills nothing AND the rush razes the fact.
45
+ #
46
+ # Win predicate (load-bearing decomposition):
47
+ # • `building_count_gte:{pbox,n:K}` — built the full budget;
48
+ # • `building_in_region:{pbox, x:60, y:<rung>, radius:0.5, count:1}`
49
+ # for EACH of the K rung rows across the full front (radius 0.5 ⇒
50
+ # cell-exact; a cluster on the middle row misses every flank rung,
51
+ # a scatter near the base misses every rung);
52
+ # • `units_killed_gte:K` — the pbox LINE must actively KILL the rush,
53
+ # not just stand (a stall / pure-army layout kills 0);
54
+ # • `building_count_gte:{fact,n:1}` (PRESENT-TENSE — `has_building`
55
+ # is the documented CLAUDE.md "ever-seen" footgun);
56
+ # • `within_ticks` + `after_ticks` fail clause ⇒ a non-finisher is a
57
+ # real reachable timeout LOSS (no interrupts ⇒ exactly 90 ticks per
58
+ # step, so `max_turns` is a hard tick budget the `after_ticks`
59
+ # deadline reliably bites in).
60
+ #
61
+ # Discrimination (four-script bar — scripted, no model needed):
62
+ # • stall (observe-only): spends nothing; the rush razes the fact →
63
+ # fact-alive fail clause fires → LOSS. The `after_ticks` deadline is
64
+ # a backstop so a staller who somehow keeps the fact also times out
65
+ # (no draw degeneracy).
66
+ # • cluster-on-centre (K pboxes piled on the y=20 row): satisfies the
67
+ # count but EVERY flank rung region is empty (radius 0.5 ⇒
68
+ # cell-exact, the central pile doesn't touch the flanks); the win
69
+ # never latches and the unguarded flank rows let the rush leak
70
+ # through → LOSS.
71
+ # • scatter-near-base (K pboxes hugging the fact west of x=20): every
72
+ # rung region is empty AND the pboxes are too far west to engage
73
+ # the rush before it reaches the fact → LOSS.
74
+ # • intended LINE (one pbox at each of the K corridor rungs at x=60):
75
+ # every row is covered AND the wave is killed at the central
76
+ # column AND the fact survives → WIN.
77
+ #
78
+ # Engine footguns honoured:
79
+ # • `place_building` does NOT enforce build-adjacency (CLAUDE.md) — the
80
+ # LINE rungs sit deep at x=60 with no nearby agent base; the engine
81
+ # places them anyway.
82
+ # • Building / Defense queues feed from the construction yard
83
+ # (single-stream); the agent must build pboxes serially. The tick
84
+ # budget on every tier gives the LINE time to assemble BEFORE the
85
+ # scheduled rush wave hits.
86
+ # • Hard tier defines TWO agent spawn_point groups (NORTH y=12 / SOUTH
87
+ # y=28) round-robined by seed (CLAUDE.md hard-tier contract). Enemy
88
+ # actors do not honour spawn_point, so the rush wave is fixed on the
89
+ # full-width spawn axis and crosses the LINE rungs identically for
90
+ # both base latitudes — what flips per seed is the BEARING the rush
91
+ # approaches the agent fact from, not the rung topology itself.
92
+ # • An unarmed high-HP enemy `fact` marker at (120,20) keeps the
93
+ # engine alive past full rush annihilation so the win/fail check
94
+ # fires (CLAUDE.md auto-`done` mitigation).
95
+ # • NO pre-placed agent combat screen — one non-combatant e1 in a far
96
+ # corner per spawn group satisfies the hard-tier env-reset
97
+ # non-empty-units check while contributing ZERO kills, so a
98
+ # wrong-placement spend cannot pass the kill clause off it.
99
+
100
  meta:
101
  id: build-defensive-tower-line
102
+ title: 'Build a Defensive Tower LINE Across a WIDE Front (Not a Cluster, Not a Scatter)'
103
  capability: reasoning
104
  real_world_meaning: >
105
+ Where do you commit your defensive towers when the threat is a rush
106
+ spread across the FULL WIDTH of the map — not pinched through a
107
+ single corridor cell, but advancing on every row of a wide front?
108
+ Military perimeter doctrine and firewall rule design both say:
109
+ cover EVERY lane across the front, one post per row, so no enemy
110
+ unit can slip past on an unguarded row. A single dense cluster on
111
+ one row wastes overlapping fire on one cell while every other row
112
+ stays open; a scatter near the base never engages the rush at all.
113
  The win predicate makes the LINE topology load-bearing — total
114
  pillbox count alone is not enough; ≥1 pillbox must sit on EACH of
115
+ the front's rung rows (cell-exact via radius 0.5), AND those
116
+ pillboxes must actually KILL the rush spread across the front.
 
117
  robotics_analogue: >
118
  Network firewall / Web Application Firewall rule placement: when
119
+ every protocol/port could be the path of compromise, the right
120
+ architecture is one rule per port across the full inspection
121
+ surface, not three duplicated rules on one port while the rest stay
122
+ open. Likewise a physical perimeter patrol covers EVERY approach
123
+ lane across the front a cluster at one waypoint or a scatter
124
+ across unrelated nodes both leave the actual front lanes
125
+ traversable. Defense in depth across a wide approach demands one
126
+ responder per lane, not many responders at one waypoint.
127
  benchmark_anchor:
128
  - "ERQA"
129
  - "MicroRTS defense"
130
  - "military perimeter"
131
  author: openra-bench
132
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
133
  base_map: rush-hour-arena
 
134
 
135
  base:
136
  agent:
 
147
  - attack_move
148
  - stop
149
  planning: true
150
+ # No interrupts — perimeter design is a STATIC up-front decision (the
151
+ # front geometry is known a priori, the rush composition is fixed).
152
+ # Dropping interrupts also makes the tick budget deterministic (each
153
+ # step is exactly 90 ticks ⇒ max_turns is a hard tick budget that the
154
+ # `after_ticks` fail clause reliably bites in).
155
  interrupts: {}
156
  termination:
157
  max_ticks: 12000
158
  actors: [] # every level supplies its own actor list via overrides.
159
 
160
  levels:
161
+ # ── EASY ── bare LINE skill. Budget covers exactly 3 pbox (1800cr).
162
+ # Win requires ≥1 pbox in EACH of the 3 front rungs (y=8, 20, 32 at
163
+ # x=60, radius 0.5 — only the exact rung cell counts, so a cluster on
164
+ # y=20 misses the y=8 and y=32 flank rungs). The rush arrives at
165
+ # tick 1500 — after the LINE has had time to assemble. A cluster at
166
+ # (60,20) satisfies the count clause but FAILS the flank rung clauses
167
+ # AND lets the flank rush leak through the open rows; a random
168
+ # scatter near the base misses every rung AND kills nothing; a stall
169
+ # loses on the count clause AND the fact razed by the rush.
 
170
  # max_turns 60 ⇒ reachable tick 93+90·59 = 5403; deadline 5400.
171
  easy:
172
  # Original (pre-verbosity-sweep) description preserved for
173
  # contributors. The trimmed version below removes scripted-policy
174
  # spoilers and cell-coord dumps; load-bearing intent kept.
175
  #
176
+ # A rifle rush will charge across the FULL WIDTH of the front (3
177
+ # distinct rows: y=8, y=20, y=32 at the east edge). Budget is
178
+ # exactly $1800 drop one pillbox on each of those three rows at
179
+ # x=60 so nothing slips past on any row. A cluster on the centre
180
+ # row misses both flank rungs and lets the flank rush leak through;
181
+ # a scatter near the base never engages the rush. Kill at least
182
+ # four and keep your construction yard standing, within about 60
183
+ # turns.
 
 
 
 
 
 
 
 
 
 
 
184
  description: >
185
+ A rifle rush charges across the full vertical width of the map —
186
+ three lanes at y=8, y=20, and y=32 toward your yard on the west.
187
+ Budget $1800 three pillboxes. Drop one on each of those three
188
+ rows at x=60 so no lane is open. A cluster on the centre row
189
+ leaves both flanks unguarded; a scatter near the base never meets
190
+ the rush. Four kills, yard intact, within about 60 turns.
191
+ starting_cash: 1800
192
  overrides:
193
  actors:
194
+ # Pre-placed agent base on the WEST. NO combat units near the
195
+ # base — the pbox LINE must be the sole source of kill output.
196
  - {type: fact, owner: agent, position: [10, 20]}
197
  - {type: tent, owner: agent, position: [14, 18]}
198
  - {type: powr, owner: agent, position: [14, 22]}
199
  # ONE non-combatant agent e1 parked in the far SW corner,
200
+ # nowhere near any rush lane. It exists only so units_summary
201
+ # is non-empty (hard-tier env-reset check); it never reaches
202
+ # combat and contributes ZERO kills — the pbox LINE is the sole
203
+ # source of kill output, so a scatter or stall play cannot pass
204
+ # the kill clause off it.
205
  - {type: e1, owner: agent, position: [4, 36], stance: 2}
206
  # Unarmed high-HP marker (anti-DRAW): keeps the episode alive
207
  # past full rusher elimination so the win/fail check fires.
208
  - {type: fact, owner: enemy, position: [120, 20]}
209
+ # Scheduled rush wave — 6 e1 spread across the full vertical
210
+ # width (2 per row × 3 rows: y=8, y=20, y=32), injected at tick
211
+ # 1500 from the east at x=100. By tick 1500 all 3 LINE pillboxes
212
+ # are built (the 3rd pbox completes ~tick 1350 from a fresh-cash
213
+ # tent + serial defense queue). The rusher charges the agent
214
+ # centroid (fact at (10,20)), so each row's spawn group walks
215
+ # WEST through x=60 — the y=8 and y=32 flank groups cross x=60 at
216
+ # their starting y values, demanding a LINE that covers every
217
+ # row.
218
  scheduled_events:
219
+ - tick: 1500
220
  type: spawn_actors
221
  actors:
222
+ - {type: e1, owner: enemy, position: [100, 8], stance: 3, count: 2}
223
+ - {type: e1, owner: enemy, position: [100, 20], stance: 3, count: 2}
224
+ - {type: e1, owner: enemy, position: [100, 32], stance: 3, count: 2}
225
  win_condition:
226
  all_of:
227
+ - building_count_gte: {type: pbox, n: 3}
228
+ - building_in_region: {type: pbox, x: 60, y: 8, radius: 0.5, count: 1}
229
+ - building_in_region: {type: pbox, x: 60, y: 20, radius: 0.5, count: 1}
230
+ - building_in_region: {type: pbox, x: 60, y: 32, radius: 0.5, count: 1}
 
231
  - units_killed_gte: 4
232
  - building_count_gte: {type: fact, n: 1}
233
  - within_ticks: 5400
 
237
  - not: {building_count_gte: {type: fact, n: 1}}
238
  max_turns: 60
239
 
240
+ # ── MEDIUM ── +1 axis: 5-pbox LINE across the full front (rungs at
241
+ # y=4, 12, 20, 28, 36 finer spacing than easy, covering the full
242
+ # 36-cell playable height). Budget covers exactly 5 pbox (3000cr).
243
+ # The rush wave is 10 e1 (2 per row × 5 rows), so a complete LINE
244
+ # must hold every row — a 3-rung easy-style layout leaves 2 rows
245
+ # unguarded and the flanks leak. A cluster on y=20 misses the y=4 /
246
+ # y=12 / y=28 / y=36 rungs (radius 0.5 ⇒ cell-exact). max_turns 60
247
+ # ⇒ reachable tick 5403; deadline 5400.
248
  medium:
249
  # Original (pre-verbosity-sweep) description preserved for
250
  # contributors. The trimmed version below removes scripted-policy
251
  # spoilers and cell-coord dumps; load-bearing intent kept.
252
  #
253
+ # The rush widens to 5 distinct rows (y=4, 12, 20, 28, 36). Build 5
254
+ # pillboxes (budget 3000cr = exactly 5 pbox at 600 each) AND place
255
+ # ONE on each of those five rows at x=60. The complete 5-rung LINE
256
+ # is required any open rung lets the rush slip past on that row.
257
+ # Kill at least seven and keep your construction yard, within about
258
+ # 60 turns.
 
259
  description: >
260
+ A wider rush now: five lanes at y=4, y=12, y=20, y=28, y=36 spread
261
+ across the full vertical front. Budget $3000 five pillboxes.
262
+ One on each of those five rows at x=60; an easy-style three-rung
263
+ line leaves two flank rows open. Seven kills, yard intact, within
264
+ about 60 turns.
265
+ starting_cash: 3000
266
  overrides:
267
  actors:
268
  - {type: fact, owner: agent, position: [10, 20]}
269
  - {type: tent, owner: agent, position: [14, 18]}
270
  - {type: powr, owner: agent, position: [14, 22]}
271
+ # Two powr to cover the 5-pbox power draw (5×-20 = -100; tent
272
+ # also draws; one powr +100 not enough margin once the rush
273
+ # damages buildings, so use 2 powr for stability).
274
+ - {type: powr, owner: agent, position: [14, 16]}
275
+ # Non-combatant SW-corner e1 (see easy).
276
  - {type: e1, owner: agent, position: [4, 36], stance: 2}
277
  # Anti-DRAW marker.
278
  - {type: fact, owner: enemy, position: [120, 20]}
279
+ # Heavier rush wave: 10 e1 (2 per row × 5 rows), injected at tick
280
+ # 2200 — after the intended 5-pbox LINE has time to assemble (~
281
+ # tick 1700 for the 5th from a fresh-cash tent + serial defense
282
+ # queue). The rusher charges the agent centroid, so each row's
283
+ # spawn group walks WEST through x=60 on its starting y.
284
  scheduled_events:
285
+ - tick: 2200
286
  type: spawn_actors
287
  actors:
288
+ - {type: e1, owner: enemy, position: [100, 4], stance: 3, count: 2}
289
+ - {type: e1, owner: enemy, position: [100, 12], stance: 3, count: 2}
290
+ - {type: e1, owner: enemy, position: [100, 20], stance: 3, count: 2}
291
+ - {type: e1, owner: enemy, position: [100, 28], stance: 3, count: 2}
292
+ - {type: e1, owner: enemy, position: [100, 36], stance: 3, count: 2}
293
  win_condition:
294
  all_of:
295
+ - building_count_gte: {type: pbox, n: 5}
296
+ - building_in_region: {type: pbox, x: 60, y: 4, radius: 0.5, count: 1}
297
+ - building_in_region: {type: pbox, x: 60, y: 12, radius: 0.5, count: 1}
298
+ - building_in_region: {type: pbox, x: 60, y: 20, radius: 0.5, count: 1}
299
+ - building_in_region: {type: pbox, x: 60, y: 28, radius: 0.5, count: 1}
300
+ - building_in_region: {type: pbox, x: 60, y: 36, radius: 0.5, count: 1}
301
  - units_killed_gte: 7
302
  - building_count_gte: {type: fact, n: 1}
303
  - within_ticks: 5400
 
307
  - not: {building_count_gte: {type: fact, n: 1}}
308
  max_turns: 60
309
 
310
+ # ── HARD ── +2 axes: (1) a 6-rung LINE (y=4, 10, 16, 22, 28, 34 at
311
+ # x=60) covering the full front with TIGHTER spacing AND (2) ATTRITION
312
+ # over time via TWO scheduled waves (tick 1800 and tick 3000), so a
313
+ # pbox damaged in wave 1 may fall to wave 2 — the agent must REBUILD
314
+ # any rung the first wave razes before the second wave hits. Budget
315
+ # 4800cr = exactly 8 pbox (6 rungs + 2 rebuilds), so the cash is
316
+ # tight: there is no slack for a 7th rung OR a pbox parked near the
317
+ # base. Hard tier also flips the agent base latitude per seed (NORTH
318
+ # y=12 / SOUTH y=28 round-robined via spawn_point) the LINE
319
+ # topology is identical across seeds (the front rungs are fixed map
320
+ # geometry at x=60) but the agent base bearing flips, so a memorised
321
+ # relative-to-base placement plan cannot generalise. Enemies don't
322
+ # honour spawn_point (CLAUDE.md), so the rush waves inject on both
323
+ # bases' candidate latitudes regardless of seed — but the rush
324
+ # geometry is the SAME because the rusher charges the agent centroid,
325
+ # and on either base latitude the LINE rungs at x=60 are what catches
326
+ # the rush before it reaches the fact. max_turns 70 ⇒ reachable tick
327
+ # 93+90·69 = 6303; deadline 6300.
328
  hard:
329
  # Original (pre-verbosity-sweep) description preserved for
330
  # contributors. The trimmed version below removes scripted-policy
331
  # spoilers and cell-coord dumps; load-bearing intent kept.
332
  #
333
+ # The full front widens to 6 rungs (y=4, 10, 16, 22, 28, 34) and the
334
+ # rush arrives in TWO waves (attrition): wave 1 at tick 1800 plus
335
+ # wave 2 at tick 3000. Budget 4800cr (= 8 pbox) enough for the 6
336
+ # rungs PLUS 2 rebuilds for any rung the first wave razes. Your
337
+ # base latitude flips between NORTH (y=12) and SOUTH (y=28) by seed,
338
+ # so the bearing the rush approaches the fact from changes per
339
+ # seed; the LINE rungs themselves stay fixed at x=60. Kill at least
340
+ # 8, keep the fact, within about 70 turns.
341
  description: >
342
+ The front widens to six lanes (y=4, 10, 16, 22, 28, 34) and the
343
+ rush arrives in TWO waves pbox attrition is real, you must
344
+ rebuild lost rungs between waves. Budget $4800 = six rungs plus
345
+ two rebuilds. Your base flips NORTH/SOUTH by seed; the rungs at
346
+ x=60 don't. Eight kills, yard intact, within about 70 turns.
347
+ starting_cash: 4800
348
  overrides:
349
  actors:
350
+ # spawn_point 0 — NORTH base at y=12. Fact at (10, 12); tent/
351
+ # powr offset west so they aren't directly on the rusher path.
352
+ # Two powr for the 8-pbox power draw (8×-20=-160; 2 powr=+200).
353
+ # Non-combatant corner e1 in the far SW.
354
  - {type: fact, owner: agent, position: [10, 12], spawn_point: 0}
355
  - {type: tent, owner: agent, position: [6, 12], spawn_point: 0}
356
  - {type: powr, owner: agent, position: [6, 14], spawn_point: 0}
357
+ - {type: powr, owner: agent, position: [6, 10], spawn_point: 0}
358
  - {type: e1, owner: agent, position: [4, 36], stance: 2, spawn_point: 0}
359
  # spawn_point 1 — SOUTH base at y=28 (mirror across y=20).
360
+ # Non-combatant corner e1 in the far NW.
361
  - {type: fact, owner: agent, position: [10, 28], spawn_point: 1}
362
  - {type: tent, owner: agent, position: [6, 28], spawn_point: 1}
363
  - {type: powr, owner: agent, position: [6, 26], spawn_point: 1}
364
+ - {type: powr, owner: agent, position: [6, 30], spawn_point: 1}
365
  - {type: e1, owner: agent, position: [4, 4], stance: 2, spawn_point: 1}
366
  # Anti-DRAW marker (enemy fact doesn't honour spawn_point).
367
  - {type: fact, owner: enemy, position: [120, 20]}
368
+ # Two-wave attrition. Each wave is 6 e1 (1 per row × 6 rows) at
369
+ # tick 1800 and tick 3000. The first wave finishes ~tick 2500,
370
+ # giving the agent a ~500-tick window to rebuild any razed rung
371
+ # before wave 2 lands. The serial defense queue can build 1
372
+ # pbox per ~270 ticks, so 2 rebuilds in 500 ticks is the budget
373
+ # the design enforces. A pure-stamp policy that places 6 pboxes
374
+ # and walks away will lose ANY rung wave 1 razed, then the
375
+ # surviving line fails to cover that row in wave 2 → rung clause
376
+ # unmet → LOSS.
377
  scheduled_events:
378
  - tick: 1800
379
  type: spawn_actors
380
  actors:
381
+ - {type: e1, owner: enemy, position: [100, 4], stance: 3, count: 1}
382
+ - {type: e1, owner: enemy, position: [100, 10], stance: 3, count: 1}
383
+ - {type: e1, owner: enemy, position: [100, 16], stance: 3, count: 1}
384
+ - {type: e1, owner: enemy, position: [100, 22], stance: 3, count: 1}
385
+ - {type: e1, owner: enemy, position: [100, 28], stance: 3, count: 1}
386
+ - {type: e1, owner: enemy, position: [100, 34], stance: 3, count: 1}
387
+ - tick: 3000
388
+ type: spawn_actors
389
+ actors:
390
+ - {type: e1, owner: enemy, position: [100, 4], stance: 3, count: 1}
391
+ - {type: e1, owner: enemy, position: [100, 10], stance: 3, count: 1}
392
+ - {type: e1, owner: enemy, position: [100, 16], stance: 3, count: 1}
393
+ - {type: e1, owner: enemy, position: [100, 22], stance: 3, count: 1}
394
+ - {type: e1, owner: enemy, position: [100, 28], stance: 3, count: 1}
395
+ - {type: e1, owner: enemy, position: [100, 34], stance: 3, count: 1}
396
  win_condition:
397
  all_of:
398
+ - building_count_gte: {type: pbox, n: 6}
399
+ - building_in_region: {type: pbox, x: 60, y: 4, radius: 0.5, count: 1}
400
+ - building_in_region: {type: pbox, x: 60, y: 10, radius: 0.5, count: 1}
401
+ - building_in_region: {type: pbox, x: 60, y: 16, radius: 0.5, count: 1}
402
  - building_in_region: {type: pbox, x: 60, y: 22, radius: 0.5, count: 1}
403
+ - building_in_region: {type: pbox, x: 60, y: 28, radius: 0.5, count: 1}
404
+ - building_in_region: {type: pbox, x: 60, y: 34, radius: 0.5, count: 1}
405
+ - units_killed_gte: 8
406
  - building_count_gte: {type: fact, n: 1}
407
  - within_ticks: 6300
408
  fail_condition:
tests/test_build_defensive_tower_line.py CHANGED
@@ -1,13 +1,15 @@
1
  """build-defensive-tower-line scenario family, full loop on Rust.
2
 
3
- The pack tests DEFENSIVE PERIMETER TOPOLOGY: when the threat is funnelled
4
- through a known corridor whose WIDTH matters (y=18..22 at x=60), the right
5
- architecture is one pbox per row across the FULL corridor width (a LINE),
6
- NOT a cluster on the centre row and NOT a scatter near the base. This is
7
- the sibling/inverse of `def-tower-line-vs-cluster` (which forces a
8
- CLUSTER at a single bottleneck cell); together the two packs discriminate
9
- whether the model understands the FORCING GEOMETRY (single-cell chokepoint
10
- vs corridor-width approach).
 
 
11
 
12
  Anchors: ERQA spatial commit / MicroRTS defense placement / military
13
  perimeter (firewall rule placement).
@@ -15,23 +17,26 @@ perimeter (firewall rule placement).
15
  The pbox is the load-bearing weapon. After the engine pbox-weapon fix
16
  (`fix(engine): pbox gets a direct-fire Armament`) a BUILT pbox is an
17
  active direct-fire anti-infantry tower. The rush arrives as a
18
- `scheduled_events: spawn_actors` wave EAST of the corridor at tick 1800
19
- — AFTER the agent has had time to build all 4 pillboxes serially — and
20
- the `rusher` bot charges the agent fact on the west, so the wave is
21
- forced WEST through the x=60 corridor. There are NO pre-placed agent
22
- defenders, so the pbox LINE is the sole source of kill output.
 
 
23
 
24
  The win predicate makes the LINE topology load-bearing — total pbox
25
  count alone is not enough:
26
 
27
- * `building_count_gte:{pbox, n:4}` ⇒ the agent built the full budget;
28
- * `building_in_region:{pbox, x:60, y:Y, radius:0.5, count:1}` for each
29
- of the four corridor rungs Y ∈ {18,19,21,22} ⇒ exactly one pbox per
30
- row across the corridor (a tiny radius 0.5 means only the exact cell
31
- counts, so a cluster on (60,20) misses ALL FOUR rungs and a scatter
32
- near the base misses all four);
 
33
  * `units_killed_gte:K` ⇒ the pbox LINE must actively KILL the rush
34
- funnelled through the corridor (a stall / pure-army layout kills 0);
35
  * `building_count_gte:{fact,n:1}` (present-tense — `has_building` is
36
  the one-shot "ever-seen" set, see CLAUDE.md footgun);
37
  * `within_ticks` paired with `after_ticks` in the fail clause ⇒ a
@@ -39,13 +44,23 @@ count alone is not enough:
39
  pack ⇒ each step is exactly 90 ticks, so max_turns is a hard tick
40
  budget that the `after_ticks` deadline reliably bites in).
41
 
 
 
 
 
 
 
 
 
 
 
42
  The scripted-policy validations prove deterministically that:
43
 
44
- * the intended LINE policy (one pbox at each of the four corridor rung
45
- cells) WINS every level + every hard seed (1..4);
46
- * stall / random-4-pbox (4 pboxes placed near the base, away from the
47
- corridor) both LOSE every level + every hard seed — a real LOSS,
48
- not a draw (the rung clauses are never satisfied);
49
  * the hard tier defines ≥2 spawn_point groups (NORTH base y=12 / SOUTH
50
  base y=28) so a memorised base-relative placement cannot generalise.
51
  """
@@ -65,16 +80,36 @@ PACK = PACKS_DIR / "build-defensive-tower-line.yaml"
65
  LEVELS = ("easy", "medium", "hard")
66
  SEEDS = (1, 2, 3, 4)
67
 
68
- # Required corridor rung cells (the four "rungs" spanning y=18..22 at
69
- # the choke column x=60; y=20 is the middle that the LINE topology
70
- # leaves open by construction placing on y=20 won't satisfy any rung).
71
- RUNGS = [(60, 18), (60, 19), (60, 21), (60, 22)]
 
 
 
 
 
 
72
 
73
- # Cells used by the "random-4-pbox" wrong-topology policy: 4 pboxes
74
- # clustered near the base rather than along the corridor. None of these
75
  # lie inside ANY rung region (radius 0.5 around the rung cells), so the
76
  # region clauses are all unsatisfied.
77
- RANDOM_CELLS_NEAR_BASE = [(20, 18), (22, 20), (24, 22), (26, 19)]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
78
 
79
 
80
  # ── scripted policies ────────────────────────────────────────────────
@@ -86,55 +121,76 @@ def stall(rs, C):
86
  return [C.observe()]
87
 
88
 
89
- def make_line():
90
- """Intended LINE topology: one pbox at EACH of the four corridor
91
- rung cells (60,18) (60,19) (60,21) (60,22)."""
 
 
92
 
93
  def policy(rs, C):
94
  own_b = rs.get("own_buildings") or []
95
- n = sum(1 for b in own_b if b.get("type") == "pbox")
 
 
 
96
  prod = rs.get("production") or []
97
- prod_items = [p.get("item") for p in prod if isinstance(p, dict)]
98
- # Once 4 pboxes are up, idle (the win clause re-evaluates each turn).
99
- if n >= len(RUNGS):
100
- return [C.observe()]
101
- cmds = []
102
- if "pbox" not in prod_items:
103
- cmds.append(C.build("pbox"))
104
- cmds.append(C.place_building("pbox", RUNGS[n][0], RUNGS[n][1]))
105
- return cmds
 
 
 
 
 
106
 
107
  return policy
108
 
109
 
110
- def make_random_4_pbox():
111
- """WRONG TOPOLOGY: 4 pboxes placed near the base (not at the
112
- corridor rungs). Satisfies `building_count_gte:{pbox,n:4}` but
113
- FAILS every rung region (none of the cells lie in any rung's
114
- radius-0.5 disk), so the win predicate cannot fire."""
115
 
116
  def policy(rs, C):
117
  own_b = rs.get("own_buildings") or []
118
  n = sum(1 for b in own_b if b.get("type") == "pbox")
119
  prod = rs.get("production") or []
120
- prod_items = [p.get("item") for p in prod if isinstance(p, dict)]
121
- if n >= len(RANDOM_CELLS_NEAR_BASE):
 
 
122
  return [C.observe()]
123
  cmds = []
124
  if "pbox" not in prod_items:
125
  cmds.append(C.build("pbox"))
126
- cmds.append(
127
- C.place_building(
128
- "pbox",
129
- RANDOM_CELLS_NEAR_BASE[n][0],
130
- RANDOM_CELLS_NEAR_BASE[n][1],
131
- )
132
- )
133
  return cmds
134
 
135
  return policy
136
 
137
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
138
  # ── scenario-shape invariants ────────────────────────────────────────
139
 
140
 
@@ -148,8 +204,9 @@ def test_pack_compiles_with_three_levels_and_rusher_bot():
148
  assert "ERQA" in anchors, anchors
149
  assert "MicroRTS defense" in anchors, anchors
150
  assert "military perimeter" in anchors, anchors
151
- # Rusher bot wired through (charges agent centroid → forces the
152
- # rush path through the corridor on every seed).
 
153
  for lvl in LEVELS:
154
  c = compile_level(pack, lvl)
155
  assert c.map_supported
@@ -159,14 +216,18 @@ def test_pack_compiles_with_three_levels_and_rusher_bot():
159
  assert str(bot).lower() == "rusher", (lvl, bot)
160
 
161
 
162
- def test_starting_cash_is_exact_pbox_budget():
163
- """The cash is intentionally tight (4 pbox at 600 each = 2400 on
164
- every level, zero slack). A model that spends on units OR extra
165
- power runs out before the count clause is satisfied."""
 
 
166
  pack = load_pack(PACK)
167
  for lvl in LEVELS:
168
  c = compile_level(pack, lvl)
169
- assert c.starting_cash == 2400, (lvl, c.starting_cash)
 
 
170
 
171
 
172
  @pytest.mark.parametrize("level", LEVELS)
@@ -208,28 +269,30 @@ def test_fact_alive_clause_uses_present_tense_predicate():
208
  assert fact_clauses, f"{lvl}: missing present-tense fact-alive fail clause"
209
 
210
 
211
- def test_win_requires_one_pbox_per_corridor_rung():
212
- """The LINE-enforcement contract: every level's win clause requires
213
- exactly one pbox in EACH of the four corridor rungs at x=60
214
- y∈{18,19,21,22}. A cluster on the centre row (y=20) misses all four
215
- rungs because each rung region has radius 0.5 (cell-exact)."""
216
- for lvl in LEVELS:
217
- c = compile_level(load_pack(PACK), lvl)
218
- wc = c.win_condition.model_dump(exclude_none=True)
219
- rungs_seen = set()
220
- for clause in wc.get("all_of", []) or []:
221
- br = clause.get("building_in_region")
222
- if (
223
- isinstance(br, dict)
224
- and br.get("type") == "pbox"
225
- and int(br.get("x", -1)) == 60
226
- and int(br.get("count", 0)) == 1
227
- and float(br.get("radius", 0)) <= 1.0
228
- ):
229
- rungs_seen.add(int(br["y"]))
230
- assert rungs_seen == {18, 19, 21, 22}, (
231
- f"{lvl}: corridor rungs y∈{{18,19,21,22}} required, got {sorted(rungs_seen)}"
232
- )
 
 
233
 
234
 
235
  def test_win_requires_a_kill_quota():
@@ -251,7 +314,9 @@ def test_win_requires_a_kill_quota():
251
  def test_rush_arrives_as_a_scheduled_event():
252
  """The rush is injected via `scheduled_events: spawn_actors` AFTER the
253
  LINE has time to assemble — there is no t=0 enemy band racing the
254
- build. This is what makes the build/rush race fair."""
 
 
255
  for lvl in LEVELS:
256
  pack = load_pack(PACK)
257
  raw = pack.levels[lvl]
@@ -260,25 +325,34 @@ def test_rush_arrives_as_a_scheduled_event():
260
  ov = ov.model_dump(exclude_none=True)
261
  evts = ov.get("scheduled_events") or []
262
  assert evts, f"{lvl}: expected a scheduled rush wave"
263
- assert any(e.get("type") == "spawn_actors" for e in evts), (lvl, evts)
 
 
 
 
 
264
 
265
 
266
  def test_no_pre_placed_agent_combat_screen():
267
  """The pbox LINE must be the sole kill source — there is no
268
  pre-placed agent combat screen ringing the base. Only ONE
269
- non-combatant agent e1 is parked in a far corner (so units_summary
270
- is non-empty for the hard-tier env-reset check); it never fights."""
 
271
  for lvl in LEVELS:
272
  c = compile_level(load_pack(PACK), lvl)
273
  agent_units = [
274
  a for a in c.scenario.actors
275
  if a.owner == "agent" and a.type == "e1"
276
  ]
277
- # At most one non-combatant marker per active spawn group.
 
 
278
  assert len(agent_units) <= 2, (lvl, [a.position for a in agent_units])
279
  for a in agent_units:
280
  x, y = a.position
281
- # Parked in a far corner, well clear of the y=18..22 lane.
 
282
  assert x <= 6 and (y <= 6 or y >= 34), (lvl, a.position)
283
 
284
 
@@ -292,7 +366,7 @@ def test_hard_has_two_spawn_point_groups():
292
  if a.owner == "agent" and a.spawn_point is not None
293
  }
294
  assert groups == {0, 1}, groups
295
- # In-bounds check (rush-hour-arena playable y ≈ 2..38, x ≈ 2..126):
296
  for a in c.scenario.actors:
297
  x, y = a.position
298
  assert 2 <= x <= 126 and 2 <= y <= 38, (a.type, a.position)
@@ -305,7 +379,7 @@ def test_hard_has_two_spawn_point_groups():
305
  def test_intended_line_wins_every_level_and_seed(level):
306
  c = compile_level(load_pack(PACK), level)
307
  for seed in SEEDS:
308
- r = run_level(c, make_line(), seed=seed)
309
  assert r.outcome == "win", (
310
  f"{level} seed{seed}: intended LINE topology must WIN; "
311
  f"got {r.outcome} (tick={r.signals.game_tick}, "
@@ -322,19 +396,20 @@ def test_intended_line_wins_every_level_and_seed(level):
322
  @pytest.mark.parametrize(
323
  "policy_name,policy_factory",
324
  [
325
- ("stall", lambda: stall),
326
- ("random_4_pbox", lambda: make_random_4_pbox()),
 
327
  ],
328
  )
329
  def test_lazy_and_wrong_topology_policies_lose_every_level_and_seed(
330
  level, policy_name, policy_factory
331
  ):
332
- """Stall (rush razes fact AND clock runs out with no pbox) and
333
- random-4-pbox (count satisfied but every rung region unsatisfied,
334
- so the win never fires and the clock runs out) must ALL LOSE on
335
- every level + every seed — no draw."""
336
  c = compile_level(load_pack(PACK), level)
337
- fn = policy_factory()
338
  for seed in SEEDS:
339
  r = run_level(c, fn, seed=seed)
340
  assert r.outcome == "loss", (
@@ -349,8 +424,8 @@ def test_lazy_and_wrong_topology_policies_lose_every_level_and_seed(
349
 
350
  def test_intended_run_is_deterministic_on_easy():
351
  c = compile_level(load_pack(PACK), "easy")
352
- a = run_level(c, make_line(), seed=3)
353
- b = run_level(c, make_line(), seed=3)
354
  assert (a.outcome, a.turns, a.signals.units_killed) == (
355
  b.outcome,
356
  b.turns,
 
1
  """build-defensive-tower-line scenario family, full loop on Rust.
2
 
3
+ The pack tests DEFENSIVE PERIMETER TOPOLOGY ACROSS A WIDE FRONT: when
4
+ the threat is a rush spread across the FULL VERTICAL WIDTH of the map
5
+ (multiple distinct rows simultaneously), not pinched through a single
6
+ corridor cell, the right architecture is one pbox per row across the
7
+ FULL width (a LINE). A dense cluster on one row leaves every other row
8
+ unguarded; a scatter near the base never engages the rush. This is the
9
+ sibling/inverse of `def-tower-line-vs-cluster` (which forces a CLUSTER
10
+ at a single bottleneck cell); together the two packs discriminate
11
+ whether the model understands the FORCING GEOMETRY (single-cell
12
+ chokepoint vs wide-front approach).
13
 
14
  Anchors: ERQA spatial commit / MicroRTS defense placement / military
15
  perimeter (firewall rule placement).
 
17
  The pbox is the load-bearing weapon. After the engine pbox-weapon fix
18
  (`fix(engine): pbox gets a direct-fire Armament`) a BUILT pbox is an
19
  active direct-fire anti-infantry tower. The rush arrives as a
20
+ `scheduled_events: spawn_actors` wave (or two waves, on hard) EAST of
21
+ the central column spread across multiple distinct rows AFTER the
22
+ agent has had time to build its LINE serially, and the `rusher` bot
23
+ charges the agent fact on the west, so each row's spawn group walks
24
+ WEST through the x=60 column on its starting y. There are NO pre-
25
+ placed agent defenders, so the pbox LINE is the sole source of kill
26
+ output.
27
 
28
  The win predicate makes the LINE topology load-bearing — total pbox
29
  count alone is not enough:
30
 
31
+ * `building_count_gte:{pbox, n:K}` ⇒ the agent built the full budget
32
+ (K = 3 easy / 5 medium / 6 hard);
33
+ * `building_in_region:{pbox, x:60, y:Y, radius:0.5, count:1}` for
34
+ EACH of the K front rungs exactly one pbox per row across the
35
+ front (a tiny radius 0.5 means only the exact cell counts, so a
36
+ cluster on (60,20) misses all flank rungs and a scatter near the
37
+ base misses every rung);
38
  * `units_killed_gte:K` ⇒ the pbox LINE must actively KILL the rush
39
+ spread across the front (a stall / pure-army layout kills 0);
40
  * `building_count_gte:{fact,n:1}` (present-tense — `has_building` is
41
  the one-shot "ever-seen" set, see CLAUDE.md footgun);
42
  * `within_ticks` paired with `after_ticks` in the fail clause ⇒ a
 
44
  pack ⇒ each step is exactly 90 ticks, so max_turns is a hard tick
45
  budget that the `after_ticks` deadline reliably bites in).
46
 
47
+ Per-tier design:
48
+ * easy — 3-pbox LINE (rungs y=8/20/32), budget $1800, one wave.
49
+ * medium — 5-pbox LINE (rungs y=4/12/20/28/36), budget $3000, one wave.
50
+ * hard — 6-pbox LINE (rungs y=4/10/16/22/28/34), budget $4800
51
+ (= 6 rungs + 2 rebuilds), TWO scheduled waves (tick 1800 + tick
52
+ 3000) so a rung the first wave razes must be REBUILT before wave 2.
53
+ Hard also flips the agent base latitude per seed (NORTH y=12 /
54
+ SOUTH y=28 via spawn_point) so a memorised relative-to-base
55
+ placement cannot generalise.
56
+
57
  The scripted-policy validations prove deterministically that:
58
 
59
+ * the intended LINE policy (one pbox at each front rung, with rebuild
60
+ on hard) WINS every level + every hard seed (1..4);
61
+ * stall / cluster-on-centre / scatter-near-base all LOSE every level +
62
+ every hard seed — a real LOSS, not a draw (the rung clauses are
63
+ never satisfied);
64
  * the hard tier defines ≥2 spawn_point groups (NORTH base y=12 / SOUTH
65
  base y=28) so a memorised base-relative placement cannot generalise.
66
  """
 
80
  LEVELS = ("easy", "medium", "hard")
81
  SEEDS = (1, 2, 3, 4)
82
 
83
+ # Per-tier rung topology (front rows across the full vertical width at
84
+ # x=60). A cluster on the centre row misses every flank rung because the
85
+ # region predicate uses radius 0.5 (cell-exact).
86
+ RUNGS_BY_LEVEL = {
87
+ "easy": [(60, 8), (60, 20), (60, 32)],
88
+ "medium": [(60, 4), (60, 12), (60, 20), (60, 28), (60, 36)],
89
+ "hard": [(60, 4), (60, 10), (60, 16), (60, 22), (60, 28), (60, 34)],
90
+ }
91
+
92
+ CASH_BY_LEVEL = {"easy": 1800, "medium": 3000, "hard": 4800}
93
 
94
+ # Cells used by the "scatter near base" wrong-topology policy: pboxes
95
+ # clustered near the base rather than across the front. None of these
96
  # lie inside ANY rung region (radius 0.5 around the rung cells), so the
97
  # region clauses are all unsatisfied.
98
+ SCATTER_NEAR_BASE_BY_LEVEL = {
99
+ "easy": [(20, 18), (22, 20), (24, 22)],
100
+ "medium": [(20, 18), (22, 20), (24, 22), (26, 19), (18, 21)],
101
+ "hard": [(20, 18), (22, 20), (24, 22), (26, 19), (18, 21), (16, 23)],
102
+ }
103
+
104
+ # Cells used by the "cluster on centre row" wrong-topology policy:
105
+ # pboxes piled on the centre rung (y=20). Satisfies the count clause
106
+ # (and on hard meets the y=20-adjacent rungs if they exist) but misses
107
+ # every other rung because radius 0.5 is cell-exact.
108
+ CLUSTER_ON_CENTRE_BY_LEVEL = {
109
+ "easy": [(60, 19), (60, 20), (60, 21)],
110
+ "medium": [(60, 18), (60, 19), (60, 20), (60, 21), (60, 22)],
111
+ "hard": [(60, 17), (60, 18), (60, 19), (60, 20), (60, 21), (60, 22)],
112
+ }
113
 
114
 
115
  # ── scripted policies ────────────────────────────────────────────────
 
121
  return [C.observe()]
122
 
123
 
124
+ def make_line(level: str):
125
+ """Intended LINE topology: one pbox at EACH front rung. On hard the
126
+ policy also REBUILDS any rung the first wave razes (the cash budget
127
+ has slack for ≤2 rebuilds across the two-wave attrition)."""
128
+ rungs = RUNGS_BY_LEVEL[level]
129
 
130
  def policy(rs, C):
131
  own_b = rs.get("own_buildings") or []
132
+ pboxes = [b for b in own_b if b.get("type") == "pbox"]
133
+ present_cells = {
134
+ (int(b["cell_x"]), int(b["cell_y"])) for b in pboxes
135
+ }
136
  prod = rs.get("production") or []
137
+ prod_items = [
138
+ p.get("item") for p in prod if isinstance(p, dict)
139
+ ]
140
+ # Find the first rung that is currently uncovered (initial
141
+ # build or post-attrition rebuild) and (build +) place there.
142
+ for cell in rungs:
143
+ if cell not in present_cells:
144
+ cmds = []
145
+ if "pbox" not in prod_items:
146
+ cmds.append(C.build("pbox"))
147
+ cmds.append(C.place_building("pbox", cell[0], cell[1]))
148
+ return cmds
149
+ # All rungs currently covered — idle.
150
+ return [C.observe()]
151
 
152
  return policy
153
 
154
 
155
+ def _wrong_topology_policy(cells):
156
+ """Pile pboxes at a fixed list of cells (count-only, no rung
157
+ rebuilding). Used for cluster-on-centre and scatter-near-base."""
158
+ cells = list(cells)
 
159
 
160
  def policy(rs, C):
161
  own_b = rs.get("own_buildings") or []
162
  n = sum(1 for b in own_b if b.get("type") == "pbox")
163
  prod = rs.get("production") or []
164
+ prod_items = [
165
+ p.get("item") for p in prod if isinstance(p, dict)
166
+ ]
167
+ if n >= len(cells):
168
  return [C.observe()]
169
  cmds = []
170
  if "pbox" not in prod_items:
171
  cmds.append(C.build("pbox"))
172
+ cmds.append(C.place_building("pbox", cells[n][0], cells[n][1]))
 
 
 
 
 
 
173
  return cmds
174
 
175
  return policy
176
 
177
 
178
+ def make_cluster_on_centre(level: str):
179
+ """WRONG TOPOLOGY: K pboxes piled on the centre row (y≈20).
180
+ Satisfies the count and the y=20 rung (if it exists) but misses
181
+ every flank rung because the rung regions are radius 0.5 (cell-
182
+ exact). The unguarded flank rows let the rush leak past."""
183
+ return _wrong_topology_policy(CLUSTER_ON_CENTRE_BY_LEVEL[level])
184
+
185
+
186
+ def make_scatter_near_base(level: str):
187
+ """WRONG TOPOLOGY: K pboxes hugging the fact west of x=20. Misses
188
+ every front rung AND too far west to engage the rush before it
189
+ reaches the fact (on harder tiers the flank rows reach the fact
190
+ without ever encountering the LINE)."""
191
+ return _wrong_topology_policy(SCATTER_NEAR_BASE_BY_LEVEL[level])
192
+
193
+
194
  # ── scenario-shape invariants ────────────────────────────────────────
195
 
196
 
 
204
  assert "ERQA" in anchors, anchors
205
  assert "MicroRTS defense" in anchors, anchors
206
  assert "military perimeter" in anchors, anchors
207
+ # Rusher bot wired through (charges agent centroid → forces each
208
+ # row's rush column WEST through the central x=60 LINE on every
209
+ # seed).
210
  for lvl in LEVELS:
211
  c = compile_level(pack, lvl)
212
  assert c.map_supported
 
216
  assert str(bot).lower() == "rusher", (lvl, bot)
217
 
218
 
219
+ def test_starting_cash_scales_per_tier_for_pbox_budget():
220
+ """Cash is intentionally tight per tier exactly K pboxes for
221
+ easy (K=3) and medium (K=5), plus a 2-pbox rebuild margin for hard
222
+ (K=6+2 rebuilds for the two-wave attrition) so a model that
223
+ spends on units OR extra rebuilds beyond the design cannot pass
224
+ the count clause."""
225
  pack = load_pack(PACK)
226
  for lvl in LEVELS:
227
  c = compile_level(pack, lvl)
228
+ assert c.starting_cash == CASH_BY_LEVEL[lvl], (
229
+ lvl, c.starting_cash, CASH_BY_LEVEL[lvl]
230
+ )
231
 
232
 
233
  @pytest.mark.parametrize("level", LEVELS)
 
269
  assert fact_clauses, f"{lvl}: missing present-tense fact-alive fail clause"
270
 
271
 
272
+ @pytest.mark.parametrize("level", LEVELS)
273
+ def test_win_requires_one_pbox_per_front_rung(level):
274
+ """The LINE-enforcement contract: each level's win clause requires
275
+ exactly one pbox in EACH of the front rungs at x=60 spanning the
276
+ full vertical width. A cluster on the centre row (y=20) misses
277
+ every flank rung because each rung region has radius 0.5 (cell-
278
+ exact). The rungs grow per tier (3 easy / 5 medium / 6 hard)."""
279
+ expected = {y for (_, y) in RUNGS_BY_LEVEL[level]}
280
+ c = compile_level(load_pack(PACK), level)
281
+ wc = c.win_condition.model_dump(exclude_none=True)
282
+ rungs_seen = set()
283
+ for clause in wc.get("all_of", []) or []:
284
+ br = clause.get("building_in_region")
285
+ if (
286
+ isinstance(br, dict)
287
+ and br.get("type") == "pbox"
288
+ and int(br.get("x", -1)) == 60
289
+ and int(br.get("count", 0)) == 1
290
+ and float(br.get("radius", 0)) <= 1.0
291
+ ):
292
+ rungs_seen.add(int(br["y"]))
293
+ assert rungs_seen == expected, (
294
+ f"{level}: front rungs y∈{sorted(expected)} required, got {sorted(rungs_seen)}"
295
+ )
296
 
297
 
298
  def test_win_requires_a_kill_quota():
 
314
  def test_rush_arrives_as_a_scheduled_event():
315
  """The rush is injected via `scheduled_events: spawn_actors` AFTER the
316
  LINE has time to assemble — there is no t=0 enemy band racing the
317
+ build. This is what makes the build/rush race fair. Hard tier has
318
+ TWO scheduled waves (attrition mechanic)."""
319
+ expected_wave_counts = {"easy": 1, "medium": 1, "hard": 2}
320
  for lvl in LEVELS:
321
  pack = load_pack(PACK)
322
  raw = pack.levels[lvl]
 
325
  ov = ov.model_dump(exclude_none=True)
326
  evts = ov.get("scheduled_events") or []
327
  assert evts, f"{lvl}: expected a scheduled rush wave"
328
+ spawn_waves = [e for e in evts if e.get("type") == "spawn_actors"]
329
+ assert spawn_waves, (lvl, evts)
330
+ assert len(spawn_waves) == expected_wave_counts[lvl], (
331
+ f"{lvl}: expected {expected_wave_counts[lvl]} spawn_actors waves, "
332
+ f"got {len(spawn_waves)} ({evts})"
333
+ )
334
 
335
 
336
  def test_no_pre_placed_agent_combat_screen():
337
  """The pbox LINE must be the sole kill source — there is no
338
  pre-placed agent combat screen ringing the base. Only ONE
339
+ non-combatant agent e1 is parked in a far corner (per spawn group)
340
+ so units_summary is non-empty for the hard-tier env-reset check;
341
+ it never fights."""
342
  for lvl in LEVELS:
343
  c = compile_level(load_pack(PACK), lvl)
344
  agent_units = [
345
  a for a in c.scenario.actors
346
  if a.owner == "agent" and a.type == "e1"
347
  ]
348
+ # At most one non-combatant marker per active spawn group
349
+ # (hard has 2 spawn_point groups so up to 2 corner e1s
350
+ # declared; only the active spawn group's e1 is materialised).
351
  assert len(agent_units) <= 2, (lvl, [a.position for a in agent_units])
352
  for a in agent_units:
353
  x, y = a.position
354
+ # Parked in a far corner, well clear of the rush lanes
355
+ # spread across y=4..36 at x=60.
356
  assert x <= 6 and (y <= 6 or y >= 34), (lvl, a.position)
357
 
358
 
 
366
  if a.owner == "agent" and a.spawn_point is not None
367
  }
368
  assert groups == {0, 1}, groups
369
+ # In-bounds check (rush-hour-arena playable y ≈ 2..38, x ≈ 2..126).
370
  for a in c.scenario.actors:
371
  x, y = a.position
372
  assert 2 <= x <= 126 and 2 <= y <= 38, (a.type, a.position)
 
379
  def test_intended_line_wins_every_level_and_seed(level):
380
  c = compile_level(load_pack(PACK), level)
381
  for seed in SEEDS:
382
+ r = run_level(c, make_line(level), seed=seed)
383
  assert r.outcome == "win", (
384
  f"{level} seed{seed}: intended LINE topology must WIN; "
385
  f"got {r.outcome} (tick={r.signals.game_tick}, "
 
396
  @pytest.mark.parametrize(
397
  "policy_name,policy_factory",
398
  [
399
+ ("stall", lambda lvl: stall),
400
+ ("cluster_on_centre", lambda lvl: make_cluster_on_centre(lvl)),
401
+ ("scatter_near_base", lambda lvl: make_scatter_near_base(lvl)),
402
  ],
403
  )
404
  def test_lazy_and_wrong_topology_policies_lose_every_level_and_seed(
405
  level, policy_name, policy_factory
406
  ):
407
+ """Stall (rush razes fact AND clock runs out with no pbox), cluster-
408
+ on-centre (count satisfied but every flank rung unmet), and scatter-
409
+ near-base (every rung region unmet, rush reaches fact past unguarded
410
+ front) must ALL LOSE on every level + every seed — no draw."""
411
  c = compile_level(load_pack(PACK), level)
412
+ fn = policy_factory(level)
413
  for seed in SEEDS:
414
  r = run_level(c, fn, seed=seed)
415
  assert r.outcome == "loss", (
 
424
 
425
  def test_intended_run_is_deterministic_on_easy():
426
  c = compile_level(load_pack(PACK), "easy")
427
+ a = run_level(c, make_line("easy"), seed=3)
428
+ b = run_level(c, make_line("easy"), seed=3)
429
  assert (a.outcome, a.turns, a.signals.units_killed) == (
430
  b.outcome,
431
  b.turns,