yxc20098 commited on
Commit
bb986ee
·
1 Parent(s): 193f926

feat(scenario): build-sequence-tech-most-resilient — redundant-prereq tech path survives a strike (PlanBench robust planning anchor)

Browse files
openra_bench/scenarios/packs/build-sequence-tech-most-resilient.yaml ADDED
@@ -0,0 +1,385 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # build-sequence-tech-most-resilient.yaml
2
+ #
3
+ # REASONING capability — Wave-11 robust build-order planning. The
4
+ # agent must REACH a tech capability (a powered war factory) AND
5
+ # KEEP it through a mid-episode strike. The classic resilient-design
6
+ # inversion: a build order that provisions only ONE power plant
7
+ # (`powr`) is a single point of failure — a scheduled enemy strike
8
+ # razes that powr mid-episode and the war factory drops to LOW POWER
9
+ # (the engine's `power_drained > power_provided` gate halves
10
+ # production speed), so the army never completes inside the budget.
11
+ # The resilient build order pre-builds a SECOND, redundant `powr`
12
+ # BEFORE the strike: when the strike razes one plant the other keeps
13
+ # the grid in surplus, production runs at full speed, and the army
14
+ # finishes on the clock.
15
+ #
16
+ # This is the PlanBench "robust planning" idiom — find a plan that
17
+ # achieves the goal AND survives a disturbance — and the classic
18
+ # N+1 resilient-design rule: never run a critical capability on a
19
+ # single point of failure; provision the redundant prerequisite
20
+ # AHEAD of the foreseen failure, not after it.
21
+ #
22
+ # ── ENGINE FACTS verified on the live engine (2026-05-20) ─────────
23
+ #
24
+ # 1. Power model (vendor RA rules, loaded via GameRules::from_ruleset):
25
+ # POWR Power +100 PROC Power -30
26
+ # WEAP Power -30 FIX Power -30 FACT Power 0
27
+ # A `World.compute_player_power` recompute sums every live (not
28
+ # powered-down) building each snapshot. Power budget on the
29
+ # INTENDED path (pre-placed proc -30, fix -30, exposed powr +100;
30
+ # agent builds redundant powr +100, weap -30):
31
+ # before strike: provided 200, drained 90 → surplus +110
32
+ # after strike (one powr razed): provided 100, drained 90 → +10
33
+ # so the resilient base stays in SURPLUS through the strike. On
34
+ # the SINGLE-powr path (no redundant powr) the strike razes the
35
+ # only plant: provided 0, drained 90 → drained > provided → LOW
36
+ # POWER for the rest of the episode.
37
+ # 2. LOW-POWER PRODUCTION SLOWDOWN (world.rs ~L3234): when a player's
38
+ # `power_drained > power_provided`, the production queue advances
39
+ # only on odd ticks — 50% speed. This is the load-bearing teeth:
40
+ # after the strike a single-`powr` base goes low-power and its
41
+ # `2tnk` queue crawls; a two-`powr` base stays in surplus and
42
+ # produces at full speed.
43
+ # 3. `2tnk` (medium tank) costs 850; its build prerequisites are
44
+ # `weap` (war factory) AND `fix` (service depot). `weap`'s
45
+ # prerequisite is `proc` (ore refinery). `proc` and `fix` are
46
+ # pre-placed here, so the agent's build task is purely
47
+ # powr-redundancy + weap + the tank army. (`fix` drains -30
48
+ # power too — folded into the power budget below.)
49
+ # 4. `scheduled_events: destroy_actors` (Wave-9 engine feature,
50
+ # oramap.rs::read_scheduled_events / env.rs::fire_scheduled_events)
51
+ # removes every actor matching `filter:` (owner + optional
52
+ # circular region) when `world_tick >= tick`. Here it razes the
53
+ # ONE pre-placed exposed `powr` at tick 1500 — the mid-episode
54
+ # strike. The region is tight around the exposed powr so it can
55
+ # never catch the agent's redundant powr in the safe west base.
56
+ # 5. `then:` happened-before composite — clause k latches only after
57
+ # clause k-1 has been observed true. `[has_building:powr,
58
+ # has_building:weap]` encodes "power the grid, THEN stand up the
59
+ # factory" (the engine refuses `weap` before its `proc` prereq;
60
+ # `powr` is the grid the whole tech path runs on).
61
+ # 6. `building_count_gte` reads the LIVE `own_buildings` list per
62
+ # frame (NOT the accumulating `own_building_types` set used by
63
+ # `has_building`). `building_count_gte:{powr,1}` toggles FALSE
64
+ # the instant the last live `powr` is razed and only re-satisfies
65
+ # if a redundant `powr` is still standing — this is the redundancy
66
+ # teeth the win predicate hangs on.
67
+ # 7. `place_building` does NOT enforce build-adjacency (CLAUDE.md);
68
+ # a `build('powr') + place_building` chain works at arbitrary
69
+ # in-bounds coords — the redundant powr goes in the safe west
70
+ # base next to the Construction Yard.
71
+ # 8. The persistent unarmed enemy `fact` marker far east keeps the
72
+ # engine all-enemies-eliminated auto-`done` path gated, so a
73
+ # non-winner reaches the deadline as a real LOSS, not a DRAW
74
+ # (CLAUDE.md engine auto-`done` footgun).
75
+ # 9. Tick alignment (CLAUDE.md): max tick ≈ 93 + 90·(max_turns−1).
76
+ # easy max_turns 60 → ceiling 5403 ≥ within_ticks 5400, fail
77
+ # after_ticks 5401 ✓. medium/hard max_turns 50 → ceiling 4503 ≥
78
+ # within_ticks 4500, fail after_ticks 4501 ✓.
79
+ # 10. Hard `spawn_point` rule (CLAUDE.md oramap.rs): ANY agent actor
80
+ # with a spawn_point causes agent actors WITHOUT one to be
81
+ # filtered out — the FULL base (fact + proc + fix + harv +
82
+ # exposed powr) is DUPLICATED across both spawn groups at
83
+ # spawn-matched cells. Enemy / neutral actors do NOT honour
84
+ # spawn_point; the
85
+ # exposed-powr strike region and the strike geometry are
86
+ # duplicated per latitude.
87
+ #
88
+ # ── THE BAR (CLAUDE.md "no defect, no cheat") ─────────────────────
89
+ #
90
+ # • stall (observe only) — builds nothing; the exposed powr is
91
+ # razed at tick 1500, `weap` is never built, `2tnk` count stays
92
+ # 0 → after_ticks LOSS.
93
+ # • single-powr (build weap + spam `2tnk`, never a redundant powr)
94
+ # — relies on the lone exposed powr. The strike razes it at tick
95
+ # 1500; the base drops to low power (drained 60 > provided 0) and
96
+ # the `2tnk` queue runs at 50% for the rest of the episode.
97
+ # `building_count_gte:{powr,1}` is FALSE (no live powr) AND the
98
+ # tank army cannot finish before the deadline → after_ticks
99
+ # LOSS. This is the single-point-of-failure inversion the pack
100
+ # is built to catch.
101
+ # • intended resilient (build a redundant 2nd `powr` in the safe
102
+ # west base BEFORE the strike, build `weap`, produce 3× `2tnk`)
103
+ # — the strike razes the exposed powr but the redundant one
104
+ # survives; the grid stays in surplus, production runs at full
105
+ # speed, the army finishes on the clock. WIN.
106
+ #
107
+ # Real LOSS not DRAW: `fail after_ticks: T+1` is reachable inside
108
+ # max_turns and the enemy `fact` marker blocks the auto-done path.
109
+ #
110
+ # Validate (no model / no network):
111
+ # cd /Users/berta/Projects/OpenRA-Bench && \
112
+ # python3 -m pytest tests/test_build_sequence_tech_most_resilient.py -q
113
+
114
+ meta:
115
+ id: build-sequence-tech-most-resilient
116
+ title: 'Resilient War Factory — Redundant Power Survives a Strike (N+1 Build Order)'
117
+ capability: reasoning
118
+ real_world_meaning: >
119
+ Robust build-order planning: reach a tech capability AND keep it
120
+ through a foreseeable disturbance. The agent must bring a war
121
+ factory online and field an armoured force, but a mid-episode
122
+ enemy strike razes one power plant. A build order that provisions
123
+ only a single power plant is a single point of failure — when the
124
+ strike lands the factory drops to low power and the army never
125
+ completes in time. The resilient build order pre-builds a second,
126
+ redundant power plant before the strike, so the grid stays in
127
+ surplus and production never slows. Tests whether the model plans
128
+ for the disturbance (N+1 redundancy on the critical prerequisite)
129
+ rather than merely planning the shortest path to the goal.
130
+ robotics_analogue: >
131
+ N+1 redundancy on a critical utility. An autonomous production
132
+ cell depends on a power feed to run its assembly machine; a known
133
+ hazard will knock out one feed mid-shift. Resilient planning
134
+ commissions a second, independent feed BEFORE the outage, so the
135
+ assembly machine never drops below rated throughput. Provisioning
136
+ only one feed — the shortest plan to first article — halts the
137
+ line the moment the hazard strikes and blows the delivery
138
+ deadline.
139
+ benchmark_anchor:
140
+ - "PlanBench robust planning"
141
+ - "N+1 resilient design"
142
+ - "redundancy"
143
+ author: openra-bench-wave-11
144
+
145
+ # rush-hour-arena: 128×40, playable bounds (2,2,124,36). Agent base
146
+ # at the WEST (x≈8..20). The inherited exposed `powr` sits forward at
147
+ # the EAST edge of the base; a scheduled strike razes it at tick 1500.
148
+ # The redundant `powr` belongs in the safe west base next to `fact`.
149
+ base_map: rush-hour-arena
150
+
151
+ base:
152
+ agent: {faction: allies}
153
+ # No scripted bot — the only threat is the scripted `destroy_actors`
154
+ # strike on the exposed powr. A hunt bot would turn an N+1 build-
155
+ # order test into a combat-survival test.
156
+ enemy: {faction: soviet, bot_type: ''}
157
+ # Build palette: build + place_building drive the redundant powr
158
+ # and the war factory + tank army; harvest keeps income credible;
159
+ # move_units + stop allow repositioning. No offensive verbs — this
160
+ # is a build-order planning pack.
161
+ tools: [observe, build, place_building, harvest, move_units, stop]
162
+ spawn_mcvs: false
163
+ planning: true
164
+ termination: {max_ticks: 8000}
165
+ actors: []
166
+
167
+ levels:
168
+ # ── EASY ─────────────────────────────────────────────────────────
169
+ # Bare skill: recognise that the inherited exposed power plant is a
170
+ # single point of failure that WILL be razed, pre-build a redundant
171
+ # `powr` in the safe west base, build the war factory, field 3
172
+ # tanks. Generous clock (within_ticks 5400, max_turns 60 → ceiling
173
+ # 5403 ✓). The strike fires at tick 1500.
174
+ easy:
175
+ description: >
176
+ You inherit a partial base — a Construction Yard ('fact'), an
177
+ Ore Refinery ('proc'), a Service Depot ('fix'), an Ore Truck
178
+ ('harv') with an ore patch, and ONE Power Plant ('powr'). That
179
+ power plant sits FORWARD at the east edge of your base and is
180
+ EXPOSED: an enemy strike will RAZE it at tick 1500. It is your
181
+ only power. If it is your only power when the strike lands,
182
+ your grid goes negative, your war factory drops to half
183
+ production speed, and your tank army cannot finish in time. To
184
+ stay resilient: build a SECOND Power Plant ('build' "powr",
185
+ cost 300) and place it next to your Construction Yard in the
186
+ safe west base BEFORE tick 1500, build a War Factory ('build'
187
+ "weap", cost 2000), then produce three medium tanks ('build'
188
+ "2tnk", cost 850 each). WIN = you brought power then a war
189
+ factory online, you still own a Power Plant, you have 3 medium
190
+ tanks, and you still own your Construction Yard, before tick
191
+ 5400. Stalling, or relying on the single exposed power plant
192
+ with no redundant backup, misses the bar.
193
+ starting_cash: 6000
194
+ overrides:
195
+ actors:
196
+ # ── Safe WEST base ─────────────────────────────────────────
197
+ - {type: fact, owner: agent, position: [8, 18]}
198
+ - {type: proc, owner: agent, position: [12, 18]}
199
+ # Service Depot — the `2tnk` build prerequisite (alongside
200
+ # `weap`). Pre-placed so the agent's build task is purely the
201
+ # redundant powr + war factory + tank army.
202
+ - {type: fix, owner: agent, position: [16, 18]}
203
+ # Income (credible, not load-bearing for the win predicate).
204
+ - {type: harv, owner: agent, position: [12, 22]}
205
+ - {type: mine, owner: neutral, position: [20, 22]}
206
+ # ── The EXPOSED single-point-of-failure Power Plant ────────
207
+ # Forward at the east edge of the base. Razed by the tick-1500
208
+ # `destroy_actors` strike. Far enough from the west base that
209
+ # the strike region can never catch a redundant powr placed
210
+ # next to the Construction Yard.
211
+ - {type: powr, owner: agent, position: [40, 18]}
212
+ # Persistent far enemy marker — LOSS-not-DRAW guarantee.
213
+ - {type: fact, owner: enemy, position: [115, 30]}
214
+ scheduled_events:
215
+ # Mid-episode strike: raze the exposed forward Power Plant at
216
+ # tick 1500. The region is a tight circle around (40,18); the
217
+ # safe west base (x≈8..16) is ~24 cells away, well outside
218
+ # radius 6, so a redundant powr there is never caught.
219
+ - tick: 1500
220
+ type: destroy_actors
221
+ filter:
222
+ owner: agent
223
+ region: {x: 40, y: 18, radius: 6}
224
+ win_condition:
225
+ all_of:
226
+ # Reach the tech capability IN ORDER: power the grid, then
227
+ # stand up the war factory.
228
+ - then:
229
+ id: bsr-easy
230
+ clauses:
231
+ - {has_building: powr}
232
+ - {has_building: weap}
233
+ # ≥1 Power Plant ALIVE at end — FALSE after the strike unless
234
+ # a redundant powr was built (the redundancy teeth).
235
+ - building_count_gte: {type: powr, n: 1}
236
+ # 3 medium tanks — only reachable at FULL production speed,
237
+ # i.e. with the grid kept in surplus through the strike.
238
+ - unit_type_count_gte: {type: '2tnk', n: 3}
239
+ # Construction Yard alive (mirrors the fail clause).
240
+ - building_count_gte: {type: fact, n: 1}
241
+ - within_ticks: 5400
242
+ fail_condition:
243
+ any_of:
244
+ - after_ticks: 5401
245
+ - not: {building_count_gte: {type: fact, n: 1}}
246
+ max_turns: 60
247
+
248
+ # ── MEDIUM ───────────────────────────────────────────────────────
249
+ # +1 controlled variable: the clock tightens (within_ticks 4500,
250
+ # max_turns 50 → ceiling 4503 ✓). The resilient N+1 build order
251
+ # still wins comfortably, but a hesitant opening that dallies
252
+ # before committing the redundant powr now risks the deadline. The
253
+ # single-point-of-failure failure modes lose exactly as on easy.
254
+ medium:
255
+ description: >
256
+ Same inherited base as easy — a Construction Yard, an Ore
257
+ Refinery, a Service Depot, an Ore Truck with an ore patch, and
258
+ ONE EXPOSED Power Plant forward at the east edge that an enemy
259
+ strike will RAZE at tick 1500. Build a SECOND Power Plant
260
+ ('build' "powr", 300) in the safe west base next to your
261
+ Construction Yard BEFORE tick 1500, build a War Factory
262
+ ('build' "weap", 2000), then produce three medium tanks
263
+ ('build' "2tnk", 850 each).
264
+ The deadline is tighter — tick 4500 — so commit the redundant
265
+ power plant early; do not wait for the exposed one to fall. If
266
+ the strike leaves you with no power, the war factory halves its
267
+ output and the army misses the clock. WIN = you brought power
268
+ then a war factory online, you still own a Power Plant, you
269
+ have 3 medium tanks, and you still own your Construction Yard,
270
+ before tick 4500. Stalling, or relying on the single exposed
271
+ power plant, misses the bar.
272
+ starting_cash: 6000
273
+ overrides:
274
+ actors:
275
+ - {type: fact, owner: agent, position: [8, 18]}
276
+ - {type: proc, owner: agent, position: [12, 18]}
277
+ - {type: fix, owner: agent, position: [16, 18]}
278
+ - {type: harv, owner: agent, position: [12, 22]}
279
+ - {type: mine, owner: neutral, position: [20, 22]}
280
+ - {type: powr, owner: agent, position: [40, 18]}
281
+ - {type: fact, owner: enemy, position: [115, 30]}
282
+ scheduled_events:
283
+ - tick: 1500
284
+ type: destroy_actors
285
+ filter:
286
+ owner: agent
287
+ region: {x: 40, y: 18, radius: 6}
288
+ win_condition:
289
+ all_of:
290
+ - then:
291
+ id: bsr-medium
292
+ clauses:
293
+ - {has_building: powr}
294
+ - {has_building: weap}
295
+ - building_count_gte: {type: powr, n: 1}
296
+ - unit_type_count_gte: {type: '2tnk', n: 3}
297
+ - building_count_gte: {type: fact, n: 1}
298
+ - within_ticks: 4500
299
+ fail_condition:
300
+ any_of:
301
+ - after_ticks: 4501
302
+ - not: {building_count_gte: {type: fact, n: 1}}
303
+ max_turns: 50
304
+
305
+ # ── HARD ─────────────────────────────────────────────────────────
306
+ # +1 controlled variable on top of medium: TWO seed-driven AGENT
307
+ # spawn_point groups (NORTH base y=12 / SOUTH base y=26) round-
308
+ # robined by seed. Per CLAUDE.md `spawn_point` rules: ANY agent
309
+ # actor with spawn_point ⇒ agent actors WITHOUT one are filtered
310
+ # out, so the FULL base (fact + proc + fix + harv + exposed powr)
311
+ # is DUPLICATED across both spawn groups at spawn-matched cells.
312
+ # Enemy / neutral actors do NOT honour spawn_point; the strike
313
+ # region is duplicated per latitude (a `destroy_actors` whose
314
+ # region misses the active base simply removes nothing). A
315
+ # memorised "place the redundant powr at (11,18)" opening cannot
316
+ # generalise — the agent must read the actual Construction Yard
317
+ # latitude and place the redundant power plant beside it.
318
+ hard:
319
+ description: >
320
+ Same N+1 build-order task as medium (one EXPOSED Power Plant
321
+ forward at the east edge that an enemy strike razes at tick
322
+ 1500, $6000, tick 4500 deadline) but your base may begin in
323
+ the NORTH band (y≈12) OR the SOUTH band (y≈26) of the map
324
+ depending on the seed. Read the Construction Yard's actual
325
+ position from the observation and place the redundant Power
326
+ Plant beside it in the safe west base BEFORE tick 1500; build a
327
+ War Factory; then produce three medium tanks. A memorised
328
+ placement cell will mis-place out of build radius on one of the
329
+ two spawns. WIN = you brought power then a war factory online,
330
+ you still own a Power Plant, you have 3 medium tanks, and you
331
+ still own your Construction Yard, before tick 4500. The same
332
+ single-point-of-failure plays — stalling, or relying on the
333
+ lone exposed power plant — lose as on medium.
334
+ starting_cash: 6000
335
+ overrides:
336
+ actors:
337
+ # ── SPAWN 0 (NORTH base, y=12) ─────────────────────────────
338
+ - {type: fact, owner: agent, position: [8, 12], spawn_point: 0}
339
+ - {type: proc, owner: agent, position: [12, 12], spawn_point: 0}
340
+ - {type: fix, owner: agent, position: [16, 12], spawn_point: 0}
341
+ - {type: harv, owner: agent, position: [12, 16], spawn_point: 0}
342
+ - {type: powr, owner: agent, position: [40, 12], spawn_point: 0}
343
+ # ── SPAWN 1 (SOUTH base, y=26) ─────────────────────────────
344
+ - {type: fact, owner: agent, position: [8, 26], spawn_point: 1}
345
+ - {type: proc, owner: agent, position: [12, 26], spawn_point: 1}
346
+ - {type: fix, owner: agent, position: [16, 26], spawn_point: 1}
347
+ - {type: harv, owner: agent, position: [12, 30], spawn_point: 1}
348
+ - {type: powr, owner: agent, position: [40, 26], spawn_point: 1}
349
+ # Neutral ore patches — one per latitude (neutral actors
350
+ # ignore the spawn_point filter, like enemy actors).
351
+ - {type: mine, owner: neutral, position: [20, 16]}
352
+ - {type: mine, owner: neutral, position: [20, 30]}
353
+ # Persistent far enemy marker — LOSS-not-DRAW guarantee.
354
+ - {type: fact, owner: enemy, position: [115, 33]}
355
+ scheduled_events:
356
+ # Strike regions DUPLICATED per latitude (enemy/neutral and
357
+ # scheduled events do not honour spawn_point). The region
358
+ # matching the dormant latitude removes nothing; the one
359
+ # matching the active base razes its exposed powr.
360
+ - tick: 1500
361
+ type: destroy_actors
362
+ filter:
363
+ owner: agent
364
+ region: {x: 40, y: 12, radius: 6}
365
+ - tick: 1500
366
+ type: destroy_actors
367
+ filter:
368
+ owner: agent
369
+ region: {x: 40, y: 26, radius: 6}
370
+ win_condition:
371
+ all_of:
372
+ - then:
373
+ id: bsr-hard
374
+ clauses:
375
+ - {has_building: powr}
376
+ - {has_building: weap}
377
+ - building_count_gte: {type: powr, n: 1}
378
+ - unit_type_count_gte: {type: '2tnk', n: 3}
379
+ - building_count_gte: {type: fact, n: 1}
380
+ - within_ticks: 4500
381
+ fail_condition:
382
+ any_of:
383
+ - after_ticks: 4501
384
+ - not: {building_count_gte: {type: fact, n: 1}}
385
+ max_turns: 50
tests/test_build_sequence_tech_most_resilient.py ADDED
@@ -0,0 +1,352 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """build-sequence-tech-most-resilient pack — full no-cheat validation.
2
+
3
+ Wave-11 REASONING — robust build-order planning. The agent must REACH
4
+ a tech capability (a powered war factory) AND KEEP it through a
5
+ mid-episode strike. A scheduled `destroy_actors` event razes the one
6
+ exposed power plant at tick 1500. A build order that provisions only
7
+ ONE `powr` is a single point of failure — when the strike lands the
8
+ grid goes negative, the war factory drops to 50% production speed
9
+ (engine low-power slowdown) and `building_count_gte:{powr,1}` is
10
+ FALSE for the rest of the episode. The resilient build order
11
+ pre-builds a SECOND, redundant `powr` in the safe west base BEFORE
12
+ the strike: one plant survives, the grid stays in surplus, the army
13
+ finishes on the clock.
14
+
15
+ Bar (CLAUDE.md "no defect, no cheat"):
16
+ - stall (observe only) ⇒ LOSS on every (level, seed)
17
+ - single-powr (no redundant powr) ⇒ LOSS on every (level, seed)
18
+ - intended resilient (redundant ⇒ WIN on every (level, seed)
19
+ 2nd powr, then weap, then 3×2tnk)
20
+ Real LOSS not DRAW: `fail after_ticks:T+1` reachable inside
21
+ max_turns; the persistent far enemy `fact` blocks the engine
22
+ auto-done path.
23
+
24
+ Scenario shape:
25
+ - rush-hour-arena, allies vs soviet (bot disabled).
26
+ - easy: within_ticks 5400, max_turns 60 — generous.
27
+ - medium: within_ticks 4500, max_turns 50 — tighter clock.
28
+ - hard: within_ticks 4500, max_turns 50 — +2 spawn_point groups
29
+ (NORTH base y=12 / SOUTH base y=26, round-robined).
30
+
31
+ Measured (seed 1, scripted policies): the intended resilient policy
32
+ WINS at ~tick 3243 on every level; stall and single-powr LOSE on the
33
+ deadline.
34
+ """
35
+
36
+ from __future__ import annotations
37
+
38
+ import pytest
39
+
40
+ pytest.importorskip("openra_train", reason="Rust env wheel not installed")
41
+
42
+ from openra_bench.eval_core import run_level
43
+ from openra_bench.scenarios import load_pack
44
+ from openra_bench.scenarios.loader import PACKS_DIR, compile_level
45
+
46
+ PACK = PACKS_DIR / "build-sequence-tech-most-resilient.yaml"
47
+ LEVELS = ("easy", "medium", "hard")
48
+ SEEDS = (1, 2, 3, 4)
49
+
50
+
51
+ # ── Policies ──────────────────────────────────────────────────────
52
+
53
+
54
+ def _stall_policy():
55
+ """Do nothing — must LOSE on the clock on every level/seed."""
56
+ def pol(obs, Cmd):
57
+ return [Cmd.observe()]
58
+ return pol
59
+
60
+
61
+ def _single_powr_policy():
62
+ """Single-point-of-failure play: build the war factory and spam
63
+ `2tnk`, but NEVER build a redundant power plant. The lone exposed
64
+ `powr` is razed at tick 1500 → the grid goes negative → 50%
65
+ production AND `building_count_gte:{powr,1}` is false → LOSS."""
66
+ ms = {"weap": False}
67
+
68
+ def pol(obs, Cmd):
69
+ ob = obs.get("own_buildings", []) or []
70
+ own = {b["type"] for b in ob}
71
+ prod = obs.get("production", []) or []
72
+ base = [b for b in ob if b["type"] == "fact"]
73
+ cmds = []
74
+ if "weap" in own:
75
+ ms["weap"] = True
76
+ if not ms["weap"]:
77
+ if "weap" not in prod:
78
+ cmds.append(Cmd.build("weap"))
79
+ if base:
80
+ cmds.append(Cmd.place_building(
81
+ "weap", base[0]["cell_x"] + 5, base[0]["cell_y"]
82
+ ))
83
+ else:
84
+ if "2tnk" not in prod:
85
+ cmds.append(Cmd.build("2tnk"))
86
+ return cmds or [Cmd.observe()]
87
+ return pol
88
+
89
+
90
+ def _intended_policy():
91
+ """Resilient N+1 build order: build a redundant `powr` in the safe
92
+ west base (placed relative to the actual Construction Yard so it
93
+ generalises across the hard-tier spawn variation), then the war
94
+ factory, then 3× `2tnk`. Must WIN on every (level, seed)."""
95
+ ms = {"powr2": False, "weap": False}
96
+
97
+ def pol(obs, Cmd):
98
+ ob = obs.get("own_buildings", []) or []
99
+ prod = obs.get("production", []) or []
100
+ base = [b for b in ob if b["type"] == "fact"]
101
+ # The redundant powr lives in the safe west base (x<30);
102
+ # the exposed inherited powr sits forward at x=40.
103
+ safe_powr = [
104
+ b for b in ob if b["type"] == "powr" and b["cell_x"] < 30
105
+ ]
106
+ weap_b = [b for b in ob if b["type"] == "weap"]
107
+ cmds = []
108
+ if safe_powr:
109
+ ms["powr2"] = True
110
+ if weap_b:
111
+ ms["weap"] = True
112
+ if not ms["powr2"]:
113
+ if "powr" not in prod:
114
+ cmds.append(Cmd.build("powr"))
115
+ if base:
116
+ cmds.append(Cmd.place_building(
117
+ "powr", base[0]["cell_x"] + 3, base[0]["cell_y"] + 4
118
+ ))
119
+ elif not ms["weap"]:
120
+ if "weap" not in prod:
121
+ cmds.append(Cmd.build("weap"))
122
+ if base:
123
+ cmds.append(Cmd.place_building(
124
+ "weap", base[0]["cell_x"] + 6, base[0]["cell_y"]
125
+ ))
126
+ else:
127
+ if "2tnk" not in prod:
128
+ cmds.append(Cmd.build("2tnk"))
129
+ return cmds or [Cmd.observe()]
130
+ return pol
131
+
132
+
133
+ # ── Pack-shape tests (cheap; do not run the engine) ──────────────
134
+
135
+
136
+ def test_pack_compiles_with_three_levels():
137
+ pack = load_pack(PACK)
138
+ assert pack.meta.id == "build-sequence-tech-most-resilient"
139
+ assert pack.meta.capability == "reasoning"
140
+ assert set(pack.levels) == {"easy", "medium", "hard"}
141
+
142
+
143
+ def test_meta_benchmark_anchor_set():
144
+ """meta.benchmark_anchor must cite PlanBench robust planning,
145
+ N+1 resilient design and redundancy (the seed taxonomy)."""
146
+ pack = load_pack(PACK)
147
+ anchors = pack.meta.benchmark_anchor or []
148
+ assert any("PlanBench" in a for a in anchors), anchors
149
+ assert any("N+1" in a or "resilient" in a for a in anchors), anchors
150
+ assert any("redundancy" in a.lower() for a in anchors), anchors
151
+
152
+
153
+ def test_every_level_has_fail_condition():
154
+ """No silent draws — every level must be able to emit a LOSS."""
155
+ pack = load_pack(PACK)
156
+ for lvl in LEVELS:
157
+ c = compile_level(pack, lvl)
158
+ assert c.fail_condition is not None, f"{lvl} missing fail_condition"
159
+
160
+
161
+ def test_then_composite_used_in_win():
162
+ """The win must wire the powr→weap happened-before chain — the
163
+ 'reach the tech capability in order' clause."""
164
+ for lvl in LEVELS:
165
+ c = compile_level(load_pack(PACK), lvl)
166
+ win = c.win_condition.model_dump(exclude_none=True)
167
+ inner = win.get("all_of") or []
168
+ then = next((cl["then"] for cl in inner if "then" in cl), None)
169
+ assert then is not None, f"{lvl} win missing then-chain: {win}"
170
+ clauses = then.get("clauses") or []
171
+ assert len(clauses) == 2, (
172
+ f"{lvl} then-chain must be powr→weap (2 clauses); got {clauses}"
173
+ )
174
+ assert clauses[0].get("has_building") == "powr"
175
+ assert clauses[1].get("has_building") == "weap"
176
+
177
+
178
+ def test_win_requires_surviving_powr_three_tanks_and_fact():
179
+ """Structural: the win clause must require a LIVE Power Plant
180
+ (`building_count_gte:{powr,1}` — the redundancy teeth that toggle
181
+ FALSE when the exposed powr is razed), three medium tanks
182
+ (`unit_type_count_gte:{2tnk,3}`), a live Construction Yard, and a
183
+ `within_ticks` deadline. `building_count_gte` (live-list) — NOT
184
+ `has_building` (accumulating set) — is mandatory for the powr
185
+ clause so it toggles false on the strike."""
186
+ for lvl in LEVELS:
187
+ c = compile_level(load_pack(PACK), lvl)
188
+ all_of = c.win_condition.model_dump(exclude_none=True).get("all_of", [])
189
+ powr = next(
190
+ (x["building_count_gte"] for x in all_of
191
+ if "building_count_gte" in x
192
+ and (x["building_count_gte"] or {}).get("type") == "powr"),
193
+ None,
194
+ )
195
+ assert powr is not None and int(powr.get("n", 0)) >= 1, (
196
+ f"{lvl}: win must require building_count_gte powr>=1 "
197
+ f"(a live power plant survives the strike)"
198
+ )
199
+ tanks = next(
200
+ (x["unit_type_count_gte"] for x in all_of
201
+ if "unit_type_count_gte" in x
202
+ and (x["unit_type_count_gte"] or {}).get("type") == "2tnk"),
203
+ None,
204
+ )
205
+ assert tanks is not None and int(tanks.get("n", 0)) >= 3, (
206
+ f"{lvl}: win must require unit_type_count_gte 2tnk>=3"
207
+ )
208
+ fact = next(
209
+ (x["building_count_gte"] for x in all_of
210
+ if "building_count_gte" in x
211
+ and (x["building_count_gte"] or {}).get("type") == "fact"),
212
+ None,
213
+ )
214
+ assert fact is not None and int(fact.get("n", 0)) >= 1, (
215
+ f"{lvl}: win must require building_count_gte fact>=1"
216
+ )
217
+ assert any("within_ticks" in x for x in all_of), (
218
+ f"{lvl}: win must include a within_ticks deadline"
219
+ )
220
+
221
+
222
+ def test_tick_budget_aligned_with_max_turns():
223
+ """within_ticks must be reachable inside max_turns and the fail
224
+ `after_ticks` must equal within_ticks+1 (real LOSS, no draw, no
225
+ overlap). Engine advances ~90 ticks/turn → reachable = 93 +
226
+ 90·(max_turns-1)."""
227
+ pack = load_pack(PACK)
228
+ for lvl in LEVELS:
229
+ c = compile_level(pack, lvl)
230
+ reachable = 93 + 90 * (c.max_turns - 1)
231
+ all_of = c.win_condition.model_dump(exclude_none=True).get("all_of", [])
232
+ wt = next(int(x["within_ticks"]) for x in all_of if "within_ticks" in x)
233
+ assert wt <= reachable, (
234
+ f"{lvl}: within_ticks={wt} > reachable={reachable} "
235
+ f"(max_turns={c.max_turns}) — deadline never bites"
236
+ )
237
+ fail = c.fail_condition.model_dump(exclude_none=True)
238
+ after = next(
239
+ int(x["after_ticks"]) for x in fail["any_of"] if "after_ticks" in x
240
+ )
241
+ assert after <= reachable, (
242
+ f"{lvl}: fail after_ticks {after} unreachable within "
243
+ f"{c.max_turns} turns (max {reachable}) — draw degeneracy"
244
+ )
245
+ assert after == wt + 1, (
246
+ f"{lvl}: after_ticks {after} must equal within_ticks+1 ({wt+1})"
247
+ )
248
+
249
+
250
+ def test_exactly_one_exposed_powr_pre_placed():
251
+ """The single-point-of-failure premise: each tier pre-places
252
+ EXACTLY ONE agent `powr` (the exposed forward plant). The
253
+ redundant second power plant must be BUILT by the agent — it is
254
+ not given. Hard duplicates the base across two spawn groups, so
255
+ each spawn group still ships exactly one exposed powr."""
256
+ for lvl in LEVELS:
257
+ c = compile_level(load_pack(PACK), lvl)
258
+ powrs = [
259
+ a for a in c.scenario.actors
260
+ if a.owner == "agent" and a.type == "powr"
261
+ ]
262
+ if lvl == "hard":
263
+ per_spawn = {}
264
+ for a in powrs:
265
+ sp = a.spawn_point if a.spawn_point is not None else 0
266
+ per_spawn[sp] = per_spawn.get(sp, 0) + 1
267
+ assert per_spawn and all(v == 1 for v in per_spawn.values()), (
268
+ f"hard: each spawn group must pre-place exactly one "
269
+ f"exposed powr; got {per_spawn}"
270
+ )
271
+ else:
272
+ assert len(powrs) == 1, (
273
+ f"{lvl}: must pre-place exactly one exposed agent powr; "
274
+ f"got {len(powrs)}"
275
+ )
276
+
277
+
278
+ def test_scheduled_destroy_event_razes_the_exposed_powr():
279
+ """Each tier must declare a `scheduled_events: destroy_actors`
280
+ that fires mid-episode (before the deadline) on the agent, with a
281
+ region tight around the exposed forward powr (x≈40) so it can
282
+ never catch a redundant powr placed in the safe west base."""
283
+ for lvl in LEVELS:
284
+ c = compile_level(load_pack(PACK), lvl)
285
+ evs = c.scheduled_events or []
286
+ destroys = [e for e in evs if e.get("type") == "destroy_actors"]
287
+ assert destroys, f"{lvl}: needs a destroy_actors scheduled event"
288
+ for e in destroys:
289
+ assert e["filter"]["owner"] == "agent"
290
+ reg = e["filter"]["region"]
291
+ assert reg["x"] == 40, (
292
+ f"{lvl}: strike region must be centred on the exposed "
293
+ f"forward powr at x=40; got {reg}"
294
+ )
295
+ assert e["tick"] < 4500, (
296
+ f"{lvl}: strike must fire mid-episode (before the "
297
+ f"deadline); got tick {e['tick']}"
298
+ )
299
+
300
+
301
+ def test_hard_tier_has_seed_driven_spawn_groups():
302
+ """Hard must define >=2 agent spawn_point groups so the seed
303
+ varies the start base (tests/test_hard_tier.py::UPGRADED)."""
304
+ c = compile_level(load_pack(PACK), "hard")
305
+ sp = {a.spawn_point for a in c.scenario.actors if a.owner == "agent"}
306
+ assert len(sp) >= 2, f"hard needs >=2 spawn groups, got {sp}"
307
+
308
+
309
+ # ── Engine-bound tests (parameterised over seeds 1..4) ────────────
310
+
311
+
312
+ @pytest.mark.parametrize("seed", SEEDS)
313
+ @pytest.mark.parametrize("level", LEVELS)
314
+ def test_intended_resilient_policy_wins(level, seed):
315
+ """The intended resilient play (redundant 2nd powr → weap → 3×
316
+ 2tnk) must WIN on every (level, seed). The load-bearing test that
317
+ the pack is solvable inside the budget by the advertised
318
+ robust-planning capability."""
319
+ c = compile_level(load_pack(PACK), level)
320
+ res = run_level(c, _intended_policy(), seed=seed)
321
+ assert res.outcome == "win", (
322
+ f"intended resilient must WIN on {level} s={seed}; got "
323
+ f"{res.outcome} (tick={res.signals.game_tick}, "
324
+ f"buildings={sorted(res.signals.own_building_types)})"
325
+ )
326
+
327
+
328
+ @pytest.mark.parametrize("seed", SEEDS)
329
+ @pytest.mark.parametrize("level", LEVELS)
330
+ def test_stall_policy_loses(level, seed):
331
+ """A stall (observe-only) builds nothing — the exposed powr is
332
+ razed, no weap, no tanks → must LOSE on every (level, seed)."""
333
+ c = compile_level(load_pack(PACK), level)
334
+ res = run_level(c, _stall_policy(), seed=seed)
335
+ assert res.outcome == "loss", (
336
+ f"stall must LOSE on {level} s={seed}; got {res.outcome}"
337
+ )
338
+
339
+
340
+ @pytest.mark.parametrize("seed", SEEDS)
341
+ @pytest.mark.parametrize("level", LEVELS)
342
+ def test_single_powr_policy_loses(level, seed):
343
+ """The single-point-of-failure play — build the war factory and
344
+ produce tanks but NEVER a redundant power plant — must LOSE on
345
+ every (level, seed): the strike razes the lone powr, so
346
+ `building_count_gte:{powr,1}` is false at the deadline."""
347
+ c = compile_level(load_pack(PACK), level)
348
+ res = run_level(c, _single_powr_policy(), seed=seed)
349
+ assert res.outcome == "loss", (
350
+ f"single-powr (no redundancy) must LOSE on {level} s={seed}; "
351
+ f"got {res.outcome} (tick={res.signals.game_tick})"
352
+ )