yxc20098 commited on
Commit
1fdd42a
·
1 Parent(s): e6644c4

feat(scenario): build-sell-and-rebuild-elsewhere — sell exposed building + rebuild safe (capital reallocation anchor)

Browse files
openra_bench/scenarios/packs/build-sell-and-rebuild-elsewhere.yaml ADDED
@@ -0,0 +1,353 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # build-sell-and-rebuild-elsewhere.yaml
2
+ #
3
+ # REASONING capability — Wave-8 Group "capital reallocation" pack
4
+ # (SC2 sell mechanic for refund / financial reallocation / business
5
+ # CAPEX redeployment anchor). The agent owns a Construction Yard
6
+ # (`fact`) and a Power Plant (`powr`) at the safe north-west corner
7
+ # and an exposed Refinery (`proc`) at the centre lane. A scripted
8
+ # `hunt` band is incoming on the centre lane and will raze the proc
9
+ # within ~25-30 turns (the proc has no defenders and no inherent
10
+ # weapons). Starting cash alone is NOT enough to build a NEW proc
11
+ # (1400 build cost > starting cash on every tier); the only path
12
+ # to a fresh proc inside the tick budget is:
13
+ #
14
+ # 1. `sell` the EXPOSED proc ⇒ refunds 50% of cost (700)
15
+ # 2. `build('proc')` + `place_building` ⇒ at the safe target region
16
+ #
17
+ # Without selling, the agent has insufficient cash to build a new
18
+ # proc (build cost 1400 > starting cash on every tier and there is
19
+ # no income source); even if the agent attempts to wait, the proc
20
+ # is razed by the hunt band ⇒ refund forfeit ⇒ still no cash for a
21
+ # new proc. Building-without-selling stalls on insufficient funds
22
+ # (`build('proc')` is rejected until cash ≥ 1400 — production
23
+ # gates on cash > cost). The intended SELL-THEN-REBUILD play wins.
24
+ #
25
+ # Real-world anchors:
26
+ # - SC2 sell mechanic for refund (the classic mechanic this pack
27
+ # is named for — partial salvage of a structure for partial
28
+ # refund to redeploy production elsewhere)
29
+ # - Financial / CAPEX reallocation (liquidate a deteriorating
30
+ # asset and redeploy the capital to a safer / forward site)
31
+ # - PlanBench resource reallocation (the planner must reason that
32
+ # the partial refund + new build is cheaper than any alternative
33
+ # under the budget — building from cash alone is impossible)
34
+ #
35
+ # DISCRIMINATIONS (no defect, no cheat):
36
+ # - stall (observe only): LOSS — proc razed; no new proc placed
37
+ # anywhere (let alone in the safe region) ⇒ building_in_region
38
+ # clause unmet AND clock runs out.
39
+ # - build-without-selling: LOSS — `build('proc')` is rejected
40
+ # until cash ≥ 1400; the agent has no income source within the
41
+ # tick budget, so the build never starts ⇒ no proc in safe
42
+ # region.
43
+ # - sell-but-place-in-wrong-region: LOSS — refunded cash + cash
44
+ # covers a new proc, but placed in the central lane (or
45
+ # anywhere outside the safe region) ⇒ building_in_region
46
+ # clause unmet (the new proc is in the wrong cell band).
47
+ # - sell-then-rebuild-at-safe-region (intended): WIN — refund
48
+ # funds the new proc, placed at the safe far-west target
49
+ # region ⇒ all win clauses satisfied within tick budget.
50
+ #
51
+ # ENGINE FACTS (CLAUDE.md):
52
+ # - proc cost 1400 → sell refund = 700 (50% of build cost;
53
+ # world.rs::order_sell / estimate_building_sell_value).
54
+ # - PROC has `Prerequisites: anypower` → POWR pre-placed at the
55
+ # safe corner so the new proc can be built.
56
+ # - `fact` cost 0 ⇒ NOT buildable via StartProduction (engine
57
+ # gates on cost > 0). Pre-place the agent fact on the SAFE
58
+ # corner so `has_building:fact` is satisfied turn 1; the
59
+ # rebuild target is a `proc`, not a fact.
60
+ # - `place_building` does NOT enforce build-adjacency — orders
61
+ # work at arbitrary in-bounds coords (CLAUDE.md), so the safe
62
+ # target region can be far from the surviving fact.
63
+ # - `building_in_region` is the present-tense region-and-type
64
+ # predicate (vs `has_building` which is a one-shot "ever
65
+ # seen" set; the proc must STAND at the safe region at the
66
+ # win check).
67
+ # - `hunt` bot: each enemy unit attacks NEAREST foe (CLAUDE.md
68
+ # scripted_bot — not centroid like rusher). With proc on the
69
+ # centre lane (front piece) and fact far in the NW corner,
70
+ # each hunt unit picks the proc first, then walks west for the
71
+ # fact only after the proc falls. This gives the agent a
72
+ # ~25-30 turn proc-razing window during which the sell+build
73
+ # must complete, and the fact survives long enough to satisfy
74
+ # the win clause.
75
+ # - within_ticks paired with after_ticks ⇒ a non-finisher is a
76
+ # real reachable timeout LOSS (CLAUDE.md rule 2 — deadline
77
+ # must be reachable below max_turns).
78
+ # - Inert enemy `fact` far east (anti-DRAW: keeps the episode
79
+ # alive past hunt-band death so the win/fail check fires;
80
+ # engine auto-`done`s on enemy-elimination otherwise).
81
+ #
82
+ # Validate (no model / no network):
83
+ # cd /Users/berta/Projects/OpenRA-Bench && \
84
+ # python3 -m pytest tests/test_build_sell_and_rebuild_elsewhere.py -q
85
+
86
+ meta:
87
+ id: build-sell-and-rebuild-elsewhere
88
+ title: 'Sell and Rebuild Elsewhere — Recoup Capital, Relocate Production'
89
+ capability: reasoning
90
+ real_world_meaning: >
91
+ A forward refinery (proc) sits in the path of an incoming enemy
92
+ hunt band that will raze it within ~25-30 turns; starting cash
93
+ alone does not cover building a new proc at the safe target
94
+ region. The only path to a fresh proc inside the tick budget
95
+ is to SELL the exposed proc (recouping 50% of its build cost)
96
+ and use the refund plus starting cash to BUILD a new proc and
97
+ PLACE it at the safe target region far from the rush. Stalling,
98
+ building without selling (cash gated), and placing the new
99
+ proc in the wrong region all lose; only sell-then-rebuild-at-
100
+ safe-region wins.
101
+ robotics_analogue: >
102
+ Liquidate a deteriorating asset to fund a relocation: a forward
103
+ production node is about to be lost to environmental damage,
104
+ and the capital reserve alone is insufficient to commission a
105
+ replacement node elsewhere. The right move is a deliberate
106
+ salvage of the at-risk node (recovering ~half the build capital
107
+ in liquid form) which, combined with the on-hand reserve, funds
108
+ a new node at a safer site BEFORE the original is lost for zero
109
+ recovery. Letting the asset be destroyed loses 100% of its
110
+ capital; salvage-and-redeploy preserves 50% + funds the new
111
+ site.
112
+ benchmark_anchor:
113
+ - 'capital reallocation'
114
+ - 'SC2 sell mechanic'
115
+ - 'financial reallocation'
116
+ author: openra-bench
117
+
118
+ base_map: rush-hour-arena
119
+
120
+ base:
121
+ agent:
122
+ faction: allies
123
+ # `hunt` bot: each enemy unit picks its OWN nearest agent target
124
+ # (per-unit nearest, not rusher's all-attack-centroid-nearest).
125
+ # With the proc on the centre lane (front piece) and the fact +
126
+ # garrison far in the NW corner, every hunt unit picks the proc
127
+ # first; only after the proc is razed do they walk west toward
128
+ # the fact. The agent's sell+build cycle must complete in the
129
+ # proc-razing window (~25-30 turns at this composition); the
130
+ # fact then survives long enough for the win clause to fire.
131
+ enemy:
132
+ faction: soviet
133
+ bot_type: hunt
134
+ # Minimal toolset: observe, sell (load-bearing), build +
135
+ # place_building (rebuild primitive). No deploy / harvest /
136
+ # repair — those would side-step the decision by enabling other
137
+ # paths (deploying a spare MCV, training income from a harv
138
+ # loop, repairing the exposed proc indefinitely).
139
+ tools:
140
+ - observe
141
+ - sell
142
+ - build
143
+ - place_building
144
+ planning: true
145
+ # Re-decide the instant the hunt band is spotted, so the agent
146
+ # gets debriefed within seconds of the threat materialising and
147
+ # has a clean re-plan window.
148
+ interrupts:
149
+ enemy_unit_spotted: true
150
+ own_unit_destroyed: true
151
+ termination:
152
+ max_ticks: 12000
153
+
154
+ starting_cash: 800
155
+
156
+ levels:
157
+ # ── EASY ──────────────────────────────────────────────────────────
158
+ # 800 cash + 700 refund from selling the exposed proc = 1500, just
159
+ # over the 1400 build cost of a new proc. Light hunt band (2× e1)
160
+ # so the proc has ~25-30 turns before it falls — generous window
161
+ # to sell + queue + place. max_turns 60 → reachable tick ≈ 4698 in
162
+ # interrupt mode (event-shortened steps). within_ticks 4500
163
+ # paired with after_ticks 4501 in fail ⇒ a non-finisher is a real
164
+ # reachable timeout LOSS (not a draw).
165
+ easy:
166
+ description: >
167
+ You own a Construction Yard (fact) and a Power Plant (powr)
168
+ at the safe far north-west corner and a forward Refinery
169
+ (proc) at the centre lane. A small hunt band (2 rifle
170
+ infantry) is incoming from the east on the centre lane and
171
+ will raze the refinery within ~25-30 turns. Your starting
172
+ cash is 800 — not enough to build a new refinery (cost
173
+ 1400). SELL the exposed refinery (refunds 700) and use the
174
+ recouped cash + starting cash to BUILD a new refinery at
175
+ the safe target region around (16, 8) — north of the rush
176
+ lane. Win by having a refinery at the safe region AND the
177
+ Construction Yard still alive AND before tick 4500. Stall,
178
+ build-without-selling (cash gated), or placing the new
179
+ refinery anywhere outside the safe region all lose.
180
+ starting_cash: 800
181
+ overrides:
182
+ actors:
183
+ # Agent fact + powr at the SAFE far north-west corner (off
184
+ # the y=20 lane; powr provides the `anypower` prerequisite
185
+ # that PROC needs).
186
+ - {type: fact, owner: agent, position: [4, 4]}
187
+ - {type: powr, owner: agent, position: [4, 9]}
188
+ # Exposed proc at the centre lane — WILL be razed by the
189
+ # hunt band ~tick 2400-2800 unless sold.
190
+ - {type: proc, owner: agent, position: [60, 20]}
191
+ # Light hunt band — 2× e1 — at x=110 on the centre lane.
192
+ # Each hunt unit attacks its own nearest foe; the proc on
193
+ # the centre lane is the front piece, so both rifles
194
+ # converge on it first. The fact in the NW corner is OFF
195
+ # the engagement axis (~108 cells away vs the proc's
196
+ # ~50 cells), so the proc absorbs the initial salvo and
197
+ # the fact survives well past the win check.
198
+ - {type: e1, owner: enemy, position: [110, 20], stance: 3, count: 2}
199
+ # Anti-DRAW marker: unarmed enemy fact far east keeps the
200
+ # episode alive past hunt-band death so the win/fail
201
+ # evaluation actually fires (CLAUDE.md: engine auto-
202
+ # `done`s on enemy-elim once the last MustBeDestroyed
203
+ # enemy building falls).
204
+ - {type: fact, owner: enemy, position: [125, 20]}
205
+ win_condition:
206
+ all_of:
207
+ # New proc at the safe target region (north shoulder
208
+ # around (16, 8), radius 6 — generous enough to admit
209
+ # nearby legal cells but tight enough that placing on
210
+ # the y=20 lane fails the clause).
211
+ - building_in_region: {type: proc, x: 16, y: 8, radius: 6, count: 1}
212
+ # Construction Yard still alive (present-tense fact
213
+ # predicate; `has_building` is one-shot and would stay
214
+ # true after the fact's destruction — CLAUDE.md footgun).
215
+ - building_count_gte: {type: fact, n: 1}
216
+ - within_ticks: 4500
217
+ # Fail bites at tick 4501 — reachable inside 60 turns in
218
+ # interrupt mode (empirically the stall path reaches ~tick 4698
219
+ # at turn 60 due to mid-episode event-shortened steps).
220
+ fail_condition:
221
+ any_of:
222
+ - after_ticks: 4501
223
+ - not: {building_count_gte: {type: fact, n: 1}}
224
+ max_turns: 60
225
+
226
+ # ── MEDIUM ────────────────────────────────────────────────────────
227
+ # +1 controlled variable: TIGHTER cash + LARGER hunt band. Starting
228
+ # cash 700 = exactly the sell refund — the agent MUST sell to fund
229
+ # the new proc (cash 700 is half the proc cost). The heavier band
230
+ # (3× e1) raises the urgency: the proc falls in ~20 turns instead
231
+ # of ~25-30. Same tick budget so the win window is tighter against
232
+ # the same after_ticks 5401 fail.
233
+ medium:
234
+ description: >
235
+ You own a Construction Yard and a Power Plant at the safe
236
+ far north-west corner and a forward Refinery at the centre
237
+ lane. A heavier hunt band (3 rifle infantry) is incoming
238
+ and will raze the refinery faster (~20 turns). Your
239
+ starting cash is 700 — exactly the sell refund of a
240
+ refinery, half the build cost. You MUST sell the exposed
241
+ refinery to free the second half of the cash, then build a
242
+ new refinery at the safe target region around (16, 8). Win
243
+ by having a refinery at the safe region AND the Construction
244
+ Yard still alive AND before tick 4500. Stalling, building
245
+ without selling (cash blocks the build), or placing the new
246
+ refinery in the central lane all lose.
247
+ starting_cash: 700
248
+ overrides:
249
+ actors:
250
+ - {type: fact, owner: agent, position: [4, 4]}
251
+ - {type: powr, owner: agent, position: [4, 9]}
252
+ - {type: proc, owner: agent, position: [60, 20]}
253
+ # Heavier band: 3× e1 (still no e3 — pure-rifle keeps the
254
+ # band's eastern walk-time consistent with easy; the
255
+ # extra rifle just shortens the proc-razing window).
256
+ - {type: e1, owner: enemy, position: [110, 20], stance: 3, count: 3}
257
+ - {type: fact, owner: enemy, position: [125, 20]}
258
+ win_condition:
259
+ all_of:
260
+ - building_in_region: {type: proc, x: 16, y: 8, radius: 6, count: 1}
261
+ - building_count_gte: {type: fact, n: 1}
262
+ - within_ticks: 4500
263
+ fail_condition:
264
+ any_of:
265
+ - after_ticks: 4501
266
+ - not: {building_count_gte: {type: fact, n: 1}}
267
+ max_turns: 60
268
+
269
+ # ── HARD ──────────────────────────────────────────────────────────
270
+ # +1 controlled variable: TWO spawn_point groups (NORTH y=4 vs
271
+ # SOUTH y=36 base corner) round-robined by seed AND a SECOND hunt
272
+ # band so each spawn faces an equivalent threat. The safe target
273
+ # region for each spawn group shifts to the SAME y-band as that
274
+ # spawn's fact (so the relocate is always "stay on YOUR latitude").
275
+ # Tight cash 700 still requires the sell. A memorised "build at
276
+ # (16, 8)" cell loses on the SOUTH spawn (the safe region there
277
+ # is (16, 36) not (16, 8)).
278
+ hard:
279
+ description: >
280
+ Your base stages from a seed-chosen latitude (NORTH y=4 or
281
+ SOUTH y=36) — a single memorised target cell cannot
282
+ generalise. You own a Construction Yard and a Power Plant
283
+ at the safe corner of your latitude and a forward Refinery
284
+ at the centre lane. Hunt bands are incoming on the centre
285
+ lane and will raze the refinery within ~20 turns. Your
286
+ starting cash is 700 — exactly half the refinery build
287
+ cost. You MUST sell the exposed refinery and use the
288
+ recouped cash to build a new refinery at the safe target
289
+ region of your OWN latitude (around (16, 8) for the NORTH
290
+ spawn, (16, 36) for the SOUTH spawn). Win by having a
291
+ refinery at the safe region of your latitude AND the
292
+ Construction Yard still alive AND before tick 4500.
293
+ starting_cash: 700
294
+ overrides:
295
+ actors:
296
+ # NORTH spawn (spawn_point 0): fact + powr at the safe
297
+ # NW corner, proc forward on the central lane.
298
+ - {type: fact, owner: agent, position: [4, 4], spawn_point: 0}
299
+ - {type: powr, owner: agent, position: [4, 9], spawn_point: 0}
300
+ - {type: proc, owner: agent, position: [60, 20], spawn_point: 0}
301
+ # Inert HoldFire rifle at the matching safe shoulder so the
302
+ # spawn variation surfaces in `units_summary` (the hard-tier
303
+ # contract test asserts that different seeds produce
304
+ # different agent starts; otherwise this pack is buildings-
305
+ # only and the spawn round-robin is invisible to the
306
+ # observation channel).
307
+ - {type: e1, owner: agent, position: [16, 8], stance: 0, spawn_point: 0}
308
+ # SOUTH spawn (spawn_point 1): fact + powr at the safe
309
+ # SW corner, proc forward on the SAME central lane (the
310
+ # threat axis is symmetric across y=20).
311
+ - {type: fact, owner: agent, position: [4, 36], spawn_point: 1}
312
+ - {type: powr, owner: agent, position: [4, 31], spawn_point: 1}
313
+ - {type: proc, owner: agent, position: [60, 20], spawn_point: 1}
314
+ - {type: e1, owner: agent, position: [16, 36], stance: 0, spawn_point: 1}
315
+ # Hunt band on the centre lane — enemy actors don't honour
316
+ # spawn_point (CLAUDE.md oramap.rs footgun) so they always
317
+ # place regardless of seed. Each hunt unit picks its own
318
+ # nearest foe; the proc on (60,20) is the front piece for
319
+ # BOTH spawns (the fact in either NW or SW corner is far
320
+ # off-axis), so the proc absorbs the salvo first and the
321
+ # active-spawn fact survives well past the win check.
322
+ - {type: e1, owner: enemy, position: [110, 20], stance: 3, count: 3}
323
+ # Anti-DRAW marker at the centre lane far east.
324
+ - {type: fact, owner: enemy, position: [125, 20]}
325
+ # Spawn-matched win: the proc must land in the SAFE region of the
326
+ # active spawn's latitude. The `any_of` pairs each NORTH/SOUTH
327
+ # safe region with a `building_in_region` check on the active
328
+ # fact's corner — so the NORTH-disc clause only fires when the
329
+ # NORTH fact is alive (NORTH spawn) and the SOUTH-disc clause
330
+ # only fires when the SOUTH fact is alive (SOUTH spawn). A
331
+ # memorised "always place at (16, 8)" cell satisfies the NORTH
332
+ # clause on NORTH seeds but FAILS on SOUTH seeds (no NORTH fact
333
+ # ⇒ NORTH-pair fails; no SOUTH proc ⇒ SOUTH-pair fails).
334
+ win_condition:
335
+ all_of:
336
+ - any_of:
337
+ # NORTH spawn (fact at (4, 4)) ⇒ matching safe region
338
+ # for the new proc is the NW shoulder around (16, 8).
339
+ - all_of:
340
+ - building_in_region: {type: fact, x: 4, y: 4, radius: 4, count: 1}
341
+ - building_in_region: {type: proc, x: 16, y: 8, radius: 6, count: 1}
342
+ # SOUTH spawn (fact at (4, 36)) ⇒ matching safe region
343
+ # for the new proc is the SW shoulder around (16, 36).
344
+ - all_of:
345
+ - building_in_region: {type: fact, x: 4, y: 36, radius: 4, count: 1}
346
+ - building_in_region: {type: proc, x: 16, y: 36, radius: 6, count: 1}
347
+ - building_count_gte: {type: fact, n: 1}
348
+ - within_ticks: 4500
349
+ fail_condition:
350
+ any_of:
351
+ - after_ticks: 4501
352
+ - not: {building_count_gte: {type: fact, n: 1}}
353
+ max_turns: 60
tests/test_build_sell_and_rebuild_elsewhere.py ADDED
@@ -0,0 +1,548 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """build-sell-and-rebuild-elsewhere pack — no-cheat validation on Rust.
2
+
3
+ Wave-8 capital reallocation pack. The pack tests SELL-AND-REBUILD as
4
+ a reasoning primitive: the agent's exposed refinery (proc) on the
5
+ centre lane will be razed by a `hunt` band, and starting cash alone
6
+ is NOT enough to build a new proc. The only path to a fresh proc at
7
+ the safe target region inside the tick budget is:
8
+
9
+ 1. `sell(proc_id)` ⇒ refunds 50% of proc cost (700)
10
+ 2. `build('proc')` + `place_building(proc, x, y)` in the safe region
11
+
12
+ The win predicate makes the SELL load-bearing:
13
+
14
+ * `building_in_region:{type:proc, x:safe_x, y:safe_y, radius:6, count:1}`
15
+ ⇒ a fresh proc must STAND at the safe target region (not the centre
16
+ lane; not anywhere outside the radius);
17
+ * `building_count_gte:{type:fact, n:1}` ⇒ the Construction Yard must
18
+ still be alive (the PRESENT-TENSE predicate, not `has_building:fact`
19
+ which is a one-shot ever-seen set — CLAUDE.md footgun);
20
+ * `within_ticks: 4500` paired with `after_ticks: 4501` in fail ⇒ the
21
+ episode end is a real reachable timeout LOSS, never a draw.
22
+
23
+ The scripted-policy validations prove deterministically that:
24
+
25
+ * the intended SELL-THEN-REBUILD policy WINS every (level, seed);
26
+ * stall (observe only), build-without-selling (cash gated), and
27
+ sell-then-misplace (new proc on the y=20 lane) all LOSE every
28
+ (level, seed) — real LOSS, not draw;
29
+ * the hard tier defines ≥2 spawn_point groups (NORTH y=4 / SOUTH
30
+ y=36) so a memorised "place at (16, 8)" cell cell that worked on
31
+ easy/medium FAILS on the SOUTH spawn (the matching safe region
32
+ there is (16, 36)).
33
+
34
+ NOTE on building ids: `sell` requires the real engine actor id
35
+ (e.g. `1003`), which the bench's `render_state["own_buildings"]`
36
+ strips. The scripted policies below reach into `_raw["own_buildings"]`
37
+ (via a small custom episode loop) to look up the proc id by cell.
38
+ The model-evaluation path is a separate concern: the model issues
39
+ sell-like reasoning and the win predicate is what actually grades
40
+ the outcome (real proc presence at the safe region).
41
+ """
42
+
43
+ from __future__ import annotations
44
+
45
+ from pathlib import Path
46
+
47
+ import pytest
48
+
49
+ pytest.importorskip("openra_train", reason="Rust env wheel not installed")
50
+ pytest.importorskip("openra_rl_training", reason="Rust env wheel not installed")
51
+
52
+ from openra_bench.eval_core import (
53
+ RustEnvPool,
54
+ _scenario_to_tmp_yaml,
55
+ run_level,
56
+ )
57
+ from openra_bench.rust_adapter import RustObsAdapter
58
+ from openra_bench.scenarios import load_pack
59
+ from openra_bench.scenarios.loader import PACKS_DIR, compile_level
60
+ from openra_bench.scenarios.win_conditions import WinContext, evaluate
61
+
62
+ PACK = PACKS_DIR / "build-sell-and-rebuild-elsewhere.yaml"
63
+ LEVELS = ("easy", "medium", "hard")
64
+ SEEDS = (1, 2, 3, 4)
65
+
66
+
67
+ # ── custom episode loop with raw building-id access ──────────────────
68
+
69
+
70
+ def _run_with_id_aware_policy(compiled, policy, seed):
71
+ """Run an episode where the policy is given (rs, raw, Command) and
72
+ can read `raw["own_buildings"][i]["id"]` (the real engine actor
73
+ id). Mirrors `run_level`'s win/fail/draw scoring without changing
74
+ the standard policy contract.
75
+ """
76
+ tmp = _scenario_to_tmp_yaml(compiled)
77
+ pool = RustEnvPool(size=1, scenario_path=tmp)
78
+ env = pool.acquire()
79
+ try:
80
+ adapter = RustObsAdapter()
81
+ adapter.observe(env.reset(seed=seed))
82
+ outcome = "draw"
83
+ turns = 0
84
+ for turns in range(1, compiled.max_turns + 1):
85
+ rs = adapter.render_state()
86
+ raw = adapter._raw # for building ids
87
+ cmds = policy(rs, raw, env.Command) or [env.Command.observe()]
88
+ obs, _r, done, _info = env.step(cmds)
89
+ adapter.observe(obs, done=done)
90
+ ctx = WinContext(
91
+ signals=adapter.signals,
92
+ render_state=adapter.render_state(),
93
+ )
94
+ if evaluate(compiled.win_condition, ctx):
95
+ outcome = "win"
96
+ break
97
+ if evaluate(compiled.fail_condition, ctx):
98
+ outcome = "loss"
99
+ break
100
+ if done:
101
+ break
102
+ return outcome, turns, adapter.signals
103
+ finally:
104
+ pool.release(env)
105
+ pool.shutdown()
106
+ Path(tmp).unlink(missing_ok=True)
107
+
108
+
109
+ def _proc_id_at(raw, y):
110
+ """Lookup the engine actor id of an own proc at the given y. None
111
+ if no matching proc is alive."""
112
+ for b in (raw.get("own_buildings") or []):
113
+ if b.get("type") == "proc" and int(b.get("cell_y", -1)) == y:
114
+ return str(b["id"])
115
+ return None
116
+
117
+
118
+ def _own_proc_at(raw, y_band):
119
+ """Any own proc inside ``y_band`` (a (lo, hi) inclusive interval)."""
120
+ for b in (raw.get("own_buildings") or []):
121
+ if (
122
+ b.get("type") == "proc"
123
+ and y_band[0] <= int(b.get("cell_y", -1)) <= y_band[1]
124
+ ):
125
+ return b
126
+ return None
127
+
128
+
129
+ def _fact_y(raw):
130
+ """Latitude of the agent's fact (4 on NORTH spawn / 4 on hard NORTH /
131
+ 36 on hard SOUTH). Used by the hard-tier intended policy to pick
132
+ the matching safe target region."""
133
+ for b in (raw.get("own_buildings") or []):
134
+ if b.get("type") == "fact":
135
+ return int(b.get("cell_y", 4))
136
+ return 4
137
+
138
+
139
+ # ── scripted policies ───────────────────────────────────────────────
140
+
141
+
142
+ def stall(rs, C):
143
+ """Observe-only — proc razed, no new proc placed. LOSS."""
144
+ return [C.observe()]
145
+
146
+
147
+ def make_build_without_selling(safe_x=16, safe_y=8):
148
+ """Try to BUILD + PLACE a new proc WITHOUT selling the exposed one.
149
+ Cash starts at 700/800 — well under the 1400 build cost — so the
150
+ `build('proc')` queue starts but never completes (no income
151
+ source). No new proc ⇒ region clause unmet ⇒ LOSS.
152
+
153
+ NOTE: queue insufficient cash is silently ignored by the engine
154
+ (production gates on cash > cost); the build never progresses.
155
+ """
156
+
157
+ def policy(rs, raw, C):
158
+ cmds = []
159
+ # Find any safe-region proc to terminate early once present.
160
+ if any(
161
+ b.get("type") == "proc" and int(b.get("cell_y", -1)) != 20
162
+ for b in (raw.get("own_buildings") or [])
163
+ ):
164
+ return [C.observe()]
165
+ prod_items = [
166
+ (p.get("item") if isinstance(p, dict) else p)
167
+ for p in (rs.get("production") or [])
168
+ ]
169
+ if "proc" not in prod_items:
170
+ cmds.append(C.build("proc"))
171
+ cmds.append(C.place_building("proc", safe_x, safe_y))
172
+ return cmds or [C.observe()]
173
+
174
+ return policy
175
+
176
+
177
+ def make_sell_then_misplace(safe_x_wrong=60, safe_y_wrong=20):
178
+ """SELL the exposed proc (refund + cash buys a new proc) but
179
+ PLACE the new proc back IN THE CENTRE LANE — outside the safe
180
+ target region disc. The new proc satisfies `building_count_gte`
181
+ but NOT `building_in_region` — LOSS.
182
+ """
183
+ state = {"sold": False}
184
+
185
+ def policy(rs, raw, C):
186
+ cmds = []
187
+ if not state["sold"]:
188
+ pid = _proc_id_at(raw, 20)
189
+ if pid:
190
+ cmds.append(C.sell([pid]))
191
+ state["sold"] = True
192
+ # No existing safe-region proc — but we deliberately place
193
+ # back on the y=20 lane to demonstrate the misplace cost.
194
+ prod_items = [
195
+ (p.get("item") if isinstance(p, dict) else p)
196
+ for p in (rs.get("production") or [])
197
+ ]
198
+ if "proc" not in prod_items:
199
+ cmds.append(C.build("proc"))
200
+ cmds.append(C.place_building("proc", safe_x_wrong, safe_y_wrong))
201
+ return cmds or [C.observe()]
202
+
203
+ return policy
204
+
205
+
206
+ def make_intended_easy_medium(safe_x=16, safe_y=8):
207
+ """Intended SELL-THEN-REBUILD play for easy/medium (fact at
208
+ (4, 4) so safe region is (16, 8)).
209
+
210
+ Turn 1: sell the exposed proc (refunds 700, total cash → 1500).
211
+ Continuously: queue `build('proc')` + `place_building` at the
212
+ safe target region. The build completes ~1400 ticks after queue;
213
+ place_building lands at (16, 8). Win clause fires.
214
+ """
215
+ state = {"sold": False}
216
+
217
+ def policy(rs, raw, C):
218
+ cmds = []
219
+ if not state["sold"]:
220
+ pid = _proc_id_at(raw, 20)
221
+ if pid:
222
+ cmds.append(C.sell([pid]))
223
+ state["sold"] = True
224
+ # Skip if the safe-region proc already exists.
225
+ if any(
226
+ b.get("type") == "proc" and int(b.get("cell_y", -1)) != 20
227
+ for b in (raw.get("own_buildings") or [])
228
+ ):
229
+ return cmds or [C.observe()]
230
+ prod_items = [
231
+ (p.get("item") if isinstance(p, dict) else p)
232
+ for p in (rs.get("production") or [])
233
+ ]
234
+ if "proc" not in prod_items:
235
+ cmds.append(C.build("proc"))
236
+ cmds.append(C.place_building("proc", safe_x, safe_y))
237
+ return cmds or [C.observe()]
238
+
239
+ return policy
240
+
241
+
242
+ def make_intended_hard_adaptive():
243
+ """Intended SELL-THEN-REBUILD play for hard (fact at either y=4 or
244
+ y=36 by seed). Reads the fact's actual y from the observation on
245
+ turn 1, then places the new proc at the matching safe region —
246
+ (16, 8) for NORTH spawn, (16, 36) for SOUTH spawn.
247
+ """
248
+ state = {"sold": False, "safe_xy": None}
249
+
250
+ def policy(rs, raw, C):
251
+ if state["safe_xy"] is None:
252
+ fy = _fact_y(raw)
253
+ state["safe_xy"] = (16, 8 if fy < 20 else 36)
254
+ sx, sy = state["safe_xy"]
255
+ cmds = []
256
+ if not state["sold"]:
257
+ pid = _proc_id_at(raw, 20)
258
+ if pid:
259
+ cmds.append(C.sell([pid]))
260
+ state["sold"] = True
261
+ # Skip if the safe-region proc already exists.
262
+ if any(
263
+ b.get("type") == "proc" and int(b.get("cell_y", -1)) != 20
264
+ for b in (raw.get("own_buildings") or [])
265
+ ):
266
+ return cmds or [C.observe()]
267
+ prod_items = [
268
+ (p.get("item") if isinstance(p, dict) else p)
269
+ for p in (rs.get("production") or [])
270
+ ]
271
+ if "proc" not in prod_items:
272
+ cmds.append(C.build("proc"))
273
+ cmds.append(C.place_building("proc", sx, sy))
274
+ return cmds or [C.observe()]
275
+
276
+ return policy
277
+
278
+
279
+ def make_memorised_north_only():
280
+ """Naive: always place at (16, 8) (the easy/medium safe region).
281
+ On hard SOUTH spawn (fact at y=36), the safe region is (16, 36),
282
+ so a place at (16, 8) lands outside the matching radius-6 disc
283
+ AND outside the SOUTH disc — LOSS on SOUTH seeds.
284
+ """
285
+ state = {"sold": False}
286
+
287
+ def policy(rs, raw, C):
288
+ cmds = []
289
+ if not state["sold"]:
290
+ pid = _proc_id_at(raw, 20)
291
+ if pid:
292
+ cmds.append(C.sell([pid]))
293
+ state["sold"] = True
294
+ if any(
295
+ b.get("type") == "proc" and int(b.get("cell_y", -1)) != 20
296
+ for b in (raw.get("own_buildings") or [])
297
+ ):
298
+ return cmds or [C.observe()]
299
+ prod_items = [
300
+ (p.get("item") if isinstance(p, dict) else p)
301
+ for p in (rs.get("production") or [])
302
+ ]
303
+ if "proc" not in prod_items:
304
+ cmds.append(C.build("proc"))
305
+ cmds.append(C.place_building("proc", 16, 8))
306
+ return cmds or [C.observe()]
307
+
308
+ return policy
309
+
310
+
311
+ # ── scenario-shape invariants ───────────────────────────────────────
312
+
313
+
314
+ def test_pack_compiles_with_three_levels_and_hunt_bot():
315
+ pack = load_pack(PACK)
316
+ assert pack.meta.id == "build-sell-and-rebuild-elsewhere"
317
+ assert pack.meta.capability == "reasoning"
318
+ assert set(pack.levels) == {"easy", "medium", "hard"}
319
+ # Required-by-spec benchmark anchors (capital reallocation idiom).
320
+ anchors = [a.lower() for a in pack.meta.benchmark_anchor]
321
+ assert any("capital reallocation" in a for a in anchors), pack.meta.benchmark_anchor
322
+ assert any("sc2 sell mechanic" in a for a in anchors), pack.meta.benchmark_anchor
323
+ assert any(
324
+ "financial reallocation" in a for a in anchors
325
+ ), pack.meta.benchmark_anchor
326
+ # Hunt bot is wired through to the engine for every level (per-unit
327
+ # nearest-foe targeting, so the proc on the centre lane is the
328
+ # front piece, not the off-axis fact — see pack header).
329
+ for lvl in LEVELS:
330
+ c = compile_level(pack, lvl)
331
+ assert c.map_supported
332
+ bot = getattr(c.scenario.enemy, "bot_type", None) or getattr(
333
+ c.scenario.enemy, "bot", None
334
+ )
335
+ assert str(bot).lower() == "hunt", (lvl, bot)
336
+
337
+
338
+ def test_sell_is_exposed_in_the_tool_palette():
339
+ """`sell` is the load-bearing primitive — the pack would be
340
+ unsolvable without it (build('proc') is cash-gated, the agent has
341
+ no income source, the exposed proc would just be razed)."""
342
+ pack = load_pack(PACK)
343
+ for lvl in LEVELS:
344
+ c = compile_level(pack, lvl)
345
+ tools = set(getattr(c.scenario, "tools", None) or [])
346
+ assert "sell" in tools, (lvl, tools)
347
+ assert "build" in tools, (lvl, tools)
348
+ assert "place_building" in tools, (lvl, tools)
349
+
350
+
351
+ def test_starting_cash_is_below_proc_build_cost_on_every_tier():
352
+ """Cash + sell-refund must just barely cover the proc rebuild
353
+ (cash 700-800; refund 700; proc cost 1400). Without the refund
354
+ the cash alone falls short — that gap is the load-bearing
355
+ discrimination."""
356
+ pack = load_pack(PACK)
357
+ for lvl in LEVELS:
358
+ c = compile_level(pack, lvl)
359
+ # cash < 1400 (proc cost) ⇒ build-without-selling is impossible.
360
+ assert c.starting_cash < 1400, (lvl, c.starting_cash)
361
+ # cash + 700 refund ≥ 1400 ⇒ sell-then-rebuild is feasible.
362
+ assert c.starting_cash + 700 >= 1400, (lvl, c.starting_cash)
363
+
364
+
365
+ @pytest.mark.parametrize("level", LEVELS)
366
+ def test_every_level_has_a_reachable_timeout_fail(level):
367
+ """Non-win must be a real LOSS: the `after_ticks` fail clause must
368
+ be strictly below the tick reachable at max_turns (interrupt mode
369
+ advances ≤90 ticks per step; some steps are shorter due to
370
+ enemy_unit_spotted events, so the empirical reachable tick is
371
+ ~4698 at 60 turns)."""
372
+ c = compile_level(load_pack(PACK), level)
373
+ assert c.fail_condition is not None
374
+ fc = c.fail_condition.model_dump(exclude_none=True)
375
+ deadline = None
376
+ for clause in fc.get("any_of", []) or []:
377
+ if "after_ticks" in clause:
378
+ deadline = int(clause["after_ticks"])
379
+ assert deadline is not None, f"{level}: no after_ticks fail clause"
380
+ # 60 turns × ~78 ticks/turn (event-shortened) ≈ 4680; 4501
381
+ # deadline reliably bites.
382
+ assert deadline < 4700, (
383
+ f"{level}: deadline {deadline} unreachable within {c.max_turns} "
384
+ f"turns (interrupt mode ≈ 4680 max tick) → draw degeneracy"
385
+ )
386
+
387
+
388
+ def test_fact_alive_clause_uses_present_tense_predicate():
389
+ """The fact-survival clause must use the PRESENT-TENSE predicate
390
+ (`building_count_gte:{type:fact,n:1}`) rather than `has_building`,
391
+ which is a one-shot "ever seen" set that stays true after the
392
+ fact is destroyed (a documented CLAUDE.md footgun)."""
393
+ for lvl in LEVELS:
394
+ c = compile_level(load_pack(PACK), lvl)
395
+ fc = c.fail_condition.model_dump(exclude_none=True)
396
+ fact_clauses = [
397
+ clause for clause in fc.get("any_of", []) or []
398
+ if isinstance(clause, dict)
399
+ and isinstance(clause.get("not"), dict)
400
+ and "building_count_gte" in (clause["not"] or {})
401
+ and (clause["not"]["building_count_gte"] or {}).get("type") == "fact"
402
+ ]
403
+ assert fact_clauses, f"{lvl}: missing present-tense fact-alive fail clause"
404
+
405
+
406
+ def test_hard_has_two_spawn_point_groups_and_fact_flips():
407
+ """Hard-tier contract: ≥2 distinct agent spawn_point groups so the
408
+ fact (and therefore the safe target region for proc placement)
409
+ flips by seed. The two groups must define the NORTH (y=4) and
410
+ SOUTH (y=36) fact pair."""
411
+ c = compile_level(load_pack(PACK), "hard")
412
+ groups = {
413
+ a.spawn_point for a in c.scenario.actors
414
+ if a.owner == "agent" and a.spawn_point is not None
415
+ }
416
+ assert groups == {0, 1}, groups
417
+ fact_ys = sorted({
418
+ a.position[1] for a in c.scenario.actors
419
+ if a.owner == "agent" and a.type == "fact"
420
+ })
421
+ assert fact_ys == [4, 36], fact_ys
422
+ # In-bounds check (rush-hour-arena playable x ≈ 2..126, y ≈ 2..38).
423
+ for a in c.scenario.actors:
424
+ x, y = a.position
425
+ assert 2 <= x <= 126 and 2 <= y <= 38, (a.type, a.position)
426
+
427
+
428
+ # ── solvency: intended SELL-THEN-REBUILD wins every (level, seed) ────
429
+
430
+
431
+ @pytest.mark.parametrize("level", ("easy", "medium"))
432
+ def test_intended_sell_then_rebuild_wins_easy_medium(level):
433
+ c = compile_level(load_pack(PACK), level)
434
+ for seed in SEEDS:
435
+ outcome, turns, sig = _run_with_id_aware_policy(
436
+ c, make_intended_easy_medium(), seed
437
+ )
438
+ assert outcome == "win", (
439
+ f"{level} seed{seed}: intended SELL-THEN-REBUILD must WIN; "
440
+ f"got {outcome} (tick={sig.game_tick}, "
441
+ f"buildings={sig.own_buildings})"
442
+ )
443
+
444
+
445
+ def test_intended_hard_adaptive_wins_every_seed():
446
+ """Hard tier: the intended policy must read the fact's latitude
447
+ (NORTH y=4 vs SOUTH y=36) and pick the matching safe target
448
+ region. WINS on every seed."""
449
+ c = compile_level(load_pack(PACK), "hard")
450
+ for seed in SEEDS:
451
+ outcome, turns, sig = _run_with_id_aware_policy(
452
+ c, make_intended_hard_adaptive(), seed
453
+ )
454
+ assert outcome == "win", (
455
+ f"hard seed{seed}: intended adaptive sell-then-rebuild must "
456
+ f"WIN; got {outcome} (tick={sig.game_tick}, "
457
+ f"buildings={sig.own_buildings})"
458
+ )
459
+
460
+
461
+ # ── no-cheat: every lazy / wrong policy LOSES (not draws) ────────────
462
+
463
+
464
+ @pytest.mark.parametrize("level", LEVELS)
465
+ def test_stall_loses_every_level_and_seed(level):
466
+ """STALL: observe only. The hunt band razes the exposed proc and
467
+ the agent never places a new one ⇒ region clause unmet AND clock
468
+ runs out ⇒ real LOSS, not draw."""
469
+ c = compile_level(load_pack(PACK), level)
470
+ for seed in SEEDS:
471
+ r = run_level(c, stall, seed=seed)
472
+ assert r.outcome == "loss", (
473
+ f"{level} seed{seed} stall: must LOSE (real fail, not draw); "
474
+ f"got {r.outcome} (tick={r.signals.game_tick}, "
475
+ f"buildings={r.signals.own_buildings})"
476
+ )
477
+
478
+
479
+ @pytest.mark.parametrize("level", ("easy", "medium"))
480
+ def test_build_without_selling_loses_easy_medium(level):
481
+ """BUILD WITHOUT SELLING: `build('proc')` is rejected until cash
482
+ ≥ 1400; the agent has no income source, the build never starts,
483
+ no proc lands at the safe region ⇒ LOSS."""
484
+ c = compile_level(load_pack(PACK), level)
485
+ for seed in SEEDS:
486
+ outcome, turns, sig = _run_with_id_aware_policy(
487
+ c, make_build_without_selling(), seed
488
+ )
489
+ assert outcome == "loss", (
490
+ f"{level} seed{seed} build-without-selling: must LOSE; "
491
+ f"got {outcome} (tick={sig.game_tick}, "
492
+ f"buildings={sig.own_buildings})"
493
+ )
494
+
495
+
496
+ @pytest.mark.parametrize("level", ("easy", "medium"))
497
+ def test_sell_then_misplace_loses_easy_medium(level):
498
+ """SELL-THEN-MISPLACE: sells the exposed proc and uses the
499
+ refund to build a NEW proc, but places it back in the central
500
+ lane (y=20) — outside the safe target region disc. Region clause
501
+ unmet ⇒ LOSS."""
502
+ c = compile_level(load_pack(PACK), level)
503
+ for seed in SEEDS:
504
+ outcome, turns, sig = _run_with_id_aware_policy(
505
+ c, make_sell_then_misplace(), seed
506
+ )
507
+ assert outcome == "loss", (
508
+ f"{level} seed{seed} sell-then-misplace: must LOSE; "
509
+ f"got {outcome} (tick={sig.game_tick}, "
510
+ f"buildings={sig.own_buildings})"
511
+ )
512
+
513
+
514
+ def test_memorised_north_only_loses_on_hard_south_seeds():
515
+ """The non-adaptive "always place at (16, 8)" policy WINS hard
516
+ seeds whose spawn is NORTH (fact at y=4 ⇒ matching safe region is
517
+ (16, 8)) but FAILS hard seeds whose spawn is SOUTH (fact at y=36
518
+ ⇒ matching safe region is (16, 36), and (16, 8) is outside the
519
+ SOUTH disc). The spawn-driven discrimination: at least one of
520
+ the 4 hard seeds must LOSE."""
521
+ c = compile_level(load_pack(PACK), "hard")
522
+ losses = 0
523
+ for seed in SEEDS:
524
+ outcome, turns, sig = _run_with_id_aware_policy(
525
+ c, make_memorised_north_only(), seed
526
+ )
527
+ if outcome == "loss":
528
+ losses += 1
529
+ assert losses >= 1, (
530
+ f"hard: memorised-north-only must LOSE on ≥1 of {len(SEEDS)} "
531
+ f"seeds (spawn-driven discrimination); got {losses} losses"
532
+ )
533
+
534
+
535
+ # ── determinism ──────────────────────────────────────────────────────
536
+
537
+
538
+ def test_intended_run_is_deterministic_on_easy():
539
+ c = compile_level(load_pack(PACK), "easy")
540
+ a_outcome, a_turns, a_sig = _run_with_id_aware_policy(
541
+ c, make_intended_easy_medium(), seed=3
542
+ )
543
+ b_outcome, b_turns, b_sig = _run_with_id_aware_policy(
544
+ c, make_intended_easy_medium(), seed=3
545
+ )
546
+ assert (a_outcome, a_turns, a_sig.units_killed) == (
547
+ b_outcome, b_turns, b_sig.units_killed,
548
+ ), "same seed must be deterministic"
tests/test_hard_tier.py CHANGED
@@ -926,6 +926,23 @@ UPGRADED = [
926
  # targets so the agent's infantry strike force racks up kills
927
  # without being attrited in transit.
928
  "lh-econ-army-victory",
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
929
  ]
930
 
931
  # Consciously NOT spawn-varied, with the reason (keeps the curation
 
926
  # targets so the agent's infantry strike force racks up kills
927
  # without being attrited in transit.
928
  "lh-econ-army-victory",
929
+ # Wave-8 REASONING capital reallocation pack — SC2 sell mechanic
930
+ # for refund / financial CAPEX reallocation / business continuity
931
+ # asset redeployment anchor. Starting cash alone is below the
932
+ # proc rebuild cost, so the agent MUST sell the exposed central-
933
+ # lane proc to recoup 50% capital (refund 700) and use the refund
934
+ # + on-hand cash to build a fresh proc at the safe target region.
935
+ # Hard tier defines two agent spawn_point groups (NORTH base at
936
+ # y=4 / SOUTH base at y=36) round-robined by seed; the win
937
+ # predicate pairs each safe-region clause with the matching
938
+ # spawn-fact's corner clause, so a memorised "always place at
939
+ # (16, 8)" opening wins NORTH seeds by coincidence but loses
940
+ # SOUTH seeds (NORTH-fact-corner clause unmet AND SOUTH-proc-
941
+ # region clause unmet). The central hunt band is symmetric across
942
+ # y=20 (enemy actors don't honour spawn_point — CLAUDE.md), so
943
+ # both spawns face the same sell-then-rebuild discipline from a
944
+ # flipped base latitude.
945
+ "build-sell-and-rebuild-elsewhere",
946
  ]
947
 
948
  # Consciously NOT spawn-varied, with the reason (keeps the curation