Spaces:
Running
Running
feat(scenario): build-sell-and-rebuild-elsewhere — sell exposed building + rebuild safe (capital reallocation anchor)
Browse files
openra_bench/scenarios/packs/build-sell-and-rebuild-elsewhere.yaml
ADDED
|
@@ -0,0 +1,353 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# build-sell-and-rebuild-elsewhere.yaml
|
| 2 |
+
#
|
| 3 |
+
# REASONING capability — Wave-8 Group "capital reallocation" pack
|
| 4 |
+
# (SC2 sell mechanic for refund / financial reallocation / business
|
| 5 |
+
# CAPEX redeployment anchor). The agent owns a Construction Yard
|
| 6 |
+
# (`fact`) and a Power Plant (`powr`) at the safe north-west corner
|
| 7 |
+
# and an exposed Refinery (`proc`) at the centre lane. A scripted
|
| 8 |
+
# `hunt` band is incoming on the centre lane and will raze the proc
|
| 9 |
+
# within ~25-30 turns (the proc has no defenders and no inherent
|
| 10 |
+
# weapons). Starting cash alone is NOT enough to build a NEW proc
|
| 11 |
+
# (1400 build cost > starting cash on every tier); the only path
|
| 12 |
+
# to a fresh proc inside the tick budget is:
|
| 13 |
+
#
|
| 14 |
+
# 1. `sell` the EXPOSED proc ⇒ refunds 50% of cost (700)
|
| 15 |
+
# 2. `build('proc')` + `place_building` ⇒ at the safe target region
|
| 16 |
+
#
|
| 17 |
+
# Without selling, the agent has insufficient cash to build a new
|
| 18 |
+
# proc (build cost 1400 > starting cash on every tier and there is
|
| 19 |
+
# no income source); even if the agent attempts to wait, the proc
|
| 20 |
+
# is razed by the hunt band ⇒ refund forfeit ⇒ still no cash for a
|
| 21 |
+
# new proc. Building-without-selling stalls on insufficient funds
|
| 22 |
+
# (`build('proc')` is rejected until cash ≥ 1400 — production
|
| 23 |
+
# gates on cash > cost). The intended SELL-THEN-REBUILD play wins.
|
| 24 |
+
#
|
| 25 |
+
# Real-world anchors:
|
| 26 |
+
# - SC2 sell mechanic for refund (the classic mechanic this pack
|
| 27 |
+
# is named for — partial salvage of a structure for partial
|
| 28 |
+
# refund to redeploy production elsewhere)
|
| 29 |
+
# - Financial / CAPEX reallocation (liquidate a deteriorating
|
| 30 |
+
# asset and redeploy the capital to a safer / forward site)
|
| 31 |
+
# - PlanBench resource reallocation (the planner must reason that
|
| 32 |
+
# the partial refund + new build is cheaper than any alternative
|
| 33 |
+
# under the budget — building from cash alone is impossible)
|
| 34 |
+
#
|
| 35 |
+
# DISCRIMINATIONS (no defect, no cheat):
|
| 36 |
+
# - stall (observe only): LOSS — proc razed; no new proc placed
|
| 37 |
+
# anywhere (let alone in the safe region) ⇒ building_in_region
|
| 38 |
+
# clause unmet AND clock runs out.
|
| 39 |
+
# - build-without-selling: LOSS — `build('proc')` is rejected
|
| 40 |
+
# until cash ≥ 1400; the agent has no income source within the
|
| 41 |
+
# tick budget, so the build never starts ⇒ no proc in safe
|
| 42 |
+
# region.
|
| 43 |
+
# - sell-but-place-in-wrong-region: LOSS — refunded cash + cash
|
| 44 |
+
# covers a new proc, but placed in the central lane (or
|
| 45 |
+
# anywhere outside the safe region) ⇒ building_in_region
|
| 46 |
+
# clause unmet (the new proc is in the wrong cell band).
|
| 47 |
+
# - sell-then-rebuild-at-safe-region (intended): WIN — refund
|
| 48 |
+
# funds the new proc, placed at the safe far-west target
|
| 49 |
+
# region ⇒ all win clauses satisfied within tick budget.
|
| 50 |
+
#
|
| 51 |
+
# ENGINE FACTS (CLAUDE.md):
|
| 52 |
+
# - proc cost 1400 → sell refund = 700 (50% of build cost;
|
| 53 |
+
# world.rs::order_sell / estimate_building_sell_value).
|
| 54 |
+
# - PROC has `Prerequisites: anypower` → POWR pre-placed at the
|
| 55 |
+
# safe corner so the new proc can be built.
|
| 56 |
+
# - `fact` cost 0 ⇒ NOT buildable via StartProduction (engine
|
| 57 |
+
# gates on cost > 0). Pre-place the agent fact on the SAFE
|
| 58 |
+
# corner so `has_building:fact` is satisfied turn 1; the
|
| 59 |
+
# rebuild target is a `proc`, not a fact.
|
| 60 |
+
# - `place_building` does NOT enforce build-adjacency — orders
|
| 61 |
+
# work at arbitrary in-bounds coords (CLAUDE.md), so the safe
|
| 62 |
+
# target region can be far from the surviving fact.
|
| 63 |
+
# - `building_in_region` is the present-tense region-and-type
|
| 64 |
+
# predicate (vs `has_building` which is a one-shot "ever
|
| 65 |
+
# seen" set; the proc must STAND at the safe region at the
|
| 66 |
+
# win check).
|
| 67 |
+
# - `hunt` bot: each enemy unit attacks NEAREST foe (CLAUDE.md
|
| 68 |
+
# scripted_bot — not centroid like rusher). With proc on the
|
| 69 |
+
# centre lane (front piece) and fact far in the NW corner,
|
| 70 |
+
# each hunt unit picks the proc first, then walks west for the
|
| 71 |
+
# fact only after the proc falls. This gives the agent a
|
| 72 |
+
# ~25-30 turn proc-razing window during which the sell+build
|
| 73 |
+
# must complete, and the fact survives long enough to satisfy
|
| 74 |
+
# the win clause.
|
| 75 |
+
# - within_ticks paired with after_ticks ⇒ a non-finisher is a
|
| 76 |
+
# real reachable timeout LOSS (CLAUDE.md rule 2 — deadline
|
| 77 |
+
# must be reachable below max_turns).
|
| 78 |
+
# - Inert enemy `fact` far east (anti-DRAW: keeps the episode
|
| 79 |
+
# alive past hunt-band death so the win/fail check fires;
|
| 80 |
+
# engine auto-`done`s on enemy-elimination otherwise).
|
| 81 |
+
#
|
| 82 |
+
# Validate (no model / no network):
|
| 83 |
+
# cd /Users/berta/Projects/OpenRA-Bench && \
|
| 84 |
+
# python3 -m pytest tests/test_build_sell_and_rebuild_elsewhere.py -q
|
| 85 |
+
|
| 86 |
+
meta:
|
| 87 |
+
id: build-sell-and-rebuild-elsewhere
|
| 88 |
+
title: 'Sell and Rebuild Elsewhere — Recoup Capital, Relocate Production'
|
| 89 |
+
capability: reasoning
|
| 90 |
+
real_world_meaning: >
|
| 91 |
+
A forward refinery (proc) sits in the path of an incoming enemy
|
| 92 |
+
hunt band that will raze it within ~25-30 turns; starting cash
|
| 93 |
+
alone does not cover building a new proc at the safe target
|
| 94 |
+
region. The only path to a fresh proc inside the tick budget
|
| 95 |
+
is to SELL the exposed proc (recouping 50% of its build cost)
|
| 96 |
+
and use the refund plus starting cash to BUILD a new proc and
|
| 97 |
+
PLACE it at the safe target region far from the rush. Stalling,
|
| 98 |
+
building without selling (cash gated), and placing the new
|
| 99 |
+
proc in the wrong region all lose; only sell-then-rebuild-at-
|
| 100 |
+
safe-region wins.
|
| 101 |
+
robotics_analogue: >
|
| 102 |
+
Liquidate a deteriorating asset to fund a relocation: a forward
|
| 103 |
+
production node is about to be lost to environmental damage,
|
| 104 |
+
and the capital reserve alone is insufficient to commission a
|
| 105 |
+
replacement node elsewhere. The right move is a deliberate
|
| 106 |
+
salvage of the at-risk node (recovering ~half the build capital
|
| 107 |
+
in liquid form) which, combined with the on-hand reserve, funds
|
| 108 |
+
a new node at a safer site BEFORE the original is lost for zero
|
| 109 |
+
recovery. Letting the asset be destroyed loses 100% of its
|
| 110 |
+
capital; salvage-and-redeploy preserves 50% + funds the new
|
| 111 |
+
site.
|
| 112 |
+
benchmark_anchor:
|
| 113 |
+
- 'capital reallocation'
|
| 114 |
+
- 'SC2 sell mechanic'
|
| 115 |
+
- 'financial reallocation'
|
| 116 |
+
author: openra-bench
|
| 117 |
+
|
| 118 |
+
base_map: rush-hour-arena
|
| 119 |
+
|
| 120 |
+
base:
|
| 121 |
+
agent:
|
| 122 |
+
faction: allies
|
| 123 |
+
# `hunt` bot: each enemy unit picks its OWN nearest agent target
|
| 124 |
+
# (per-unit nearest, not rusher's all-attack-centroid-nearest).
|
| 125 |
+
# With the proc on the centre lane (front piece) and the fact +
|
| 126 |
+
# garrison far in the NW corner, every hunt unit picks the proc
|
| 127 |
+
# first; only after the proc is razed do they walk west toward
|
| 128 |
+
# the fact. The agent's sell+build cycle must complete in the
|
| 129 |
+
# proc-razing window (~25-30 turns at this composition); the
|
| 130 |
+
# fact then survives long enough for the win clause to fire.
|
| 131 |
+
enemy:
|
| 132 |
+
faction: soviet
|
| 133 |
+
bot_type: hunt
|
| 134 |
+
# Minimal toolset: observe, sell (load-bearing), build +
|
| 135 |
+
# place_building (rebuild primitive). No deploy / harvest /
|
| 136 |
+
# repair — those would side-step the decision by enabling other
|
| 137 |
+
# paths (deploying a spare MCV, training income from a harv
|
| 138 |
+
# loop, repairing the exposed proc indefinitely).
|
| 139 |
+
tools:
|
| 140 |
+
- observe
|
| 141 |
+
- sell
|
| 142 |
+
- build
|
| 143 |
+
- place_building
|
| 144 |
+
planning: true
|
| 145 |
+
# Re-decide the instant the hunt band is spotted, so the agent
|
| 146 |
+
# gets debriefed within seconds of the threat materialising and
|
| 147 |
+
# has a clean re-plan window.
|
| 148 |
+
interrupts:
|
| 149 |
+
enemy_unit_spotted: true
|
| 150 |
+
own_unit_destroyed: true
|
| 151 |
+
termination:
|
| 152 |
+
max_ticks: 12000
|
| 153 |
+
|
| 154 |
+
starting_cash: 800
|
| 155 |
+
|
| 156 |
+
levels:
|
| 157 |
+
# ── EASY ──────────────────────────────────────────────────────────
|
| 158 |
+
# 800 cash + 700 refund from selling the exposed proc = 1500, just
|
| 159 |
+
# over the 1400 build cost of a new proc. Light hunt band (2× e1)
|
| 160 |
+
# so the proc has ~25-30 turns before it falls — generous window
|
| 161 |
+
# to sell + queue + place. max_turns 60 → reachable tick ≈ 4698 in
|
| 162 |
+
# interrupt mode (event-shortened steps). within_ticks 4500
|
| 163 |
+
# paired with after_ticks 4501 in fail ⇒ a non-finisher is a real
|
| 164 |
+
# reachable timeout LOSS (not a draw).
|
| 165 |
+
easy:
|
| 166 |
+
description: >
|
| 167 |
+
You own a Construction Yard (fact) and a Power Plant (powr)
|
| 168 |
+
at the safe far north-west corner and a forward Refinery
|
| 169 |
+
(proc) at the centre lane. A small hunt band (2 rifle
|
| 170 |
+
infantry) is incoming from the east on the centre lane and
|
| 171 |
+
will raze the refinery within ~25-30 turns. Your starting
|
| 172 |
+
cash is 800 — not enough to build a new refinery (cost
|
| 173 |
+
1400). SELL the exposed refinery (refunds 700) and use the
|
| 174 |
+
recouped cash + starting cash to BUILD a new refinery at
|
| 175 |
+
the safe target region around (16, 8) — north of the rush
|
| 176 |
+
lane. Win by having a refinery at the safe region AND the
|
| 177 |
+
Construction Yard still alive AND before tick 4500. Stall,
|
| 178 |
+
build-without-selling (cash gated), or placing the new
|
| 179 |
+
refinery anywhere outside the safe region all lose.
|
| 180 |
+
starting_cash: 800
|
| 181 |
+
overrides:
|
| 182 |
+
actors:
|
| 183 |
+
# Agent fact + powr at the SAFE far north-west corner (off
|
| 184 |
+
# the y=20 lane; powr provides the `anypower` prerequisite
|
| 185 |
+
# that PROC needs).
|
| 186 |
+
- {type: fact, owner: agent, position: [4, 4]}
|
| 187 |
+
- {type: powr, owner: agent, position: [4, 9]}
|
| 188 |
+
# Exposed proc at the centre lane — WILL be razed by the
|
| 189 |
+
# hunt band ~tick 2400-2800 unless sold.
|
| 190 |
+
- {type: proc, owner: agent, position: [60, 20]}
|
| 191 |
+
# Light hunt band — 2× e1 — at x=110 on the centre lane.
|
| 192 |
+
# Each hunt unit attacks its own nearest foe; the proc on
|
| 193 |
+
# the centre lane is the front piece, so both rifles
|
| 194 |
+
# converge on it first. The fact in the NW corner is OFF
|
| 195 |
+
# the engagement axis (~108 cells away vs the proc's
|
| 196 |
+
# ~50 cells), so the proc absorbs the initial salvo and
|
| 197 |
+
# the fact survives well past the win check.
|
| 198 |
+
- {type: e1, owner: enemy, position: [110, 20], stance: 3, count: 2}
|
| 199 |
+
# Anti-DRAW marker: unarmed enemy fact far east keeps the
|
| 200 |
+
# episode alive past hunt-band death so the win/fail
|
| 201 |
+
# evaluation actually fires (CLAUDE.md: engine auto-
|
| 202 |
+
# `done`s on enemy-elim once the last MustBeDestroyed
|
| 203 |
+
# enemy building falls).
|
| 204 |
+
- {type: fact, owner: enemy, position: [125, 20]}
|
| 205 |
+
win_condition:
|
| 206 |
+
all_of:
|
| 207 |
+
# New proc at the safe target region (north shoulder
|
| 208 |
+
# around (16, 8), radius 6 — generous enough to admit
|
| 209 |
+
# nearby legal cells but tight enough that placing on
|
| 210 |
+
# the y=20 lane fails the clause).
|
| 211 |
+
- building_in_region: {type: proc, x: 16, y: 8, radius: 6, count: 1}
|
| 212 |
+
# Construction Yard still alive (present-tense fact
|
| 213 |
+
# predicate; `has_building` is one-shot and would stay
|
| 214 |
+
# true after the fact's destruction — CLAUDE.md footgun).
|
| 215 |
+
- building_count_gte: {type: fact, n: 1}
|
| 216 |
+
- within_ticks: 4500
|
| 217 |
+
# Fail bites at tick 4501 — reachable inside 60 turns in
|
| 218 |
+
# interrupt mode (empirically the stall path reaches ~tick 4698
|
| 219 |
+
# at turn 60 due to mid-episode event-shortened steps).
|
| 220 |
+
fail_condition:
|
| 221 |
+
any_of:
|
| 222 |
+
- after_ticks: 4501
|
| 223 |
+
- not: {building_count_gte: {type: fact, n: 1}}
|
| 224 |
+
max_turns: 60
|
| 225 |
+
|
| 226 |
+
# ── MEDIUM ────────────────────────────────────────────────────────
|
| 227 |
+
# +1 controlled variable: TIGHTER cash + LARGER hunt band. Starting
|
| 228 |
+
# cash 700 = exactly the sell refund — the agent MUST sell to fund
|
| 229 |
+
# the new proc (cash 700 is half the proc cost). The heavier band
|
| 230 |
+
# (3× e1) raises the urgency: the proc falls in ~20 turns instead
|
| 231 |
+
# of ~25-30. Same tick budget so the win window is tighter against
|
| 232 |
+
# the same after_ticks 5401 fail.
|
| 233 |
+
medium:
|
| 234 |
+
description: >
|
| 235 |
+
You own a Construction Yard and a Power Plant at the safe
|
| 236 |
+
far north-west corner and a forward Refinery at the centre
|
| 237 |
+
lane. A heavier hunt band (3 rifle infantry) is incoming
|
| 238 |
+
and will raze the refinery faster (~20 turns). Your
|
| 239 |
+
starting cash is 700 — exactly the sell refund of a
|
| 240 |
+
refinery, half the build cost. You MUST sell the exposed
|
| 241 |
+
refinery to free the second half of the cash, then build a
|
| 242 |
+
new refinery at the safe target region around (16, 8). Win
|
| 243 |
+
by having a refinery at the safe region AND the Construction
|
| 244 |
+
Yard still alive AND before tick 4500. Stalling, building
|
| 245 |
+
without selling (cash blocks the build), or placing the new
|
| 246 |
+
refinery in the central lane all lose.
|
| 247 |
+
starting_cash: 700
|
| 248 |
+
overrides:
|
| 249 |
+
actors:
|
| 250 |
+
- {type: fact, owner: agent, position: [4, 4]}
|
| 251 |
+
- {type: powr, owner: agent, position: [4, 9]}
|
| 252 |
+
- {type: proc, owner: agent, position: [60, 20]}
|
| 253 |
+
# Heavier band: 3× e1 (still no e3 — pure-rifle keeps the
|
| 254 |
+
# band's eastern walk-time consistent with easy; the
|
| 255 |
+
# extra rifle just shortens the proc-razing window).
|
| 256 |
+
- {type: e1, owner: enemy, position: [110, 20], stance: 3, count: 3}
|
| 257 |
+
- {type: fact, owner: enemy, position: [125, 20]}
|
| 258 |
+
win_condition:
|
| 259 |
+
all_of:
|
| 260 |
+
- building_in_region: {type: proc, x: 16, y: 8, radius: 6, count: 1}
|
| 261 |
+
- building_count_gte: {type: fact, n: 1}
|
| 262 |
+
- within_ticks: 4500
|
| 263 |
+
fail_condition:
|
| 264 |
+
any_of:
|
| 265 |
+
- after_ticks: 4501
|
| 266 |
+
- not: {building_count_gte: {type: fact, n: 1}}
|
| 267 |
+
max_turns: 60
|
| 268 |
+
|
| 269 |
+
# ── HARD ──────────────────────────────────────────────────────────
|
| 270 |
+
# +1 controlled variable: TWO spawn_point groups (NORTH y=4 vs
|
| 271 |
+
# SOUTH y=36 base corner) round-robined by seed AND a SECOND hunt
|
| 272 |
+
# band so each spawn faces an equivalent threat. The safe target
|
| 273 |
+
# region for each spawn group shifts to the SAME y-band as that
|
| 274 |
+
# spawn's fact (so the relocate is always "stay on YOUR latitude").
|
| 275 |
+
# Tight cash 700 still requires the sell. A memorised "build at
|
| 276 |
+
# (16, 8)" cell loses on the SOUTH spawn (the safe region there
|
| 277 |
+
# is (16, 36) not (16, 8)).
|
| 278 |
+
hard:
|
| 279 |
+
description: >
|
| 280 |
+
Your base stages from a seed-chosen latitude (NORTH y=4 or
|
| 281 |
+
SOUTH y=36) — a single memorised target cell cannot
|
| 282 |
+
generalise. You own a Construction Yard and a Power Plant
|
| 283 |
+
at the safe corner of your latitude and a forward Refinery
|
| 284 |
+
at the centre lane. Hunt bands are incoming on the centre
|
| 285 |
+
lane and will raze the refinery within ~20 turns. Your
|
| 286 |
+
starting cash is 700 — exactly half the refinery build
|
| 287 |
+
cost. You MUST sell the exposed refinery and use the
|
| 288 |
+
recouped cash to build a new refinery at the safe target
|
| 289 |
+
region of your OWN latitude (around (16, 8) for the NORTH
|
| 290 |
+
spawn, (16, 36) for the SOUTH spawn). Win by having a
|
| 291 |
+
refinery at the safe region of your latitude AND the
|
| 292 |
+
Construction Yard still alive AND before tick 4500.
|
| 293 |
+
starting_cash: 700
|
| 294 |
+
overrides:
|
| 295 |
+
actors:
|
| 296 |
+
# NORTH spawn (spawn_point 0): fact + powr at the safe
|
| 297 |
+
# NW corner, proc forward on the central lane.
|
| 298 |
+
- {type: fact, owner: agent, position: [4, 4], spawn_point: 0}
|
| 299 |
+
- {type: powr, owner: agent, position: [4, 9], spawn_point: 0}
|
| 300 |
+
- {type: proc, owner: agent, position: [60, 20], spawn_point: 0}
|
| 301 |
+
# Inert HoldFire rifle at the matching safe shoulder so the
|
| 302 |
+
# spawn variation surfaces in `units_summary` (the hard-tier
|
| 303 |
+
# contract test asserts that different seeds produce
|
| 304 |
+
# different agent starts; otherwise this pack is buildings-
|
| 305 |
+
# only and the spawn round-robin is invisible to the
|
| 306 |
+
# observation channel).
|
| 307 |
+
- {type: e1, owner: agent, position: [16, 8], stance: 0, spawn_point: 0}
|
| 308 |
+
# SOUTH spawn (spawn_point 1): fact + powr at the safe
|
| 309 |
+
# SW corner, proc forward on the SAME central lane (the
|
| 310 |
+
# threat axis is symmetric across y=20).
|
| 311 |
+
- {type: fact, owner: agent, position: [4, 36], spawn_point: 1}
|
| 312 |
+
- {type: powr, owner: agent, position: [4, 31], spawn_point: 1}
|
| 313 |
+
- {type: proc, owner: agent, position: [60, 20], spawn_point: 1}
|
| 314 |
+
- {type: e1, owner: agent, position: [16, 36], stance: 0, spawn_point: 1}
|
| 315 |
+
# Hunt band on the centre lane — enemy actors don't honour
|
| 316 |
+
# spawn_point (CLAUDE.md oramap.rs footgun) so they always
|
| 317 |
+
# place regardless of seed. Each hunt unit picks its own
|
| 318 |
+
# nearest foe; the proc on (60,20) is the front piece for
|
| 319 |
+
# BOTH spawns (the fact in either NW or SW corner is far
|
| 320 |
+
# off-axis), so the proc absorbs the salvo first and the
|
| 321 |
+
# active-spawn fact survives well past the win check.
|
| 322 |
+
- {type: e1, owner: enemy, position: [110, 20], stance: 3, count: 3}
|
| 323 |
+
# Anti-DRAW marker at the centre lane far east.
|
| 324 |
+
- {type: fact, owner: enemy, position: [125, 20]}
|
| 325 |
+
# Spawn-matched win: the proc must land in the SAFE region of the
|
| 326 |
+
# active spawn's latitude. The `any_of` pairs each NORTH/SOUTH
|
| 327 |
+
# safe region with a `building_in_region` check on the active
|
| 328 |
+
# fact's corner — so the NORTH-disc clause only fires when the
|
| 329 |
+
# NORTH fact is alive (NORTH spawn) and the SOUTH-disc clause
|
| 330 |
+
# only fires when the SOUTH fact is alive (SOUTH spawn). A
|
| 331 |
+
# memorised "always place at (16, 8)" cell satisfies the NORTH
|
| 332 |
+
# clause on NORTH seeds but FAILS on SOUTH seeds (no NORTH fact
|
| 333 |
+
# ⇒ NORTH-pair fails; no SOUTH proc ⇒ SOUTH-pair fails).
|
| 334 |
+
win_condition:
|
| 335 |
+
all_of:
|
| 336 |
+
- any_of:
|
| 337 |
+
# NORTH spawn (fact at (4, 4)) ⇒ matching safe region
|
| 338 |
+
# for the new proc is the NW shoulder around (16, 8).
|
| 339 |
+
- all_of:
|
| 340 |
+
- building_in_region: {type: fact, x: 4, y: 4, radius: 4, count: 1}
|
| 341 |
+
- building_in_region: {type: proc, x: 16, y: 8, radius: 6, count: 1}
|
| 342 |
+
# SOUTH spawn (fact at (4, 36)) ⇒ matching safe region
|
| 343 |
+
# for the new proc is the SW shoulder around (16, 36).
|
| 344 |
+
- all_of:
|
| 345 |
+
- building_in_region: {type: fact, x: 4, y: 36, radius: 4, count: 1}
|
| 346 |
+
- building_in_region: {type: proc, x: 16, y: 36, radius: 6, count: 1}
|
| 347 |
+
- building_count_gte: {type: fact, n: 1}
|
| 348 |
+
- within_ticks: 4500
|
| 349 |
+
fail_condition:
|
| 350 |
+
any_of:
|
| 351 |
+
- after_ticks: 4501
|
| 352 |
+
- not: {building_count_gte: {type: fact, n: 1}}
|
| 353 |
+
max_turns: 60
|
tests/test_build_sell_and_rebuild_elsewhere.py
ADDED
|
@@ -0,0 +1,548 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""build-sell-and-rebuild-elsewhere pack — no-cheat validation on Rust.
|
| 2 |
+
|
| 3 |
+
Wave-8 capital reallocation pack. The pack tests SELL-AND-REBUILD as
|
| 4 |
+
a reasoning primitive: the agent's exposed refinery (proc) on the
|
| 5 |
+
centre lane will be razed by a `hunt` band, and starting cash alone
|
| 6 |
+
is NOT enough to build a new proc. The only path to a fresh proc at
|
| 7 |
+
the safe target region inside the tick budget is:
|
| 8 |
+
|
| 9 |
+
1. `sell(proc_id)` ⇒ refunds 50% of proc cost (700)
|
| 10 |
+
2. `build('proc')` + `place_building(proc, x, y)` in the safe region
|
| 11 |
+
|
| 12 |
+
The win predicate makes the SELL load-bearing:
|
| 13 |
+
|
| 14 |
+
* `building_in_region:{type:proc, x:safe_x, y:safe_y, radius:6, count:1}`
|
| 15 |
+
⇒ a fresh proc must STAND at the safe target region (not the centre
|
| 16 |
+
lane; not anywhere outside the radius);
|
| 17 |
+
* `building_count_gte:{type:fact, n:1}` ⇒ the Construction Yard must
|
| 18 |
+
still be alive (the PRESENT-TENSE predicate, not `has_building:fact`
|
| 19 |
+
which is a one-shot ever-seen set — CLAUDE.md footgun);
|
| 20 |
+
* `within_ticks: 4500` paired with `after_ticks: 4501` in fail ⇒ the
|
| 21 |
+
episode end is a real reachable timeout LOSS, never a draw.
|
| 22 |
+
|
| 23 |
+
The scripted-policy validations prove deterministically that:
|
| 24 |
+
|
| 25 |
+
* the intended SELL-THEN-REBUILD policy WINS every (level, seed);
|
| 26 |
+
* stall (observe only), build-without-selling (cash gated), and
|
| 27 |
+
sell-then-misplace (new proc on the y=20 lane) all LOSE every
|
| 28 |
+
(level, seed) — real LOSS, not draw;
|
| 29 |
+
* the hard tier defines ≥2 spawn_point groups (NORTH y=4 / SOUTH
|
| 30 |
+
y=36) so a memorised "place at (16, 8)" cell cell that worked on
|
| 31 |
+
easy/medium FAILS on the SOUTH spawn (the matching safe region
|
| 32 |
+
there is (16, 36)).
|
| 33 |
+
|
| 34 |
+
NOTE on building ids: `sell` requires the real engine actor id
|
| 35 |
+
(e.g. `1003`), which the bench's `render_state["own_buildings"]`
|
| 36 |
+
strips. The scripted policies below reach into `_raw["own_buildings"]`
|
| 37 |
+
(via a small custom episode loop) to look up the proc id by cell.
|
| 38 |
+
The model-evaluation path is a separate concern: the model issues
|
| 39 |
+
sell-like reasoning and the win predicate is what actually grades
|
| 40 |
+
the outcome (real proc presence at the safe region).
|
| 41 |
+
"""
|
| 42 |
+
|
| 43 |
+
from __future__ import annotations
|
| 44 |
+
|
| 45 |
+
from pathlib import Path
|
| 46 |
+
|
| 47 |
+
import pytest
|
| 48 |
+
|
| 49 |
+
pytest.importorskip("openra_train", reason="Rust env wheel not installed")
|
| 50 |
+
pytest.importorskip("openra_rl_training", reason="Rust env wheel not installed")
|
| 51 |
+
|
| 52 |
+
from openra_bench.eval_core import (
|
| 53 |
+
RustEnvPool,
|
| 54 |
+
_scenario_to_tmp_yaml,
|
| 55 |
+
run_level,
|
| 56 |
+
)
|
| 57 |
+
from openra_bench.rust_adapter import RustObsAdapter
|
| 58 |
+
from openra_bench.scenarios import load_pack
|
| 59 |
+
from openra_bench.scenarios.loader import PACKS_DIR, compile_level
|
| 60 |
+
from openra_bench.scenarios.win_conditions import WinContext, evaluate
|
| 61 |
+
|
| 62 |
+
PACK = PACKS_DIR / "build-sell-and-rebuild-elsewhere.yaml"
|
| 63 |
+
LEVELS = ("easy", "medium", "hard")
|
| 64 |
+
SEEDS = (1, 2, 3, 4)
|
| 65 |
+
|
| 66 |
+
|
| 67 |
+
# ── custom episode loop with raw building-id access ──────────────────
|
| 68 |
+
|
| 69 |
+
|
| 70 |
+
def _run_with_id_aware_policy(compiled, policy, seed):
|
| 71 |
+
"""Run an episode where the policy is given (rs, raw, Command) and
|
| 72 |
+
can read `raw["own_buildings"][i]["id"]` (the real engine actor
|
| 73 |
+
id). Mirrors `run_level`'s win/fail/draw scoring without changing
|
| 74 |
+
the standard policy contract.
|
| 75 |
+
"""
|
| 76 |
+
tmp = _scenario_to_tmp_yaml(compiled)
|
| 77 |
+
pool = RustEnvPool(size=1, scenario_path=tmp)
|
| 78 |
+
env = pool.acquire()
|
| 79 |
+
try:
|
| 80 |
+
adapter = RustObsAdapter()
|
| 81 |
+
adapter.observe(env.reset(seed=seed))
|
| 82 |
+
outcome = "draw"
|
| 83 |
+
turns = 0
|
| 84 |
+
for turns in range(1, compiled.max_turns + 1):
|
| 85 |
+
rs = adapter.render_state()
|
| 86 |
+
raw = adapter._raw # for building ids
|
| 87 |
+
cmds = policy(rs, raw, env.Command) or [env.Command.observe()]
|
| 88 |
+
obs, _r, done, _info = env.step(cmds)
|
| 89 |
+
adapter.observe(obs, done=done)
|
| 90 |
+
ctx = WinContext(
|
| 91 |
+
signals=adapter.signals,
|
| 92 |
+
render_state=adapter.render_state(),
|
| 93 |
+
)
|
| 94 |
+
if evaluate(compiled.win_condition, ctx):
|
| 95 |
+
outcome = "win"
|
| 96 |
+
break
|
| 97 |
+
if evaluate(compiled.fail_condition, ctx):
|
| 98 |
+
outcome = "loss"
|
| 99 |
+
break
|
| 100 |
+
if done:
|
| 101 |
+
break
|
| 102 |
+
return outcome, turns, adapter.signals
|
| 103 |
+
finally:
|
| 104 |
+
pool.release(env)
|
| 105 |
+
pool.shutdown()
|
| 106 |
+
Path(tmp).unlink(missing_ok=True)
|
| 107 |
+
|
| 108 |
+
|
| 109 |
+
def _proc_id_at(raw, y):
|
| 110 |
+
"""Lookup the engine actor id of an own proc at the given y. None
|
| 111 |
+
if no matching proc is alive."""
|
| 112 |
+
for b in (raw.get("own_buildings") or []):
|
| 113 |
+
if b.get("type") == "proc" and int(b.get("cell_y", -1)) == y:
|
| 114 |
+
return str(b["id"])
|
| 115 |
+
return None
|
| 116 |
+
|
| 117 |
+
|
| 118 |
+
def _own_proc_at(raw, y_band):
|
| 119 |
+
"""Any own proc inside ``y_band`` (a (lo, hi) inclusive interval)."""
|
| 120 |
+
for b in (raw.get("own_buildings") or []):
|
| 121 |
+
if (
|
| 122 |
+
b.get("type") == "proc"
|
| 123 |
+
and y_band[0] <= int(b.get("cell_y", -1)) <= y_band[1]
|
| 124 |
+
):
|
| 125 |
+
return b
|
| 126 |
+
return None
|
| 127 |
+
|
| 128 |
+
|
| 129 |
+
def _fact_y(raw):
|
| 130 |
+
"""Latitude of the agent's fact (4 on NORTH spawn / 4 on hard NORTH /
|
| 131 |
+
36 on hard SOUTH). Used by the hard-tier intended policy to pick
|
| 132 |
+
the matching safe target region."""
|
| 133 |
+
for b in (raw.get("own_buildings") or []):
|
| 134 |
+
if b.get("type") == "fact":
|
| 135 |
+
return int(b.get("cell_y", 4))
|
| 136 |
+
return 4
|
| 137 |
+
|
| 138 |
+
|
| 139 |
+
# ── scripted policies ───────────────────────────────────────────────
|
| 140 |
+
|
| 141 |
+
|
| 142 |
+
def stall(rs, C):
|
| 143 |
+
"""Observe-only — proc razed, no new proc placed. LOSS."""
|
| 144 |
+
return [C.observe()]
|
| 145 |
+
|
| 146 |
+
|
| 147 |
+
def make_build_without_selling(safe_x=16, safe_y=8):
|
| 148 |
+
"""Try to BUILD + PLACE a new proc WITHOUT selling the exposed one.
|
| 149 |
+
Cash starts at 700/800 — well under the 1400 build cost — so the
|
| 150 |
+
`build('proc')` queue starts but never completes (no income
|
| 151 |
+
source). No new proc ⇒ region clause unmet ⇒ LOSS.
|
| 152 |
+
|
| 153 |
+
NOTE: queue insufficient cash is silently ignored by the engine
|
| 154 |
+
(production gates on cash > cost); the build never progresses.
|
| 155 |
+
"""
|
| 156 |
+
|
| 157 |
+
def policy(rs, raw, C):
|
| 158 |
+
cmds = []
|
| 159 |
+
# Find any safe-region proc to terminate early once present.
|
| 160 |
+
if any(
|
| 161 |
+
b.get("type") == "proc" and int(b.get("cell_y", -1)) != 20
|
| 162 |
+
for b in (raw.get("own_buildings") or [])
|
| 163 |
+
):
|
| 164 |
+
return [C.observe()]
|
| 165 |
+
prod_items = [
|
| 166 |
+
(p.get("item") if isinstance(p, dict) else p)
|
| 167 |
+
for p in (rs.get("production") or [])
|
| 168 |
+
]
|
| 169 |
+
if "proc" not in prod_items:
|
| 170 |
+
cmds.append(C.build("proc"))
|
| 171 |
+
cmds.append(C.place_building("proc", safe_x, safe_y))
|
| 172 |
+
return cmds or [C.observe()]
|
| 173 |
+
|
| 174 |
+
return policy
|
| 175 |
+
|
| 176 |
+
|
| 177 |
+
def make_sell_then_misplace(safe_x_wrong=60, safe_y_wrong=20):
|
| 178 |
+
"""SELL the exposed proc (refund + cash buys a new proc) but
|
| 179 |
+
PLACE the new proc back IN THE CENTRE LANE — outside the safe
|
| 180 |
+
target region disc. The new proc satisfies `building_count_gte`
|
| 181 |
+
but NOT `building_in_region` — LOSS.
|
| 182 |
+
"""
|
| 183 |
+
state = {"sold": False}
|
| 184 |
+
|
| 185 |
+
def policy(rs, raw, C):
|
| 186 |
+
cmds = []
|
| 187 |
+
if not state["sold"]:
|
| 188 |
+
pid = _proc_id_at(raw, 20)
|
| 189 |
+
if pid:
|
| 190 |
+
cmds.append(C.sell([pid]))
|
| 191 |
+
state["sold"] = True
|
| 192 |
+
# No existing safe-region proc — but we deliberately place
|
| 193 |
+
# back on the y=20 lane to demonstrate the misplace cost.
|
| 194 |
+
prod_items = [
|
| 195 |
+
(p.get("item") if isinstance(p, dict) else p)
|
| 196 |
+
for p in (rs.get("production") or [])
|
| 197 |
+
]
|
| 198 |
+
if "proc" not in prod_items:
|
| 199 |
+
cmds.append(C.build("proc"))
|
| 200 |
+
cmds.append(C.place_building("proc", safe_x_wrong, safe_y_wrong))
|
| 201 |
+
return cmds or [C.observe()]
|
| 202 |
+
|
| 203 |
+
return policy
|
| 204 |
+
|
| 205 |
+
|
| 206 |
+
def make_intended_easy_medium(safe_x=16, safe_y=8):
|
| 207 |
+
"""Intended SELL-THEN-REBUILD play for easy/medium (fact at
|
| 208 |
+
(4, 4) so safe region is (16, 8)).
|
| 209 |
+
|
| 210 |
+
Turn 1: sell the exposed proc (refunds 700, total cash → 1500).
|
| 211 |
+
Continuously: queue `build('proc')` + `place_building` at the
|
| 212 |
+
safe target region. The build completes ~1400 ticks after queue;
|
| 213 |
+
place_building lands at (16, 8). Win clause fires.
|
| 214 |
+
"""
|
| 215 |
+
state = {"sold": False}
|
| 216 |
+
|
| 217 |
+
def policy(rs, raw, C):
|
| 218 |
+
cmds = []
|
| 219 |
+
if not state["sold"]:
|
| 220 |
+
pid = _proc_id_at(raw, 20)
|
| 221 |
+
if pid:
|
| 222 |
+
cmds.append(C.sell([pid]))
|
| 223 |
+
state["sold"] = True
|
| 224 |
+
# Skip if the safe-region proc already exists.
|
| 225 |
+
if any(
|
| 226 |
+
b.get("type") == "proc" and int(b.get("cell_y", -1)) != 20
|
| 227 |
+
for b in (raw.get("own_buildings") or [])
|
| 228 |
+
):
|
| 229 |
+
return cmds or [C.observe()]
|
| 230 |
+
prod_items = [
|
| 231 |
+
(p.get("item") if isinstance(p, dict) else p)
|
| 232 |
+
for p in (rs.get("production") or [])
|
| 233 |
+
]
|
| 234 |
+
if "proc" not in prod_items:
|
| 235 |
+
cmds.append(C.build("proc"))
|
| 236 |
+
cmds.append(C.place_building("proc", safe_x, safe_y))
|
| 237 |
+
return cmds or [C.observe()]
|
| 238 |
+
|
| 239 |
+
return policy
|
| 240 |
+
|
| 241 |
+
|
| 242 |
+
def make_intended_hard_adaptive():
|
| 243 |
+
"""Intended SELL-THEN-REBUILD play for hard (fact at either y=4 or
|
| 244 |
+
y=36 by seed). Reads the fact's actual y from the observation on
|
| 245 |
+
turn 1, then places the new proc at the matching safe region —
|
| 246 |
+
(16, 8) for NORTH spawn, (16, 36) for SOUTH spawn.
|
| 247 |
+
"""
|
| 248 |
+
state = {"sold": False, "safe_xy": None}
|
| 249 |
+
|
| 250 |
+
def policy(rs, raw, C):
|
| 251 |
+
if state["safe_xy"] is None:
|
| 252 |
+
fy = _fact_y(raw)
|
| 253 |
+
state["safe_xy"] = (16, 8 if fy < 20 else 36)
|
| 254 |
+
sx, sy = state["safe_xy"]
|
| 255 |
+
cmds = []
|
| 256 |
+
if not state["sold"]:
|
| 257 |
+
pid = _proc_id_at(raw, 20)
|
| 258 |
+
if pid:
|
| 259 |
+
cmds.append(C.sell([pid]))
|
| 260 |
+
state["sold"] = True
|
| 261 |
+
# Skip if the safe-region proc already exists.
|
| 262 |
+
if any(
|
| 263 |
+
b.get("type") == "proc" and int(b.get("cell_y", -1)) != 20
|
| 264 |
+
for b in (raw.get("own_buildings") or [])
|
| 265 |
+
):
|
| 266 |
+
return cmds or [C.observe()]
|
| 267 |
+
prod_items = [
|
| 268 |
+
(p.get("item") if isinstance(p, dict) else p)
|
| 269 |
+
for p in (rs.get("production") or [])
|
| 270 |
+
]
|
| 271 |
+
if "proc" not in prod_items:
|
| 272 |
+
cmds.append(C.build("proc"))
|
| 273 |
+
cmds.append(C.place_building("proc", sx, sy))
|
| 274 |
+
return cmds or [C.observe()]
|
| 275 |
+
|
| 276 |
+
return policy
|
| 277 |
+
|
| 278 |
+
|
| 279 |
+
def make_memorised_north_only():
|
| 280 |
+
"""Naive: always place at (16, 8) (the easy/medium safe region).
|
| 281 |
+
On hard SOUTH spawn (fact at y=36), the safe region is (16, 36),
|
| 282 |
+
so a place at (16, 8) lands outside the matching radius-6 disc
|
| 283 |
+
AND outside the SOUTH disc — LOSS on SOUTH seeds.
|
| 284 |
+
"""
|
| 285 |
+
state = {"sold": False}
|
| 286 |
+
|
| 287 |
+
def policy(rs, raw, C):
|
| 288 |
+
cmds = []
|
| 289 |
+
if not state["sold"]:
|
| 290 |
+
pid = _proc_id_at(raw, 20)
|
| 291 |
+
if pid:
|
| 292 |
+
cmds.append(C.sell([pid]))
|
| 293 |
+
state["sold"] = True
|
| 294 |
+
if any(
|
| 295 |
+
b.get("type") == "proc" and int(b.get("cell_y", -1)) != 20
|
| 296 |
+
for b in (raw.get("own_buildings") or [])
|
| 297 |
+
):
|
| 298 |
+
return cmds or [C.observe()]
|
| 299 |
+
prod_items = [
|
| 300 |
+
(p.get("item") if isinstance(p, dict) else p)
|
| 301 |
+
for p in (rs.get("production") or [])
|
| 302 |
+
]
|
| 303 |
+
if "proc" not in prod_items:
|
| 304 |
+
cmds.append(C.build("proc"))
|
| 305 |
+
cmds.append(C.place_building("proc", 16, 8))
|
| 306 |
+
return cmds or [C.observe()]
|
| 307 |
+
|
| 308 |
+
return policy
|
| 309 |
+
|
| 310 |
+
|
| 311 |
+
# ── scenario-shape invariants ───────────────────────────────────────
|
| 312 |
+
|
| 313 |
+
|
| 314 |
+
def test_pack_compiles_with_three_levels_and_hunt_bot():
|
| 315 |
+
pack = load_pack(PACK)
|
| 316 |
+
assert pack.meta.id == "build-sell-and-rebuild-elsewhere"
|
| 317 |
+
assert pack.meta.capability == "reasoning"
|
| 318 |
+
assert set(pack.levels) == {"easy", "medium", "hard"}
|
| 319 |
+
# Required-by-spec benchmark anchors (capital reallocation idiom).
|
| 320 |
+
anchors = [a.lower() for a in pack.meta.benchmark_anchor]
|
| 321 |
+
assert any("capital reallocation" in a for a in anchors), pack.meta.benchmark_anchor
|
| 322 |
+
assert any("sc2 sell mechanic" in a for a in anchors), pack.meta.benchmark_anchor
|
| 323 |
+
assert any(
|
| 324 |
+
"financial reallocation" in a for a in anchors
|
| 325 |
+
), pack.meta.benchmark_anchor
|
| 326 |
+
# Hunt bot is wired through to the engine for every level (per-unit
|
| 327 |
+
# nearest-foe targeting, so the proc on the centre lane is the
|
| 328 |
+
# front piece, not the off-axis fact — see pack header).
|
| 329 |
+
for lvl in LEVELS:
|
| 330 |
+
c = compile_level(pack, lvl)
|
| 331 |
+
assert c.map_supported
|
| 332 |
+
bot = getattr(c.scenario.enemy, "bot_type", None) or getattr(
|
| 333 |
+
c.scenario.enemy, "bot", None
|
| 334 |
+
)
|
| 335 |
+
assert str(bot).lower() == "hunt", (lvl, bot)
|
| 336 |
+
|
| 337 |
+
|
| 338 |
+
def test_sell_is_exposed_in_the_tool_palette():
|
| 339 |
+
"""`sell` is the load-bearing primitive — the pack would be
|
| 340 |
+
unsolvable without it (build('proc') is cash-gated, the agent has
|
| 341 |
+
no income source, the exposed proc would just be razed)."""
|
| 342 |
+
pack = load_pack(PACK)
|
| 343 |
+
for lvl in LEVELS:
|
| 344 |
+
c = compile_level(pack, lvl)
|
| 345 |
+
tools = set(getattr(c.scenario, "tools", None) or [])
|
| 346 |
+
assert "sell" in tools, (lvl, tools)
|
| 347 |
+
assert "build" in tools, (lvl, tools)
|
| 348 |
+
assert "place_building" in tools, (lvl, tools)
|
| 349 |
+
|
| 350 |
+
|
| 351 |
+
def test_starting_cash_is_below_proc_build_cost_on_every_tier():
|
| 352 |
+
"""Cash + sell-refund must just barely cover the proc rebuild
|
| 353 |
+
(cash 700-800; refund 700; proc cost 1400). Without the refund
|
| 354 |
+
the cash alone falls short — that gap is the load-bearing
|
| 355 |
+
discrimination."""
|
| 356 |
+
pack = load_pack(PACK)
|
| 357 |
+
for lvl in LEVELS:
|
| 358 |
+
c = compile_level(pack, lvl)
|
| 359 |
+
# cash < 1400 (proc cost) ⇒ build-without-selling is impossible.
|
| 360 |
+
assert c.starting_cash < 1400, (lvl, c.starting_cash)
|
| 361 |
+
# cash + 700 refund ≥ 1400 ⇒ sell-then-rebuild is feasible.
|
| 362 |
+
assert c.starting_cash + 700 >= 1400, (lvl, c.starting_cash)
|
| 363 |
+
|
| 364 |
+
|
| 365 |
+
@pytest.mark.parametrize("level", LEVELS)
|
| 366 |
+
def test_every_level_has_a_reachable_timeout_fail(level):
|
| 367 |
+
"""Non-win must be a real LOSS: the `after_ticks` fail clause must
|
| 368 |
+
be strictly below the tick reachable at max_turns (interrupt mode
|
| 369 |
+
advances ≤90 ticks per step; some steps are shorter due to
|
| 370 |
+
enemy_unit_spotted events, so the empirical reachable tick is
|
| 371 |
+
~4698 at 60 turns)."""
|
| 372 |
+
c = compile_level(load_pack(PACK), level)
|
| 373 |
+
assert c.fail_condition is not None
|
| 374 |
+
fc = c.fail_condition.model_dump(exclude_none=True)
|
| 375 |
+
deadline = None
|
| 376 |
+
for clause in fc.get("any_of", []) or []:
|
| 377 |
+
if "after_ticks" in clause:
|
| 378 |
+
deadline = int(clause["after_ticks"])
|
| 379 |
+
assert deadline is not None, f"{level}: no after_ticks fail clause"
|
| 380 |
+
# 60 turns × ~78 ticks/turn (event-shortened) ≈ 4680; 4501
|
| 381 |
+
# deadline reliably bites.
|
| 382 |
+
assert deadline < 4700, (
|
| 383 |
+
f"{level}: deadline {deadline} unreachable within {c.max_turns} "
|
| 384 |
+
f"turns (interrupt mode ≈ 4680 max tick) → draw degeneracy"
|
| 385 |
+
)
|
| 386 |
+
|
| 387 |
+
|
| 388 |
+
def test_fact_alive_clause_uses_present_tense_predicate():
|
| 389 |
+
"""The fact-survival clause must use the PRESENT-TENSE predicate
|
| 390 |
+
(`building_count_gte:{type:fact,n:1}`) rather than `has_building`,
|
| 391 |
+
which is a one-shot "ever seen" set that stays true after the
|
| 392 |
+
fact is destroyed (a documented CLAUDE.md footgun)."""
|
| 393 |
+
for lvl in LEVELS:
|
| 394 |
+
c = compile_level(load_pack(PACK), lvl)
|
| 395 |
+
fc = c.fail_condition.model_dump(exclude_none=True)
|
| 396 |
+
fact_clauses = [
|
| 397 |
+
clause for clause in fc.get("any_of", []) or []
|
| 398 |
+
if isinstance(clause, dict)
|
| 399 |
+
and isinstance(clause.get("not"), dict)
|
| 400 |
+
and "building_count_gte" in (clause["not"] or {})
|
| 401 |
+
and (clause["not"]["building_count_gte"] or {}).get("type") == "fact"
|
| 402 |
+
]
|
| 403 |
+
assert fact_clauses, f"{lvl}: missing present-tense fact-alive fail clause"
|
| 404 |
+
|
| 405 |
+
|
| 406 |
+
def test_hard_has_two_spawn_point_groups_and_fact_flips():
|
| 407 |
+
"""Hard-tier contract: ≥2 distinct agent spawn_point groups so the
|
| 408 |
+
fact (and therefore the safe target region for proc placement)
|
| 409 |
+
flips by seed. The two groups must define the NORTH (y=4) and
|
| 410 |
+
SOUTH (y=36) fact pair."""
|
| 411 |
+
c = compile_level(load_pack(PACK), "hard")
|
| 412 |
+
groups = {
|
| 413 |
+
a.spawn_point for a in c.scenario.actors
|
| 414 |
+
if a.owner == "agent" and a.spawn_point is not None
|
| 415 |
+
}
|
| 416 |
+
assert groups == {0, 1}, groups
|
| 417 |
+
fact_ys = sorted({
|
| 418 |
+
a.position[1] for a in c.scenario.actors
|
| 419 |
+
if a.owner == "agent" and a.type == "fact"
|
| 420 |
+
})
|
| 421 |
+
assert fact_ys == [4, 36], fact_ys
|
| 422 |
+
# In-bounds check (rush-hour-arena playable x ≈ 2..126, y ≈ 2..38).
|
| 423 |
+
for a in c.scenario.actors:
|
| 424 |
+
x, y = a.position
|
| 425 |
+
assert 2 <= x <= 126 and 2 <= y <= 38, (a.type, a.position)
|
| 426 |
+
|
| 427 |
+
|
| 428 |
+
# ── solvency: intended SELL-THEN-REBUILD wins every (level, seed) ────
|
| 429 |
+
|
| 430 |
+
|
| 431 |
+
@pytest.mark.parametrize("level", ("easy", "medium"))
|
| 432 |
+
def test_intended_sell_then_rebuild_wins_easy_medium(level):
|
| 433 |
+
c = compile_level(load_pack(PACK), level)
|
| 434 |
+
for seed in SEEDS:
|
| 435 |
+
outcome, turns, sig = _run_with_id_aware_policy(
|
| 436 |
+
c, make_intended_easy_medium(), seed
|
| 437 |
+
)
|
| 438 |
+
assert outcome == "win", (
|
| 439 |
+
f"{level} seed{seed}: intended SELL-THEN-REBUILD must WIN; "
|
| 440 |
+
f"got {outcome} (tick={sig.game_tick}, "
|
| 441 |
+
f"buildings={sig.own_buildings})"
|
| 442 |
+
)
|
| 443 |
+
|
| 444 |
+
|
| 445 |
+
def test_intended_hard_adaptive_wins_every_seed():
|
| 446 |
+
"""Hard tier: the intended policy must read the fact's latitude
|
| 447 |
+
(NORTH y=4 vs SOUTH y=36) and pick the matching safe target
|
| 448 |
+
region. WINS on every seed."""
|
| 449 |
+
c = compile_level(load_pack(PACK), "hard")
|
| 450 |
+
for seed in SEEDS:
|
| 451 |
+
outcome, turns, sig = _run_with_id_aware_policy(
|
| 452 |
+
c, make_intended_hard_adaptive(), seed
|
| 453 |
+
)
|
| 454 |
+
assert outcome == "win", (
|
| 455 |
+
f"hard seed{seed}: intended adaptive sell-then-rebuild must "
|
| 456 |
+
f"WIN; got {outcome} (tick={sig.game_tick}, "
|
| 457 |
+
f"buildings={sig.own_buildings})"
|
| 458 |
+
)
|
| 459 |
+
|
| 460 |
+
|
| 461 |
+
# ── no-cheat: every lazy / wrong policy LOSES (not draws) ────────────
|
| 462 |
+
|
| 463 |
+
|
| 464 |
+
@pytest.mark.parametrize("level", LEVELS)
|
| 465 |
+
def test_stall_loses_every_level_and_seed(level):
|
| 466 |
+
"""STALL: observe only. The hunt band razes the exposed proc and
|
| 467 |
+
the agent never places a new one ⇒ region clause unmet AND clock
|
| 468 |
+
runs out ⇒ real LOSS, not draw."""
|
| 469 |
+
c = compile_level(load_pack(PACK), level)
|
| 470 |
+
for seed in SEEDS:
|
| 471 |
+
r = run_level(c, stall, seed=seed)
|
| 472 |
+
assert r.outcome == "loss", (
|
| 473 |
+
f"{level} seed{seed} stall: must LOSE (real fail, not draw); "
|
| 474 |
+
f"got {r.outcome} (tick={r.signals.game_tick}, "
|
| 475 |
+
f"buildings={r.signals.own_buildings})"
|
| 476 |
+
)
|
| 477 |
+
|
| 478 |
+
|
| 479 |
+
@pytest.mark.parametrize("level", ("easy", "medium"))
|
| 480 |
+
def test_build_without_selling_loses_easy_medium(level):
|
| 481 |
+
"""BUILD WITHOUT SELLING: `build('proc')` is rejected until cash
|
| 482 |
+
≥ 1400; the agent has no income source, the build never starts,
|
| 483 |
+
no proc lands at the safe region ⇒ LOSS."""
|
| 484 |
+
c = compile_level(load_pack(PACK), level)
|
| 485 |
+
for seed in SEEDS:
|
| 486 |
+
outcome, turns, sig = _run_with_id_aware_policy(
|
| 487 |
+
c, make_build_without_selling(), seed
|
| 488 |
+
)
|
| 489 |
+
assert outcome == "loss", (
|
| 490 |
+
f"{level} seed{seed} build-without-selling: must LOSE; "
|
| 491 |
+
f"got {outcome} (tick={sig.game_tick}, "
|
| 492 |
+
f"buildings={sig.own_buildings})"
|
| 493 |
+
)
|
| 494 |
+
|
| 495 |
+
|
| 496 |
+
@pytest.mark.parametrize("level", ("easy", "medium"))
|
| 497 |
+
def test_sell_then_misplace_loses_easy_medium(level):
|
| 498 |
+
"""SELL-THEN-MISPLACE: sells the exposed proc and uses the
|
| 499 |
+
refund to build a NEW proc, but places it back in the central
|
| 500 |
+
lane (y=20) — outside the safe target region disc. Region clause
|
| 501 |
+
unmet ⇒ LOSS."""
|
| 502 |
+
c = compile_level(load_pack(PACK), level)
|
| 503 |
+
for seed in SEEDS:
|
| 504 |
+
outcome, turns, sig = _run_with_id_aware_policy(
|
| 505 |
+
c, make_sell_then_misplace(), seed
|
| 506 |
+
)
|
| 507 |
+
assert outcome == "loss", (
|
| 508 |
+
f"{level} seed{seed} sell-then-misplace: must LOSE; "
|
| 509 |
+
f"got {outcome} (tick={sig.game_tick}, "
|
| 510 |
+
f"buildings={sig.own_buildings})"
|
| 511 |
+
)
|
| 512 |
+
|
| 513 |
+
|
| 514 |
+
def test_memorised_north_only_loses_on_hard_south_seeds():
|
| 515 |
+
"""The non-adaptive "always place at (16, 8)" policy WINS hard
|
| 516 |
+
seeds whose spawn is NORTH (fact at y=4 ⇒ matching safe region is
|
| 517 |
+
(16, 8)) but FAILS hard seeds whose spawn is SOUTH (fact at y=36
|
| 518 |
+
⇒ matching safe region is (16, 36), and (16, 8) is outside the
|
| 519 |
+
SOUTH disc). The spawn-driven discrimination: at least one of
|
| 520 |
+
the 4 hard seeds must LOSE."""
|
| 521 |
+
c = compile_level(load_pack(PACK), "hard")
|
| 522 |
+
losses = 0
|
| 523 |
+
for seed in SEEDS:
|
| 524 |
+
outcome, turns, sig = _run_with_id_aware_policy(
|
| 525 |
+
c, make_memorised_north_only(), seed
|
| 526 |
+
)
|
| 527 |
+
if outcome == "loss":
|
| 528 |
+
losses += 1
|
| 529 |
+
assert losses >= 1, (
|
| 530 |
+
f"hard: memorised-north-only must LOSE on ≥1 of {len(SEEDS)} "
|
| 531 |
+
f"seeds (spawn-driven discrimination); got {losses} losses"
|
| 532 |
+
)
|
| 533 |
+
|
| 534 |
+
|
| 535 |
+
# ── determinism ──────────────────────────────────────────────────────
|
| 536 |
+
|
| 537 |
+
|
| 538 |
+
def test_intended_run_is_deterministic_on_easy():
|
| 539 |
+
c = compile_level(load_pack(PACK), "easy")
|
| 540 |
+
a_outcome, a_turns, a_sig = _run_with_id_aware_policy(
|
| 541 |
+
c, make_intended_easy_medium(), seed=3
|
| 542 |
+
)
|
| 543 |
+
b_outcome, b_turns, b_sig = _run_with_id_aware_policy(
|
| 544 |
+
c, make_intended_easy_medium(), seed=3
|
| 545 |
+
)
|
| 546 |
+
assert (a_outcome, a_turns, a_sig.units_killed) == (
|
| 547 |
+
b_outcome, b_turns, b_sig.units_killed,
|
| 548 |
+
), "same seed must be deterministic"
|
tests/test_hard_tier.py
CHANGED
|
@@ -926,6 +926,23 @@ UPGRADED = [
|
|
| 926 |
# targets so the agent's infantry strike force racks up kills
|
| 927 |
# without being attrited in transit.
|
| 928 |
"lh-econ-army-victory",
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 929 |
]
|
| 930 |
|
| 931 |
# Consciously NOT spawn-varied, with the reason (keeps the curation
|
|
|
|
| 926 |
# targets so the agent's infantry strike force racks up kills
|
| 927 |
# without being attrited in transit.
|
| 928 |
"lh-econ-army-victory",
|
| 929 |
+
# Wave-8 REASONING capital reallocation pack — SC2 sell mechanic
|
| 930 |
+
# for refund / financial CAPEX reallocation / business continuity
|
| 931 |
+
# asset redeployment anchor. Starting cash alone is below the
|
| 932 |
+
# proc rebuild cost, so the agent MUST sell the exposed central-
|
| 933 |
+
# lane proc to recoup 50% capital (refund 700) and use the refund
|
| 934 |
+
# + on-hand cash to build a fresh proc at the safe target region.
|
| 935 |
+
# Hard tier defines two agent spawn_point groups (NORTH base at
|
| 936 |
+
# y=4 / SOUTH base at y=36) round-robined by seed; the win
|
| 937 |
+
# predicate pairs each safe-region clause with the matching
|
| 938 |
+
# spawn-fact's corner clause, so a memorised "always place at
|
| 939 |
+
# (16, 8)" opening wins NORTH seeds by coincidence but loses
|
| 940 |
+
# SOUTH seeds (NORTH-fact-corner clause unmet AND SOUTH-proc-
|
| 941 |
+
# region clause unmet). The central hunt band is symmetric across
|
| 942 |
+
# y=20 (enemy actors don't honour spawn_point — CLAUDE.md), so
|
| 943 |
+
# both spawns face the same sell-then-rebuild discipline from a
|
| 944 |
+
# flipped base latitude.
|
| 945 |
+
"build-sell-and-rebuild-elsewhere",
|
| 946 |
]
|
| 947 |
|
| 948 |
# Consciously NOT spawn-varied, with the reason (keeps the curation
|