Spaces:
Running
Running
feat(scenario): build-sequence-tech-most-resilient — redundant-prereq tech path survives a strike (PlanBench robust planning anchor)
Browse files
openra_bench/scenarios/packs/build-sequence-tech-most-resilient.yaml
ADDED
|
@@ -0,0 +1,385 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# build-sequence-tech-most-resilient.yaml
|
| 2 |
+
#
|
| 3 |
+
# REASONING capability — Wave-11 robust build-order planning. The
|
| 4 |
+
# agent must REACH a tech capability (a powered war factory) AND
|
| 5 |
+
# KEEP it through a mid-episode strike. The classic resilient-design
|
| 6 |
+
# inversion: a build order that provisions only ONE power plant
|
| 7 |
+
# (`powr`) is a single point of failure — a scheduled enemy strike
|
| 8 |
+
# razes that powr mid-episode and the war factory drops to LOW POWER
|
| 9 |
+
# (the engine's `power_drained > power_provided` gate halves
|
| 10 |
+
# production speed), so the army never completes inside the budget.
|
| 11 |
+
# The resilient build order pre-builds a SECOND, redundant `powr`
|
| 12 |
+
# BEFORE the strike: when the strike razes one plant the other keeps
|
| 13 |
+
# the grid in surplus, production runs at full speed, and the army
|
| 14 |
+
# finishes on the clock.
|
| 15 |
+
#
|
| 16 |
+
# This is the PlanBench "robust planning" idiom — find a plan that
|
| 17 |
+
# achieves the goal AND survives a disturbance — and the classic
|
| 18 |
+
# N+1 resilient-design rule: never run a critical capability on a
|
| 19 |
+
# single point of failure; provision the redundant prerequisite
|
| 20 |
+
# AHEAD of the foreseen failure, not after it.
|
| 21 |
+
#
|
| 22 |
+
# ── ENGINE FACTS verified on the live engine (2026-05-20) ─────────
|
| 23 |
+
#
|
| 24 |
+
# 1. Power model (vendor RA rules, loaded via GameRules::from_ruleset):
|
| 25 |
+
# POWR Power +100 PROC Power -30
|
| 26 |
+
# WEAP Power -30 FIX Power -30 FACT Power 0
|
| 27 |
+
# A `World.compute_player_power` recompute sums every live (not
|
| 28 |
+
# powered-down) building each snapshot. Power budget on the
|
| 29 |
+
# INTENDED path (pre-placed proc -30, fix -30, exposed powr +100;
|
| 30 |
+
# agent builds redundant powr +100, weap -30):
|
| 31 |
+
# before strike: provided 200, drained 90 → surplus +110
|
| 32 |
+
# after strike (one powr razed): provided 100, drained 90 → +10
|
| 33 |
+
# so the resilient base stays in SURPLUS through the strike. On
|
| 34 |
+
# the SINGLE-powr path (no redundant powr) the strike razes the
|
| 35 |
+
# only plant: provided 0, drained 90 → drained > provided → LOW
|
| 36 |
+
# POWER for the rest of the episode.
|
| 37 |
+
# 2. LOW-POWER PRODUCTION SLOWDOWN (world.rs ~L3234): when a player's
|
| 38 |
+
# `power_drained > power_provided`, the production queue advances
|
| 39 |
+
# only on odd ticks — 50% speed. This is the load-bearing teeth:
|
| 40 |
+
# after the strike a single-`powr` base goes low-power and its
|
| 41 |
+
# `2tnk` queue crawls; a two-`powr` base stays in surplus and
|
| 42 |
+
# produces at full speed.
|
| 43 |
+
# 3. `2tnk` (medium tank) costs 850; its build prerequisites are
|
| 44 |
+
# `weap` (war factory) AND `fix` (service depot). `weap`'s
|
| 45 |
+
# prerequisite is `proc` (ore refinery). `proc` and `fix` are
|
| 46 |
+
# pre-placed here, so the agent's build task is purely
|
| 47 |
+
# powr-redundancy + weap + the tank army. (`fix` drains -30
|
| 48 |
+
# power too — folded into the power budget below.)
|
| 49 |
+
# 4. `scheduled_events: destroy_actors` (Wave-9 engine feature,
|
| 50 |
+
# oramap.rs::read_scheduled_events / env.rs::fire_scheduled_events)
|
| 51 |
+
# removes every actor matching `filter:` (owner + optional
|
| 52 |
+
# circular region) when `world_tick >= tick`. Here it razes the
|
| 53 |
+
# ONE pre-placed exposed `powr` at tick 1500 — the mid-episode
|
| 54 |
+
# strike. The region is tight around the exposed powr so it can
|
| 55 |
+
# never catch the agent's redundant powr in the safe west base.
|
| 56 |
+
# 5. `then:` happened-before composite — clause k latches only after
|
| 57 |
+
# clause k-1 has been observed true. `[has_building:powr,
|
| 58 |
+
# has_building:weap]` encodes "power the grid, THEN stand up the
|
| 59 |
+
# factory" (the engine refuses `weap` before its `proc` prereq;
|
| 60 |
+
# `powr` is the grid the whole tech path runs on).
|
| 61 |
+
# 6. `building_count_gte` reads the LIVE `own_buildings` list per
|
| 62 |
+
# frame (NOT the accumulating `own_building_types` set used by
|
| 63 |
+
# `has_building`). `building_count_gte:{powr,1}` toggles FALSE
|
| 64 |
+
# the instant the last live `powr` is razed and only re-satisfies
|
| 65 |
+
# if a redundant `powr` is still standing — this is the redundancy
|
| 66 |
+
# teeth the win predicate hangs on.
|
| 67 |
+
# 7. `place_building` does NOT enforce build-adjacency (CLAUDE.md);
|
| 68 |
+
# a `build('powr') + place_building` chain works at arbitrary
|
| 69 |
+
# in-bounds coords — the redundant powr goes in the safe west
|
| 70 |
+
# base next to the Construction Yard.
|
| 71 |
+
# 8. The persistent unarmed enemy `fact` marker far east keeps the
|
| 72 |
+
# engine all-enemies-eliminated auto-`done` path gated, so a
|
| 73 |
+
# non-winner reaches the deadline as a real LOSS, not a DRAW
|
| 74 |
+
# (CLAUDE.md engine auto-`done` footgun).
|
| 75 |
+
# 9. Tick alignment (CLAUDE.md): max tick ≈ 93 + 90·(max_turns−1).
|
| 76 |
+
# easy max_turns 60 → ceiling 5403 ≥ within_ticks 5400, fail
|
| 77 |
+
# after_ticks 5401 ✓. medium/hard max_turns 50 → ceiling 4503 ≥
|
| 78 |
+
# within_ticks 4500, fail after_ticks 4501 ✓.
|
| 79 |
+
# 10. Hard `spawn_point` rule (CLAUDE.md oramap.rs): ANY agent actor
|
| 80 |
+
# with a spawn_point causes agent actors WITHOUT one to be
|
| 81 |
+
# filtered out — the FULL base (fact + proc + fix + harv +
|
| 82 |
+
# exposed powr) is DUPLICATED across both spawn groups at
|
| 83 |
+
# spawn-matched cells. Enemy / neutral actors do NOT honour
|
| 84 |
+
# spawn_point; the
|
| 85 |
+
# exposed-powr strike region and the strike geometry are
|
| 86 |
+
# duplicated per latitude.
|
| 87 |
+
#
|
| 88 |
+
# ── THE BAR (CLAUDE.md "no defect, no cheat") ─────────────────────
|
| 89 |
+
#
|
| 90 |
+
# • stall (observe only) — builds nothing; the exposed powr is
|
| 91 |
+
# razed at tick 1500, `weap` is never built, `2tnk` count stays
|
| 92 |
+
# 0 → after_ticks LOSS.
|
| 93 |
+
# • single-powr (build weap + spam `2tnk`, never a redundant powr)
|
| 94 |
+
# — relies on the lone exposed powr. The strike razes it at tick
|
| 95 |
+
# 1500; the base drops to low power (drained 60 > provided 0) and
|
| 96 |
+
# the `2tnk` queue runs at 50% for the rest of the episode.
|
| 97 |
+
# `building_count_gte:{powr,1}` is FALSE (no live powr) AND the
|
| 98 |
+
# tank army cannot finish before the deadline → after_ticks
|
| 99 |
+
# LOSS. This is the single-point-of-failure inversion the pack
|
| 100 |
+
# is built to catch.
|
| 101 |
+
# • intended resilient (build a redundant 2nd `powr` in the safe
|
| 102 |
+
# west base BEFORE the strike, build `weap`, produce 3× `2tnk`)
|
| 103 |
+
# — the strike razes the exposed powr but the redundant one
|
| 104 |
+
# survives; the grid stays in surplus, production runs at full
|
| 105 |
+
# speed, the army finishes on the clock. WIN.
|
| 106 |
+
#
|
| 107 |
+
# Real LOSS not DRAW: `fail after_ticks: T+1` is reachable inside
|
| 108 |
+
# max_turns and the enemy `fact` marker blocks the auto-done path.
|
| 109 |
+
#
|
| 110 |
+
# Validate (no model / no network):
|
| 111 |
+
# cd /Users/berta/Projects/OpenRA-Bench && \
|
| 112 |
+
# python3 -m pytest tests/test_build_sequence_tech_most_resilient.py -q
|
| 113 |
+
|
| 114 |
+
meta:
|
| 115 |
+
id: build-sequence-tech-most-resilient
|
| 116 |
+
title: 'Resilient War Factory — Redundant Power Survives a Strike (N+1 Build Order)'
|
| 117 |
+
capability: reasoning
|
| 118 |
+
real_world_meaning: >
|
| 119 |
+
Robust build-order planning: reach a tech capability AND keep it
|
| 120 |
+
through a foreseeable disturbance. The agent must bring a war
|
| 121 |
+
factory online and field an armoured force, but a mid-episode
|
| 122 |
+
enemy strike razes one power plant. A build order that provisions
|
| 123 |
+
only a single power plant is a single point of failure — when the
|
| 124 |
+
strike lands the factory drops to low power and the army never
|
| 125 |
+
completes in time. The resilient build order pre-builds a second,
|
| 126 |
+
redundant power plant before the strike, so the grid stays in
|
| 127 |
+
surplus and production never slows. Tests whether the model plans
|
| 128 |
+
for the disturbance (N+1 redundancy on the critical prerequisite)
|
| 129 |
+
rather than merely planning the shortest path to the goal.
|
| 130 |
+
robotics_analogue: >
|
| 131 |
+
N+1 redundancy on a critical utility. An autonomous production
|
| 132 |
+
cell depends on a power feed to run its assembly machine; a known
|
| 133 |
+
hazard will knock out one feed mid-shift. Resilient planning
|
| 134 |
+
commissions a second, independent feed BEFORE the outage, so the
|
| 135 |
+
assembly machine never drops below rated throughput. Provisioning
|
| 136 |
+
only one feed — the shortest plan to first article — halts the
|
| 137 |
+
line the moment the hazard strikes and blows the delivery
|
| 138 |
+
deadline.
|
| 139 |
+
benchmark_anchor:
|
| 140 |
+
- "PlanBench robust planning"
|
| 141 |
+
- "N+1 resilient design"
|
| 142 |
+
- "redundancy"
|
| 143 |
+
author: openra-bench-wave-11
|
| 144 |
+
|
| 145 |
+
# rush-hour-arena: 128×40, playable bounds (2,2,124,36). Agent base
|
| 146 |
+
# at the WEST (x≈8..20). The inherited exposed `powr` sits forward at
|
| 147 |
+
# the EAST edge of the base; a scheduled strike razes it at tick 1500.
|
| 148 |
+
# The redundant `powr` belongs in the safe west base next to `fact`.
|
| 149 |
+
base_map: rush-hour-arena
|
| 150 |
+
|
| 151 |
+
base:
|
| 152 |
+
agent: {faction: allies}
|
| 153 |
+
# No scripted bot — the only threat is the scripted `destroy_actors`
|
| 154 |
+
# strike on the exposed powr. A hunt bot would turn an N+1 build-
|
| 155 |
+
# order test into a combat-survival test.
|
| 156 |
+
enemy: {faction: soviet, bot_type: ''}
|
| 157 |
+
# Build palette: build + place_building drive the redundant powr
|
| 158 |
+
# and the war factory + tank army; harvest keeps income credible;
|
| 159 |
+
# move_units + stop allow repositioning. No offensive verbs — this
|
| 160 |
+
# is a build-order planning pack.
|
| 161 |
+
tools: [observe, build, place_building, harvest, move_units, stop]
|
| 162 |
+
spawn_mcvs: false
|
| 163 |
+
planning: true
|
| 164 |
+
termination: {max_ticks: 8000}
|
| 165 |
+
actors: []
|
| 166 |
+
|
| 167 |
+
levels:
|
| 168 |
+
# ── EASY ─────────────────────────────────────────────────────────
|
| 169 |
+
# Bare skill: recognise that the inherited exposed power plant is a
|
| 170 |
+
# single point of failure that WILL be razed, pre-build a redundant
|
| 171 |
+
# `powr` in the safe west base, build the war factory, field 3
|
| 172 |
+
# tanks. Generous clock (within_ticks 5400, max_turns 60 → ceiling
|
| 173 |
+
# 5403 ✓). The strike fires at tick 1500.
|
| 174 |
+
easy:
|
| 175 |
+
description: >
|
| 176 |
+
You inherit a partial base — a Construction Yard ('fact'), an
|
| 177 |
+
Ore Refinery ('proc'), a Service Depot ('fix'), an Ore Truck
|
| 178 |
+
('harv') with an ore patch, and ONE Power Plant ('powr'). That
|
| 179 |
+
power plant sits FORWARD at the east edge of your base and is
|
| 180 |
+
EXPOSED: an enemy strike will RAZE it at tick 1500. It is your
|
| 181 |
+
only power. If it is your only power when the strike lands,
|
| 182 |
+
your grid goes negative, your war factory drops to half
|
| 183 |
+
production speed, and your tank army cannot finish in time. To
|
| 184 |
+
stay resilient: build a SECOND Power Plant ('build' "powr",
|
| 185 |
+
cost 300) and place it next to your Construction Yard in the
|
| 186 |
+
safe west base BEFORE tick 1500, build a War Factory ('build'
|
| 187 |
+
"weap", cost 2000), then produce three medium tanks ('build'
|
| 188 |
+
"2tnk", cost 850 each). WIN = you brought power then a war
|
| 189 |
+
factory online, you still own a Power Plant, you have 3 medium
|
| 190 |
+
tanks, and you still own your Construction Yard, before tick
|
| 191 |
+
5400. Stalling, or relying on the single exposed power plant
|
| 192 |
+
with no redundant backup, misses the bar.
|
| 193 |
+
starting_cash: 6000
|
| 194 |
+
overrides:
|
| 195 |
+
actors:
|
| 196 |
+
# ── Safe WEST base ─────────────────────────────────────────
|
| 197 |
+
- {type: fact, owner: agent, position: [8, 18]}
|
| 198 |
+
- {type: proc, owner: agent, position: [12, 18]}
|
| 199 |
+
# Service Depot — the `2tnk` build prerequisite (alongside
|
| 200 |
+
# `weap`). Pre-placed so the agent's build task is purely the
|
| 201 |
+
# redundant powr + war factory + tank army.
|
| 202 |
+
- {type: fix, owner: agent, position: [16, 18]}
|
| 203 |
+
# Income (credible, not load-bearing for the win predicate).
|
| 204 |
+
- {type: harv, owner: agent, position: [12, 22]}
|
| 205 |
+
- {type: mine, owner: neutral, position: [20, 22]}
|
| 206 |
+
# ── The EXPOSED single-point-of-failure Power Plant ────────
|
| 207 |
+
# Forward at the east edge of the base. Razed by the tick-1500
|
| 208 |
+
# `destroy_actors` strike. Far enough from the west base that
|
| 209 |
+
# the strike region can never catch a redundant powr placed
|
| 210 |
+
# next to the Construction Yard.
|
| 211 |
+
- {type: powr, owner: agent, position: [40, 18]}
|
| 212 |
+
# Persistent far enemy marker — LOSS-not-DRAW guarantee.
|
| 213 |
+
- {type: fact, owner: enemy, position: [115, 30]}
|
| 214 |
+
scheduled_events:
|
| 215 |
+
# Mid-episode strike: raze the exposed forward Power Plant at
|
| 216 |
+
# tick 1500. The region is a tight circle around (40,18); the
|
| 217 |
+
# safe west base (x≈8..16) is ~24 cells away, well outside
|
| 218 |
+
# radius 6, so a redundant powr there is never caught.
|
| 219 |
+
- tick: 1500
|
| 220 |
+
type: destroy_actors
|
| 221 |
+
filter:
|
| 222 |
+
owner: agent
|
| 223 |
+
region: {x: 40, y: 18, radius: 6}
|
| 224 |
+
win_condition:
|
| 225 |
+
all_of:
|
| 226 |
+
# Reach the tech capability IN ORDER: power the grid, then
|
| 227 |
+
# stand up the war factory.
|
| 228 |
+
- then:
|
| 229 |
+
id: bsr-easy
|
| 230 |
+
clauses:
|
| 231 |
+
- {has_building: powr}
|
| 232 |
+
- {has_building: weap}
|
| 233 |
+
# ≥1 Power Plant ALIVE at end — FALSE after the strike unless
|
| 234 |
+
# a redundant powr was built (the redundancy teeth).
|
| 235 |
+
- building_count_gte: {type: powr, n: 1}
|
| 236 |
+
# 3 medium tanks — only reachable at FULL production speed,
|
| 237 |
+
# i.e. with the grid kept in surplus through the strike.
|
| 238 |
+
- unit_type_count_gte: {type: '2tnk', n: 3}
|
| 239 |
+
# Construction Yard alive (mirrors the fail clause).
|
| 240 |
+
- building_count_gte: {type: fact, n: 1}
|
| 241 |
+
- within_ticks: 5400
|
| 242 |
+
fail_condition:
|
| 243 |
+
any_of:
|
| 244 |
+
- after_ticks: 5401
|
| 245 |
+
- not: {building_count_gte: {type: fact, n: 1}}
|
| 246 |
+
max_turns: 60
|
| 247 |
+
|
| 248 |
+
# ── MEDIUM ───────────────────────────────────────────────────────
|
| 249 |
+
# +1 controlled variable: the clock tightens (within_ticks 4500,
|
| 250 |
+
# max_turns 50 → ceiling 4503 ✓). The resilient N+1 build order
|
| 251 |
+
# still wins comfortably, but a hesitant opening that dallies
|
| 252 |
+
# before committing the redundant powr now risks the deadline. The
|
| 253 |
+
# single-point-of-failure failure modes lose exactly as on easy.
|
| 254 |
+
medium:
|
| 255 |
+
description: >
|
| 256 |
+
Same inherited base as easy — a Construction Yard, an Ore
|
| 257 |
+
Refinery, a Service Depot, an Ore Truck with an ore patch, and
|
| 258 |
+
ONE EXPOSED Power Plant forward at the east edge that an enemy
|
| 259 |
+
strike will RAZE at tick 1500. Build a SECOND Power Plant
|
| 260 |
+
('build' "powr", 300) in the safe west base next to your
|
| 261 |
+
Construction Yard BEFORE tick 1500, build a War Factory
|
| 262 |
+
('build' "weap", 2000), then produce three medium tanks
|
| 263 |
+
('build' "2tnk", 850 each).
|
| 264 |
+
The deadline is tighter — tick 4500 — so commit the redundant
|
| 265 |
+
power plant early; do not wait for the exposed one to fall. If
|
| 266 |
+
the strike leaves you with no power, the war factory halves its
|
| 267 |
+
output and the army misses the clock. WIN = you brought power
|
| 268 |
+
then a war factory online, you still own a Power Plant, you
|
| 269 |
+
have 3 medium tanks, and you still own your Construction Yard,
|
| 270 |
+
before tick 4500. Stalling, or relying on the single exposed
|
| 271 |
+
power plant, misses the bar.
|
| 272 |
+
starting_cash: 6000
|
| 273 |
+
overrides:
|
| 274 |
+
actors:
|
| 275 |
+
- {type: fact, owner: agent, position: [8, 18]}
|
| 276 |
+
- {type: proc, owner: agent, position: [12, 18]}
|
| 277 |
+
- {type: fix, owner: agent, position: [16, 18]}
|
| 278 |
+
- {type: harv, owner: agent, position: [12, 22]}
|
| 279 |
+
- {type: mine, owner: neutral, position: [20, 22]}
|
| 280 |
+
- {type: powr, owner: agent, position: [40, 18]}
|
| 281 |
+
- {type: fact, owner: enemy, position: [115, 30]}
|
| 282 |
+
scheduled_events:
|
| 283 |
+
- tick: 1500
|
| 284 |
+
type: destroy_actors
|
| 285 |
+
filter:
|
| 286 |
+
owner: agent
|
| 287 |
+
region: {x: 40, y: 18, radius: 6}
|
| 288 |
+
win_condition:
|
| 289 |
+
all_of:
|
| 290 |
+
- then:
|
| 291 |
+
id: bsr-medium
|
| 292 |
+
clauses:
|
| 293 |
+
- {has_building: powr}
|
| 294 |
+
- {has_building: weap}
|
| 295 |
+
- building_count_gte: {type: powr, n: 1}
|
| 296 |
+
- unit_type_count_gte: {type: '2tnk', n: 3}
|
| 297 |
+
- building_count_gte: {type: fact, n: 1}
|
| 298 |
+
- within_ticks: 4500
|
| 299 |
+
fail_condition:
|
| 300 |
+
any_of:
|
| 301 |
+
- after_ticks: 4501
|
| 302 |
+
- not: {building_count_gte: {type: fact, n: 1}}
|
| 303 |
+
max_turns: 50
|
| 304 |
+
|
| 305 |
+
# ── HARD ─────────────────────────────────────────────────────────
|
| 306 |
+
# +1 controlled variable on top of medium: TWO seed-driven AGENT
|
| 307 |
+
# spawn_point groups (NORTH base y=12 / SOUTH base y=26) round-
|
| 308 |
+
# robined by seed. Per CLAUDE.md `spawn_point` rules: ANY agent
|
| 309 |
+
# actor with spawn_point ⇒ agent actors WITHOUT one are filtered
|
| 310 |
+
# out, so the FULL base (fact + proc + fix + harv + exposed powr)
|
| 311 |
+
# is DUPLICATED across both spawn groups at spawn-matched cells.
|
| 312 |
+
# Enemy / neutral actors do NOT honour spawn_point; the strike
|
| 313 |
+
# region is duplicated per latitude (a `destroy_actors` whose
|
| 314 |
+
# region misses the active base simply removes nothing). A
|
| 315 |
+
# memorised "place the redundant powr at (11,18)" opening cannot
|
| 316 |
+
# generalise — the agent must read the actual Construction Yard
|
| 317 |
+
# latitude and place the redundant power plant beside it.
|
| 318 |
+
hard:
|
| 319 |
+
description: >
|
| 320 |
+
Same N+1 build-order task as medium (one EXPOSED Power Plant
|
| 321 |
+
forward at the east edge that an enemy strike razes at tick
|
| 322 |
+
1500, $6000, tick 4500 deadline) but your base may begin in
|
| 323 |
+
the NORTH band (y≈12) OR the SOUTH band (y≈26) of the map
|
| 324 |
+
depending on the seed. Read the Construction Yard's actual
|
| 325 |
+
position from the observation and place the redundant Power
|
| 326 |
+
Plant beside it in the safe west base BEFORE tick 1500; build a
|
| 327 |
+
War Factory; then produce three medium tanks. A memorised
|
| 328 |
+
placement cell will mis-place out of build radius on one of the
|
| 329 |
+
two spawns. WIN = you brought power then a war factory online,
|
| 330 |
+
you still own a Power Plant, you have 3 medium tanks, and you
|
| 331 |
+
still own your Construction Yard, before tick 4500. The same
|
| 332 |
+
single-point-of-failure plays — stalling, or relying on the
|
| 333 |
+
lone exposed power plant — lose as on medium.
|
| 334 |
+
starting_cash: 6000
|
| 335 |
+
overrides:
|
| 336 |
+
actors:
|
| 337 |
+
# ── SPAWN 0 (NORTH base, y=12) ─────────────────────────────
|
| 338 |
+
- {type: fact, owner: agent, position: [8, 12], spawn_point: 0}
|
| 339 |
+
- {type: proc, owner: agent, position: [12, 12], spawn_point: 0}
|
| 340 |
+
- {type: fix, owner: agent, position: [16, 12], spawn_point: 0}
|
| 341 |
+
- {type: harv, owner: agent, position: [12, 16], spawn_point: 0}
|
| 342 |
+
- {type: powr, owner: agent, position: [40, 12], spawn_point: 0}
|
| 343 |
+
# ── SPAWN 1 (SOUTH base, y=26) ─────────────────────────────
|
| 344 |
+
- {type: fact, owner: agent, position: [8, 26], spawn_point: 1}
|
| 345 |
+
- {type: proc, owner: agent, position: [12, 26], spawn_point: 1}
|
| 346 |
+
- {type: fix, owner: agent, position: [16, 26], spawn_point: 1}
|
| 347 |
+
- {type: harv, owner: agent, position: [12, 30], spawn_point: 1}
|
| 348 |
+
- {type: powr, owner: agent, position: [40, 26], spawn_point: 1}
|
| 349 |
+
# Neutral ore patches — one per latitude (neutral actors
|
| 350 |
+
# ignore the spawn_point filter, like enemy actors).
|
| 351 |
+
- {type: mine, owner: neutral, position: [20, 16]}
|
| 352 |
+
- {type: mine, owner: neutral, position: [20, 30]}
|
| 353 |
+
# Persistent far enemy marker — LOSS-not-DRAW guarantee.
|
| 354 |
+
- {type: fact, owner: enemy, position: [115, 33]}
|
| 355 |
+
scheduled_events:
|
| 356 |
+
# Strike regions DUPLICATED per latitude (enemy/neutral and
|
| 357 |
+
# scheduled events do not honour spawn_point). The region
|
| 358 |
+
# matching the dormant latitude removes nothing; the one
|
| 359 |
+
# matching the active base razes its exposed powr.
|
| 360 |
+
- tick: 1500
|
| 361 |
+
type: destroy_actors
|
| 362 |
+
filter:
|
| 363 |
+
owner: agent
|
| 364 |
+
region: {x: 40, y: 12, radius: 6}
|
| 365 |
+
- tick: 1500
|
| 366 |
+
type: destroy_actors
|
| 367 |
+
filter:
|
| 368 |
+
owner: agent
|
| 369 |
+
region: {x: 40, y: 26, radius: 6}
|
| 370 |
+
win_condition:
|
| 371 |
+
all_of:
|
| 372 |
+
- then:
|
| 373 |
+
id: bsr-hard
|
| 374 |
+
clauses:
|
| 375 |
+
- {has_building: powr}
|
| 376 |
+
- {has_building: weap}
|
| 377 |
+
- building_count_gte: {type: powr, n: 1}
|
| 378 |
+
- unit_type_count_gte: {type: '2tnk', n: 3}
|
| 379 |
+
- building_count_gte: {type: fact, n: 1}
|
| 380 |
+
- within_ticks: 4500
|
| 381 |
+
fail_condition:
|
| 382 |
+
any_of:
|
| 383 |
+
- after_ticks: 4501
|
| 384 |
+
- not: {building_count_gte: {type: fact, n: 1}}
|
| 385 |
+
max_turns: 50
|
tests/test_build_sequence_tech_most_resilient.py
ADDED
|
@@ -0,0 +1,352 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""build-sequence-tech-most-resilient pack — full no-cheat validation.
|
| 2 |
+
|
| 3 |
+
Wave-11 REASONING — robust build-order planning. The agent must REACH
|
| 4 |
+
a tech capability (a powered war factory) AND KEEP it through a
|
| 5 |
+
mid-episode strike. A scheduled `destroy_actors` event razes the one
|
| 6 |
+
exposed power plant at tick 1500. A build order that provisions only
|
| 7 |
+
ONE `powr` is a single point of failure — when the strike lands the
|
| 8 |
+
grid goes negative, the war factory drops to 50% production speed
|
| 9 |
+
(engine low-power slowdown) and `building_count_gte:{powr,1}` is
|
| 10 |
+
FALSE for the rest of the episode. The resilient build order
|
| 11 |
+
pre-builds a SECOND, redundant `powr` in the safe west base BEFORE
|
| 12 |
+
the strike: one plant survives, the grid stays in surplus, the army
|
| 13 |
+
finishes on the clock.
|
| 14 |
+
|
| 15 |
+
Bar (CLAUDE.md "no defect, no cheat"):
|
| 16 |
+
- stall (observe only) ⇒ LOSS on every (level, seed)
|
| 17 |
+
- single-powr (no redundant powr) ⇒ LOSS on every (level, seed)
|
| 18 |
+
- intended resilient (redundant ⇒ WIN on every (level, seed)
|
| 19 |
+
2nd powr, then weap, then 3×2tnk)
|
| 20 |
+
Real LOSS not DRAW: `fail after_ticks:T+1` reachable inside
|
| 21 |
+
max_turns; the persistent far enemy `fact` blocks the engine
|
| 22 |
+
auto-done path.
|
| 23 |
+
|
| 24 |
+
Scenario shape:
|
| 25 |
+
- rush-hour-arena, allies vs soviet (bot disabled).
|
| 26 |
+
- easy: within_ticks 5400, max_turns 60 — generous.
|
| 27 |
+
- medium: within_ticks 4500, max_turns 50 — tighter clock.
|
| 28 |
+
- hard: within_ticks 4500, max_turns 50 — +2 spawn_point groups
|
| 29 |
+
(NORTH base y=12 / SOUTH base y=26, round-robined).
|
| 30 |
+
|
| 31 |
+
Measured (seed 1, scripted policies): the intended resilient policy
|
| 32 |
+
WINS at ~tick 3243 on every level; stall and single-powr LOSE on the
|
| 33 |
+
deadline.
|
| 34 |
+
"""
|
| 35 |
+
|
| 36 |
+
from __future__ import annotations
|
| 37 |
+
|
| 38 |
+
import pytest
|
| 39 |
+
|
| 40 |
+
pytest.importorskip("openra_train", reason="Rust env wheel not installed")
|
| 41 |
+
|
| 42 |
+
from openra_bench.eval_core import run_level
|
| 43 |
+
from openra_bench.scenarios import load_pack
|
| 44 |
+
from openra_bench.scenarios.loader import PACKS_DIR, compile_level
|
| 45 |
+
|
| 46 |
+
PACK = PACKS_DIR / "build-sequence-tech-most-resilient.yaml"
|
| 47 |
+
LEVELS = ("easy", "medium", "hard")
|
| 48 |
+
SEEDS = (1, 2, 3, 4)
|
| 49 |
+
|
| 50 |
+
|
| 51 |
+
# ── Policies ──────────────────────────────────────────────────────
|
| 52 |
+
|
| 53 |
+
|
| 54 |
+
def _stall_policy():
|
| 55 |
+
"""Do nothing — must LOSE on the clock on every level/seed."""
|
| 56 |
+
def pol(obs, Cmd):
|
| 57 |
+
return [Cmd.observe()]
|
| 58 |
+
return pol
|
| 59 |
+
|
| 60 |
+
|
| 61 |
+
def _single_powr_policy():
|
| 62 |
+
"""Single-point-of-failure play: build the war factory and spam
|
| 63 |
+
`2tnk`, but NEVER build a redundant power plant. The lone exposed
|
| 64 |
+
`powr` is razed at tick 1500 → the grid goes negative → 50%
|
| 65 |
+
production AND `building_count_gte:{powr,1}` is false → LOSS."""
|
| 66 |
+
ms = {"weap": False}
|
| 67 |
+
|
| 68 |
+
def pol(obs, Cmd):
|
| 69 |
+
ob = obs.get("own_buildings", []) or []
|
| 70 |
+
own = {b["type"] for b in ob}
|
| 71 |
+
prod = obs.get("production", []) or []
|
| 72 |
+
base = [b for b in ob if b["type"] == "fact"]
|
| 73 |
+
cmds = []
|
| 74 |
+
if "weap" in own:
|
| 75 |
+
ms["weap"] = True
|
| 76 |
+
if not ms["weap"]:
|
| 77 |
+
if "weap" not in prod:
|
| 78 |
+
cmds.append(Cmd.build("weap"))
|
| 79 |
+
if base:
|
| 80 |
+
cmds.append(Cmd.place_building(
|
| 81 |
+
"weap", base[0]["cell_x"] + 5, base[0]["cell_y"]
|
| 82 |
+
))
|
| 83 |
+
else:
|
| 84 |
+
if "2tnk" not in prod:
|
| 85 |
+
cmds.append(Cmd.build("2tnk"))
|
| 86 |
+
return cmds or [Cmd.observe()]
|
| 87 |
+
return pol
|
| 88 |
+
|
| 89 |
+
|
| 90 |
+
def _intended_policy():
|
| 91 |
+
"""Resilient N+1 build order: build a redundant `powr` in the safe
|
| 92 |
+
west base (placed relative to the actual Construction Yard so it
|
| 93 |
+
generalises across the hard-tier spawn variation), then the war
|
| 94 |
+
factory, then 3× `2tnk`. Must WIN on every (level, seed)."""
|
| 95 |
+
ms = {"powr2": False, "weap": False}
|
| 96 |
+
|
| 97 |
+
def pol(obs, Cmd):
|
| 98 |
+
ob = obs.get("own_buildings", []) or []
|
| 99 |
+
prod = obs.get("production", []) or []
|
| 100 |
+
base = [b for b in ob if b["type"] == "fact"]
|
| 101 |
+
# The redundant powr lives in the safe west base (x<30);
|
| 102 |
+
# the exposed inherited powr sits forward at x=40.
|
| 103 |
+
safe_powr = [
|
| 104 |
+
b for b in ob if b["type"] == "powr" and b["cell_x"] < 30
|
| 105 |
+
]
|
| 106 |
+
weap_b = [b for b in ob if b["type"] == "weap"]
|
| 107 |
+
cmds = []
|
| 108 |
+
if safe_powr:
|
| 109 |
+
ms["powr2"] = True
|
| 110 |
+
if weap_b:
|
| 111 |
+
ms["weap"] = True
|
| 112 |
+
if not ms["powr2"]:
|
| 113 |
+
if "powr" not in prod:
|
| 114 |
+
cmds.append(Cmd.build("powr"))
|
| 115 |
+
if base:
|
| 116 |
+
cmds.append(Cmd.place_building(
|
| 117 |
+
"powr", base[0]["cell_x"] + 3, base[0]["cell_y"] + 4
|
| 118 |
+
))
|
| 119 |
+
elif not ms["weap"]:
|
| 120 |
+
if "weap" not in prod:
|
| 121 |
+
cmds.append(Cmd.build("weap"))
|
| 122 |
+
if base:
|
| 123 |
+
cmds.append(Cmd.place_building(
|
| 124 |
+
"weap", base[0]["cell_x"] + 6, base[0]["cell_y"]
|
| 125 |
+
))
|
| 126 |
+
else:
|
| 127 |
+
if "2tnk" not in prod:
|
| 128 |
+
cmds.append(Cmd.build("2tnk"))
|
| 129 |
+
return cmds or [Cmd.observe()]
|
| 130 |
+
return pol
|
| 131 |
+
|
| 132 |
+
|
| 133 |
+
# ── Pack-shape tests (cheap; do not run the engine) ──────────────
|
| 134 |
+
|
| 135 |
+
|
| 136 |
+
def test_pack_compiles_with_three_levels():
|
| 137 |
+
pack = load_pack(PACK)
|
| 138 |
+
assert pack.meta.id == "build-sequence-tech-most-resilient"
|
| 139 |
+
assert pack.meta.capability == "reasoning"
|
| 140 |
+
assert set(pack.levels) == {"easy", "medium", "hard"}
|
| 141 |
+
|
| 142 |
+
|
| 143 |
+
def test_meta_benchmark_anchor_set():
|
| 144 |
+
"""meta.benchmark_anchor must cite PlanBench robust planning,
|
| 145 |
+
N+1 resilient design and redundancy (the seed taxonomy)."""
|
| 146 |
+
pack = load_pack(PACK)
|
| 147 |
+
anchors = pack.meta.benchmark_anchor or []
|
| 148 |
+
assert any("PlanBench" in a for a in anchors), anchors
|
| 149 |
+
assert any("N+1" in a or "resilient" in a for a in anchors), anchors
|
| 150 |
+
assert any("redundancy" in a.lower() for a in anchors), anchors
|
| 151 |
+
|
| 152 |
+
|
| 153 |
+
def test_every_level_has_fail_condition():
|
| 154 |
+
"""No silent draws — every level must be able to emit a LOSS."""
|
| 155 |
+
pack = load_pack(PACK)
|
| 156 |
+
for lvl in LEVELS:
|
| 157 |
+
c = compile_level(pack, lvl)
|
| 158 |
+
assert c.fail_condition is not None, f"{lvl} missing fail_condition"
|
| 159 |
+
|
| 160 |
+
|
| 161 |
+
def test_then_composite_used_in_win():
|
| 162 |
+
"""The win must wire the powr→weap happened-before chain — the
|
| 163 |
+
'reach the tech capability in order' clause."""
|
| 164 |
+
for lvl in LEVELS:
|
| 165 |
+
c = compile_level(load_pack(PACK), lvl)
|
| 166 |
+
win = c.win_condition.model_dump(exclude_none=True)
|
| 167 |
+
inner = win.get("all_of") or []
|
| 168 |
+
then = next((cl["then"] for cl in inner if "then" in cl), None)
|
| 169 |
+
assert then is not None, f"{lvl} win missing then-chain: {win}"
|
| 170 |
+
clauses = then.get("clauses") or []
|
| 171 |
+
assert len(clauses) == 2, (
|
| 172 |
+
f"{lvl} then-chain must be powr→weap (2 clauses); got {clauses}"
|
| 173 |
+
)
|
| 174 |
+
assert clauses[0].get("has_building") == "powr"
|
| 175 |
+
assert clauses[1].get("has_building") == "weap"
|
| 176 |
+
|
| 177 |
+
|
| 178 |
+
def test_win_requires_surviving_powr_three_tanks_and_fact():
|
| 179 |
+
"""Structural: the win clause must require a LIVE Power Plant
|
| 180 |
+
(`building_count_gte:{powr,1}` — the redundancy teeth that toggle
|
| 181 |
+
FALSE when the exposed powr is razed), three medium tanks
|
| 182 |
+
(`unit_type_count_gte:{2tnk,3}`), a live Construction Yard, and a
|
| 183 |
+
`within_ticks` deadline. `building_count_gte` (live-list) — NOT
|
| 184 |
+
`has_building` (accumulating set) — is mandatory for the powr
|
| 185 |
+
clause so it toggles false on the strike."""
|
| 186 |
+
for lvl in LEVELS:
|
| 187 |
+
c = compile_level(load_pack(PACK), lvl)
|
| 188 |
+
all_of = c.win_condition.model_dump(exclude_none=True).get("all_of", [])
|
| 189 |
+
powr = next(
|
| 190 |
+
(x["building_count_gte"] for x in all_of
|
| 191 |
+
if "building_count_gte" in x
|
| 192 |
+
and (x["building_count_gte"] or {}).get("type") == "powr"),
|
| 193 |
+
None,
|
| 194 |
+
)
|
| 195 |
+
assert powr is not None and int(powr.get("n", 0)) >= 1, (
|
| 196 |
+
f"{lvl}: win must require building_count_gte powr>=1 "
|
| 197 |
+
f"(a live power plant survives the strike)"
|
| 198 |
+
)
|
| 199 |
+
tanks = next(
|
| 200 |
+
(x["unit_type_count_gte"] for x in all_of
|
| 201 |
+
if "unit_type_count_gte" in x
|
| 202 |
+
and (x["unit_type_count_gte"] or {}).get("type") == "2tnk"),
|
| 203 |
+
None,
|
| 204 |
+
)
|
| 205 |
+
assert tanks is not None and int(tanks.get("n", 0)) >= 3, (
|
| 206 |
+
f"{lvl}: win must require unit_type_count_gte 2tnk>=3"
|
| 207 |
+
)
|
| 208 |
+
fact = next(
|
| 209 |
+
(x["building_count_gte"] for x in all_of
|
| 210 |
+
if "building_count_gte" in x
|
| 211 |
+
and (x["building_count_gte"] or {}).get("type") == "fact"),
|
| 212 |
+
None,
|
| 213 |
+
)
|
| 214 |
+
assert fact is not None and int(fact.get("n", 0)) >= 1, (
|
| 215 |
+
f"{lvl}: win must require building_count_gte fact>=1"
|
| 216 |
+
)
|
| 217 |
+
assert any("within_ticks" in x for x in all_of), (
|
| 218 |
+
f"{lvl}: win must include a within_ticks deadline"
|
| 219 |
+
)
|
| 220 |
+
|
| 221 |
+
|
| 222 |
+
def test_tick_budget_aligned_with_max_turns():
|
| 223 |
+
"""within_ticks must be reachable inside max_turns and the fail
|
| 224 |
+
`after_ticks` must equal within_ticks+1 (real LOSS, no draw, no
|
| 225 |
+
overlap). Engine advances ~90 ticks/turn → reachable = 93 +
|
| 226 |
+
90·(max_turns-1)."""
|
| 227 |
+
pack = load_pack(PACK)
|
| 228 |
+
for lvl in LEVELS:
|
| 229 |
+
c = compile_level(pack, lvl)
|
| 230 |
+
reachable = 93 + 90 * (c.max_turns - 1)
|
| 231 |
+
all_of = c.win_condition.model_dump(exclude_none=True).get("all_of", [])
|
| 232 |
+
wt = next(int(x["within_ticks"]) for x in all_of if "within_ticks" in x)
|
| 233 |
+
assert wt <= reachable, (
|
| 234 |
+
f"{lvl}: within_ticks={wt} > reachable={reachable} "
|
| 235 |
+
f"(max_turns={c.max_turns}) — deadline never bites"
|
| 236 |
+
)
|
| 237 |
+
fail = c.fail_condition.model_dump(exclude_none=True)
|
| 238 |
+
after = next(
|
| 239 |
+
int(x["after_ticks"]) for x in fail["any_of"] if "after_ticks" in x
|
| 240 |
+
)
|
| 241 |
+
assert after <= reachable, (
|
| 242 |
+
f"{lvl}: fail after_ticks {after} unreachable within "
|
| 243 |
+
f"{c.max_turns} turns (max {reachable}) — draw degeneracy"
|
| 244 |
+
)
|
| 245 |
+
assert after == wt + 1, (
|
| 246 |
+
f"{lvl}: after_ticks {after} must equal within_ticks+1 ({wt+1})"
|
| 247 |
+
)
|
| 248 |
+
|
| 249 |
+
|
| 250 |
+
def test_exactly_one_exposed_powr_pre_placed():
|
| 251 |
+
"""The single-point-of-failure premise: each tier pre-places
|
| 252 |
+
EXACTLY ONE agent `powr` (the exposed forward plant). The
|
| 253 |
+
redundant second power plant must be BUILT by the agent — it is
|
| 254 |
+
not given. Hard duplicates the base across two spawn groups, so
|
| 255 |
+
each spawn group still ships exactly one exposed powr."""
|
| 256 |
+
for lvl in LEVELS:
|
| 257 |
+
c = compile_level(load_pack(PACK), lvl)
|
| 258 |
+
powrs = [
|
| 259 |
+
a for a in c.scenario.actors
|
| 260 |
+
if a.owner == "agent" and a.type == "powr"
|
| 261 |
+
]
|
| 262 |
+
if lvl == "hard":
|
| 263 |
+
per_spawn = {}
|
| 264 |
+
for a in powrs:
|
| 265 |
+
sp = a.spawn_point if a.spawn_point is not None else 0
|
| 266 |
+
per_spawn[sp] = per_spawn.get(sp, 0) + 1
|
| 267 |
+
assert per_spawn and all(v == 1 for v in per_spawn.values()), (
|
| 268 |
+
f"hard: each spawn group must pre-place exactly one "
|
| 269 |
+
f"exposed powr; got {per_spawn}"
|
| 270 |
+
)
|
| 271 |
+
else:
|
| 272 |
+
assert len(powrs) == 1, (
|
| 273 |
+
f"{lvl}: must pre-place exactly one exposed agent powr; "
|
| 274 |
+
f"got {len(powrs)}"
|
| 275 |
+
)
|
| 276 |
+
|
| 277 |
+
|
| 278 |
+
def test_scheduled_destroy_event_razes_the_exposed_powr():
|
| 279 |
+
"""Each tier must declare a `scheduled_events: destroy_actors`
|
| 280 |
+
that fires mid-episode (before the deadline) on the agent, with a
|
| 281 |
+
region tight around the exposed forward powr (x≈40) so it can
|
| 282 |
+
never catch a redundant powr placed in the safe west base."""
|
| 283 |
+
for lvl in LEVELS:
|
| 284 |
+
c = compile_level(load_pack(PACK), lvl)
|
| 285 |
+
evs = c.scheduled_events or []
|
| 286 |
+
destroys = [e for e in evs if e.get("type") == "destroy_actors"]
|
| 287 |
+
assert destroys, f"{lvl}: needs a destroy_actors scheduled event"
|
| 288 |
+
for e in destroys:
|
| 289 |
+
assert e["filter"]["owner"] == "agent"
|
| 290 |
+
reg = e["filter"]["region"]
|
| 291 |
+
assert reg["x"] == 40, (
|
| 292 |
+
f"{lvl}: strike region must be centred on the exposed "
|
| 293 |
+
f"forward powr at x=40; got {reg}"
|
| 294 |
+
)
|
| 295 |
+
assert e["tick"] < 4500, (
|
| 296 |
+
f"{lvl}: strike must fire mid-episode (before the "
|
| 297 |
+
f"deadline); got tick {e['tick']}"
|
| 298 |
+
)
|
| 299 |
+
|
| 300 |
+
|
| 301 |
+
def test_hard_tier_has_seed_driven_spawn_groups():
|
| 302 |
+
"""Hard must define >=2 agent spawn_point groups so the seed
|
| 303 |
+
varies the start base (tests/test_hard_tier.py::UPGRADED)."""
|
| 304 |
+
c = compile_level(load_pack(PACK), "hard")
|
| 305 |
+
sp = {a.spawn_point for a in c.scenario.actors if a.owner == "agent"}
|
| 306 |
+
assert len(sp) >= 2, f"hard needs >=2 spawn groups, got {sp}"
|
| 307 |
+
|
| 308 |
+
|
| 309 |
+
# ── Engine-bound tests (parameterised over seeds 1..4) ────────────
|
| 310 |
+
|
| 311 |
+
|
| 312 |
+
@pytest.mark.parametrize("seed", SEEDS)
|
| 313 |
+
@pytest.mark.parametrize("level", LEVELS)
|
| 314 |
+
def test_intended_resilient_policy_wins(level, seed):
|
| 315 |
+
"""The intended resilient play (redundant 2nd powr → weap → 3×
|
| 316 |
+
2tnk) must WIN on every (level, seed). The load-bearing test that
|
| 317 |
+
the pack is solvable inside the budget by the advertised
|
| 318 |
+
robust-planning capability."""
|
| 319 |
+
c = compile_level(load_pack(PACK), level)
|
| 320 |
+
res = run_level(c, _intended_policy(), seed=seed)
|
| 321 |
+
assert res.outcome == "win", (
|
| 322 |
+
f"intended resilient must WIN on {level} s={seed}; got "
|
| 323 |
+
f"{res.outcome} (tick={res.signals.game_tick}, "
|
| 324 |
+
f"buildings={sorted(res.signals.own_building_types)})"
|
| 325 |
+
)
|
| 326 |
+
|
| 327 |
+
|
| 328 |
+
@pytest.mark.parametrize("seed", SEEDS)
|
| 329 |
+
@pytest.mark.parametrize("level", LEVELS)
|
| 330 |
+
def test_stall_policy_loses(level, seed):
|
| 331 |
+
"""A stall (observe-only) builds nothing — the exposed powr is
|
| 332 |
+
razed, no weap, no tanks → must LOSE on every (level, seed)."""
|
| 333 |
+
c = compile_level(load_pack(PACK), level)
|
| 334 |
+
res = run_level(c, _stall_policy(), seed=seed)
|
| 335 |
+
assert res.outcome == "loss", (
|
| 336 |
+
f"stall must LOSE on {level} s={seed}; got {res.outcome}"
|
| 337 |
+
)
|
| 338 |
+
|
| 339 |
+
|
| 340 |
+
@pytest.mark.parametrize("seed", SEEDS)
|
| 341 |
+
@pytest.mark.parametrize("level", LEVELS)
|
| 342 |
+
def test_single_powr_policy_loses(level, seed):
|
| 343 |
+
"""The single-point-of-failure play — build the war factory and
|
| 344 |
+
produce tanks but NEVER a redundant power plant — must LOSE on
|
| 345 |
+
every (level, seed): the strike razes the lone powr, so
|
| 346 |
+
`building_count_gte:{powr,1}` is false at the deadline."""
|
| 347 |
+
c = compile_level(load_pack(PACK), level)
|
| 348 |
+
res = run_level(c, _single_powr_policy(), seed=seed)
|
| 349 |
+
assert res.outcome == "loss", (
|
| 350 |
+
f"single-powr (no redundancy) must LOSE on {level} s={seed}; "
|
| 351 |
+
f"got {res.outcome} (tick={res.signals.game_tick})"
|
| 352 |
+
)
|