yxc20098 commited on
Commit
7450894
·
1 Parent(s): 5cd2d8e

feat(scenario): combat-skirmish-then-disengage — strike then disengage (SC2 skirmisher / military recon-by-fire anchor)

Browse files

Wave-6 combat-micro pack: ONE coordinated engagement done well — drive
east, score >=3 kills against a slow infantry cluster, then PULL BACK
to the spawn-corner recovery zone before the deadline. Distinct from
combat-harass-balanced-hit-and-run (which is the CYCLIC pulsed variant
with a zero-attrition bar): this pack is one big engagement with a
positional/temporal recovery bar.

Bar (all four-policy proxies, every level + every hard seed 1..4):
* stall (only observe) -> LOSS (kill bar unmet;
jeeps stance:0 so no auto-return-fire; on hard the hunt-bot e1
wipe the idle stack)
* never-engage (park at start) -> LOSS (recovery clause
trivially satisfied but kill bar unmet)
* commit-until-overwhelmed (charge & stay) -> LOSS (kill bar IS met
but jeeps end at the kill site x~50, not in the recovery region
around the spawn corner; region clause fails -> after_ticks LOSS)
* intended skirmish-then-disengage -> WIN on every seed
(kill bar met inside ~14 turns, then disengage to spawn corner
finishes inside the 4500-tick budget)

Win predicate (all levels):
units_killed_gte:3 AND own_units_gte:3 AND
units_in_region_gte:{x:5,y:<spawn>,radius:6,n:3} AND within_ticks:4500
Hard recovery clause is any_of over the two spawn-corner regions
(NORTH (5,10) or SOUTH (5,30)) — agent must return to its OWN start
corner.

Difficulty axis:
easy -> 4x e1 cluster at (50,20), no bot
medium -> 6x e1 cluster (same kill bar; the extra rifles tighten the
commit-and-stay failure mode by stretching the mop-up
window past the disengage budget)
hard -> 6x e1 cluster + bot_type:hunt (active pursuit) + 2 agent
spawn_point groups round-robined by seed (anti-memorisation)

UPGRADED in tests/test_hard_tier.py (>=2 distinct seed-driven spawn
groups verified). 18 scripted-policy tests pass (predicate teeth +
4-policy bar on every level / every hard seed).

Model smoke (Together/Qwen3.6-Plus, medium, seed=1): runs end-to-end,
loss outcome (model played a perception-failure variant — composite
0.2628, action=1.0, weakest=perception). Bar is on scripted policies,
not the model.

benchmark_anchor:
- SC2 skirmisher tactics
- military reconnaissance-by-fire
- harass-and-disengage doctrine
- armoured cavalry doctrine

openra_bench/scenarios/packs/combat-skirmish-then-disengage.yaml ADDED
@@ -0,0 +1,292 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # combat-skirmish-then-disengage — ONE coordinated strike-then-pull-back
2
+ # (Wave-6 combat-micro pack; complement to combat-harass-balanced-hit-
3
+ # and-run which is the CYCLIC pulsed variant).
4
+ #
5
+ # The capability under test is SKIRMISHER DOCTRINE: a single big
6
+ # engagement done well — drive forward, score the kills, then pull the
7
+ # force back to a recovery zone before being overwhelmed. Unlike
8
+ # combat-harass-balanced (a cycle of small pulses with zero attrition),
9
+ # this pack is ONE coordinated engagement: kills are easy to score; the
10
+ # DISCRIMINATING decision is whether the agent stops fighting and
11
+ # disengages before attrition mounts past the survival bar — and gets
12
+ # the force HOME (recovery region around the start) before the clock.
13
+ #
14
+ # Real-world anchors:
15
+ # - SC2 skirmisher tactics: a controlled "pull back to base" call
16
+ # after scoring damage, vs the "commit until dead" anti-pattern.
17
+ # - Military reconnaissance-by-fire / armoured cavalry doctrine:
18
+ # probe, score, withdraw to friendly lines for re-supply.
19
+ # - Fire-and-maneuver doctrine in the SINGLE-engagement frame
20
+ # (the pulsed/cyclic frame is combat-harass-balanced-hit-and-run).
21
+ #
22
+ # Idiom (the four-policy bar — every level):
23
+ # • stall (only observe) → LOSS. Kill bar unmet (≥3 kills
24
+ # required); the jeeps' default stance does not let them rack up
25
+ # return-fire kills sitting at home — they must drive east to
26
+ # find foes inside MG range.
27
+ # • never-engage (park at start) → LOSS. units_in_region clause
28
+ # is satisfied (jeeps already at the recovery zone), but the kill
29
+ # bar (units_killed_gte:3) is unmet → after_ticks LOSS.
30
+ # • commit-until-overwhelmed (charge east, never retreat)
31
+ # → LOSS. Kill bar IS met (the
32
+ # enemy infantry trade poorly vs 4× jeep MG), but the jeeps end
33
+ # the run at the enemy cluster (~x=50), NOT in the recovery
34
+ # region around the start. units_in_region_gte:{x:5,y:..,r:6,n:3}
35
+ # is unmet → after_ticks LOSS. On hard tier additionally the
36
+ # hunt-bot spawn waves grind down the un-retreating force past
37
+ # own_units_gte:3.
38
+ # • intended skirmish-then-disengage (drive east, kill ≥3, then
39
+ # move_units back to the start) → WIN. All three clauses met
40
+ # inside the 4500-tick budget.
41
+ #
42
+ # Distinct from combat-harass-balanced-hit-and-run: the BALANCED pack
43
+ # enforces zero attrition across a multi-pulse cycle (the "no loss"
44
+ # bar), and the win is "kill workers without losing raiders". HERE the
45
+ # win is "finish the kills BEFORE you get overwhelmed AND get the force
46
+ # back HOME" — a positional/temporal recovery bar, not an attrition
47
+ # bar. units_in_region_gte is the load-bearing clause that makes
48
+ # disengage required.
49
+ #
50
+ # Engine notes (load-bearing for the bar):
51
+ # - Jeeps start `stance: 0` (HoldFire). With stance:0 they do NOT
52
+ # auto-return-fire on approaching enemies — sitting idle while
53
+ # hunt-bot e1 close in DOES NOT score kills (kill bar unmet).
54
+ # The only way to score is to explicitly `attack_unit` (or
55
+ # `attack_move`), which makes the agent's strike decision
56
+ # load-bearing.
57
+ # - Enemy `e1` at the mid-x cluster are placed at y=19/y=21 cells
58
+ # (verified-placement rows per CLAUDE.md — `e1` at some mid-x
59
+ # cells silently fails to surface; (50,19)/(50,21) are confirmed
60
+ # working).
61
+ # - Persistent unarmed enemy `fact` at far east (x=124) prevents the
62
+ # engine from auto-`done`ing on enemy unit wipe (which would
63
+ # collapse the run to DRAW before the within_ticks + region
64
+ # predicates evaluate cleanly on the terminal frame).
65
+
66
+ meta:
67
+ id: combat-skirmish-then-disengage
68
+ title: 'Combat Skirmish — Strike, Score the Kills, Pull Back to Recovery'
69
+ capability: action
70
+ real_world_meaning: >
71
+ SKIRMISHER doctrine in the single-engagement frame: four fast
72
+ raiders (jeeps) must drive east into a slow infantry cluster,
73
+ score AT LEAST 3 kills, and then PULL BACK to the recovery zone
74
+ around the western start before the clock expires AND while
75
+ keeping at least 3 raiders alive. The skill under test is the
76
+ decision to STOP FIGHTING and disengage — committing until the
77
+ enemy is wiped or until the strike force is destroyed both LOSE
78
+ (commit leaves the raiders at the kill site instead of the
79
+ recovery zone; over-commit on hard loses raiders to the
80
+ hunt-bot spawn waves). Distinct from the BALANCED pulsed
81
+ harass-retreat cycle (combat-harass-balanced-hit-and-run, which
82
+ is many small pulses with zero attrition): this pack is ONE big
83
+ engagement done well, with a positional recovery bar.
84
+ robotics_analogue: >
85
+ Mission-with-egress: a mobile manipulator must complete a
86
+ threshold of reward-bearing actions in a contested workspace,
87
+ then return to a safe staging region before a time or attrition
88
+ budget expires. Knowing WHEN to stop the productive sub-task
89
+ and start the egress is the decision under test — a
90
+ productivity-only policy (greedy accumulation) leaves the agent
91
+ far from the staging region at deadline and fails the egress
92
+ clause.
93
+ benchmark_anchor:
94
+ - "SC2 skirmisher tactics"
95
+ - "military reconnaissance-by-fire"
96
+ - "harass-and-disengage doctrine"
97
+ - "armoured cavalry doctrine"
98
+ author: openra-bench
99
+
100
+ base_map: rush-hour-arena
101
+
102
+ base:
103
+ agent: {faction: allies, cash: 0}
104
+ enemy: {faction: soviet, cash: 0}
105
+ tools: [move_units, attack_unit, attack_move, stop]
106
+ planning: true
107
+ termination: {max_ticks: 6500}
108
+ actors: []
109
+
110
+ levels:
111
+ # ── EASY ────────────────────────────────────────────────────────────
112
+ # Bare skirmish-then-disengage skill: 4 jeeps (stance:0, no
113
+ # auto-fire — kills require explicit attack_unit) vs 4× slow rifle
114
+ # infantry (e1, stance:0 — no auto-fire either, so a never-engage
115
+ # policy never scores). The strike force easily scores ≥3 kills
116
+ # under attack_unit (4× jeep MG vs unanswered rifles); the load-
117
+ # bearing decision is whether the agent then ORDERS THE RETREAT
118
+ # back to the recovery zone instead of mopping up the last enemy in
119
+ # place. stall / never-engage LOSE on the kill bar; commit-until-
120
+ # wiped LOSES because the jeeps end at x≈50 instead of the
121
+ # (5,20,r=6) recovery region.
122
+ easy:
123
+ description: >
124
+ Four jeeps stage at the west base (5,20). Four enemy
125
+ RIFLE INFANTRY (e1, slow) hold a cluster around (50,20). Drive
126
+ east, kill AT LEAST THREE rifle infantry, then PULL BACK so
127
+ AT LEAST THREE of your jeeps end inside the recovery zone (a
128
+ 6-cell radius around (5,20) — i.e. your starting region). Keep
129
+ at least three jeeps alive. Finish before tick 4500. Stalling
130
+ LOSES (kill bar unmet); never engaging LOSES (kill bar unmet);
131
+ committing east and staying at the cluster LOSES (your jeeps
132
+ are at the kill site, not the recovery zone). The discriminator
133
+ is the DISENGAGE order — stop attacking and move_units back
134
+ to (5,20) once you have your 3 kills.
135
+ overrides:
136
+ actors:
137
+ # Strike force: 4 jeeps at the western staging point.
138
+ # stance:0 (HoldFire) — no auto-return-fire, so kills require
139
+ # an explicit attack_unit / attack_move order (the load-
140
+ # bearing decision under test).
141
+ - {type: jeep, owner: agent, position: [5, 19], stance: 0}
142
+ - {type: jeep, owner: agent, position: [5, 20], stance: 0}
143
+ - {type: jeep, owner: agent, position: [5, 21], stance: 0}
144
+ - {type: jeep, owner: agent, position: [6, 20], stance: 0}
145
+ # Enemy infantry cluster — 4× e1 spread across rows y=19/y=21
146
+ # (CLAUDE.md confirms y=19/y=21 mid-x cells place reliably).
147
+ # stance:0 so they sit on post — fair "discoverable cluster"
148
+ # for the test (a never-engage agent never gets attacked into
149
+ # an accidental kill).
150
+ - {type: e1, owner: enemy, position: [48, 19], stance: 0}
151
+ - {type: e1, owner: enemy, position: [50, 19], stance: 0}
152
+ - {type: e1, owner: enemy, position: [50, 21], stance: 0}
153
+ - {type: e1, owner: enemy, position: [52, 21], stance: 0}
154
+ # Persistent far-east enemy fact — prevents engine auto-done
155
+ # on enemy wipe collapsing the run to DRAW before the
156
+ # within_ticks + region predicates evaluate.
157
+ - {type: fact, owner: enemy, position: [124, 20]}
158
+ win_condition:
159
+ all_of:
160
+ - {units_killed_gte: 3}
161
+ - {own_units_gte: 3}
162
+ - {units_in_region_gte: {x: 5, y: 20, radius: 6, n: 3}}
163
+ - {within_ticks: 4500}
164
+ fail_condition:
165
+ any_of:
166
+ - {after_ticks: 4501}
167
+ - {not: {own_units_gte: 1}}
168
+ max_turns: 52
169
+
170
+ # ── MEDIUM ──────────────────────────────────────────────────────────
171
+ # +1 controlled variable: the enemy cluster grows to 6× e1 (vs 4 on
172
+ # easy). The kill bar (≥3) is unchanged, so the strike is still
173
+ # easily achievable — but the larger cluster means a commit-until-
174
+ # wiped policy spends MORE turns mopping up (more enemies = more
175
+ # rounds at the cluster), which leaves it even further from being
176
+ # able to RETREAT before the within_ticks deadline. The discriminator
177
+ # — "stop attacking after 3 kills and order the disengage" — is
178
+ # sharper.
179
+ medium:
180
+ description: >
181
+ Four jeeps stage at the west base (5,20). SIX enemy rifle
182
+ infantry hold a cluster around (50,20). Drive east, kill AT
183
+ LEAST THREE rifle infantry, then PULL BACK so AT LEAST THREE
184
+ of your jeeps end inside the recovery zone (6-cell radius
185
+ around (5,20)). Keep at least three jeeps alive. Finish
186
+ before tick 4500. With six enemies in the cluster a "commit
187
+ until everything is dead" policy spends most of the budget
188
+ mopping up — by the deadline your jeeps are still at the
189
+ kill site, not the recovery zone, and the run fails on the
190
+ region clause. Order the DISENGAGE after the third kill and
191
+ drive west to the recovery zone.
192
+ overrides:
193
+ actors:
194
+ - {type: jeep, owner: agent, position: [5, 19], stance: 0}
195
+ - {type: jeep, owner: agent, position: [5, 20], stance: 0}
196
+ - {type: jeep, owner: agent, position: [5, 21], stance: 0}
197
+ - {type: jeep, owner: agent, position: [6, 20], stance: 0}
198
+ # 6× e1 cluster around (50,20). Verified-placement rows
199
+ # (y=19/y=21 mid-x).
200
+ - {type: e1, owner: enemy, position: [48, 19], stance: 0}
201
+ - {type: e1, owner: enemy, position: [48, 21], stance: 0}
202
+ - {type: e1, owner: enemy, position: [50, 19], stance: 0}
203
+ - {type: e1, owner: enemy, position: [50, 21], stance: 0}
204
+ - {type: e1, owner: enemy, position: [52, 19], stance: 0}
205
+ - {type: e1, owner: enemy, position: [52, 21], stance: 0}
206
+ - {type: fact, owner: enemy, position: [124, 20]}
207
+ win_condition:
208
+ all_of:
209
+ - {units_killed_gte: 3}
210
+ - {own_units_gte: 3}
211
+ - {units_in_region_gte: {x: 5, y: 20, radius: 6, n: 3}}
212
+ - {within_ticks: 4500}
213
+ fail_condition:
214
+ any_of:
215
+ - {after_ticks: 4501}
216
+ - {not: {own_units_gte: 1}}
217
+ max_turns: 52
218
+
219
+ # ── HARD ────────────────────────────────────────────────────────────
220
+ # +2 controlled variables vs medium:
221
+ # 1. bot_type: hunt — the e1 cluster actively PURSUES the jeeps
222
+ # (jeeps remain stance:0 so they only score on explicit
223
+ # attack orders; the hunt bot turns the engagement into a
224
+ # tightening window — a slow retreat or commit-and-stay loses
225
+ # jeeps past own_units_gte:3). Spec's "hunt-bot pursues".
226
+ # 2. Two agent spawn_point groups (NORTH y=10 or SOUTH y=30)
227
+ # round-robined by seed; the recovery zone is `any_of` over the
228
+ # two spawn corners so the agent must return to ITS OWN start
229
+ # corner (no "always retreat to (5,20)" memorisation). Spec's
230
+ # "2 spawn groups".
231
+ # Enemy actors do NOT honour spawn_point (CLAUDE.md), so the e1
232
+ # cluster sits symmetrically at the mid-latitude (y=20) — both
233
+ # spawn corridors face the same eastern threat geometry. The
234
+ # cluster size stays at 6 (matching medium); the hunt bot is the
235
+ # threat-axis upgrade, not raw enemy count — extra waves would
236
+ # overwhelm 4 jeeps before any disengage could complete (verified
237
+ # 2026-05-20: +4 extra e1 at x≈90 + hunt drops the intended-policy
238
+ # win rate to ~0% as the swarm closes inside 5 turns).
239
+ hard:
240
+ description: >
241
+ Four jeeps stage at ONE of two western staging points (NORTH
242
+ (5,10) or SOUTH (5,30), chosen by seed — anti-memorisation).
243
+ Six enemy RIFLE INFANTRY (e1) sit at a cluster around
244
+ (50,20). The enemy side is HUNTING — surviving e1 actively
245
+ pursue your jeeps. Kill AT LEAST THREE rifle infantry, keep
246
+ at least three jeeps alive, AND end with at least three
247
+ jeeps inside the recovery zone (6-cell radius around YOUR
248
+ spawn corner, either (5,10) or (5,30)). Finish before tick
249
+ 4500. Stalling, never engaging, and commit-and-stay all
250
+ LOSE; the hunt bot ensures that a slow disengage also fails
251
+ on the survival or region clause.
252
+ overrides:
253
+ actors:
254
+ # spawn_point 0 — NORTH staging (y=10)
255
+ - {type: jeep, owner: agent, position: [5, 9], stance: 0, spawn_point: 0}
256
+ - {type: jeep, owner: agent, position: [5, 10], stance: 0, spawn_point: 0}
257
+ - {type: jeep, owner: agent, position: [5, 11], stance: 0, spawn_point: 0}
258
+ - {type: jeep, owner: agent, position: [6, 10], stance: 0, spawn_point: 0}
259
+ # spawn_point 1 — SOUTH staging (y=30)
260
+ - {type: jeep, owner: agent, position: [5, 29], stance: 0, spawn_point: 1}
261
+ - {type: jeep, owner: agent, position: [5, 30], stance: 0, spawn_point: 1}
262
+ - {type: jeep, owner: agent, position: [5, 31], stance: 0, spawn_point: 1}
263
+ - {type: jeep, owner: agent, position: [6, 30], stance: 0, spawn_point: 1}
264
+ # 6× e1 cluster at (50,20). Hunt bot gives them stance:3 on
265
+ # init and issues Attack orders that drive them west toward
266
+ # the jeeps; the infantry walk to contact takes ~6-8 turns.
267
+ - {type: e1, owner: enemy, position: [48, 19], stance: 0}
268
+ - {type: e1, owner: enemy, position: [48, 21], stance: 0}
269
+ - {type: e1, owner: enemy, position: [50, 19], stance: 0}
270
+ - {type: e1, owner: enemy, position: [50, 21], stance: 0}
271
+ - {type: e1, owner: enemy, position: [52, 19], stance: 0}
272
+ - {type: e1, owner: enemy, position: [52, 21], stance: 0}
273
+ # Persistent far-east enemy fact.
274
+ - {type: fact, owner: enemy, position: [124, 20]}
275
+ enemy: {faction: soviet, cash: 0, bot_type: hunt}
276
+ # Hard win: recovery zone is `any_of` over the two spawn corners
277
+ # — the agent must return to ITS OWN start corner. (A wrong-corner
278
+ # return is geometrically infeasible inside the tick budget, but
279
+ # encoded for predicate clarity.)
280
+ win_condition:
281
+ all_of:
282
+ - {units_killed_gte: 3}
283
+ - {own_units_gte: 3}
284
+ - any_of:
285
+ - {units_in_region_gte: {x: 5, y: 10, radius: 6, n: 3}}
286
+ - {units_in_region_gte: {x: 5, y: 30, radius: 6, n: 3}}
287
+ - {within_ticks: 4500}
288
+ fail_condition:
289
+ any_of:
290
+ - {after_ticks: 4501}
291
+ - {not: {own_units_gte: 1}}
292
+ max_turns: 52
tests/test_combat_skirmish_then_disengage.py ADDED
@@ -0,0 +1,370 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """combat-skirmish-then-disengage — ONE coordinated strike-then-pull-back.
2
+
3
+ Bar: the intended skirmish-then-disengage policy WINS on every level
4
+ and every hard seed; stall (only observe), never-engage (park at
5
+ start), and commit-until-overwhelmed (charge east and never retreat)
6
+ LOSE on every level. Non-win is a real reachable timeout LOSS (not a
7
+ draw).
8
+
9
+ Validation is scripted (no model / network): the four policies below
10
+ are the exhaustive proxies for the four real strategies and exercise
11
+ the predicate teeth directly. The load-bearing decision under test is
12
+ "stop attacking after the kill bar is met and order the disengage
13
+ back to the recovery zone before the deadline".
14
+ """
15
+
16
+ from __future__ import annotations
17
+
18
+ from pathlib import Path
19
+
20
+ import pytest
21
+
22
+ pytest.importorskip("openra_rl_training", reason="Rust env wheel not installed")
23
+ from openra_bench.scenarios import load_pack
24
+ from openra_bench.scenarios.loader import compile_level
25
+ from openra_bench.scenarios.win_conditions import WinContext, evaluate
26
+
27
+ PACKS = Path(__file__).parent.parent / "openra_bench" / "scenarios" / "packs"
28
+ PACK_PATH = PACKS / "combat-skirmish-then-disengage.yaml"
29
+
30
+
31
+ # ── unit-level predicate checks ──────────────────────────────────────
32
+
33
+ def _ctx(units_xy=(), tick=1000, killed=0, lost=0):
34
+ """Synthesize a WinContext for predicate-level checks."""
35
+ import types
36
+
37
+ sig = types.SimpleNamespace(
38
+ game_tick=tick,
39
+ units_killed=killed,
40
+ units_lost=lost,
41
+ own_buildings=[],
42
+ own_building_types=set(),
43
+ enemies_seen_ids=set(),
44
+ enemy_buildings_seen_ids=set(),
45
+ )
46
+ return WinContext(
47
+ signals=sig,
48
+ render_state={
49
+ "units_summary": [
50
+ {"cell_x": x, "cell_y": y} for x, y in units_xy
51
+ ]
52
+ },
53
+ )
54
+
55
+
56
+ def test_predicates_easy_recovery_clause():
57
+ c = compile_level(load_pack(PACK_PATH), "easy")
58
+ home = [(5, 20), (5, 20), (5, 20), (5, 20)]
59
+ cluster = [(50, 20), (50, 20), (50, 20), (50, 20)]
60
+ mixed_3_home = [(5, 20), (5, 20), (5, 20), (50, 20)]
61
+
62
+ # Intended: 3+ kills, ≥3 alive, ≥3 in recovery → WIN
63
+ assert evaluate(c.win_condition, _ctx(home, tick=2000, killed=3, lost=0))
64
+ assert evaluate(c.win_condition, _ctx(mixed_3_home, tick=2000, killed=4, lost=0))
65
+ # Kill bar met but all units still at the kill site → fail region clause
66
+ assert not evaluate(c.win_condition, _ctx(cluster, tick=2000, killed=4, lost=0))
67
+ # 3 kills but only 2 own_units → predicate fails
68
+ assert not evaluate(c.win_condition, _ctx(home[:2], tick=2000, killed=3, lost=2))
69
+ # 0 kills → predicate fails even if everyone is at home
70
+ assert not evaluate(c.win_condition, _ctx(home, tick=2000, killed=0, lost=0))
71
+ # Past deadline → real loss, reachable within max_turns
72
+ assert evaluate(c.fail_condition, _ctx(home, tick=4502, killed=0, lost=0))
73
+ assert 4501 <= 93 + 90 * (c.max_turns - 1), (
74
+ "after_ticks 4501 must be reachable within max_turns"
75
+ )
76
+
77
+
78
+ def test_predicates_medium_same_bar_six_enemies():
79
+ c = compile_level(load_pack(PACK_PATH), "medium")
80
+ home = [(5, 20), (5, 20), (5, 20), (5, 20)]
81
+ cluster = [(50, 20), (50, 20), (50, 20), (50, 20)]
82
+
83
+ # Intended: 3+ kills, ≥3 alive, ≥3 in recovery → WIN
84
+ assert evaluate(c.win_condition, _ctx(home, tick=3000, killed=3, lost=0))
85
+ # Commit-and-stay: kill bar met but jeeps at cluster, not home → fail
86
+ assert not evaluate(c.win_condition, _ctx(cluster, tick=3000, killed=6, lost=0))
87
+ # Past deadline → real loss, reachable
88
+ assert evaluate(c.fail_condition, _ctx(home, tick=4502, killed=0, lost=0))
89
+ assert 4501 <= 93 + 90 * (c.max_turns - 1)
90
+
91
+
92
+ def test_predicates_hard_any_of_spawn_corner_recovery():
93
+ c = compile_level(load_pack(PACK_PATH), "hard")
94
+ home_north = [(5, 10), (5, 10), (5, 10), (5, 10)]
95
+ home_south = [(5, 30), (5, 30), (5, 30), (5, 30)]
96
+ mid_lat = [(5, 20), (5, 20), (5, 20), (5, 20)] # neither corner
97
+ cluster = [(50, 20), (50, 20), (50, 20), (50, 20)]
98
+
99
+ # Either spawn corner satisfies the any_of recovery clause.
100
+ assert evaluate(c.win_condition, _ctx(home_north, tick=3000, killed=3, lost=0))
101
+ assert evaluate(c.win_condition, _ctx(home_south, tick=3000, killed=3, lost=0))
102
+ # Mid-latitude (y=20) is outside BOTH spawn-corner radii (radius=6
103
+ # from (5,10) ⇒ y=20 is 10 cells away; same from (5,30)) → fail.
104
+ assert not evaluate(c.win_condition, _ctx(mid_lat, tick=3000, killed=3, lost=0))
105
+ # Commit-and-stay at cluster → fail region clause.
106
+ assert not evaluate(c.win_condition, _ctx(cluster, tick=3000, killed=6, lost=0))
107
+ # Past deadline → real loss, reachable
108
+ assert evaluate(c.fail_condition, _ctx(home_north, tick=4502, killed=0, lost=0))
109
+ assert 4501 <= 93 + 90 * (c.max_turns - 1)
110
+
111
+
112
+ def test_hard_has_two_spawn_point_groups():
113
+ """Hard-tier curation contract: ≥2 distinct agent spawn_point
114
+ groups so the seed round-robins the raider start corner."""
115
+ c = compile_level(load_pack(PACK_PATH), "hard")
116
+ groups = {
117
+ (a.spawn_point if a.spawn_point is not None else 0)
118
+ for a in c.scenario.actors
119
+ if a.owner == "agent"
120
+ }
121
+ assert len(groups) >= 2, f"hard needs ≥2 spawn_point groups, got {groups}"
122
+
123
+
124
+ def test_pack_compiles_and_meta_fields_populated():
125
+ pack = load_pack(PACK_PATH)
126
+ assert pack.meta.capability == "action"
127
+ assert pack.meta.id == "combat-skirmish-then-disengage"
128
+ anchors = pack.meta.benchmark_anchor
129
+ assert isinstance(anchors, list) and anchors, "benchmark_anchor required"
130
+ joined = " ".join(anchors).lower()
131
+ # Anchored to the doctrines the brief calls out: SC2 skirmisher +
132
+ # military reconnaissance-by-fire / cavalry doctrine.
133
+ assert "skirmish" in joined
134
+ assert "recon" in joined or "cavalry" in joined or "disengage" in joined
135
+ for lvl in ("easy", "medium", "hard"):
136
+ c = compile_level(pack, lvl)
137
+ assert c.map_supported
138
+ assert c.win_condition is not None and c.fail_condition is not None
139
+
140
+
141
+ def test_timeout_loss_is_reachable_on_every_level():
142
+ """No draw degeneracy: after_ticks 4501 fits inside max_turns on
143
+ every level (∼90 ticks/turn ⇒ 93 + 90·(max_turns-1))."""
144
+ pack = load_pack(PACK_PATH)
145
+ for lvl in ("easy", "medium", "hard"):
146
+ c = compile_level(pack, lvl)
147
+ assert 4501 <= 93 + 90 * (c.max_turns - 1), lvl
148
+
149
+
150
+ # ── engine-driven scripted policies ──────────────────────────────────
151
+ #
152
+ # Stage thresholds for the intended policy:
153
+ # PHASE_STRIKE_UNTIL_TICK = 1300 — drive east, attack_unit any
154
+ # visible e1 until this tick (~14 turns @ ~90 ticks/turn). At
155
+ # this point the kill bar (≥3) is comfortably met and the
156
+ # disengage budget (4500 - 1300 = ~3200 ticks ≈ 35 turns) is
157
+ # more than enough to march back to (5,20)-ish.
158
+ # APPROACH_STEP = 15 cells / turn east toward the cluster axis
159
+ # (jeep ~50% faster than infantry).
160
+
161
+ PHASE_STRIKE_UNTIL_TICK = 1300
162
+ APPROACH_STEP = 15
163
+ APPROACH_LIMIT_X = 46 # don't overrun the cluster (~x=50)
164
+
165
+
166
+ def _e1_foes(enemies):
167
+ return [
168
+ e for e in enemies
169
+ if (e.get("type") or "").lower() == "e1"
170
+ and not e.get("is_building")
171
+ ]
172
+
173
+
174
+ def _stall_policy(rs, Command):
175
+ """Stall: only observe. Kill bar never met (jeeps are stance:0;
176
+ no auto-return-fire) → LOSS on the clock; on hard the hunt-bot
177
+ e1 close on the idle stack and wipe it → LOSS on
178
+ `not own_units_gte:1`."""
179
+ return [Command.observe()]
180
+
181
+
182
+ def _never_engage_policy(rs, Command):
183
+ """Park at the start; never move east, never fire. Recovery
184
+ region clause is trivially satisfied but the kill bar is unmet
185
+ → LOSS on the clock (easy/medium) or LOSS on hard when hunt-bot
186
+ e1 wipe the idle stack."""
187
+ units = rs.get("units_summary", []) or []
188
+ if not units:
189
+ return [Command.observe()]
190
+ cmds = []
191
+ for u in units:
192
+ cmds.append(
193
+ Command.move_units(
194
+ [str(u["id"])], target_x=u["cell_x"], target_y=u["cell_y"]
195
+ )
196
+ )
197
+ return cmds
198
+
199
+
200
+ def _commit_until_overwhelmed_policy(rs, Command):
201
+ """Charge east; attack_unit any visible foe; never retreat. The
202
+ kill bar IS met (4× jeep MG vs stance:0 rifles), but the jeeps
203
+ end the run sitting at the kill site (~x=50), not in the
204
+ recovery region. The region clause fails → after_ticks LOSS.
205
+ """
206
+ units = rs.get("units_summary", []) or []
207
+ enemies = rs.get("enemy_summary", []) or []
208
+ if not units:
209
+ return [Command.observe()]
210
+ foes = _e1_foes(enemies)
211
+ cmds = []
212
+ for u in units:
213
+ ux, uy = u["cell_x"], u["cell_y"]
214
+ if foes:
215
+ t = min(
216
+ foes,
217
+ key=lambda e: (e["cell_x"] - ux) ** 2 + (e["cell_y"] - uy) ** 2,
218
+ )
219
+ cmds.append(Command.attack_unit([str(u["id"])], str(t["id"])))
220
+ else:
221
+ # March east to the cluster axis but STOP at the cluster
222
+ # (don't overrun to the far-east fact and trip auto-done).
223
+ cmds.append(
224
+ Command.move_units(
225
+ [str(u["id"])], target_x=min(50, ux + 12), target_y=uy
226
+ )
227
+ )
228
+ return cmds
229
+
230
+
231
+ def _intended_skirmish_then_disengage_policy(rs, Command):
232
+ """Intended skirmisher cycle:
233
+ - PHASE 1 (tick < PHASE_STRIKE_UNTIL_TICK): drive east, attack_unit
234
+ any visible e1.
235
+ - PHASE 2 (tick >= PHASE_STRIKE_UNTIL_TICK): stop attacking; order
236
+ move_units back to the nearest spawn corner — the RECOVERY zone.
237
+ The phase switch is the spec's load-bearing decision: "stop
238
+ fighting and pull back" before the deadline.
239
+ """
240
+ units = rs.get("units_summary", []) or []
241
+ enemies = rs.get("enemy_summary", []) or []
242
+ tick = rs.get("game_tick") or 0
243
+ if not units:
244
+ return [Command.observe()]
245
+ foes = _e1_foes(enemies)
246
+ # Pick the nearest spawn-corner candidate as the recovery target
247
+ # (stateless — works for both single-corner and any_of-corner
248
+ # recovery clauses).
249
+ candidates = [(5, 20), (5, 10), (5, 30)]
250
+ cx = sum(u["cell_x"] for u in units) / len(units)
251
+ cy = sum(u["cell_y"] for u in units) / len(units)
252
+ home = min(
253
+ candidates, key=lambda p: (p[0] - cx) ** 2 + (p[1] - cy) ** 2
254
+ )
255
+ cmds = []
256
+ if tick < PHASE_STRIKE_UNTIL_TICK:
257
+ if foes:
258
+ for u in units:
259
+ ux, uy = u["cell_x"], u["cell_y"]
260
+ t = min(
261
+ foes,
262
+ key=lambda e: (e["cell_x"] - ux) ** 2
263
+ + (e["cell_y"] - uy) ** 2,
264
+ )
265
+ cmds.append(
266
+ Command.attack_unit([str(u["id"])], str(t["id"]))
267
+ )
268
+ else:
269
+ # No foes in sight yet — drive east toward the cluster
270
+ # axis. Cap at APPROACH_LIMIT_X so the strike force
271
+ # doesn't overrun past the cluster.
272
+ for u in units:
273
+ ux, uy = u["cell_x"], u["cell_y"]
274
+ cmds.append(
275
+ Command.move_units(
276
+ [str(u["id"])],
277
+ target_x=min(APPROACH_LIMIT_X, ux + APPROACH_STEP),
278
+ target_y=uy,
279
+ )
280
+ )
281
+ else:
282
+ # PHASE 2: PULL BACK. Stop fighting; drive home.
283
+ for u in units:
284
+ cmds.append(
285
+ Command.move_units(
286
+ [str(u["id"])], target_x=home[0], target_y=home[1]
287
+ )
288
+ )
289
+ return cmds
290
+
291
+
292
+ # ── policy bar tests ────────────────────────────────────────────────
293
+
294
+
295
+ @pytest.mark.parametrize("level", ["easy", "medium", "hard"])
296
+ def test_stall_loses(level):
297
+ """Stall must LOSE on every level: jeeps are stance:0 so they
298
+ never return fire (kill bar unmet); on hard the hunt-bot e1
299
+ close on the idle stack and trip `not own_units_gte:1`."""
300
+ pytest.importorskip("openra_train")
301
+ from openra_bench.eval_core import run_level
302
+
303
+ c = compile_level(load_pack(PACK_PATH), level)
304
+ seeds = (1, 2, 3, 4) if level == "hard" else (1,)
305
+ for s in seeds:
306
+ res = run_level(c, _stall_policy, seed=s)
307
+ assert res.outcome == "loss", (
308
+ f"{level} seed={s}: stall must LOSE, got {res.outcome} "
309
+ f"killed={res.signals.units_killed} lost={res.signals.units_lost}"
310
+ )
311
+
312
+
313
+ @pytest.mark.parametrize("level", ["easy", "medium", "hard"])
314
+ def test_never_engage_loses(level):
315
+ """Park-at-start must LOSE: kill bar unmet; on hard hunt-bot e1
316
+ wipe the idle stack."""
317
+ pytest.importorskip("openra_train")
318
+ from openra_bench.eval_core import run_level
319
+
320
+ c = compile_level(load_pack(PACK_PATH), level)
321
+ seeds = (1, 2, 3, 4) if level == "hard" else (1,)
322
+ for s in seeds:
323
+ res = run_level(c, _never_engage_policy, seed=s)
324
+ assert res.outcome == "loss", (
325
+ f"{level} seed={s}: never-engage must LOSE, got {res.outcome} "
326
+ f"killed={res.signals.units_killed} lost={res.signals.units_lost}"
327
+ )
328
+
329
+
330
+ @pytest.mark.parametrize("level", ["easy", "medium", "hard"])
331
+ def test_commit_until_overwhelmed_loses(level):
332
+ """Commit-and-stay at the cluster must LOSE on every level: the
333
+ kill bar IS met but the jeeps end the run at the kill site
334
+ (~x=50), not the recovery region around the start. The region
335
+ clause fails → after_ticks LOSS."""
336
+ pytest.importorskip("openra_train")
337
+ from openra_bench.eval_core import run_level
338
+
339
+ c = compile_level(load_pack(PACK_PATH), level)
340
+ seeds = (1, 2, 3, 4) if level == "hard" else (1,)
341
+ for s in seeds:
342
+ res = run_level(c, _commit_until_overwhelmed_policy, seed=s)
343
+ assert res.outcome == "loss", (
344
+ f"{level} seed={s}: commit-and-stay must LOSE, got "
345
+ f"{res.outcome} killed={res.signals.units_killed} "
346
+ f"lost={res.signals.units_lost}"
347
+ )
348
+
349
+
350
+ @pytest.mark.parametrize("level", ["easy", "medium", "hard"])
351
+ def test_intended_skirmish_then_disengage_wins(level):
352
+ """Intended skirmisher (strike phase → disengage phase) must
353
+ WIN on every level and every hard seed: kill bar met, ≥3 jeeps
354
+ alive, ≥3 jeeps inside the spawn-corner recovery region, all
355
+ inside the 4500-tick budget."""
356
+ pytest.importorskip("openra_train")
357
+ from openra_bench.eval_core import run_level
358
+
359
+ c = compile_level(load_pack(PACK_PATH), level)
360
+ seeds = (1, 2, 3, 4) if level == "hard" else (1,)
361
+ for s in seeds:
362
+ res = run_level(
363
+ c, _intended_skirmish_then_disengage_policy, seed=s
364
+ )
365
+ assert res.outcome == "win", (
366
+ f"{level} seed={s}: intended skirmish-then-disengage should "
367
+ f"WIN, got {res.outcome} after {res.turns} turns "
368
+ f"(killed={res.signals.units_killed}, "
369
+ f"lost={res.signals.units_lost})"
370
+ )
tests/test_hard_tier.py CHANGED
@@ -200,6 +200,19 @@ UPGRADED = [
200
  # flips per seed and no memorised "retreat west on y=20" opening
201
  # generalises.
202
  "combat-kite-jeep-vs-tank",
 
 
 
 
 
 
 
 
 
 
 
 
 
203
  # Wave-4 Group B TURTLE node of the expansion triple (SC2 fortress
204
  # macro / 1-base mass-defence; military fortress doctrine; risk-
205
  # averse single-market deep-investment anchor). Hard tier defines
 
200
  # flips per seed and no memorised "retreat west on y=20" opening
201
  # generalises.
202
  "combat-kite-jeep-vs-tank",
203
+ # Wave-6 combat-micro skirmish pack (SC2 skirmisher tactics /
204
+ # military reconnaissance-by-fire anchor). One coordinated
205
+ # strike-then-pull-back; the load-bearing decision is "stop
206
+ # attacking after the kill bar is met and order the disengage
207
+ # back to the spawn-corner recovery zone before the deadline".
208
+ # Hard tier defines two agent spawn_point groups (NORTH (5,10)
209
+ # vs SOUTH (5,30)) round-robined by seed; the recovery clause is
210
+ # `any_of` over the two spawn-corner regions so the agent must
211
+ # return to ITS OWN start corner. Hunt-bot pursuit (e1 cluster
212
+ # attacks-anything) makes a slow-disengage policy also LOSE on
213
+ # the survival bar — the "stop fighting and pull back" call is
214
+ # mandatory on every seed.
215
+ "combat-skirmish-then-disengage",
216
  # Wave-4 Group B TURTLE node of the expansion triple (SC2 fortress
217
  # macro / 1-base mass-defence; military fortress doctrine; risk-
218
  # averse single-market deep-investment anchor). Hard tier defines