Spaces:
Running
feat(scenario): combat-skirmish-then-disengage — strike then disengage (SC2 skirmisher / military recon-by-fire anchor)
Browse filesWave-6 combat-micro pack: ONE coordinated engagement done well — drive
east, score >=3 kills against a slow infantry cluster, then PULL BACK
to the spawn-corner recovery zone before the deadline. Distinct from
combat-harass-balanced-hit-and-run (which is the CYCLIC pulsed variant
with a zero-attrition bar): this pack is one big engagement with a
positional/temporal recovery bar.
Bar (all four-policy proxies, every level + every hard seed 1..4):
* stall (only observe) -> LOSS (kill bar unmet;
jeeps stance:0 so no auto-return-fire; on hard the hunt-bot e1
wipe the idle stack)
* never-engage (park at start) -> LOSS (recovery clause
trivially satisfied but kill bar unmet)
* commit-until-overwhelmed (charge & stay) -> LOSS (kill bar IS met
but jeeps end at the kill site x~50, not in the recovery region
around the spawn corner; region clause fails -> after_ticks LOSS)
* intended skirmish-then-disengage -> WIN on every seed
(kill bar met inside ~14 turns, then disengage to spawn corner
finishes inside the 4500-tick budget)
Win predicate (all levels):
units_killed_gte:3 AND own_units_gte:3 AND
units_in_region_gte:{x:5,y:<spawn>,radius:6,n:3} AND within_ticks:4500
Hard recovery clause is any_of over the two spawn-corner regions
(NORTH (5,10) or SOUTH (5,30)) — agent must return to its OWN start
corner.
Difficulty axis:
easy -> 4x e1 cluster at (50,20), no bot
medium -> 6x e1 cluster (same kill bar; the extra rifles tighten the
commit-and-stay failure mode by stretching the mop-up
window past the disengage budget)
hard -> 6x e1 cluster + bot_type:hunt (active pursuit) + 2 agent
spawn_point groups round-robined by seed (anti-memorisation)
UPGRADED in tests/test_hard_tier.py (>=2 distinct seed-driven spawn
groups verified). 18 scripted-policy tests pass (predicate teeth +
4-policy bar on every level / every hard seed).
Model smoke (Together/Qwen3.6-Plus, medium, seed=1): runs end-to-end,
loss outcome (model played a perception-failure variant — composite
0.2628, action=1.0, weakest=perception). Bar is on scripted policies,
not the model.
benchmark_anchor:
- SC2 skirmisher tactics
- military reconnaissance-by-fire
- harass-and-disengage doctrine
- armoured cavalry doctrine
|
@@ -0,0 +1,292 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# combat-skirmish-then-disengage — ONE coordinated strike-then-pull-back
|
| 2 |
+
# (Wave-6 combat-micro pack; complement to combat-harass-balanced-hit-
|
| 3 |
+
# and-run which is the CYCLIC pulsed variant).
|
| 4 |
+
#
|
| 5 |
+
# The capability under test is SKIRMISHER DOCTRINE: a single big
|
| 6 |
+
# engagement done well — drive forward, score the kills, then pull the
|
| 7 |
+
# force back to a recovery zone before being overwhelmed. Unlike
|
| 8 |
+
# combat-harass-balanced (a cycle of small pulses with zero attrition),
|
| 9 |
+
# this pack is ONE coordinated engagement: kills are easy to score; the
|
| 10 |
+
# DISCRIMINATING decision is whether the agent stops fighting and
|
| 11 |
+
# disengages before attrition mounts past the survival bar — and gets
|
| 12 |
+
# the force HOME (recovery region around the start) before the clock.
|
| 13 |
+
#
|
| 14 |
+
# Real-world anchors:
|
| 15 |
+
# - SC2 skirmisher tactics: a controlled "pull back to base" call
|
| 16 |
+
# after scoring damage, vs the "commit until dead" anti-pattern.
|
| 17 |
+
# - Military reconnaissance-by-fire / armoured cavalry doctrine:
|
| 18 |
+
# probe, score, withdraw to friendly lines for re-supply.
|
| 19 |
+
# - Fire-and-maneuver doctrine in the SINGLE-engagement frame
|
| 20 |
+
# (the pulsed/cyclic frame is combat-harass-balanced-hit-and-run).
|
| 21 |
+
#
|
| 22 |
+
# Idiom (the four-policy bar — every level):
|
| 23 |
+
# • stall (only observe) → LOSS. Kill bar unmet (≥3 kills
|
| 24 |
+
# required); the jeeps' default stance does not let them rack up
|
| 25 |
+
# return-fire kills sitting at home — they must drive east to
|
| 26 |
+
# find foes inside MG range.
|
| 27 |
+
# • never-engage (park at start) → LOSS. units_in_region clause
|
| 28 |
+
# is satisfied (jeeps already at the recovery zone), but the kill
|
| 29 |
+
# bar (units_killed_gte:3) is unmet → after_ticks LOSS.
|
| 30 |
+
# • commit-until-overwhelmed (charge east, never retreat)
|
| 31 |
+
# → LOSS. Kill bar IS met (the
|
| 32 |
+
# enemy infantry trade poorly vs 4× jeep MG), but the jeeps end
|
| 33 |
+
# the run at the enemy cluster (~x=50), NOT in the recovery
|
| 34 |
+
# region around the start. units_in_region_gte:{x:5,y:..,r:6,n:3}
|
| 35 |
+
# is unmet → after_ticks LOSS. On hard tier additionally the
|
| 36 |
+
# hunt-bot spawn waves grind down the un-retreating force past
|
| 37 |
+
# own_units_gte:3.
|
| 38 |
+
# • intended skirmish-then-disengage (drive east, kill ≥3, then
|
| 39 |
+
# move_units back to the start) → WIN. All three clauses met
|
| 40 |
+
# inside the 4500-tick budget.
|
| 41 |
+
#
|
| 42 |
+
# Distinct from combat-harass-balanced-hit-and-run: the BALANCED pack
|
| 43 |
+
# enforces zero attrition across a multi-pulse cycle (the "no loss"
|
| 44 |
+
# bar), and the win is "kill workers without losing raiders". HERE the
|
| 45 |
+
# win is "finish the kills BEFORE you get overwhelmed AND get the force
|
| 46 |
+
# back HOME" — a positional/temporal recovery bar, not an attrition
|
| 47 |
+
# bar. units_in_region_gte is the load-bearing clause that makes
|
| 48 |
+
# disengage required.
|
| 49 |
+
#
|
| 50 |
+
# Engine notes (load-bearing for the bar):
|
| 51 |
+
# - Jeeps start `stance: 0` (HoldFire). With stance:0 they do NOT
|
| 52 |
+
# auto-return-fire on approaching enemies — sitting idle while
|
| 53 |
+
# hunt-bot e1 close in DOES NOT score kills (kill bar unmet).
|
| 54 |
+
# The only way to score is to explicitly `attack_unit` (or
|
| 55 |
+
# `attack_move`), which makes the agent's strike decision
|
| 56 |
+
# load-bearing.
|
| 57 |
+
# - Enemy `e1` at the mid-x cluster are placed at y=19/y=21 cells
|
| 58 |
+
# (verified-placement rows per CLAUDE.md — `e1` at some mid-x
|
| 59 |
+
# cells silently fails to surface; (50,19)/(50,21) are confirmed
|
| 60 |
+
# working).
|
| 61 |
+
# - Persistent unarmed enemy `fact` at far east (x=124) prevents the
|
| 62 |
+
# engine from auto-`done`ing on enemy unit wipe (which would
|
| 63 |
+
# collapse the run to DRAW before the within_ticks + region
|
| 64 |
+
# predicates evaluate cleanly on the terminal frame).
|
| 65 |
+
|
| 66 |
+
meta:
|
| 67 |
+
id: combat-skirmish-then-disengage
|
| 68 |
+
title: 'Combat Skirmish — Strike, Score the Kills, Pull Back to Recovery'
|
| 69 |
+
capability: action
|
| 70 |
+
real_world_meaning: >
|
| 71 |
+
SKIRMISHER doctrine in the single-engagement frame: four fast
|
| 72 |
+
raiders (jeeps) must drive east into a slow infantry cluster,
|
| 73 |
+
score AT LEAST 3 kills, and then PULL BACK to the recovery zone
|
| 74 |
+
around the western start before the clock expires AND while
|
| 75 |
+
keeping at least 3 raiders alive. The skill under test is the
|
| 76 |
+
decision to STOP FIGHTING and disengage — committing until the
|
| 77 |
+
enemy is wiped or until the strike force is destroyed both LOSE
|
| 78 |
+
(commit leaves the raiders at the kill site instead of the
|
| 79 |
+
recovery zone; over-commit on hard loses raiders to the
|
| 80 |
+
hunt-bot spawn waves). Distinct from the BALANCED pulsed
|
| 81 |
+
harass-retreat cycle (combat-harass-balanced-hit-and-run, which
|
| 82 |
+
is many small pulses with zero attrition): this pack is ONE big
|
| 83 |
+
engagement done well, with a positional recovery bar.
|
| 84 |
+
robotics_analogue: >
|
| 85 |
+
Mission-with-egress: a mobile manipulator must complete a
|
| 86 |
+
threshold of reward-bearing actions in a contested workspace,
|
| 87 |
+
then return to a safe staging region before a time or attrition
|
| 88 |
+
budget expires. Knowing WHEN to stop the productive sub-task
|
| 89 |
+
and start the egress is the decision under test — a
|
| 90 |
+
productivity-only policy (greedy accumulation) leaves the agent
|
| 91 |
+
far from the staging region at deadline and fails the egress
|
| 92 |
+
clause.
|
| 93 |
+
benchmark_anchor:
|
| 94 |
+
- "SC2 skirmisher tactics"
|
| 95 |
+
- "military reconnaissance-by-fire"
|
| 96 |
+
- "harass-and-disengage doctrine"
|
| 97 |
+
- "armoured cavalry doctrine"
|
| 98 |
+
author: openra-bench
|
| 99 |
+
|
| 100 |
+
base_map: rush-hour-arena
|
| 101 |
+
|
| 102 |
+
base:
|
| 103 |
+
agent: {faction: allies, cash: 0}
|
| 104 |
+
enemy: {faction: soviet, cash: 0}
|
| 105 |
+
tools: [move_units, attack_unit, attack_move, stop]
|
| 106 |
+
planning: true
|
| 107 |
+
termination: {max_ticks: 6500}
|
| 108 |
+
actors: []
|
| 109 |
+
|
| 110 |
+
levels:
|
| 111 |
+
# ── EASY ────────────────────────────────────────────────────────────
|
| 112 |
+
# Bare skirmish-then-disengage skill: 4 jeeps (stance:0, no
|
| 113 |
+
# auto-fire — kills require explicit attack_unit) vs 4× slow rifle
|
| 114 |
+
# infantry (e1, stance:0 — no auto-fire either, so a never-engage
|
| 115 |
+
# policy never scores). The strike force easily scores ≥3 kills
|
| 116 |
+
# under attack_unit (4× jeep MG vs unanswered rifles); the load-
|
| 117 |
+
# bearing decision is whether the agent then ORDERS THE RETREAT
|
| 118 |
+
# back to the recovery zone instead of mopping up the last enemy in
|
| 119 |
+
# place. stall / never-engage LOSE on the kill bar; commit-until-
|
| 120 |
+
# wiped LOSES because the jeeps end at x≈50 instead of the
|
| 121 |
+
# (5,20,r=6) recovery region.
|
| 122 |
+
easy:
|
| 123 |
+
description: >
|
| 124 |
+
Four jeeps stage at the west base (5,20). Four enemy
|
| 125 |
+
RIFLE INFANTRY (e1, slow) hold a cluster around (50,20). Drive
|
| 126 |
+
east, kill AT LEAST THREE rifle infantry, then PULL BACK so
|
| 127 |
+
AT LEAST THREE of your jeeps end inside the recovery zone (a
|
| 128 |
+
6-cell radius around (5,20) — i.e. your starting region). Keep
|
| 129 |
+
at least three jeeps alive. Finish before tick 4500. Stalling
|
| 130 |
+
LOSES (kill bar unmet); never engaging LOSES (kill bar unmet);
|
| 131 |
+
committing east and staying at the cluster LOSES (your jeeps
|
| 132 |
+
are at the kill site, not the recovery zone). The discriminator
|
| 133 |
+
is the DISENGAGE order — stop attacking and move_units back
|
| 134 |
+
to (5,20) once you have your 3 kills.
|
| 135 |
+
overrides:
|
| 136 |
+
actors:
|
| 137 |
+
# Strike force: 4 jeeps at the western staging point.
|
| 138 |
+
# stance:0 (HoldFire) — no auto-return-fire, so kills require
|
| 139 |
+
# an explicit attack_unit / attack_move order (the load-
|
| 140 |
+
# bearing decision under test).
|
| 141 |
+
- {type: jeep, owner: agent, position: [5, 19], stance: 0}
|
| 142 |
+
- {type: jeep, owner: agent, position: [5, 20], stance: 0}
|
| 143 |
+
- {type: jeep, owner: agent, position: [5, 21], stance: 0}
|
| 144 |
+
- {type: jeep, owner: agent, position: [6, 20], stance: 0}
|
| 145 |
+
# Enemy infantry cluster — 4× e1 spread across rows y=19/y=21
|
| 146 |
+
# (CLAUDE.md confirms y=19/y=21 mid-x cells place reliably).
|
| 147 |
+
# stance:0 so they sit on post — fair "discoverable cluster"
|
| 148 |
+
# for the test (a never-engage agent never gets attacked into
|
| 149 |
+
# an accidental kill).
|
| 150 |
+
- {type: e1, owner: enemy, position: [48, 19], stance: 0}
|
| 151 |
+
- {type: e1, owner: enemy, position: [50, 19], stance: 0}
|
| 152 |
+
- {type: e1, owner: enemy, position: [50, 21], stance: 0}
|
| 153 |
+
- {type: e1, owner: enemy, position: [52, 21], stance: 0}
|
| 154 |
+
# Persistent far-east enemy fact — prevents engine auto-done
|
| 155 |
+
# on enemy wipe collapsing the run to DRAW before the
|
| 156 |
+
# within_ticks + region predicates evaluate.
|
| 157 |
+
- {type: fact, owner: enemy, position: [124, 20]}
|
| 158 |
+
win_condition:
|
| 159 |
+
all_of:
|
| 160 |
+
- {units_killed_gte: 3}
|
| 161 |
+
- {own_units_gte: 3}
|
| 162 |
+
- {units_in_region_gte: {x: 5, y: 20, radius: 6, n: 3}}
|
| 163 |
+
- {within_ticks: 4500}
|
| 164 |
+
fail_condition:
|
| 165 |
+
any_of:
|
| 166 |
+
- {after_ticks: 4501}
|
| 167 |
+
- {not: {own_units_gte: 1}}
|
| 168 |
+
max_turns: 52
|
| 169 |
+
|
| 170 |
+
# ── MEDIUM ──────────────────────────────────────────────────────────
|
| 171 |
+
# +1 controlled variable: the enemy cluster grows to 6× e1 (vs 4 on
|
| 172 |
+
# easy). The kill bar (≥3) is unchanged, so the strike is still
|
| 173 |
+
# easily achievable — but the larger cluster means a commit-until-
|
| 174 |
+
# wiped policy spends MORE turns mopping up (more enemies = more
|
| 175 |
+
# rounds at the cluster), which leaves it even further from being
|
| 176 |
+
# able to RETREAT before the within_ticks deadline. The discriminator
|
| 177 |
+
# — "stop attacking after 3 kills and order the disengage" — is
|
| 178 |
+
# sharper.
|
| 179 |
+
medium:
|
| 180 |
+
description: >
|
| 181 |
+
Four jeeps stage at the west base (5,20). SIX enemy rifle
|
| 182 |
+
infantry hold a cluster around (50,20). Drive east, kill AT
|
| 183 |
+
LEAST THREE rifle infantry, then PULL BACK so AT LEAST THREE
|
| 184 |
+
of your jeeps end inside the recovery zone (6-cell radius
|
| 185 |
+
around (5,20)). Keep at least three jeeps alive. Finish
|
| 186 |
+
before tick 4500. With six enemies in the cluster a "commit
|
| 187 |
+
until everything is dead" policy spends most of the budget
|
| 188 |
+
mopping up — by the deadline your jeeps are still at the
|
| 189 |
+
kill site, not the recovery zone, and the run fails on the
|
| 190 |
+
region clause. Order the DISENGAGE after the third kill and
|
| 191 |
+
drive west to the recovery zone.
|
| 192 |
+
overrides:
|
| 193 |
+
actors:
|
| 194 |
+
- {type: jeep, owner: agent, position: [5, 19], stance: 0}
|
| 195 |
+
- {type: jeep, owner: agent, position: [5, 20], stance: 0}
|
| 196 |
+
- {type: jeep, owner: agent, position: [5, 21], stance: 0}
|
| 197 |
+
- {type: jeep, owner: agent, position: [6, 20], stance: 0}
|
| 198 |
+
# 6× e1 cluster around (50,20). Verified-placement rows
|
| 199 |
+
# (y=19/y=21 mid-x).
|
| 200 |
+
- {type: e1, owner: enemy, position: [48, 19], stance: 0}
|
| 201 |
+
- {type: e1, owner: enemy, position: [48, 21], stance: 0}
|
| 202 |
+
- {type: e1, owner: enemy, position: [50, 19], stance: 0}
|
| 203 |
+
- {type: e1, owner: enemy, position: [50, 21], stance: 0}
|
| 204 |
+
- {type: e1, owner: enemy, position: [52, 19], stance: 0}
|
| 205 |
+
- {type: e1, owner: enemy, position: [52, 21], stance: 0}
|
| 206 |
+
- {type: fact, owner: enemy, position: [124, 20]}
|
| 207 |
+
win_condition:
|
| 208 |
+
all_of:
|
| 209 |
+
- {units_killed_gte: 3}
|
| 210 |
+
- {own_units_gte: 3}
|
| 211 |
+
- {units_in_region_gte: {x: 5, y: 20, radius: 6, n: 3}}
|
| 212 |
+
- {within_ticks: 4500}
|
| 213 |
+
fail_condition:
|
| 214 |
+
any_of:
|
| 215 |
+
- {after_ticks: 4501}
|
| 216 |
+
- {not: {own_units_gte: 1}}
|
| 217 |
+
max_turns: 52
|
| 218 |
+
|
| 219 |
+
# ── HARD ────────────────────────────────────────────────────────────
|
| 220 |
+
# +2 controlled variables vs medium:
|
| 221 |
+
# 1. bot_type: hunt — the e1 cluster actively PURSUES the jeeps
|
| 222 |
+
# (jeeps remain stance:0 so they only score on explicit
|
| 223 |
+
# attack orders; the hunt bot turns the engagement into a
|
| 224 |
+
# tightening window — a slow retreat or commit-and-stay loses
|
| 225 |
+
# jeeps past own_units_gte:3). Spec's "hunt-bot pursues".
|
| 226 |
+
# 2. Two agent spawn_point groups (NORTH y=10 or SOUTH y=30)
|
| 227 |
+
# round-robined by seed; the recovery zone is `any_of` over the
|
| 228 |
+
# two spawn corners so the agent must return to ITS OWN start
|
| 229 |
+
# corner (no "always retreat to (5,20)" memorisation). Spec's
|
| 230 |
+
# "2 spawn groups".
|
| 231 |
+
# Enemy actors do NOT honour spawn_point (CLAUDE.md), so the e1
|
| 232 |
+
# cluster sits symmetrically at the mid-latitude (y=20) — both
|
| 233 |
+
# spawn corridors face the same eastern threat geometry. The
|
| 234 |
+
# cluster size stays at 6 (matching medium); the hunt bot is the
|
| 235 |
+
# threat-axis upgrade, not raw enemy count — extra waves would
|
| 236 |
+
# overwhelm 4 jeeps before any disengage could complete (verified
|
| 237 |
+
# 2026-05-20: +4 extra e1 at x≈90 + hunt drops the intended-policy
|
| 238 |
+
# win rate to ~0% as the swarm closes inside 5 turns).
|
| 239 |
+
hard:
|
| 240 |
+
description: >
|
| 241 |
+
Four jeeps stage at ONE of two western staging points (NORTH
|
| 242 |
+
(5,10) or SOUTH (5,30), chosen by seed — anti-memorisation).
|
| 243 |
+
Six enemy RIFLE INFANTRY (e1) sit at a cluster around
|
| 244 |
+
(50,20). The enemy side is HUNTING — surviving e1 actively
|
| 245 |
+
pursue your jeeps. Kill AT LEAST THREE rifle infantry, keep
|
| 246 |
+
at least three jeeps alive, AND end with at least three
|
| 247 |
+
jeeps inside the recovery zone (6-cell radius around YOUR
|
| 248 |
+
spawn corner, either (5,10) or (5,30)). Finish before tick
|
| 249 |
+
4500. Stalling, never engaging, and commit-and-stay all
|
| 250 |
+
LOSE; the hunt bot ensures that a slow disengage also fails
|
| 251 |
+
on the survival or region clause.
|
| 252 |
+
overrides:
|
| 253 |
+
actors:
|
| 254 |
+
# spawn_point 0 — NORTH staging (y=10)
|
| 255 |
+
- {type: jeep, owner: agent, position: [5, 9], stance: 0, spawn_point: 0}
|
| 256 |
+
- {type: jeep, owner: agent, position: [5, 10], stance: 0, spawn_point: 0}
|
| 257 |
+
- {type: jeep, owner: agent, position: [5, 11], stance: 0, spawn_point: 0}
|
| 258 |
+
- {type: jeep, owner: agent, position: [6, 10], stance: 0, spawn_point: 0}
|
| 259 |
+
# spawn_point 1 — SOUTH staging (y=30)
|
| 260 |
+
- {type: jeep, owner: agent, position: [5, 29], stance: 0, spawn_point: 1}
|
| 261 |
+
- {type: jeep, owner: agent, position: [5, 30], stance: 0, spawn_point: 1}
|
| 262 |
+
- {type: jeep, owner: agent, position: [5, 31], stance: 0, spawn_point: 1}
|
| 263 |
+
- {type: jeep, owner: agent, position: [6, 30], stance: 0, spawn_point: 1}
|
| 264 |
+
# 6× e1 cluster at (50,20). Hunt bot gives them stance:3 on
|
| 265 |
+
# init and issues Attack orders that drive them west toward
|
| 266 |
+
# the jeeps; the infantry walk to contact takes ~6-8 turns.
|
| 267 |
+
- {type: e1, owner: enemy, position: [48, 19], stance: 0}
|
| 268 |
+
- {type: e1, owner: enemy, position: [48, 21], stance: 0}
|
| 269 |
+
- {type: e1, owner: enemy, position: [50, 19], stance: 0}
|
| 270 |
+
- {type: e1, owner: enemy, position: [50, 21], stance: 0}
|
| 271 |
+
- {type: e1, owner: enemy, position: [52, 19], stance: 0}
|
| 272 |
+
- {type: e1, owner: enemy, position: [52, 21], stance: 0}
|
| 273 |
+
# Persistent far-east enemy fact.
|
| 274 |
+
- {type: fact, owner: enemy, position: [124, 20]}
|
| 275 |
+
enemy: {faction: soviet, cash: 0, bot_type: hunt}
|
| 276 |
+
# Hard win: recovery zone is `any_of` over the two spawn corners
|
| 277 |
+
# — the agent must return to ITS OWN start corner. (A wrong-corner
|
| 278 |
+
# return is geometrically infeasible inside the tick budget, but
|
| 279 |
+
# encoded for predicate clarity.)
|
| 280 |
+
win_condition:
|
| 281 |
+
all_of:
|
| 282 |
+
- {units_killed_gte: 3}
|
| 283 |
+
- {own_units_gte: 3}
|
| 284 |
+
- any_of:
|
| 285 |
+
- {units_in_region_gte: {x: 5, y: 10, radius: 6, n: 3}}
|
| 286 |
+
- {units_in_region_gte: {x: 5, y: 30, radius: 6, n: 3}}
|
| 287 |
+
- {within_ticks: 4500}
|
| 288 |
+
fail_condition:
|
| 289 |
+
any_of:
|
| 290 |
+
- {after_ticks: 4501}
|
| 291 |
+
- {not: {own_units_gte: 1}}
|
| 292 |
+
max_turns: 52
|
|
@@ -0,0 +1,370 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""combat-skirmish-then-disengage — ONE coordinated strike-then-pull-back.
|
| 2 |
+
|
| 3 |
+
Bar: the intended skirmish-then-disengage policy WINS on every level
|
| 4 |
+
and every hard seed; stall (only observe), never-engage (park at
|
| 5 |
+
start), and commit-until-overwhelmed (charge east and never retreat)
|
| 6 |
+
LOSE on every level. Non-win is a real reachable timeout LOSS (not a
|
| 7 |
+
draw).
|
| 8 |
+
|
| 9 |
+
Validation is scripted (no model / network): the four policies below
|
| 10 |
+
are the exhaustive proxies for the four real strategies and exercise
|
| 11 |
+
the predicate teeth directly. The load-bearing decision under test is
|
| 12 |
+
"stop attacking after the kill bar is met and order the disengage
|
| 13 |
+
back to the recovery zone before the deadline".
|
| 14 |
+
"""
|
| 15 |
+
|
| 16 |
+
from __future__ import annotations
|
| 17 |
+
|
| 18 |
+
from pathlib import Path
|
| 19 |
+
|
| 20 |
+
import pytest
|
| 21 |
+
|
| 22 |
+
pytest.importorskip("openra_rl_training", reason="Rust env wheel not installed")
|
| 23 |
+
from openra_bench.scenarios import load_pack
|
| 24 |
+
from openra_bench.scenarios.loader import compile_level
|
| 25 |
+
from openra_bench.scenarios.win_conditions import WinContext, evaluate
|
| 26 |
+
|
| 27 |
+
PACKS = Path(__file__).parent.parent / "openra_bench" / "scenarios" / "packs"
|
| 28 |
+
PACK_PATH = PACKS / "combat-skirmish-then-disengage.yaml"
|
| 29 |
+
|
| 30 |
+
|
| 31 |
+
# ── unit-level predicate checks ──────────────────────────────────────
|
| 32 |
+
|
| 33 |
+
def _ctx(units_xy=(), tick=1000, killed=0, lost=0):
|
| 34 |
+
"""Synthesize a WinContext for predicate-level checks."""
|
| 35 |
+
import types
|
| 36 |
+
|
| 37 |
+
sig = types.SimpleNamespace(
|
| 38 |
+
game_tick=tick,
|
| 39 |
+
units_killed=killed,
|
| 40 |
+
units_lost=lost,
|
| 41 |
+
own_buildings=[],
|
| 42 |
+
own_building_types=set(),
|
| 43 |
+
enemies_seen_ids=set(),
|
| 44 |
+
enemy_buildings_seen_ids=set(),
|
| 45 |
+
)
|
| 46 |
+
return WinContext(
|
| 47 |
+
signals=sig,
|
| 48 |
+
render_state={
|
| 49 |
+
"units_summary": [
|
| 50 |
+
{"cell_x": x, "cell_y": y} for x, y in units_xy
|
| 51 |
+
]
|
| 52 |
+
},
|
| 53 |
+
)
|
| 54 |
+
|
| 55 |
+
|
| 56 |
+
def test_predicates_easy_recovery_clause():
|
| 57 |
+
c = compile_level(load_pack(PACK_PATH), "easy")
|
| 58 |
+
home = [(5, 20), (5, 20), (5, 20), (5, 20)]
|
| 59 |
+
cluster = [(50, 20), (50, 20), (50, 20), (50, 20)]
|
| 60 |
+
mixed_3_home = [(5, 20), (5, 20), (5, 20), (50, 20)]
|
| 61 |
+
|
| 62 |
+
# Intended: 3+ kills, ≥3 alive, ≥3 in recovery → WIN
|
| 63 |
+
assert evaluate(c.win_condition, _ctx(home, tick=2000, killed=3, lost=0))
|
| 64 |
+
assert evaluate(c.win_condition, _ctx(mixed_3_home, tick=2000, killed=4, lost=0))
|
| 65 |
+
# Kill bar met but all units still at the kill site → fail region clause
|
| 66 |
+
assert not evaluate(c.win_condition, _ctx(cluster, tick=2000, killed=4, lost=0))
|
| 67 |
+
# 3 kills but only 2 own_units → predicate fails
|
| 68 |
+
assert not evaluate(c.win_condition, _ctx(home[:2], tick=2000, killed=3, lost=2))
|
| 69 |
+
# 0 kills → predicate fails even if everyone is at home
|
| 70 |
+
assert not evaluate(c.win_condition, _ctx(home, tick=2000, killed=0, lost=0))
|
| 71 |
+
# Past deadline → real loss, reachable within max_turns
|
| 72 |
+
assert evaluate(c.fail_condition, _ctx(home, tick=4502, killed=0, lost=0))
|
| 73 |
+
assert 4501 <= 93 + 90 * (c.max_turns - 1), (
|
| 74 |
+
"after_ticks 4501 must be reachable within max_turns"
|
| 75 |
+
)
|
| 76 |
+
|
| 77 |
+
|
| 78 |
+
def test_predicates_medium_same_bar_six_enemies():
|
| 79 |
+
c = compile_level(load_pack(PACK_PATH), "medium")
|
| 80 |
+
home = [(5, 20), (5, 20), (5, 20), (5, 20)]
|
| 81 |
+
cluster = [(50, 20), (50, 20), (50, 20), (50, 20)]
|
| 82 |
+
|
| 83 |
+
# Intended: 3+ kills, ≥3 alive, ≥3 in recovery → WIN
|
| 84 |
+
assert evaluate(c.win_condition, _ctx(home, tick=3000, killed=3, lost=0))
|
| 85 |
+
# Commit-and-stay: kill bar met but jeeps at cluster, not home → fail
|
| 86 |
+
assert not evaluate(c.win_condition, _ctx(cluster, tick=3000, killed=6, lost=0))
|
| 87 |
+
# Past deadline → real loss, reachable
|
| 88 |
+
assert evaluate(c.fail_condition, _ctx(home, tick=4502, killed=0, lost=0))
|
| 89 |
+
assert 4501 <= 93 + 90 * (c.max_turns - 1)
|
| 90 |
+
|
| 91 |
+
|
| 92 |
+
def test_predicates_hard_any_of_spawn_corner_recovery():
|
| 93 |
+
c = compile_level(load_pack(PACK_PATH), "hard")
|
| 94 |
+
home_north = [(5, 10), (5, 10), (5, 10), (5, 10)]
|
| 95 |
+
home_south = [(5, 30), (5, 30), (5, 30), (5, 30)]
|
| 96 |
+
mid_lat = [(5, 20), (5, 20), (5, 20), (5, 20)] # neither corner
|
| 97 |
+
cluster = [(50, 20), (50, 20), (50, 20), (50, 20)]
|
| 98 |
+
|
| 99 |
+
# Either spawn corner satisfies the any_of recovery clause.
|
| 100 |
+
assert evaluate(c.win_condition, _ctx(home_north, tick=3000, killed=3, lost=0))
|
| 101 |
+
assert evaluate(c.win_condition, _ctx(home_south, tick=3000, killed=3, lost=0))
|
| 102 |
+
# Mid-latitude (y=20) is outside BOTH spawn-corner radii (radius=6
|
| 103 |
+
# from (5,10) ⇒ y=20 is 10 cells away; same from (5,30)) → fail.
|
| 104 |
+
assert not evaluate(c.win_condition, _ctx(mid_lat, tick=3000, killed=3, lost=0))
|
| 105 |
+
# Commit-and-stay at cluster → fail region clause.
|
| 106 |
+
assert not evaluate(c.win_condition, _ctx(cluster, tick=3000, killed=6, lost=0))
|
| 107 |
+
# Past deadline → real loss, reachable
|
| 108 |
+
assert evaluate(c.fail_condition, _ctx(home_north, tick=4502, killed=0, lost=0))
|
| 109 |
+
assert 4501 <= 93 + 90 * (c.max_turns - 1)
|
| 110 |
+
|
| 111 |
+
|
| 112 |
+
def test_hard_has_two_spawn_point_groups():
|
| 113 |
+
"""Hard-tier curation contract: ≥2 distinct agent spawn_point
|
| 114 |
+
groups so the seed round-robins the raider start corner."""
|
| 115 |
+
c = compile_level(load_pack(PACK_PATH), "hard")
|
| 116 |
+
groups = {
|
| 117 |
+
(a.spawn_point if a.spawn_point is not None else 0)
|
| 118 |
+
for a in c.scenario.actors
|
| 119 |
+
if a.owner == "agent"
|
| 120 |
+
}
|
| 121 |
+
assert len(groups) >= 2, f"hard needs ≥2 spawn_point groups, got {groups}"
|
| 122 |
+
|
| 123 |
+
|
| 124 |
+
def test_pack_compiles_and_meta_fields_populated():
|
| 125 |
+
pack = load_pack(PACK_PATH)
|
| 126 |
+
assert pack.meta.capability == "action"
|
| 127 |
+
assert pack.meta.id == "combat-skirmish-then-disengage"
|
| 128 |
+
anchors = pack.meta.benchmark_anchor
|
| 129 |
+
assert isinstance(anchors, list) and anchors, "benchmark_anchor required"
|
| 130 |
+
joined = " ".join(anchors).lower()
|
| 131 |
+
# Anchored to the doctrines the brief calls out: SC2 skirmisher +
|
| 132 |
+
# military reconnaissance-by-fire / cavalry doctrine.
|
| 133 |
+
assert "skirmish" in joined
|
| 134 |
+
assert "recon" in joined or "cavalry" in joined or "disengage" in joined
|
| 135 |
+
for lvl in ("easy", "medium", "hard"):
|
| 136 |
+
c = compile_level(pack, lvl)
|
| 137 |
+
assert c.map_supported
|
| 138 |
+
assert c.win_condition is not None and c.fail_condition is not None
|
| 139 |
+
|
| 140 |
+
|
| 141 |
+
def test_timeout_loss_is_reachable_on_every_level():
|
| 142 |
+
"""No draw degeneracy: after_ticks 4501 fits inside max_turns on
|
| 143 |
+
every level (∼90 ticks/turn ⇒ 93 + 90·(max_turns-1))."""
|
| 144 |
+
pack = load_pack(PACK_PATH)
|
| 145 |
+
for lvl in ("easy", "medium", "hard"):
|
| 146 |
+
c = compile_level(pack, lvl)
|
| 147 |
+
assert 4501 <= 93 + 90 * (c.max_turns - 1), lvl
|
| 148 |
+
|
| 149 |
+
|
| 150 |
+
# ── engine-driven scripted policies ──────────────────────────────────
|
| 151 |
+
#
|
| 152 |
+
# Stage thresholds for the intended policy:
|
| 153 |
+
# PHASE_STRIKE_UNTIL_TICK = 1300 — drive east, attack_unit any
|
| 154 |
+
# visible e1 until this tick (~14 turns @ ~90 ticks/turn). At
|
| 155 |
+
# this point the kill bar (≥3) is comfortably met and the
|
| 156 |
+
# disengage budget (4500 - 1300 = ~3200 ticks ≈ 35 turns) is
|
| 157 |
+
# more than enough to march back to (5,20)-ish.
|
| 158 |
+
# APPROACH_STEP = 15 cells / turn east toward the cluster axis
|
| 159 |
+
# (jeep ~50% faster than infantry).
|
| 160 |
+
|
| 161 |
+
PHASE_STRIKE_UNTIL_TICK = 1300
|
| 162 |
+
APPROACH_STEP = 15
|
| 163 |
+
APPROACH_LIMIT_X = 46 # don't overrun the cluster (~x=50)
|
| 164 |
+
|
| 165 |
+
|
| 166 |
+
def _e1_foes(enemies):
|
| 167 |
+
return [
|
| 168 |
+
e for e in enemies
|
| 169 |
+
if (e.get("type") or "").lower() == "e1"
|
| 170 |
+
and not e.get("is_building")
|
| 171 |
+
]
|
| 172 |
+
|
| 173 |
+
|
| 174 |
+
def _stall_policy(rs, Command):
|
| 175 |
+
"""Stall: only observe. Kill bar never met (jeeps are stance:0;
|
| 176 |
+
no auto-return-fire) → LOSS on the clock; on hard the hunt-bot
|
| 177 |
+
e1 close on the idle stack and wipe it → LOSS on
|
| 178 |
+
`not own_units_gte:1`."""
|
| 179 |
+
return [Command.observe()]
|
| 180 |
+
|
| 181 |
+
|
| 182 |
+
def _never_engage_policy(rs, Command):
|
| 183 |
+
"""Park at the start; never move east, never fire. Recovery
|
| 184 |
+
region clause is trivially satisfied but the kill bar is unmet
|
| 185 |
+
→ LOSS on the clock (easy/medium) or LOSS on hard when hunt-bot
|
| 186 |
+
e1 wipe the idle stack."""
|
| 187 |
+
units = rs.get("units_summary", []) or []
|
| 188 |
+
if not units:
|
| 189 |
+
return [Command.observe()]
|
| 190 |
+
cmds = []
|
| 191 |
+
for u in units:
|
| 192 |
+
cmds.append(
|
| 193 |
+
Command.move_units(
|
| 194 |
+
[str(u["id"])], target_x=u["cell_x"], target_y=u["cell_y"]
|
| 195 |
+
)
|
| 196 |
+
)
|
| 197 |
+
return cmds
|
| 198 |
+
|
| 199 |
+
|
| 200 |
+
def _commit_until_overwhelmed_policy(rs, Command):
|
| 201 |
+
"""Charge east; attack_unit any visible foe; never retreat. The
|
| 202 |
+
kill bar IS met (4× jeep MG vs stance:0 rifles), but the jeeps
|
| 203 |
+
end the run sitting at the kill site (~x=50), not in the
|
| 204 |
+
recovery region. The region clause fails → after_ticks LOSS.
|
| 205 |
+
"""
|
| 206 |
+
units = rs.get("units_summary", []) or []
|
| 207 |
+
enemies = rs.get("enemy_summary", []) or []
|
| 208 |
+
if not units:
|
| 209 |
+
return [Command.observe()]
|
| 210 |
+
foes = _e1_foes(enemies)
|
| 211 |
+
cmds = []
|
| 212 |
+
for u in units:
|
| 213 |
+
ux, uy = u["cell_x"], u["cell_y"]
|
| 214 |
+
if foes:
|
| 215 |
+
t = min(
|
| 216 |
+
foes,
|
| 217 |
+
key=lambda e: (e["cell_x"] - ux) ** 2 + (e["cell_y"] - uy) ** 2,
|
| 218 |
+
)
|
| 219 |
+
cmds.append(Command.attack_unit([str(u["id"])], str(t["id"])))
|
| 220 |
+
else:
|
| 221 |
+
# March east to the cluster axis but STOP at the cluster
|
| 222 |
+
# (don't overrun to the far-east fact and trip auto-done).
|
| 223 |
+
cmds.append(
|
| 224 |
+
Command.move_units(
|
| 225 |
+
[str(u["id"])], target_x=min(50, ux + 12), target_y=uy
|
| 226 |
+
)
|
| 227 |
+
)
|
| 228 |
+
return cmds
|
| 229 |
+
|
| 230 |
+
|
| 231 |
+
def _intended_skirmish_then_disengage_policy(rs, Command):
|
| 232 |
+
"""Intended skirmisher cycle:
|
| 233 |
+
- PHASE 1 (tick < PHASE_STRIKE_UNTIL_TICK): drive east, attack_unit
|
| 234 |
+
any visible e1.
|
| 235 |
+
- PHASE 2 (tick >= PHASE_STRIKE_UNTIL_TICK): stop attacking; order
|
| 236 |
+
move_units back to the nearest spawn corner — the RECOVERY zone.
|
| 237 |
+
The phase switch is the spec's load-bearing decision: "stop
|
| 238 |
+
fighting and pull back" before the deadline.
|
| 239 |
+
"""
|
| 240 |
+
units = rs.get("units_summary", []) or []
|
| 241 |
+
enemies = rs.get("enemy_summary", []) or []
|
| 242 |
+
tick = rs.get("game_tick") or 0
|
| 243 |
+
if not units:
|
| 244 |
+
return [Command.observe()]
|
| 245 |
+
foes = _e1_foes(enemies)
|
| 246 |
+
# Pick the nearest spawn-corner candidate as the recovery target
|
| 247 |
+
# (stateless — works for both single-corner and any_of-corner
|
| 248 |
+
# recovery clauses).
|
| 249 |
+
candidates = [(5, 20), (5, 10), (5, 30)]
|
| 250 |
+
cx = sum(u["cell_x"] for u in units) / len(units)
|
| 251 |
+
cy = sum(u["cell_y"] for u in units) / len(units)
|
| 252 |
+
home = min(
|
| 253 |
+
candidates, key=lambda p: (p[0] - cx) ** 2 + (p[1] - cy) ** 2
|
| 254 |
+
)
|
| 255 |
+
cmds = []
|
| 256 |
+
if tick < PHASE_STRIKE_UNTIL_TICK:
|
| 257 |
+
if foes:
|
| 258 |
+
for u in units:
|
| 259 |
+
ux, uy = u["cell_x"], u["cell_y"]
|
| 260 |
+
t = min(
|
| 261 |
+
foes,
|
| 262 |
+
key=lambda e: (e["cell_x"] - ux) ** 2
|
| 263 |
+
+ (e["cell_y"] - uy) ** 2,
|
| 264 |
+
)
|
| 265 |
+
cmds.append(
|
| 266 |
+
Command.attack_unit([str(u["id"])], str(t["id"]))
|
| 267 |
+
)
|
| 268 |
+
else:
|
| 269 |
+
# No foes in sight yet — drive east toward the cluster
|
| 270 |
+
# axis. Cap at APPROACH_LIMIT_X so the strike force
|
| 271 |
+
# doesn't overrun past the cluster.
|
| 272 |
+
for u in units:
|
| 273 |
+
ux, uy = u["cell_x"], u["cell_y"]
|
| 274 |
+
cmds.append(
|
| 275 |
+
Command.move_units(
|
| 276 |
+
[str(u["id"])],
|
| 277 |
+
target_x=min(APPROACH_LIMIT_X, ux + APPROACH_STEP),
|
| 278 |
+
target_y=uy,
|
| 279 |
+
)
|
| 280 |
+
)
|
| 281 |
+
else:
|
| 282 |
+
# PHASE 2: PULL BACK. Stop fighting; drive home.
|
| 283 |
+
for u in units:
|
| 284 |
+
cmds.append(
|
| 285 |
+
Command.move_units(
|
| 286 |
+
[str(u["id"])], target_x=home[0], target_y=home[1]
|
| 287 |
+
)
|
| 288 |
+
)
|
| 289 |
+
return cmds
|
| 290 |
+
|
| 291 |
+
|
| 292 |
+
# ── policy bar tests ────────────────────────────────────────────────
|
| 293 |
+
|
| 294 |
+
|
| 295 |
+
@pytest.mark.parametrize("level", ["easy", "medium", "hard"])
|
| 296 |
+
def test_stall_loses(level):
|
| 297 |
+
"""Stall must LOSE on every level: jeeps are stance:0 so they
|
| 298 |
+
never return fire (kill bar unmet); on hard the hunt-bot e1
|
| 299 |
+
close on the idle stack and trip `not own_units_gte:1`."""
|
| 300 |
+
pytest.importorskip("openra_train")
|
| 301 |
+
from openra_bench.eval_core import run_level
|
| 302 |
+
|
| 303 |
+
c = compile_level(load_pack(PACK_PATH), level)
|
| 304 |
+
seeds = (1, 2, 3, 4) if level == "hard" else (1,)
|
| 305 |
+
for s in seeds:
|
| 306 |
+
res = run_level(c, _stall_policy, seed=s)
|
| 307 |
+
assert res.outcome == "loss", (
|
| 308 |
+
f"{level} seed={s}: stall must LOSE, got {res.outcome} "
|
| 309 |
+
f"killed={res.signals.units_killed} lost={res.signals.units_lost}"
|
| 310 |
+
)
|
| 311 |
+
|
| 312 |
+
|
| 313 |
+
@pytest.mark.parametrize("level", ["easy", "medium", "hard"])
|
| 314 |
+
def test_never_engage_loses(level):
|
| 315 |
+
"""Park-at-start must LOSE: kill bar unmet; on hard hunt-bot e1
|
| 316 |
+
wipe the idle stack."""
|
| 317 |
+
pytest.importorskip("openra_train")
|
| 318 |
+
from openra_bench.eval_core import run_level
|
| 319 |
+
|
| 320 |
+
c = compile_level(load_pack(PACK_PATH), level)
|
| 321 |
+
seeds = (1, 2, 3, 4) if level == "hard" else (1,)
|
| 322 |
+
for s in seeds:
|
| 323 |
+
res = run_level(c, _never_engage_policy, seed=s)
|
| 324 |
+
assert res.outcome == "loss", (
|
| 325 |
+
f"{level} seed={s}: never-engage must LOSE, got {res.outcome} "
|
| 326 |
+
f"killed={res.signals.units_killed} lost={res.signals.units_lost}"
|
| 327 |
+
)
|
| 328 |
+
|
| 329 |
+
|
| 330 |
+
@pytest.mark.parametrize("level", ["easy", "medium", "hard"])
|
| 331 |
+
def test_commit_until_overwhelmed_loses(level):
|
| 332 |
+
"""Commit-and-stay at the cluster must LOSE on every level: the
|
| 333 |
+
kill bar IS met but the jeeps end the run at the kill site
|
| 334 |
+
(~x=50), not the recovery region around the start. The region
|
| 335 |
+
clause fails → after_ticks LOSS."""
|
| 336 |
+
pytest.importorskip("openra_train")
|
| 337 |
+
from openra_bench.eval_core import run_level
|
| 338 |
+
|
| 339 |
+
c = compile_level(load_pack(PACK_PATH), level)
|
| 340 |
+
seeds = (1, 2, 3, 4) if level == "hard" else (1,)
|
| 341 |
+
for s in seeds:
|
| 342 |
+
res = run_level(c, _commit_until_overwhelmed_policy, seed=s)
|
| 343 |
+
assert res.outcome == "loss", (
|
| 344 |
+
f"{level} seed={s}: commit-and-stay must LOSE, got "
|
| 345 |
+
f"{res.outcome} killed={res.signals.units_killed} "
|
| 346 |
+
f"lost={res.signals.units_lost}"
|
| 347 |
+
)
|
| 348 |
+
|
| 349 |
+
|
| 350 |
+
@pytest.mark.parametrize("level", ["easy", "medium", "hard"])
|
| 351 |
+
def test_intended_skirmish_then_disengage_wins(level):
|
| 352 |
+
"""Intended skirmisher (strike phase → disengage phase) must
|
| 353 |
+
WIN on every level and every hard seed: kill bar met, ≥3 jeeps
|
| 354 |
+
alive, ≥3 jeeps inside the spawn-corner recovery region, all
|
| 355 |
+
inside the 4500-tick budget."""
|
| 356 |
+
pytest.importorskip("openra_train")
|
| 357 |
+
from openra_bench.eval_core import run_level
|
| 358 |
+
|
| 359 |
+
c = compile_level(load_pack(PACK_PATH), level)
|
| 360 |
+
seeds = (1, 2, 3, 4) if level == "hard" else (1,)
|
| 361 |
+
for s in seeds:
|
| 362 |
+
res = run_level(
|
| 363 |
+
c, _intended_skirmish_then_disengage_policy, seed=s
|
| 364 |
+
)
|
| 365 |
+
assert res.outcome == "win", (
|
| 366 |
+
f"{level} seed={s}: intended skirmish-then-disengage should "
|
| 367 |
+
f"WIN, got {res.outcome} after {res.turns} turns "
|
| 368 |
+
f"(killed={res.signals.units_killed}, "
|
| 369 |
+
f"lost={res.signals.units_lost})"
|
| 370 |
+
)
|
|
@@ -200,6 +200,19 @@ UPGRADED = [
|
|
| 200 |
# flips per seed and no memorised "retreat west on y=20" opening
|
| 201 |
# generalises.
|
| 202 |
"combat-kite-jeep-vs-tank",
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 203 |
# Wave-4 Group B TURTLE node of the expansion triple (SC2 fortress
|
| 204 |
# macro / 1-base mass-defence; military fortress doctrine; risk-
|
| 205 |
# averse single-market deep-investment anchor). Hard tier defines
|
|
|
|
| 200 |
# flips per seed and no memorised "retreat west on y=20" opening
|
| 201 |
# generalises.
|
| 202 |
"combat-kite-jeep-vs-tank",
|
| 203 |
+
# Wave-6 combat-micro skirmish pack (SC2 skirmisher tactics /
|
| 204 |
+
# military reconnaissance-by-fire anchor). One coordinated
|
| 205 |
+
# strike-then-pull-back; the load-bearing decision is "stop
|
| 206 |
+
# attacking after the kill bar is met and order the disengage
|
| 207 |
+
# back to the spawn-corner recovery zone before the deadline".
|
| 208 |
+
# Hard tier defines two agent spawn_point groups (NORTH (5,10)
|
| 209 |
+
# vs SOUTH (5,30)) round-robined by seed; the recovery clause is
|
| 210 |
+
# `any_of` over the two spawn-corner regions so the agent must
|
| 211 |
+
# return to ITS OWN start corner. Hunt-bot pursuit (e1 cluster
|
| 212 |
+
# attacks-anything) makes a slow-disengage policy also LOSE on
|
| 213 |
+
# the survival bar — the "stop fighting and pull back" call is
|
| 214 |
+
# mandatory on every seed.
|
| 215 |
+
"combat-skirmish-then-disengage",
|
| 216 |
# Wave-4 Group B TURTLE node of the expansion triple (SC2 fortress
|
| 217 |
# macro / 1-base mass-defence; military fortress doctrine; risk-
|
| 218 |
# averse single-market deep-investment anchor). Hard tier defines
|