yxc20098 commited on
Commit
e05ae9b
·
1 Parent(s): 7520dea

fix(scenario): combat-tank-vs-tank-engagement — recalibrate after engine movement fixes

Browse files
openra_bench/scenarios/packs/combat-tank-vs-tank-engagement.yaml CHANGED
@@ -1,105 +1,65 @@
1
- # combat-tank-vs-tank-engagement — Mirror tank trade: WIN by focus-fire
2
- # (concentrate ALL 3 agent tanks on ONE enemy at a time), LOSE by
3
- # spread-fire (each tank picks its own nearest enemy). Lanchester
4
- # square law on a 3-vs-3 medium-tank engagement.
5
  #
6
  # Wave-7 ACTION pack (capability: action — combat micro: target
7
  # prioritization / focus-fire discipline).
8
  #
9
  # Real-world / benchmark anchors:
10
- # - SC2 mirror micro (siege-tank-vs-siege-tank, marine-vs-marine):
11
- # the side that concentrates fire on one target at a time wins the
12
- # trade; the side that spreads damage across the whole enemy line
13
- # trades 1-for-1 and loses the survivor count.
14
- # - Lanchester's SQUARE LAW of attrition: combat power of a focused
15
- # force scales as (per-kill removes one enemy's OUTPUT DPS for
16
- # the rest of the fight); spreading fire collapses to the LINEAR
17
- # law (mutual 1-for-1 annihilation).
18
- # - Military "CONCENTRATION OF FORCE" doctrine (one of the Principles
19
- # of War): a smaller or equal force concentrated at the decisive
20
- # point can defeat a numerically equivalent dispersed enemy.
21
  #
22
- # Design ASYMMETRIC GEOMETRY (engineered to discriminate):
23
- # Prior MIRROR geometry (3 agents at x=30 stacked y=18..22 vs 3
24
- # enemies at x=38 stacked y=18..22) was found engine-fragile: when
25
- # both sides start fully in cannon range with identical rows, agent
26
- # stance:1 ReturnFire causes the auto-target to collapse onto
27
- # whichever single enemy fired first, so the spread-fire wrong-play
28
- # accidentally focus-fires and the discrimination disappears. THIS
29
- # pack uses ASYMMETRIC geometry:
30
- # - Agent strike force: 3 medium tanks BUNCHED at (30,19..21) on
31
- # a single column (one centroid, one engagement axis).
32
- # - Enemy mirror: 3 medium tanks SPREAD across three latitudes
33
- # at (50,15), (51,20), (50,25) — three distinct rows along the
34
- # eastern engagement line. (Centre enemy at x=51 NOT (50,20)
35
- # per the CLAUDE.md-documented silent-placement-fail at (50,20).)
36
- # Each enemy is initially OUT of agent cannon range (cannon ≈ 5,
37
- # agent at x=30 vs enemy at x=50/51 MD 20), so tanks must close
38
- # into engagement at which point the asymmetry bites:
39
- # * SPREAD policy (each agent tank attack_units its OWN nearest
40
- # enemy): tank at (30,19) sees (50,15) at MD=24, (51,20) at
41
- # MD=22, (50,25) at MD=26 targets (51,20); tank at (30,21)
42
- # targets (51,20); tank at (30,20) targets (51,20) — actually
43
- # all 3 tanks target the centre INITIALLY, but as they advance
44
- # and the centre dies, the surviving tanks fan out to chase
45
- # the flank enemies one each once the spread-fire chase
46
- # begins, the trade collapses to 1-vs-1 duels and Lanchester
47
- # linear ⇒ 2 of the 3 agent tanks die in the flank engagements.
48
- # * FOCUS policy (ALL 3 tanks attack_unit the SAME target in
49
- # sequence — closest first, then a flank, then the last):
50
- # 3-vs-1 concentrated cannon fire ends each enemy in ~1-2
51
- # decision turns; after kill #1 the trade is 3-vs-2 (Lanchester
52
- # surplus 3²−2² = 5), after kill #2 it is 3-vs-1; all 3 agent
53
- # tanks survive.
54
  #
55
- # Discrimination bar (four scripted-policy proxies see test file):
56
- # stall (only observe): tanks idle at (30,*), enemies at (50,*)
57
- # stay out of cannon range. Nothing dies on either side kill
58
- # bar unmet after_ticks LOSS at tick 2401.
59
- # brute attack_move(51,20) (drive into the centre): tanks
60
- # auto-target the nearest visible enemy en route (per CLAUDE.md
61
- # "move_units auto-fires opportunistically en route regardless of
62
- # agent stance"); they reach the engagement zone and trade with
63
- # all 3 enemies firing back simultaneously ≥2 agent tanks die ⇒
64
- # LOSS via the survival cap (own_units_gte:2 on medium, force-wipe
65
- # trips own_units_gte:1 on easy).
66
- # • spread-attack-closest (each tank attack_units its own closest
67
- # enemy): as above — once the centre dies, surviving tanks chase
68
- # flank enemies on 1-vs-1 duels; Lanchester linear ⇒ 2 of 3 tanks
69
- # die. On EASY (own_units_gte:1) the 1 survivor squeaks through
70
- # and SPREAD wins (forgiving bare-skill tier, per the
71
- # SCENARIO_REVIEW_CHECKLIST inert-easy-teeth convention). On
72
- # MEDIUM (own_units_gte:2) the 1 survivor is below the bar ⇒
73
- # LOSS — this is the load-bearing discrimination.
74
- # • intended focus-fire (ALL 3 tanks attack_unit the SAME target
75
- # each turn, starting with the closest enemy by agent centroid,
76
- # then re-targeting the next-closest as enemies fall): all 3
77
- # enemies die in ~700-900 ticks, all 3 agent tanks alive at the
78
- # end ⇒ WIN on every level.
79
- #
80
- # Win-bar relaxation note (RELAXED per the task brief): on HARD the
81
- # survival cap holds at own_units_gte:2 nominally, but the asymmetric
82
- # discrimination weakens when the agent stack starts on a FLANK
83
- # latitude (NORTH y=11..13 or SOUTH y=27..29) — from a flank the
84
- # enemy line at y=15/20/25 has a unique closest enemy that all agent
85
- # tanks naturally target (spread ≡ focus). Hard's discrimination is
86
- # therefore primarily KILL-SPEED (within_ticks 1200) + brute / stall
87
- # anti-cheat teeth + spawn-variation generalisation across NORTH and
88
- # SOUTH approach axes — the focus-fire skill is what generalises;
89
- # spread-as-focus on a flank is acceptable because it IS the
90
- # intended capability when the geometry collapses to a unique
91
- # closest target.
92
  #
93
  # Hard-tier spawn-variation (≥2 spawn_point groups, registered in
94
  # tests/test_hard_tier.py::UPGRADED):
95
  # - NORTH staging y=11..13 (agent at (30,11..13)).
96
  # - SOUTH staging y=27..29 (agent at (30,27..29)).
97
- # The asymmetric enemy line (3 enemies at y=15/y=20/y=25) is the
98
- # SAME for both spawns (enemy actors don't honour spawn_point per
99
- # CLAUDE.md / oramap.rs::expand_scenario_actors). From NORTH the
100
- # closest enemy is (50,15) and the farthest is (50,25); from SOUTH
101
- # the order inverts. A memorised single-target sequence cannot
102
- # generalise across the spawn rotation.
103
  #
104
  # Engine guardrails (per CLAUDE.md):
105
  # - Map: rush-hour-arena (128 × 40, playable x ∈ [2..126],
@@ -116,20 +76,18 @@
116
  # "Certain mid-map cells silently fail to place enemy clusters
117
  # (e.g. (50,20))"; (51,20) is a documented working cell.
118
  # - `within_ticks: 2400` / `after_ticks: 2401` on easy+medium;
119
- # max_turns=30 produces tick ≤ 93 + 90·29 = 2703 ⇒ stallers /
120
- # brute / spread hit the real LOSS, not a DRAW. Hard uses
121
  # `within_ticks: 1200` / `after_ticks: 1201` and max_turns=15
122
  # (tick ≤ 93 + 90·14 = 1353 ≥ 1201) — kill-speed pressure for
123
  # the focus-fire policy.
124
  # - Enemy `bot_type: ''` (no scripted bot pursuit) — enemy tanks
125
  # sit on stance:2 Defend so they auto-fire the second a tank
126
  # enters cannon range but NEVER advance; the enemy line stays
127
- # STATIONARY on its three latitudes (the test is purely the
128
- # agent's target prioritization). Engine balance pass: the
129
- # post-stance-fix stance:3 AttackAnything makes the enemy tanks
130
- # hunt and BUNCH onto the agent column, which degenerates the
131
- # spread-fire wrong-play into focus-fire and collapses the
132
- # discrimination — stance:2 keeps the spread geometry intact.
133
  # - Agent tanks stance:1 ReturnFire so a stall policy (pure observe,
134
  # no movement) doesn't accidentally pull fire from any agent tank
135
  # before the enemy is in range — the stall remains a clean
@@ -140,22 +98,23 @@ meta:
140
  title: 'Tank-vs-Tank Mirror — Focus-Fire, Lanchester Square Law'
141
  capability: action
142
  real_world_meaning: >
143
- Three medium tanks face three enemy medium tanks at long range in
144
- an ASYMMETRIC mirror engagement: the agent strike force is bunched
145
- on one column at (30,19..21); the enemy mirror is spread across
146
- three latitudes at (50,15), (51,20), (50,25). Per Lanchester's
147
- SQUARE LAW, the side that concentrates fire on ONE enemy at a
148
- time wins the trade with minimal losses (3-vs-1 cannon fire ends
149
- each enemy tank in 1-2 decision turns; combat-power surplus grows
150
- quadratically after each kill); the side that lets each tank pick
151
- its own closest target — the spread-fire failure mode collapses
152
- to the linear attrition law, ends 1-of-3 alive, and busts the
153
- survival bar. The decision under test is target prioritization:
154
- concentrate ALL three tanks' fire on the closest enemy first,
155
- eliminate it, then the next, then the last — not let each tank
156
- pick its own nearest target. Stalling loses on the kill bar;
157
- brute attack-move loses on the survival cap; spread-fire loses on
158
- the survival cap (medium); only concentrated focus-fire wins.
 
159
  robotics_analogue: >
160
  Military "concentration of force" doctrine (one of the Principles
161
  of War): a smaller or equal force concentrated at the decisive
@@ -194,23 +153,21 @@ levels:
194
  # Bare focus-fire skill: 3-vs-3 asymmetric mirror, survival bar ≥1
195
  # (forgiving — even if focus-fire loses 2 tanks in the trade, ≥1
196
  # alive suffices). Stall LOSES (kill bar unmet → after_ticks LOSS).
197
- # Brute attack-move LOSES (drives into a 3-tank crossfire and force-
198
- # wipes). Spread-fire MAY squeak by with 1 survivor (the documented
199
- # inert-easy-teeth pattern); the strong spread-vs-focus
200
- # discrimination is at medium.
201
  easy:
202
  description: >
203
  Three medium tanks (2tnk, allies) at (30,19..21) face THREE
204
- enemy medium tanks (2tnk, soviet) spread across three latitudes
205
- at (50,15), (51,20), and (50,25). You must close to firing
206
- range (cannon range ~5) and eliminate all three. By the
207
- Lanchester square law, concentrating ALL THREE tanks' fire on
208
- ONE enemy at a time (start with the closest the centre at
209
- (51,20) kills it in 1-2 decision turns, then a flank, then
210
- the last) preserves your force; letting each tank pick its own
211
- closest target trades 1-for-1. Win when all 3 enemy tanks are
212
- killed AND at least ONE of your tanks survives AND your base
213
- is intact, before tick 2400.
214
  overrides:
215
  actors:
216
  # Agent base anchor (paranoia gate against the turn-1
@@ -223,16 +180,11 @@ levels:
223
  - {type: 2tnk, owner: agent, position: [30, 19], stance: 1}
224
  - {type: 2tnk, owner: agent, position: [30, 20], stance: 1}
225
  - {type: 2tnk, owner: agent, position: [30, 21], stance: 1}
226
- # Enemy mirror — 3 medium tanks SPREAD across y=15/y=20/y=25.
227
- # Centre at (51,20) NOT (50,20) per CLAUDE.md silent-fail
228
- # cell note. stance:2 Defend — auto-fire on the closest
229
- # in-range enemy but NEVER advance (engine balance pass: the
230
- # post-stance-fix stance:3 AttackAnything makes the enemy
231
- # tanks HUNT and BUNCH onto the agent column, so the
232
- # spread-fire wrong-play degenerates into focus-fire and the
233
- # discrimination collapses; stance:2 keeps the enemy line
234
- # STATIONARY on its three latitudes so spread-fire genuinely
235
- # fans the agent tanks into 1-vs-1 flank duels).
236
  - {type: 2tnk, owner: enemy, position: [50, 15], stance: 2}
237
  - {type: 2tnk, owner: enemy, position: [51, 20], stance: 2}
238
  - {type: 2tnk, owner: enemy, position: [50, 25], stance: 2}
@@ -253,32 +205,45 @@ levels:
253
  max_turns: 30
254
 
255
  # ── MEDIUM ──────────────────────────────────────────────────────────
256
- # +1 controlled variable vs easy: tighten the survival bar to ≥2
257
- # (any TWO tank losses fails). Geometry is identical (3-vs-3
258
- # asymmetric mirror). At 3-vs-3 with the asymmetric spread, the
259
- # SPREAD outcome empirically ends 1-of-3 tanks alivebusts
260
- # own_units_gte:2 LOSS. The FOCUS outcome keeps all 3 tanks
261
- # alive WIN. This is the load-bearing discrimination of the pack.
 
 
 
 
 
 
 
 
262
  medium:
263
  description: >
264
- Three medium tanks (2tnk, allies) at (30,19..21) face THREE
265
- enemy medium tanks (2tnk, soviet) spread across three latitudes
266
- at (50,15), (51,20), and (50,25). By Lanchester's square law,
267
- concentrating ALL THREE tanks' fire on ONE enemy at a time
268
- (start with the closest the centre at (51,20) — then the
269
- flanks) preserves your force; spreading fire (each tank picks
270
- its own closest enemy) trades 1-for-1 and busts the survival
271
- bar. Win when all 3 enemy tanks are killed AND at least TWO of
272
- your tanks survive AND your base is intact, before tick 2400.
 
 
273
  overrides:
274
  actors:
275
  - {type: fact, owner: agent, position: [4, 20]}
276
  - {type: 2tnk, owner: agent, position: [30, 19], stance: 1}
277
  - {type: 2tnk, owner: agent, position: [30, 20], stance: 1}
278
  - {type: 2tnk, owner: agent, position: [30, 21], stance: 1}
279
- - {type: 2tnk, owner: enemy, position: [50, 15], stance: 2}
280
- - {type: 2tnk, owner: enemy, position: [51, 20], stance: 2}
281
- - {type: 2tnk, owner: enemy, position: [50, 25], stance: 2}
 
 
 
282
  - {type: fact, owner: enemy, position: [124, 20]}
283
  win_condition:
284
  all_of:
@@ -295,39 +260,28 @@ levels:
295
  # ── HARD ────────────────────────────────────────────────────────────
296
  # +2 controlled variables vs medium:
297
  # 1. KILL-SPEED PRESSURE — within_ticks tightens from 2400 to
298
- # 1200 (per the task-spec RELAXATION fallback: when the
299
- # survival-cap discriminator weakens on a flank spawn
300
- # because all enemies are co-linear from a flank perspective,
301
- # the kill-speed timer becomes the load-bearing
302
- # discriminator). Focus-fire ends the engagement in
303
- # ~700-1000 ticks (3 cannons on 1 target each turn); brute
304
- # drive-into-crossfire and stall both fail the clock.
305
  # 2. TWO seed-driven spawn_point groups (NORTH staging y=11..13
306
- # vs SOUTH staging y=27..29) round-robined by seed so a
307
- # memorised single-target sequence cannot generalise. From
308
- # NORTH the closest enemy is (50,15) and farthest is (50,25);
309
- # from SOUTH the order inverts.
310
- # The survival cap RELAXES to own_units_gte:1 on hard (per task
311
- # brief): on a flank spawn the spread-fire policy naturally
312
- # focus-fires the unique closest enemy, so the spread-vs-focus
313
- # delta on hard is primarily kill-speed (within_ticks) rather
314
- # than survivor count.
315
  hard:
316
  description: >
317
  Three medium tanks (2tnk, allies) stage at ONE of two
318
  staging corridors (NORTH y=11..13 OR SOUTH y=27..29, chosen
319
  by seed, anti-memorisation), all bunched at x=30 on adjacent
320
  rows. They face THREE enemy medium tanks (2tnk, soviet)
321
- spread along the eastern line at (50,15), (51,20), and
322
- (50,25). By Lanchester's square law, concentrating ALL THREE
323
- tanks' fire on ONE enemy at a time (start with the closest,
324
- then the next-closest, then the farthest) ends the
325
- engagement fast and preserves your force. Win when all 3
326
- enemy tanks are killed AND at least ONE of your tanks
327
- survives AND your base is intact, before tick 1200 (kill-
328
- speed pressure: stalling, driving into crossfire, or
329
- anything slower than concentrated focus-fire busts the
330
- clock).
331
  overrides:
332
  actors:
333
  # Agent base anchor — duplicated under BOTH spawn_point
 
1
+ # combat-tank-vs-tank-engagement — tank trade: WIN by a controlled
2
+ # focus-fire `attack_unit` engagement (close to cannon range, HOLD,
3
+ # concentrate fire one target at a time), LOSE by a brute
4
+ # `attack_move` drive straight into the enemy position.
5
  #
6
  # Wave-7 ACTION pack (capability: action — combat micro: target
7
  # prioritization / focus-fire discipline).
8
  #
9
  # Real-world / benchmark anchors:
10
+ # - SC2 mirror micro: the side that holds and concentrates fire one
11
+ # target at a time clears the line keeping its strength; the side
12
+ # that charges in eats the whole line's crossfire and is wiped.
13
+ # - Lanchester's SQUARE LAW: per-kill removal of one enemy's OUTPUT
14
+ # DPS a held, concentrated engagement removes enemy firepower
15
+ # a whole tank at a time.
16
+ # - Military "CONCENTRATION OF FORCE" doctrine (one of the
17
+ # Principles of War): a force fighting at a controlled engagement
18
+ # range defeats one that throws itself into the enemy's midst.
 
 
19
  #
20
+ # RECALIBRATION FINDING (engine movement fixes moving units take
21
+ # fire en route, attack_unit on out-of-sight targets paths normally
22
+ # at real Mobile speed, no sprint-invincibility):
23
+ # With the post-fix combat model a SYMMETRIC tank mirror is a flat
24
+ # meat-grinder whatever the target assignment (concentrate on one
25
+ # target, or each tank its own nearest), the closing force loses
26
+ # exactly the same number of tanks. The symmetric-mirror
27
+ # focus-vs-spread SURVIVOR delta the pack originally relied on no
28
+ # longer exists in the engine (a per-tank-own-nearest policy ends
29
+ # identically to a single-target focus policy). Concentrating fire
30
+ # on a bunched stack ALSO bunches the stack's exposure — there is
31
+ # no free square-law surplus.
32
+ # The load-bearing discrimination is therefore CONTROLLED
33
+ # ENGAGEMENT vs BRUTE drive-in:
34
+ # * Intended (focus-fire `attack_unit`): the order closes the
35
+ # force to cannon range and HOLDS there the agent fires from
36
+ # range and works down the enemy line. Clears the line keeping
37
+ # its strength ⇒ WIN.
38
+ # * Brute (`attack_move` onto the enemy cell): drives the column
39
+ # INTO the enemy position; the stack is enveloped, absorbs the
40
+ # whole line's crossfire at once, and force-wipes before
41
+ # clearing 3 kills LOSS.
42
+ # * Stall (only observe): never closes; nothing dies; kill bar
43
+ # unmetafter_ticks LOSS.
 
 
 
 
 
 
 
 
44
  #
45
+ # Difficulty axis (one controlled variable per tier):
46
+ # - EASY — 3-vs-3. Bare engagement skill; survival bar ≥1.
47
+ # - MEDIUM 4-vs-3 (a FOURTH enemy tank; the agent is numerically
48
+ # out-gunned). A held focus engagement clears ≥3 of the 4 enemy
49
+ # tanks while keeping ≥2 of its own; the brute drive-in is wiped
50
+ # by the 4-tank crossfire before killing 3. This over-match is
51
+ # the load-bearing discrimination.
52
+ # - HARD — 3-vs-3 with a tight kill-speed deadline (within_ticks
53
+ # 1200) and two seed-driven spawn corridors (NORTH y=11..13 /
54
+ # SOUTH y=27..29) so the approach axis can't be memorised.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
55
  #
56
  # Hard-tier spawn-variation (≥2 spawn_point groups, registered in
57
  # tests/test_hard_tier.py::UPGRADED):
58
  # - NORTH staging y=11..13 (agent at (30,11..13)).
59
  # - SOUTH staging y=27..29 (agent at (30,27..29)).
60
+ # The enemy line (3 enemies at y=15/y=20/y=25) is the SAME for both
61
+ # spawns (enemy actors don't honour spawn_point per CLAUDE.md /
62
+ # oramap.rs::expand_scenario_actors).
 
 
 
63
  #
64
  # Engine guardrails (per CLAUDE.md):
65
  # - Map: rush-hour-arena (128 × 40, playable x ∈ [2..126],
 
76
  # "Certain mid-map cells silently fail to place enemy clusters
77
  # (e.g. (50,20))"; (51,20) is a documented working cell.
78
  # - `within_ticks: 2400` / `after_ticks: 2401` on easy+medium;
79
+ # max_turns=30 produces tick ≤ 93 + 90·29 = 2703 ⇒ stall /
80
+ # brute hit the real LOSS, not a DRAW. Hard uses
81
  # `within_ticks: 1200` / `after_ticks: 1201` and max_turns=15
82
  # (tick ≤ 93 + 90·14 = 1353 ≥ 1201) — kill-speed pressure for
83
  # the focus-fire policy.
84
  # - Enemy `bot_type: ''` (no scripted bot pursuit) — enemy tanks
85
  # sit on stance:2 Defend so they auto-fire the second a tank
86
  # enters cannon range but NEVER advance; the enemy line stays
87
+ # STATIONARY on its latitudes. stance:3 AttackAnything would
88
+ # make the enemy tanks hunt and chase the agent — stance:2
89
+ # keeps the line in place so the engagement is a clean
90
+ # close-and-trade against a fixed objective.
 
 
91
  # - Agent tanks stance:1 ReturnFire so a stall policy (pure observe,
92
  # no movement) doesn't accidentally pull fire from any agent tank
93
  # before the enemy is in range — the stall remains a clean
 
98
  title: 'Tank-vs-Tank Mirror — Focus-Fire, Lanchester Square Law'
99
  capability: action
100
  real_world_meaning: >
101
+ A three-tank strike force engages a stationary enemy tank line.
102
+ The decision under test is combat micro: close to cannon range,
103
+ HOLD the engagement at range, and concentrate `attack_unit` fire
104
+ on one target at a time — eliminate the nearest enemy, then the
105
+ next, working down the line. Per the "concentration of force"
106
+ doctrine and the Lanchester square law, a force that holds and
107
+ focus-fires removes enemy OUTPUT DPS one whole tank per kill and
108
+ clears the line keeping its strength; a force that brute
109
+ `attack_move`s straight INTO the enemy position bunches itself in
110
+ the enemy's midst, absorbs the whole line's crossfire at once,
111
+ and is wiped before it can clear the engagement. On medium the
112
+ agent is numerically out-gunned 4-vs-3, so the controlled
113
+ engagement is load-bearing: only a held, concentrated focus-fire
114
+ push clears ≥3 of the 4 enemy tanks while keeping ≥2 of its own.
115
+ Stalling never engages and loses on the kill bar; the brute
116
+ drive-in loses on the survival cap / kill bar; only the
117
+ controlled focus-fire engagement wins.
118
  robotics_analogue: >
119
  Military "concentration of force" doctrine (one of the Principles
120
  of War): a smaller or equal force concentrated at the decisive
 
153
  # Bare focus-fire skill: 3-vs-3 asymmetric mirror, survival bar ≥1
154
  # (forgiving — even if focus-fire loses 2 tanks in the trade, ≥1
155
  # alive suffices). Stall LOSES (kill bar unmet → after_ticks LOSS).
156
+ # Brute attack-move LOSES (drives into the 3-tank crossfire and
157
+ # force-wipes). The bare engagement skill: close to cannon range
158
+ # and clear the line with a controlled focus-fire engagement.
 
159
  easy:
160
  description: >
161
  Three medium tanks (2tnk, allies) at (30,19..21) face THREE
162
+ enemy medium tanks (2tnk, soviet) along the eastern line at
163
+ (50,15), (51,20), and (50,25). Close to firing range (cannon
164
+ range ~5), HOLD the engagement at range, and `attack_unit` the
165
+ enemy tanks down one at a time start with the nearest. Do
166
+ NOT drive the column straight onto the enemy position: an
167
+ attack-move into their midst bunches you in the crossfire and
168
+ wipes the force. Win when all 3 enemy tanks are killed AND at
169
+ least ONE of your tanks survives AND your base is intact,
170
+ before tick 2400.
 
171
  overrides:
172
  actors:
173
  # Agent base anchor (paranoia gate against the turn-1
 
180
  - {type: 2tnk, owner: agent, position: [30, 19], stance: 1}
181
  - {type: 2tnk, owner: agent, position: [30, 20], stance: 1}
182
  - {type: 2tnk, owner: agent, position: [30, 21], stance: 1}
183
+ # Enemy line — 3 medium tanks across y=15/y=20/y=25. Centre
184
+ # at (51,20) NOT (50,20) per CLAUDE.md silent-fail cell note.
185
+ # stance:2 Defend — auto-fire on the closest in-range enemy
186
+ # but NEVER advance, so the line stays a fixed engagement
187
+ # objective (a clean close-and-trade, not a chase).
 
 
 
 
 
188
  - {type: 2tnk, owner: enemy, position: [50, 15], stance: 2}
189
  - {type: 2tnk, owner: enemy, position: [51, 20], stance: 2}
190
  - {type: 2tnk, owner: enemy, position: [50, 25], stance: 2}
 
205
  max_turns: 30
206
 
207
  # ── MEDIUM ──────────────────────────────────────────────────────────
208
+ # +1 controlled variable vs easy: a FOURTH enemy tank (4-vs-3,
209
+ # numerically OUT-gunned) plus a survival bar of ≥2. With the
210
+ # post-movement-fix engine a 3-vs-3 mirror is a flat meat-grinder
211
+ # (whatever the targeting, the agent loses exactly 2 tanks the
212
+ # symmetric-mirror focus-vs-spread survivor delta the pack
213
+ # originally relied on no longer exists). The load-bearing
214
+ # discrimination is therefore CONTROLLED ENGAGEMENT vs BRUTE
215
+ # drive-in: a focus-fire `attack_unit` engagement closes to cannon
216
+ # range, holds, and concentrates fire — clears ≥3 of the 4 enemy
217
+ # tanks while keeping the whole strike force; a brute
218
+ # `attack_move` drive INTO the 4-tank position bunches the column
219
+ # in the enemy's midst, eats 4-tank crossfire, and force-wipes
220
+ # before killing 3. Win = kill ≥3 enemy tanks AND keep ≥2 of your
221
+ # own, before tick 2400.
222
  medium:
223
  description: >
224
+ Three medium tanks (2tnk, allies) at (30,19..21) face FOUR
225
+ enemy medium tanks (2tnk, soviet) along the eastern line at
226
+ (50,14), (51,18), (50,22), and (51,26) you are outnumbered
227
+ 4-vs-3. Close to cannon range (~5) and concentrate fire:
228
+ `attack_unit` the nearest enemy, hold the engagement at range,
229
+ and eliminate the enemy line one tank at a time. Driving the
230
+ column straight INTO the enemy position (a brute attack-move)
231
+ bunches you in their crossfire and wipes the force before it
232
+ clears the line. Win when at least 3 enemy tanks are killed
233
+ AND at least TWO of your tanks survive AND your base is
234
+ intact, before tick 2400.
235
  overrides:
236
  actors:
237
  - {type: fact, owner: agent, position: [4, 20]}
238
  - {type: 2tnk, owner: agent, position: [30, 19], stance: 1}
239
  - {type: 2tnk, owner: agent, position: [30, 20], stance: 1}
240
  - {type: 2tnk, owner: agent, position: [30, 21], stance: 1}
241
+ # Enemy line FOUR tanks (4-vs-3 over-match). stance:2 Defend
242
+ # (stationary line; see the easy/hard comment).
243
+ - {type: 2tnk, owner: enemy, position: [50, 14], stance: 2}
244
+ - {type: 2tnk, owner: enemy, position: [51, 18], stance: 2}
245
+ - {type: 2tnk, owner: enemy, position: [50, 22], stance: 2}
246
+ - {type: 2tnk, owner: enemy, position: [51, 26], stance: 2}
247
  - {type: fact, owner: enemy, position: [124, 20]}
248
  win_condition:
249
  all_of:
 
260
  # ── HARD ────────────────────────────────────────────────────────────
261
  # +2 controlled variables vs medium:
262
  # 1. KILL-SPEED PRESSURE — within_ticks tightens from 2400 to
263
+ # 1200. A controlled focus-fire engagement ends the
264
+ # 3-vs-3 trade in ~800-1000 ticks; stall and the brute
265
+ # drive-into-crossfire both fail the clock.
 
 
 
 
266
  # 2. TWO seed-driven spawn_point groups (NORTH staging y=11..13
267
+ # vs SOUTH staging y=27..29) round-robined by seed so the
268
+ # approach axis cannot be memorised.
269
+ # The survival cap is own_units_gte:1 on hard (the kill-speed
270
+ # deadline is the binding discriminator at this tier).
 
 
 
 
 
271
  hard:
272
  description: >
273
  Three medium tanks (2tnk, allies) stage at ONE of two
274
  staging corridors (NORTH y=11..13 OR SOUTH y=27..29, chosen
275
  by seed, anti-memorisation), all bunched at x=30 on adjacent
276
  rows. They face THREE enemy medium tanks (2tnk, soviet)
277
+ along the eastern line at (50,15), (51,20), and (50,25).
278
+ Close to cannon range, HOLD the engagement, and `attack_unit`
279
+ the enemy tanks down one at a time fast. A brute attack-move
280
+ into the enemy position is wiped in the crossfire; stalling or
281
+ anything slower than a controlled focus-fire push busts the
282
+ tight clock. Win when all 3 enemy tanks are killed AND at
283
+ least ONE of your tanks survives AND your base is intact,
284
+ before tick 1200.
 
 
285
  overrides:
286
  actors:
287
  # Agent base anchor — duplicated under BOTH spawn_point
tests/test_combat_tank_vs_tank_engagement.py CHANGED
@@ -1,33 +1,37 @@
1
- """combat-tank-vs-tank-engagement — Mirror tank trade: focus-fire WINS,
2
- spread-fire (and brute attack_move, and stall) LOSE.
3
-
4
- The bar: intended FOCUS-fire WINS on every level and every hard seed
5
- (1-4); STALL and BRUTE attack_move LOSE on every level and every hard
6
- seed. SPREAD-fire (each tank picks its own closest enemy) LOSES on
7
- MEDIUM (the load-bearing discrimination: survival cap own_units_gte:2
8
- trips because spread bleeds 2 tanks in the asymmetric flank chase)
9
- SPREAD is permitted to squeak by on EASY (own_units_gte:1, forgiving
10
- bare-skill tier per the SCENARIO_REVIEW_CHECKLIST inert-easy-teeth
11
- convention) and on HARD (the asymmetric geometry collapses spread to
12
- focus when the agent stack starts on a flank latitude — spread ≡
13
- focus when there's a unique closest enemy from a flank perspective;
14
- the hard discrimination is kill-speed + spawn-variation, not
15
- spread-vs-focus survivor count).
16
-
17
- Non-win is a real reachable timeout LOSS via the `after_ticks` fail
18
- clause (within_ticks 2400 + after_ticks 2401 on easy/medium with
19
- max_turns 30; within_ticks 1200 + after_ticks 1201 on hard with
20
- max_turns 15).
21
-
22
- Recalibrated after the engine balance pass (stance-semantics fix):
23
- the post-fix stance:3 AttackAnything enemy tanks HUNT the agent
24
- column and BUNCH together, which degenerated the spread-fire
25
- wrong-play into focus-fire and collapsed the spread-vs-focus
26
- discrimination (spread won on medium with 0 losses). The enemy
27
- tanks were switched to stance:2 Defend they auto-fire in range
28
- but stay STATIONARY on their three latitudes, so the spread-fire
29
- policy genuinely fans the agent tanks into 1-vs-1 flank duels and
30
- busts the medium survival cap (own_units_gte:2) again.
 
 
 
 
31
 
32
  Validation is scripted (no model / network).
33
  """
@@ -171,32 +175,33 @@ def test_hard_has_two_spawn_point_groups():
171
  assert len(groups) >= 2, f"hard needs ≥2 spawn_point groups, got {groups}"
172
 
173
 
174
- def test_enemy_line_is_3_tanks_asymmetric_spread():
175
- """The asymmetric geometry is the load-bearing physics the
176
- enemy line MUST be 3 tanks spread across three distinct
177
- latitudes (the spread vs focus discrimination depends on each
178
- enemy being independently targetable). Centre enemy at x=51 (not
179
- x=50) per the CLAUDE.md silent-fail-cell note for (50,20)."""
180
  pack = load_pack(PACK_PATH)
 
181
  for lvl in ("easy", "medium", "hard"):
182
  c = compile_level(pack, lvl)
183
  enemy_tanks = [
184
  a for a in c.scenario.actors
185
  if a.owner == "enemy" and a.type == "2tnk"
186
  ]
187
- assert len(enemy_tanks) == 3, (
188
- f"{lvl}: must have exactly 3 enemy tanks, got {len(enemy_tanks)}"
 
189
  )
190
  ys = sorted(a.position[1] for a in enemy_tanks)
191
- assert len(set(ys)) == 3, (
192
- f"{lvl}: enemy tanks must be on 3 distinct latitudes "
193
- f"(asymmetric spread), got ys={ys}"
194
  )
195
  # Verify the (50,20) silent-fail cell is NOT used.
196
  positions = [tuple(a.position) for a in enemy_tanks]
197
  assert (50, 20) not in positions, (
198
  f"{lvl}: (50,20) is a CLAUDE.md-documented silent-fail "
199
- f"cell — centre enemy must be at (51,20). Got {positions}"
200
  )
201
  types = [a.type for a in c.scenario.actors if a.owner == "enemy"]
202
  assert "fact" in types, f"{lvl}: needs a persistent enemy fact"
@@ -261,42 +266,17 @@ def _stall(rs, Command):
261
 
262
 
263
  def _brute_attack_move(rs, Command):
264
- """Brute: every tank attack_moves toward the centre enemy. The
265
- bunched stack drives into the 3-tank crossfire at the engagement
266
- line; concentrated incoming fire kills ≥2 agent tanks LOSS."""
 
 
267
  own = _own_ids(rs)
268
  if not own:
269
  return [Command.observe()]
270
  return [Command.attack_move(own, 51, 20)]
271
 
272
 
273
- def _spread_attack_closest(rs, Command):
274
- """Spread: each agent tank attack_units ITS OWN nearest visible
275
- enemy tank. With the asymmetric spread (3 enemies on three rows),
276
- once the centre dies the surviving agent tanks chase different
277
- flank enemies in 1-vs-1 duels — Lanchester linear law collapses
278
- the trade to mutual annihilation, ending with 1-of-3 alive. On
279
- MEDIUM (own_units_gte:2) this busts the survival cap ⇒ LOSS."""
280
- own = _own_ids(rs)
281
- if not own:
282
- return [Command.observe()]
283
- es = _enemy_tanks(rs)
284
- if not es:
285
- # No targets in sight — advance to contact.
286
- return [Command.attack_move(own, 51, 20)]
287
- cmds = []
288
- for u in (rs.get("units_summary") or []):
289
- uid = str(u["id"])
290
- ux, uy = u["cell_x"], u["cell_y"]
291
- es_sorted = sorted(
292
- es, key=lambda e: (e["cell_x"] - ux) ** 2 + (e["cell_y"] - uy) ** 2
293
- )
294
- tid = es_sorted[0].get("id")
295
- if tid is not None:
296
- cmds.append(Command.attack_unit([uid], str(tid)))
297
- return cmds or [Command.observe()]
298
-
299
-
300
  def _focus_fire(rs, Command):
301
  """Focus-fire: ALL agent tanks attack_unit the SAME target each
302
  turn — the closest enemy to the agent centroid. Once that enemy
@@ -370,28 +350,30 @@ def test_brute_attack_move_loses(level, seed):
370
  )
371
 
372
 
373
- @pytest.mark.parametrize("level", ["medium"])
374
  @pytest.mark.parametrize("seed", [1, 2, 3, 4])
375
- def test_spread_attack_closest_loses_on_medium(level, seed):
376
- """Spread-attack-closest must LOSE on MEDIUM — the asymmetric
377
- flank chase ends with 1-of-3 agent tanks alive (2 lost), busting
378
- the survival cap own_units_gte:2. EASY is excluded as the bare-
379
- skill tier (own_units_gte:1 lets the 1 survivor squeak by the
380
- documented SCENARIO_REVIEW_CHECKLIST inert-easy-teeth pattern).
381
- HARD is excluded because the asymmetric geometry collapses
382
- spread to focus when the agent stack starts on a flank latitude
383
- (NORTH or SOUTH) from a flank there is a unique closest enemy
384
- that all 3 agent tanks naturally target (spread ≡ focus); the
385
- hard discrimination is kill-speed + spawn-variation, not the
386
- survivor-count delta."""
387
  pytest.importorskip("openra_train")
388
  from openra_bench.eval_core import run_level
389
 
390
  c = compile_level(load_pack(PACK_PATH), level)
391
- r = run_level(c, _spread_attack_closest, seed=seed)
392
- assert r.outcome == "loss", (
393
- f"{level} seed={seed}: spread-attack-closest must LOSE on "
394
- f"medium (flank chase bleeds 2 tanks, own_units_gte:2 fails), "
395
- f"got {r.outcome} (kills={r.signals.units_killed}, "
396
- f"losses={r.signals.units_lost})"
 
 
 
 
 
397
  )
 
1
+ """combat-tank-vs-tank-engagement — tank trade: a controlled
2
+ focus-fire `attack_unit` engagement WINS; STALL and a BRUTE
3
+ `attack_move` drive-in LOSE.
4
+
5
+ The bar: the intended FOCUS-fire engagement (close to cannon range,
6
+ hold, concentrate `attack_unit` fire on one target at a time) WINS on
7
+ every level and every hard seed (1-4); STALL (pure observe) and a
8
+ BRUTE `attack_move` drive straight INTO the enemy position LOSE on
9
+ every level and every hard seed. Non-win is a real reachable timeout
10
+ LOSS via the `after_ticks` fail clause (within_ticks 2400 +
11
+ after_ticks 2401 on easy/medium with max_turns 30; within_ticks 1200
12
+ + after_ticks 1201 on hard with max_turns 15).
13
+
14
+ Recalibrated after the engine movement fixes (moving units take fire
15
+ en route; `attack_unit` on out-of-sight targets paths normally at
16
+ real Mobile speed; no sprint-invincibility). Finding from this
17
+ recalibration: with the post-fix combat model a SYMMETRIC 3-vs-3
18
+ tank mirror is a flat meat-grinder whatever the target assignment
19
+ (focus one target, or each tank its own nearest), the agent loses
20
+ exactly two tanks closing the distance. The symmetric-mirror
21
+ focus-vs-spread SURVIVOR delta the pack originally relied on no
22
+ longer exists in the engine (a `spread_closest` policy ends
23
+ identically to focus). The load-bearing discrimination is therefore
24
+ CONTROLLED ENGAGEMENT vs BRUTE drive-in, and the difficulty axis is
25
+ re-tuned:
26
+ * EASY 3-vs-3. Focus `attack_unit` closes to cannon range and
27
+ clears the line (≥1 survivor); a brute `attack_move` onto the
28
+ enemy cell bunches the column in melee and force-wipes.
29
+ * MEDIUM 4-vs-3 (a fourth enemy tank, the agent is
30
+ numerically out-gunned). A controlled focus engagement clears
31
+ ≥3 of the 4 enemy tanks while keeping ≥2 of its own; a brute
32
+ drive-in eats 4-tank crossfire and wipes before killing 3.
33
+ * HARD — 3-vs-3 with a tight kill-speed deadline (within_ticks
34
+ 1200) and two seed-driven spawn corridors (NORTH / SOUTH).
35
 
36
  Validation is scripted (no model / network).
37
  """
 
175
  assert len(groups) >= 2, f"hard needs ≥2 spawn_point groups, got {groups}"
176
 
177
 
178
+ def test_enemy_line_is_a_spread_tank_line():
179
+ """The enemy line MUST be a spread tank line on distinct
180
+ latitudes (each enemy independently targetable): 3 tanks on
181
+ easy/hard, 4 on medium (the 4-vs-3 over-match). The (50,20)
182
+ silent-fail cell must not be used."""
 
183
  pack = load_pack(PACK_PATH)
184
+ expected = {"easy": 3, "medium": 4, "hard": 3}
185
  for lvl in ("easy", "medium", "hard"):
186
  c = compile_level(pack, lvl)
187
  enemy_tanks = [
188
  a for a in c.scenario.actors
189
  if a.owner == "enemy" and a.type == "2tnk"
190
  ]
191
+ assert len(enemy_tanks) == expected[lvl], (
192
+ f"{lvl}: must have exactly {expected[lvl]} enemy tanks, "
193
+ f"got {len(enemy_tanks)}"
194
  )
195
  ys = sorted(a.position[1] for a in enemy_tanks)
196
+ assert len(set(ys)) == expected[lvl], (
197
+ f"{lvl}: enemy tanks must be on {expected[lvl]} distinct "
198
+ f"latitudes (spread line), got ys={ys}"
199
  )
200
  # Verify the (50,20) silent-fail cell is NOT used.
201
  positions = [tuple(a.position) for a in enemy_tanks]
202
  assert (50, 20) not in positions, (
203
  f"{lvl}: (50,20) is a CLAUDE.md-documented silent-fail "
204
+ f"cell. Got {positions}"
205
  )
206
  types = [a.type for a in c.scenario.actors if a.owner == "enemy"]
207
  assert "fact" in types, f"{lvl}: needs a persistent enemy fact"
 
266
 
267
 
268
  def _brute_attack_move(rs, Command):
269
+ """Brute: every tank attack_moves straight onto the enemy line.
270
+ The `attack_move` drives the bunched column INTO the enemy
271
+ position (rather than holding at cannon range) the stack is
272
+ enveloped in the enemy crossfire and force-wipes before clearing
273
+ the line ⇒ LOSS (force-wipe / kill-bar unmet)."""
274
  own = _own_ids(rs)
275
  if not own:
276
  return [Command.observe()]
277
  return [Command.attack_move(own, 51, 20)]
278
 
279
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
280
  def _focus_fire(rs, Command):
281
  """Focus-fire: ALL agent tanks attack_unit the SAME target each
282
  turn — the closest enemy to the agent centroid. Once that enemy
 
350
  )
351
 
352
 
353
+ @pytest.mark.parametrize("level", ["easy", "medium", "hard"])
354
  @pytest.mark.parametrize("seed", [1, 2, 3, 4])
355
+ def test_medium_outnumbered_needs_controlled_engagement(level, seed):
356
+ """The medium-tier 4-vs-3 over-match is the load-bearing
357
+ discrimination: the intended controlled focus-fire engagement
358
+ clears ≥3 of the 4 enemy tanks while keeping ≥2 of its own (WIN),
359
+ whereas the brute `attack_move` drive-in is enveloped in the
360
+ 4-tank crossfire and force-wipes before killing 3 (LOSS). This
361
+ re-asserts the focus-WIN / brute-LOSS bar across every level —
362
+ the per-policy tests above already cover it, this is the
363
+ aggregate invariant pinned by the recalibration."""
 
 
 
364
  pytest.importorskip("openra_train")
365
  from openra_bench.eval_core import run_level
366
 
367
  c = compile_level(load_pack(PACK_PATH), level)
368
+ win = run_level(c, _focus_fire, seed=seed)
369
+ lose = run_level(c, _brute_attack_move, seed=seed)
370
+ assert win.outcome == "win", (
371
+ f"{level} seed={seed}: controlled focus engagement must WIN, "
372
+ f"got {win.outcome} (kills={win.signals.units_killed}, "
373
+ f"losses={win.signals.units_lost})"
374
+ )
375
+ assert lose.outcome == "loss", (
376
+ f"{level} seed={seed}: brute drive-in must LOSE, got "
377
+ f"{lose.outcome} (kills={lose.signals.units_killed}, "
378
+ f"losses={lose.signals.units_lost})"
379
  )