yxc20098 commited on
Commit
701ffc9
·
1 Parent(s): d3e6883

HTML scenario catalog + clearer custom-map-no-enemy descriptions

Browse files

- scripts/gen_scenario_docs.py → docs/scenarios.html: every active
scenario's title, capability, why-it-exists, robotics analogue, and
per runnable config the EXACT objective the model sees
(objective_brief: description + WIN/LOSE + turn budget). Run with
--open to view.
- custom-map-no-enemy easy/medium descriptions: plain imperative
('Move your units to the goal zone near (30,16) … Win: …') instead
of the terse originals.
Finishes the clarity pass (#26): titles + catalog; the remaining
per-level texts were assessed already concrete and left to preserve
tuned puzzle intent.

docs/scenarios.html ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!doctype html><meta charset=utf-8><title>OpenRA-Bench — Scenario Catalog</title><style>
2
+ body{font:15px/1.5 -apple-system,Segoe UI,Roboto,sans-serif;
3
+ margin:0;background:#0f1115;color:#e6e6e6}
4
+ header{padding:24px 32px;background:#161922;
5
+ border-bottom:1px solid #2a2f3a}
6
+ h1{margin:0;font-size:22px} .sub{color:#9aa3b2;margin-top:6px}
7
+ main{padding:24px 32px;max-width:1100px}
8
+ h2{margin:34px 0 8px;font-size:18px;border-bottom:1px solid #2a2f3a;
9
+ padding-bottom:6px}
10
+ .pack{background:#161922;border:1px solid #2a2f3a;border-radius:10px;
11
+ padding:16px 18px;margin:14px 0}
12
+ .ptitle{font-size:17px;font-weight:600}
13
+ .pid{color:#7e8796;font-size:12px;font-family:ui-monospace,monospace}
14
+ .cap{display:inline-block;padding:2px 9px;border-radius:10px;
15
+ color:#fff;font-size:12px;margin-left:8px;vertical-align:middle}
16
+ .why{color:#c3cad6;margin:8px 0 12px;font-size:14px}
17
+ .cell{border-left:3px solid #2a2f3a;padding:6px 0 6px 14px;
18
+ margin:10px 0}
19
+ .clab{font-weight:600;color:#cdd5e3}
20
+ pre{white-space:pre-wrap;background:#0f1115;border:1px solid #242a35;
21
+ border-radius:6px;padding:10px 12px;margin:6px 0 0;
22
+ font:13px/1.45 ui-monospace,monospace;color:#d7dce6}
23
+ .toc a{color:#7497db;text-decoration:none;margin-right:14px}
24
+ </style><header><h1>OpenRA-Bench — Scenario Catalog</h1><div class=sub>24 active scenarios · the title, why it exists, and the exact objective the model is given per runnable config.</div><div class=sub toc><a href='#action'>action (6)</a> <a href='#adversarial'>adversarial (3)</a> <a href='#perception'>perception (3)</a> <a href='#reasoning'>reasoning (12)</a></div></header><main><h2 id='action'>action</h2><div class=pack><div><span class=ptitle>Multi-Squad Coordination — Drive Several Groups to Different Objectives at Once</span><span class=cap style='background:#5fae7a'>action</span><div class=pid>action-multiunit-coordination · rush-hour-arena</div></div><div class=why><b>Why:</b> Multi-region task allocation under a shared deadline is not a planning problem here — the split is obvious. It tests whether the controller can actually drive several effector groups in parallel instead of completing one objective then the next, which serialized control fails to do in time.<br><br><b>Robotics analogue:</b> Coordinated multi-robot fleet dispatch: a logistics swarm must place distinct sub-teams at separate depots within one delivery window; one-at-a-time control blows the schedule.<br></div><div class=cell><span class=clab>easy</span> <span class=pid>(level easy · fog vision)</span><pre>Two objectives, generous window. Split the 2tnk group to the north-east region and the 1tnk group to the south-east region; both groups must be on station before the deadline.<br>WIN WHEN: get a unit into region (110,6) r=8 AND get a unit into region (110,33) r=8 AND before game tick 6000.<br>You have at most 30 decision turns; acting decisively and early matters.</pre></div><div class=cell><span class=clab>medium</span> <span class=pid>(level medium · fog vision)</span><pre>Three simultaneous regions and an attrition cap. The force splits three ways (NE, SE, mid-east); enemy infantry contest the lanes so sloppy serialized movement bleeds units and misses the tighter deadline.<br>WIN WHEN: get a unit into region (110,5) r=7 AND get a unit into region (110,34) r=7 AND get a unit into region (112,20) r=7 AND before game tick 5000 AND lose ≤2 of your own units.<br>You have at most 36 decision turns; acting decisively and early matters.</pre></div><div class=cell><span class=clab>hard</span> <span class=pid>(level hard · fog vision)</span><pre>Three contested regions, the full force must arrive (every unit in one of the three zones — checked as occupied corners), defenders have turrets, the deadline is tight and attrition is strict. Only genuine parallel multi-group control can satisfy all clauses.<br>WIN WHEN: get a unit into region (112,5) r=7 AND get a unit into region (112,35) r=7 AND get a unit into region (115,20) r=7 AND before game tick 4200 AND lose ≤2 of your own units.<br>YOU LOSE IF: your whole force is destroyed.<br>You have at most 44 decision turns; acting decisively and early matters.</pre></div></div><div class=pack><div><span class=ptitle>Non-Stop Route Execution — Run an Ordered Path Without Stalling</span><span class=cap style='background:#5fae7a'>action</span><div class=pid>action-sequenced-execution · rush-hour-arena</div></div><div class=why><b>Why:</b> The route is already planned; the open problem is faithful, non-stalling execution of an ordered objective sequence — issuing the next move the instant a leg completes rather than idling, re-deliberating, or skipping a waypoint, which is what makes long autonomous missions fail.<br><br><b>Robotics analogue:</b> Manipulator / mobile-base task sequencing: a robot must hit a fixed ordered set of stations (pick, transit, place) within a cycle-time budget; pausing or reordering between steps misses the takt time.<br></div><div class=cell><span class=clab>easy</span> <span class=pid>(level easy · fog vision)</span><pre>Two-leg route: go to waypoint W1 (SW-mid), then the final objective (far SE). The after_ticks gate means the agent must already be moving; arriving at the end only counts once W1&#x27;s leg time has elapsed, and the overall deadline punishes any stall.<br>WIN WHEN: get a unit into region (118,33) r=8 AND not before game tick 1200 AND before game tick 6000.<br>You have at most 30 decision turns; acting decisively and early matters.</pre></div><div class=cell><span class=clab>medium</span> <span class=pid>(level medium · fog vision)</span><pre>Three-leg route W1 -&gt; W2 -&gt; final with a contested first leg and an attrition cap. A model that stalls to re-plan between legs cannot clear all three before the tighter deadline.<br>WIN WHEN: get a unit into region (120,34) r=7 AND not before game tick 2200 AND before game tick 5000 AND lose ≤1 of your own units.<br>You have at most 36 decision turns; acting decisively and early matters.</pre></div><div class=cell><span class=clab>hard</span> <span class=pid>(level hard · fog vision)</span><pre>Four-leg serpentine route across the full arena with defended waypoints, a strict attrition cap and a tight budget. Every leg must be executed back-to-back; any idling, re-deliberation, or skipped leg overruns the clock or loses too many units.<br>WIN WHEN: get a unit into region (122,6) r=7 AND not before game tick 3400 AND before game tick 4400 AND lose ≤1 of your own units.<br>YOU LOSE IF: your whole force is destroyed.<br>You have at most 46 decision turns; acting decisively and early matters.</pre></div></div><div class=pack><div><span class=ptitle>Search &amp; Destroy \u2014 Find and Eliminate Scattered Enemy Squads</span><span class=cap style='background:#5fae7a'>action</span><div class=pid>rush-hour · rush-hour-arena</div></div><div class=why><b>Why:</b> A large arena hides several separate enemy groups in fog. The task is to SEARCH the map, locate each group, and destroy enough of them before the clock runs out \u2014 the skill is coordinated coverage of unknown space (spread out, scout, converge), not single-unit micro. This is the find-the-enemy counterpart to the close-range 1v1 duel.<br><br><b>Robotics analogue:</b> Multi-robot search-and-destroy: sweep and clear a bounded area with several dispersed targets within a time budget.<br></div><div class=cell><span class=clab>easy</span> <span class=pid>(level easy · fog vision)</span><pre>Rush Hour Sweep — easy difficulty (ported).<br>WIN WHEN: destroy ≥3 enemy units AND before game tick 16000.<br>You have at most 40 decision turns; acting decisively and early matters.</pre></div><div class=cell><span class=clab>medium</span> <span class=pid>(level medium · fog vision)</span><pre>Rush Hour Sweep — medium difficulty (ported).<br>WIN WHEN: destroy ≥6 enemy units AND before game tick 13000.<br>You have at most 40 decision turns; acting decisively and early matters.</pre></div><div class=cell><span class=clab>hard</span> <span class=pid>(level hard · fog vision)</span><pre>Rush Hour Sweep \u2014 hard. The agent force already round-robins between two base spawn_point groups by seed (start corner varies, anti-memorisation); hard adds a tight clock, an attrition cap, and a real loss on force destruction.<br>WIN WHEN: destroy ≥9 enemy units AND before game tick 10000 AND lose ≤3 of your own units.<br>YOU LOSE IF: your whole force is destroyed.<br>You have at most 40 decision turns; acting decisively and early matters.</pre></div></div><div class=pack><div><span class=ptitle>Two-Pronged Attack — Coordinate Two Squads to Destroy the Enemy Economy</span><span class=cap style='background:#5fae7a'>action</span><div class=pid>strategy-twobody · singles-twobody</div></div><div class=why><b>Why:</b> Two separated squads must be driven in parallel and converged through a weak gap to reach the objective: simultaneous multi-group control, not serial play.<br><b>Robotics analogue:</b> Coordinated multi-robot rendezvous: independently steer two teams to a common staging point.</div><div class=cell><span class=clab>easy</span> <span class=pid>(level easy · fog vision)</span><pre>Two-Body Coordination \u2014 drive the two separated squads in parallel, converge through the weak gap, and destroy the enemy&#x27;s key economic buildings (construction yard + refinery). Serial play or a brute-force push through the strong center loses.<br>WIN WHEN: destroy the enemy fact+proc AND before game tick 16000.<br>YOU LOSE IF: your whole force is destroyed.<br>You have at most 100 decision turns; acting decisively and early matters.</pre></div><div class=cell><span class=clab>medium</span> <span class=pid>(level medium · fog vision)</span><pre>Same objective (fact+proc), tighter clock + attrition cap \u2014 only coordinated parallel control reaches base cheaply enough.<br>WIN WHEN: destroy the enemy fact+proc AND before game tick 12000 AND lose ≤8 of your own units.<br>YOU LOSE IF: your whole force is destroyed.<br>You have at most 90 decision turns; acting decisively and early matters.</pre></div><div class=cell><span class=clab>hard</span> <span class=pid>(level hard · fog vision)</span><pre>fact+proc destroyed, tight clock, strict attrition: both squads must converge and raid cleanly; brute force loses.<br>WIN WHEN: destroy the enemy fact+proc AND before game tick 9000 AND lose ≤4 of your own units.<br>YOU LOSE IF: your whole force is destroyed.<br>You have at most 80 decision turns; acting decisively and early matters.</pre></div></div><div class=pack><div><span class=ptitle>Exact Build Order — Produce the Precise Bill of Materials Under Budget</span><span class=cap style='background:#5fae7a'>action</span><div class=pid>strict-production-bom · rush-hour-arena</div></div><div class=why><b>Why:</b> Produce an EXACT bill of materials under a hard budget: build only the tech-tree prerequisites needed, in the right order, and produce exactly the specified units — no more, no less. Overproducing one type, or erecting an unnecessary structure, exhausts the budget and fails the spec. The objective is instruction/spec fidelity itself, not combat — the OpenRA analogue of strict tool/function-calling benchmarks (BFCL / τ²-bench): do exactly and only what was asked.<br><br><b>Robotics analogue:</b> Manufacturing-cell order fulfilment under a resource cap: satisfy a precise parts list respecting assembly prerequisites without over-building or wasting stock.<br></div><div class=cell><span class=clab>easy</span> <span class=pid>(level easy · fog vision)</span><pre>Spec: exactly 3 e1 rifle infantry — requires building a barracks first (tech prerequisite). Budget covers the barracks + exactly 3 e1; a 4th e1 or any extra structure breaks the spec.<br>WIN WHEN: have EXACTLY 3 &#x27;e1&#x27; (no more, no fewer) AND before game tick 20000.<br>YOU LOSE IF: have ≥4 &#x27;e1&#x27;.<br>You have at most 70 decision turns; acting decisively and early matters.</pre></div><div class=cell><span class=clab>medium</span> <span class=pid>(level medium · fog vision)</span><pre>Spec: exactly 3 e1 + 2 e3 (rocket). e3 shares the barracks tech but costs more — the budget forces correct allocation, no waste.<br>WIN WHEN: have EXACTLY 3 &#x27;e1&#x27; (no more, no fewer) AND have EXACTLY 2 &#x27;e3&#x27; (no more, no fewer) AND before game tick 18000.<br>YOU LOSE IF: (have ≥4 &#x27;e1&#x27; OR have ≥3 &#x27;e3&#x27;).<br>You have at most 80 decision turns; acting decisively and early matters.</pre></div><div class=cell><span class=clab>hard</span> <span class=pid>(level hard · fog vision)</span><pre>Spec: exactly 3 e1 + 2 e3 + 1 tesla coil (tsla). tsla adds a deeper tech path (power-dependent) and a tight budget — any overproduction or unnecessary building exhausts cash and fails.<br>WIN WHEN: have EXACTLY 3 &#x27;e1&#x27; (no more, no fewer) AND have EXACTLY 2 &#x27;e3&#x27; (no more, no fewer) AND own ≥1 &#x27;tsla&#x27; building(s) AND before game tick 16000.<br>YOU LOSE IF: (have ≥4 &#x27;e1&#x27; OR have ≥3 &#x27;e3&#x27; OR own ≥2 &#x27;tsla&#x27; building(s)).<br>You have at most 90 decision turns; acting decisively and early matters.</pre></div></div><div class=pack><div><span class=ptitle>Follow the Procedure — Reach the Region in the Exact Ordered Time Window</span><span class=cap style='background:#5fae7a'>action</span><div class=pid>strict-sequence · rush-hour-arena</div></div><div class=why><b>Why:</b> Follow a precise procedure: reach a marked staging region FIRST, and only AFTER that has been achieved is the mission complete — doing the steps out of order, or skipping the constraint, fails. Tests literal instruction-following and precondition ordering under a strict action API rather than free-form goal seeking.<br><br><b>Robotics analogue:</b> Procedural compliance: a robot must satisfy an ordered checklist (reach checkpoint, hold, then proceed) where step order is graded, not just the end state.<br></div><div class=cell><span class=clab>easy</span> <span class=pid>(level easy · fog vision)</span><pre>Reach the staging region (≈ x18,y22) within the deadline using only move_units. The strict part: attack/economy tools are not offered — a valid solution must be expressed in the allowed API.<br>WIN WHEN: get a unit into region (18,22) r=4 AND before game tick 12000.<br>You have at most 50 decision turns; acting decisively and early matters.</pre></div><div class=cell><span class=clab>medium</span> <span class=pid>(level medium · fog vision)</span><pre>Two-step procedure: only after holding past tick 3000 does reaching the region count (after_ticks gates the win).<br>WIN WHEN: get a unit into region (22,26) r=4 AND not before game tick 3000 AND before game tick 14000.<br>You have at most 60 decision turns; acting decisively and early matters.</pre></div><div class=cell><span class=clab>hard</span> <span class=pid>(level hard · fog vision)</span><pre>Tight ordered window: reach the region within a narrow tick band (not before 4000, not after 9000) — precise timing compliance. The unit stages from a seed-chosen position (two spawn_point groups) so the exact timed move can&#x27;t be replayed from memory.<br>WIN WHEN: get a unit into region (26,30) r=3 AND not before game tick 4000 AND before game tick 9000.<br>You have at most 60 decision turns; acting decisively and early matters.</pre></div></div><h2 id='adversarial'>adversarial</h2><div class=pack><div><span class=ptitle>1v1 Combat Duel — Beat a Reactive Enemy at Close Range</span><span class=cap style='background:#d2683c'>adversarial</span><div class=pid>adversarial-duel · rush-hour-arena</div></div><div class=why><b>Why:</b> A close-quarters force-on-force fight against an enemy that shoots back: the decision is combat micro — concentrate fire on one target, retreat damaged units, trade favourably. The enemy starts just to the east, in or near sight, so this is a DUEL, not a search (finding scattered enemies on a big map is tested separately). Difficulty escalates the opponent, not the search.<br><br><b>Robotics analogue:</b> Adversarial multi-agent engagement at close range: prevail over a reactive opponent team by target selection and damage trading.<br></div><div class=cell><span class=clab>easy</span> <span class=pid>(level easy · fog structured)</span><pre>Even duel: your 4 vehicles vs 3 enemy rifle infantry a short move east. Win by destroying the enemy force without losing all of yours.<br>WIN WHEN: destroy ≥3 enemy units AND before game tick 14000.<br>YOU LOSE IF: your whole force is destroyed.<br>You have at most 30 decision turns; acting decisively and early matters.</pre></div><div class=cell><span class=clab>medium</span> <span class=pid>(level medium · fog vision)</span><pre>Tougher opponent: the enemy rifle squad is backed by light armour. Same close engagement; concentrate fire to win the trade.<br>WIN WHEN: destroy ≥5 enemy units AND before game tick 12000.<br>YOU LOSE IF: your whole force is destroyed.<br>You have at most 40 decision turns; acting decisively and early matters.</pre></div></div><div class=pack><div><span class=ptitle>Break a Defended Position — Assault Entrenched Reactive Defenders</span><span class=cap style='background:#d2683c'>adversarial</span><div class=pid>adversarial-siege · rush-hour-arena</div></div><div class=why><b>Why:</b> Dislodge a reactive force holding a prepared position: commit, absorb the defended ground, and break it before the clock — assaulting an opponent with the terrain advantage.<br><b>Robotics analogue:</b> Adversarial objective seizure: overcome a reactive defending team entrenched at the goal under a time budget.</div><div class=cell><span class=clab>easy</span> <span class=pid>(level easy · fog vision)</span><pre>Rung 1 — 4 tanks break a 4-rifle position.<br>WIN WHEN: destroy ≥4 enemy units AND before game tick 13000.<br>YOU LOSE IF: your whole force is destroyed.<br>You have at most 60 decision turns; acting decisively and early matters.</pre></div><div class=cell><span class=clab>medium</span> <span class=pid>(level medium · fog vision)</span><pre>Rung 2 — defenders reinforced with armour.<br>WIN WHEN: destroy ≥7 enemy units AND before game tick 11000.<br>YOU LOSE IF: your whole force is destroyed.<br>You have at most 75 decision turns; acting decisively and early matters.</pre></div><div class=cell><span class=clab>hard</span> <span class=pid>(level hard · fog vision)</span><pre>Rung 3 — dug-in mixed force, tight clock + loss cap, and the assault force stages from a seed-chosen corner (two spawn_point groups → the approach vector onto the entrenched position varies by seed). Defenders sit mid-map beyond initial sight.<br>WIN WHEN: destroy ≥10 enemy units AND before game tick 9000 AND lose ≤3 of your own units.<br>YOU LOSE IF: your whole force is destroyed.<br>You have at most 85 decision turns; acting decisively and early matters.</pre></div></div><div class=pack><div><span class=ptitle>Win While Outnumbered — Defeat a Larger Reactive Force by Maneuver</span><span class=cap style='background:#d2683c'>adversarial</span><div class=pid>adversarial-skirmish · rush-hour-arena</div></div><div class=why><b>Why:</b> Win while outnumbered by trading space for time: pick fights, retreat, and re-engage a reactive enemy on favourable terms instead of a head-on brawl — asymmetric tactical reasoning.<br><b>Robotics analogue:</b> Asymmetric adversarial control: a smaller agent team must defeat a larger reactive force by engagement selection.</div><div class=cell><span class=clab>easy</span> <span class=pid>(level easy · fog vision)</span><pre>Rung 1 — 2 tanks vs 4 rifle: doable if you fight smart.<br>WIN WHEN: destroy ≥4 enemy units AND before game tick 14000.<br>YOU LOSE IF: your whole force is destroyed.<br>You have at most 60 decision turns; acting decisively and early matters.</pre></div><div class=cell><span class=clab>medium</span> <span class=pid>(level medium · fog vision)</span><pre>Rung 2 — 2 tanks vs 6 mixed: must split the enemy.<br>WIN WHEN: destroy ≥6 enemy units AND before game tick 12000.<br>YOU LOSE IF: your whole force is destroyed.<br>You have at most 70 decision turns; acting decisively and early matters.</pre></div><div class=cell><span class=clab>hard</span> <span class=pid>(level hard · fog vision)</span><pre>Rung 3 — 3 tanks vs 8, loss-capped, and the outnumbered force starts from a seed-chosen corner (two spawn_point groups → start axis varies by seed, so a memorised kite line can&#x27;t generalise). Enemy is mid-map beyond initial sight: engage selection under fog.<br>WIN WHEN: destroy ≥8 enemy units AND before game tick 10000 AND lose ≤2 of your own units.<br>YOU LOSE IF: your whole force is destroyed.<br>You have at most 80 decision turns; acting decisively and early matters.</pre></div></div><h2 id='perception'>perception</h2><div class=pack><div><span class=ptitle>Pure Navigation — Reach the Goal Zone in a Confined Map (No Enemy)</span><span class=cap style='background:#7497db'>perception</span><div class=pid>custom-map-no-enemy · singles-maginot</div></div><div class=why><b>Why:</b> Pure spatial navigation inside a confined custom region with no adversary: read the map, plan a route through the bounded playable area, and reach the designated zone. Success depends only on perceiving the terrain and committing a path — not on combat.<br><br><b>Robotics analogue:</b> Confined-space autonomous navigation (warehouse aisle, indoor map): reach a goal cell within a bounded region from a map read alone.<br></div><div class=cell><span class=clab>easy</span> <span class=pid>(level easy · fog vision)</span><pre>Move your units to the goal zone near (30,16). No enemy on this map — success is purely reading the terrain and committing a route. Win: a unit reaches the zone within the time limit.<br>WIN WHEN: get a unit into region (30,16) r=6 AND before game tick 12000.<br>You have at most 50 decision turns; acting decisively and early matters.</pre></div><div class=cell><span class=clab>medium</span> <span class=pid>(level medium · fog vision)</span><pre>Same task, harder: the goal zone is farther at (55,18) and the clock is tighter. Plan the longer route and go directly. Win: a unit reaches the zone in time.<br>WIN WHEN: get a unit into region (55,18) r=5 AND before game tick 10000.<br>You have at most 55 decision turns; acting decisively and early matters.</pre></div><div class=cell><span class=clab>hard</span> <span class=pid>(level hard · fog vision)</span><pre>Whole squad must reach the far corner of the playable bounds together within a short deadline, starting from a seed-chosen position (two spawn_point groups) so the route can&#x27;t be replayed from memory. (No adversary on this map → no force-loss fail; the only failure mode is missing the deadline.)<br>WIN WHEN: get EVERY unit into region (70,24) r=6 AND before game tick 9000.<br>You have at most 60 decision turns; acting decisively and early matters.</pre></div></div><div class=pack><div><span class=ptitle>Read the Fog — Infer Where to Scout From the Explored Frontier</span><span class=cap style='background:#7497db'>perception</span><div class=pid>perception-frontier-reading · rush-hour-arena</div></div><div class=why><b>Why:</b> Path planning is solved; the hard part is reading an occupancy grid correctly to tell explored cells from the unknown frontier and pushing sensors into the unknown instead of re-scanning known space.<br><br><b>Robotics analogue:</b> SLAM frontier detection — picking the next unexplored cell to drive a scout robot toward</div><div class=cell><span class=clab>easy</span> <span class=pid>(level easy · fog vision)</span><pre>One contiguous unexplored mass directly east of spawn along the open lane. The frontier is obvious and reachable; generous clock.<br>WIN WHEN: reveal ≥45% of the map AND before game tick 8000.<br>You have at most 30 decision turns; acting decisively and early matters.</pre></div><div class=cell><span class=clab>medium</span> <span class=pid>(level medium · fog vision)</span><pre>Unexplored area is split: a near pocket bottom-left already partly visible, and the real bulk of the frontier lies far NE. A decoy enemy squad sits inside the ALREADY-explored band to bait the scouts into re-treading seen ground. Tighter clock, fewer scouts.<br>WIN WHEN: reveal ≥55% of the map AND before game tick 6000.<br>You have at most 35 decision turns; acting decisively and early matters.</pre></div><div class=cell><span class=clab>hard</span> <span class=pid>(level hard · fog vision)</span><pre>Frontier is fragmented across three corners (far NE, far SE, and a thin NW strip). Two decoy squads sit in the explored center to pull scouts off the true frontier, attrition is real (must keep &gt;=4 of 5 scouts), and the deadline is short — only correct, simultaneous reading of all three fog pockets clears the coverage bar in time.<br>WIN WHEN: reveal ≥62% of the map AND before game tick 4800 AND lose ≤1 of your own units.<br>YOU LOSE IF: your whole force is destroyed.<br>You have at most 40 decision turns; acting decisively and early matters.</pre></div></div><div class=pack><div><span class=ptitle>Find the Hidden Target — Locate the Real Objective Among Fog Decoys</span><span class=cap style='background:#7497db'>perception</span><div class=pid>perception-target-vs-fog · rush-hour-arena</div></div><div class=why><b>Why:</b> The real search problem is not &quot;go to fog&quot; but inferring which of several unexplored regions could actually contain the target given what the empty regions rule out, then committing sensors there.<br><br><b>Robotics analogue:</b> Search-and-rescue: choosing which unexplored room can still hold the victim after clearing the near ones</div><div class=cell><span class=clab>easy</span> <span class=pid>(level easy · fog vision)</span><pre>One target building, one unexplored region — both far east. The near area is already open (no decoy fog). Reading &quot;fog is east, therefore the target is east&quot; is direct. Generous clock.<br>WIN WHEN: spot ≥1 enemy buildings AND before game tick 8000.<br>You have at most 30 decision turns; acting decisively and early matters.</pre></div><div class=cell><span class=clab>medium</span> <span class=pid>(level medium · fog vision)</span><pre>Two unexplored regions: a NEAR one (NE, closer to spawn) that is EMPTY, and a FAR one (SE, opposite corner) that holds the real target. The model must not stop after clearing the near fog and finding nothing — it must infer the target is in the remaining unexplored region. Decoy enemy units sit in the empty near region.<br>WIN WHEN: spot ≥1 enemy buildings AND before game tick 6000.<br>You have at most 35 decision turns; acting decisively and early matters.</pre></div><div class=cell><span class=clab>hard</span> <span class=pid>(level hard · fog vision)</span><pre>Three unexplored regions. Two are decoys: a NEAR-NE pocket with a noisy enemy squad (pure bait) and a MID-SOUTH pocket with a single enemy *building of the wrong kind already implicitly elsewhere* — both contain no usable target. The real target is a lone building tucked in the far-NW strip, the LEAST intuitive direction from a spawn that naturally pushes east. Short deadline and attrition: only a correct read of which fog pocket can still hold the target — after the near ones are ruled out — discovers it in time.<br>WIN WHEN: spot ≥1 enemy buildings AND before game tick 4800 AND lose ≤1 of your own units.<br>YOU LOSE IF: your whole force is destroyed.<br>You have at most 40 decision turns; acting decisively and early matters.</pre></div></div><h2 id='reasoning'>reasoning</h2><div class=pack><div><span class=ptitle>Decoy Sacrifice — Spend a Bait Unit to Pull Defenders Off the Objective</span><span class=cap style='background:#9b8cce'>reasoning</span><div class=pid>artofwar-decoy-sacrifice · rush-hour-arena</div></div><div class=why><b>Why:</b> 以利动之 — &quot;move the enemy with the prospect of gain.&quot; A small bait detachment must be deliberately spent to pull a reactive guarding force off the objective so the main body can reach it. An agent that keeps every unit safe and pushes head-on is destroyed; the winning policy accepts an early, local loss for the delayed mission payoff — long-horizon credit assignment, not greedy unit preservation.<br><br><b>Robotics analogue:</b> Sacrificial-decoy planning: expend a cheap agent to divert a reactive obstacle/defender, accepting negative short-term reward for a later objective gain.<br></div><div class=cell><span class=clab>easy</span> <span class=pid>(level easy · fog vision)</span><pre>Reach the objective region with the main body. The guards will chase whatever they see — bait them aside, then drive through.<br>WIN WHEN: get a unit into region (44,24) r=4 AND before game tick 16000.<br>YOU LOSE IF: your whole force is destroyed.<br>You have at most 70 decision turns; acting decisively and early matters.</pre></div><div class=cell><span class=clab>medium</span> <span class=pid>(level medium · fog vision)</span><pre>More guards; the head-on route is now clearly fatal.<br>WIN WHEN: get a unit into region (44,24) r=4 AND before game tick 14000.<br>YOU LOSE IF: your whole force is destroyed.<br>You have at most 80 decision turns; acting decisively and early matters.</pre></div><div class=cell><span class=clab>hard</span> <span class=pid>(level hard · fog vision)</span><pre>The main body must arrive nearly intact (loss cap) — only the bait may be spent. The force stages from a seed-chosen latitude (two spawn_point groups, same main+bait formation) so the decoy line that works can&#x27;t be memorised; guards sit on the objective beyond initial sight.<br>WIN WHEN: get a unit into region (44,24) r=3 AND before game tick 12000 AND lose ≤2 of your own units.<br>YOU LOSE IF: your whole force is destroyed.<br>You have at most 90 decision turns; acting decisively and early matters.</pre></div></div><div class=pack><div><span class=ptitle>The Long Way Round — Take a Costly Detour Because the Direct Path Is Lethal</span><span class=cap style='background:#9b8cce'>reasoning</span><div class=pid>artofwar-indirect-approach · rush-hour-arena</div></div><div class=why><b>Why:</b> 迂直之计 — &quot;make the devious route the most direct.&quot; The short path to the objective is a killing ground; the only winning policy is a long detour that looks strictly worse for many turns (objective progress stays flat or dips) before it pays off. Tests temporal credit assignment over a long horizon against a greedy shortest-path bias.<br><br><b>Robotics analogue:</b> Hazard-aware long-horizon routing: reject the locally optimal short path for a circuitous survivable one whose payoff is far delayed.<br></div><div class=cell><span class=clab>easy</span> <span class=pid>(level easy · fog vision)</span><pre>The objective is due east but the direct lane is a gauntlet. Survive to the region — charging straight loses the force.<br>WIN WHEN: get a unit into region (40,26) r=4 AND before game tick 18000 AND lose ≤1 of your own units.<br>YOU LOSE IF: your whole force is destroyed.<br>You have at most 80 decision turns; acting decisively and early matters.</pre></div><div class=cell><span class=clab>medium</span> <span class=pid>(level medium · fog vision)</span><pre>Denser gun line; the detour is longer and costlier.<br>WIN WHEN: get a unit into region (44,26) r=4 AND before game tick 16000 AND lose ≤1 of your own units.<br>YOU LOSE IF: your whole force is destroyed.<br>You have at most 90 decision turns; acting decisively and early matters.</pre></div><div class=cell><span class=clab>hard</span> <span class=pid>(level hard · fog vision)</span><pre>Zero-loss traversal, seed-chosen start latitude. Two spawn_point groups (y26 / y34); the gun line is split to span BOTH corridors so the short route is lethal from either start — the safe detour differs by seed and can&#x27;t be memorised.<br>WIN WHEN: get EVERY unit into region (46,26) r=4 AND before game tick 14000 AND lose ≤0 of your own units.<br>YOU LOSE IF: your whole force is destroyed.<br>You have at most 95 decision turns; acting decisively and early matters.</pre></div></div><div class=pack><div><span class=ptitle>Lure the Guard Away — Bait a Mobile Defender, Then Slip the Main Force Through</span><span class=cap style='background:#9b8cce'>reasoning</span><div class=pid>artofwar-lure-the-tiger · rush-hour-arena</div></div><div class=why><b>Why:</b> 调虎离山 — draw the enemy from its strong position. A mobile reactive defender sits astride the only lane to the objective. Phase 1 (send a probe to bait the defender out of position) yields ZERO objective progress; only in phase 2, through the vacated lane, does the main force score. Two-phase plan with no reward for the enabling first phase.<br><br><b>Robotics analogue:</b> Two-phase manipulation: a no-reward enabling action (displace a reactive obstacle) must precede the rewarded objective action.<br></div><div class=cell><span class=clab>easy</span> <span class=pid>(level easy · fog vision)</span><pre>The tiger guards the lane at (~26,24). Bait it north with the jeep, then run the main force east to the objective.<br>WIN WHEN: get a unit into region (42,24) r=4 AND before game tick 16000.<br>YOU LOSE IF: your whole force is destroyed.<br>You have at most 75 decision turns; acting decisively and early matters.</pre></div><div class=cell><span class=clab>medium</span> <span class=pid>(level medium · fog vision)</span><pre>A heavier tiger; baiting must be more deliberate.<br>WIN WHEN: get a unit into region (46,24) r=4 AND before game tick 14000.<br>YOU LOSE IF: your whole force is destroyed.<br>You have at most 85 decision turns; acting decisively and early matters.</pre></div><div class=cell><span class=clab>hard</span> <span class=pid>(level hard · fog vision)</span><pre>Loss-capped: the bait may die, the main force may not. Force stages from a seed-chosen latitude (two spawn_point groups, same main+lure formation) — the lure path that works can&#x27;t be memorised. The tiger blocks the lane beyond initial sight.<br>WIN WHEN: get a unit into region (48,24) r=3 AND before game tick 12000 AND lose ≤1 of your own units.<br>YOU LOSE IF: your whole force is destroyed.<br>You have at most 95 decision turns; acting decisively and early matters.</pre></div></div><div class=pack><div><span class=ptitle>Staged Assault — Reach Waypoints in Order, Then Seize the Objective</span><span class=cap style='background:#9b8cce'>reasoning</span><div class=pid>artofwar-sequenced-citadel · rush-hour-arena</div></div><div class=why><b>Why:</b> 攻其无备 — strike where unprepared, but only after the prerequisite moves. The mission is a strict sub-goal chain: stage at A, then (and only after a hold) advance through B, and finally seize C. Reward lands only at C; A and B are unrewarded prerequisites whose *ordering* is graded. Long-horizon sub-goal sequencing with delayed terminal credit.<br><br><b>Robotics analogue:</b> Ordered multi-waypoint mission (stage → transit-after-hold → objective) where only terminal success is rewarded and step order is graded — the Blocksworld/GAIA-style long-horizon plan.<br></div><div class=cell><span class=clab>easy</span> <span class=pid>(level easy · fog vision)</span><pre>Stage at A (~20,20), hold past tick 3000, then seize the citadel region C (~44,20). Arriving at C before the hold does not count.<br>WIN WHEN: get a unit into region (44,20) r=4 AND not before game tick 3000 AND before game tick 18000.<br>YOU LOSE IF: your whole force is destroyed.<br>You have at most 85 decision turns; acting decisively and early matters.</pre></div><div class=cell><span class=clab>medium</span> <span class=pid>(level medium · fog vision)</span><pre>Longer prerequisite hold; tighter terminal deadline.<br>WIN WHEN: get a unit into region (46,20) r=4 AND not before game tick 5000 AND before game tick 16000.<br>YOU LOSE IF: your whole force is destroyed.<br>You have at most 90 decision turns; acting decisively and early matters.</pre></div><div class=cell><span class=clab>hard</span> <span class=pid>(level hard · fog vision)</span><pre>Narrow ordered window AND a loss cap: seize C only inside the tick band, with the force near-intact.<br>WIN WHEN: get a unit into region (48,20) r=3 AND not before game tick 6000 AND before game tick 12000 AND lose ≤1 of your own units.<br>YOU LOSE IF: your whole force is destroyed.<br>You have at most 95 decision turns; acting decisively and early matters.</pre></div></div><div class=pack><div><span class=ptitle>Base Building — Construct Structures Respecting the Tech Tree</span><span class=cap style='background:#9b8cce'>reasoning</span><div class=pid>building-and-planning · rush-hour-arena</div></div><div class=why><b>Why:</b> Construction planning under dependency and spatial constraints: decide a build order that respects tech-tree prerequisites, place structures where the objective requires (a defended direction), and when needed relocate to found a new base in a designated region. The decision is the plan — order, placement, and commitment — not the motor control of any single build.<br><br><b>Robotics analogue:</b> Autonomous construction / facility-layout planning: a task-graph with prerequisites (B needs A) plus spatial goals (assemble in zone Z, relocate the depot near region R) under a time budget.<br></div><div class=cell><span class=clab>easy</span> <span class=pid>(level easy · fog vision)</span><pre>Base + budget given. Build the required structures — including a barracks that depends on power (tech-tree prerequisite). Win = reach the target building total AND the tech-dependent barracks.<br>WIN WHEN: own ≥4 buildings total AND own a &#x27;tent&#x27; AND before game tick 20000.<br>You have at most 60 decision turns; acting decisively and early matters.</pre></div><div class=cell><span class=clab>medium</span> <span class=pid>(level medium · fog vision)</span><pre>Build a defensive line to the EAST (toward the enemy): place at least two pillboxes inside the designated eastern region within the deadline. Placement direction is the decision.<br>WIN WHEN: have 2 building(s) near (40,20) AND before game tick 22000.<br>You have at most 70 decision turns; acting decisively and early matters.</pre></div><div class=cell><span class=clab>hard</span> <span class=pid>(level hard · fog vision)</span><pre>No starting base economy here — an MCV and a scout. Relocate to the designated eastern region, deploy a construction yard there, and stand up a power plant: found a new base near the region.<br>WIN WHEN: have 2 building(s) near (60,20) AND before game tick 30000 AND lose ≤1 of your own units.<br>You have at most 90 decision turns; acting decisively and early matters.</pre></div></div><div class=pack><div><span class=ptitle>Force Buildup — Spend a Limited Budget to Field the Strongest Army</span><span class=cap style='background:#9b8cce'>reasoning</span><div class=pid>economy-force-buildup · rush-hour-arena</div></div><div class=why><b>Why:</b> Resource allocation under a hard budget and deadline: given finite funds and a production facility, decide how much to spend and when to commit, so a sufficient force is fielded before time runs out. Spend too slow and the deadline is missed; the budget caps how much is even possible — the decision is the spend/commit schedule.<br><br><b>Robotics analogue:</b> Autonomous fleet/agent provisioning under an energy or money budget — deciding how many sub-agents to spin up and when, given a finite resource pool and a mission deadline.<br></div><div class=cell><span class=clab>easy</span> <span class=pid>(level easy · fog vision)</span><pre>Generous budget and clock. Build a small force from the barracks; the only failure mode is not committing funds at all.<br>WIN WHEN: keep ≥4 units alive AND before game tick 16000.<br>You have at most 45 decision turns; acting decisively and early matters.</pre></div><div class=cell><span class=clab>medium</span> <span class=pid>(level medium · fog vision)</span><pre>Tighter budget — only a few units are affordable; spend must start promptly to make the deadline.<br>WIN WHEN: keep ≥5 units alive AND before game tick 12000.<br>You have at most 45 decision turns; acting decisively and early matters.</pre></div><div class=cell><span class=clab>hard</span> <span class=pid>(level hard · fog vision)</span><pre>Lean budget and a short clock with an attrition cap: every credit and tick must convert to fielded force, losing nothing.<br>WIN WHEN: keep ≥6 units alive AND before game tick 9000 AND lose ≤0 of your own units.<br>You have at most 50 decision turns; acting decisively and early matters.</pre></div></div><div class=pack><div><span class=ptitle>Capital Allocation — Split One Budget Between Economy and Military</span><span class=cap style='background:#9b8cce'>reasoning</span><div class=pid>economy-investment · rush-hour-arena</div></div><div class=why><b>Why:</b> A capital-allocation decision under a single indivisible budget: the same cash buys EITHER a wide economy (a second refinery plus supporting power — depot/throughput capacity) OR a deep one (one refinery plus a larger forward force — collection/utilisation). The budget covers exactly one coherent path; splitting it builds neither. The test is committing to and completing one allocation, not the mechanics of either build.<br><br><b>Robotics analogue:</b> Fleet capital allocation: a fixed budget that buys either added depot/processing capacity or added collector/worker agents — the classic throughput-versus-collection trade in an autonomous foraging fleet, where indecision (a split budget) yields neither.<br></div><div class=cell><span class=clab>easy</span> <span class=pid>(level easy · fog vision)</span><pre>Wide-economy path, generous clock: invest the budget into a second refinery plus supporting power (added throughput capacity). Splitting the budget into scattered units instead fails the building bar.<br>WIN WHEN: own ≥2 &#x27;proc&#x27; building(s) AND own ≥2 &#x27;powr&#x27; building(s) AND before game tick 22000.<br>You have at most 70 decision turns; acting decisively and early matters.</pre></div><div class=cell><span class=clab>medium</span> <span class=pid>(level medium · fog vision)</span><pre>Budget covers exactly one coherent path. The wide-economy allocation is scored here (second refinery + power + a small utilising force) under a tighter clock; the deep alternative is the hard level.<br>WIN WHEN: own ≥2 &#x27;proc&#x27; building(s) AND own ≥2 &#x27;powr&#x27; building(s) AND keep ≥3 units alive AND before game tick 16000.<br>You have at most 80 decision turns; acting decisively and early matters.</pre></div><div class=cell><span class=clab>hard</span> <span class=pid>(level hard · fog vision)</span><pre>Lean budget forces the DEEP allocation: keep the single refinery and instead convert the budget into a larger forward force under a short clock with an attrition cap. Only a committed (unsplit) allocation clears the bar in time.<br>WIN WHEN: keep ≥6 units alive AND own ≥1 &#x27;proc&#x27; building(s) AND lose ≤0 of your own units AND before game tick 14000.<br>You have at most 90 decision turns; acting decisively and early matters.</pre></div></div><div class=pack><div><span class=ptitle>Spend Well Against the Clock — Allocate a Fixed Budget Before Time Runs Out</span><span class=cap style='background:#9b8cce'>reasoning</span><div class=pid>economy-time-box · rush-hour-arena</div></div><div class=why><b>Why:</b> Time-bounded capital deployment: a fixed budget must be converted into deployed capability AND a standing economy before a hard tick deadline. Cash left unspent at the deadline is wasted; spending it all on units leaves no production economy; the decision is the spend schedule and the unit/structure mix, not any single build.<br><br><b>Robotics analogue:</b> A field robot with a finite, non-replenishing energy budget that must both perform mission work and stand up persistent infrastructure before a deadline — scheduling expenditure so the objective is met without exhausting the budget prematurely.<br></div><div class=cell><span class=clab>easy</span> <span class=pid>(level easy · fog vision)</span><pre>Generous budget and clock. Field a small force and keep an economy building standing: the only failure is leaving the budget idle or spending it down to nothing.<br>WIN WHEN: keep ≥4 units alive AND own ≥4 buildings total AND before game tick 22000.<br>You have at most 60 decision turns; acting decisively and early matters.</pre></div><div class=cell><span class=clab>medium</span> <span class=pid>(level medium · fog vision)</span><pre>Tighter budget and clock: the spend must start promptly and the mix must cover both fielded units and a power-dependent structure within the deadline.<br>WIN WHEN: keep ≥5 units alive AND own a &#x27;tent&#x27; AND own ≥4 buildings total AND before game tick 16000.<br>You have at most 70 decision turns; acting decisively and early matters.</pre></div><div class=cell><span class=clab>hard</span> <span class=pid>(level hard · fog vision)</span><pre>Lean budget, short clock, attrition cap: every credit and tick must convert to fielded force plus a standing economy while losing nothing.<br>WIN WHEN: keep ≥6 units alive AND own ≥5 buildings total AND lose ≤0 of your own units AND before game tick 12000.<br>You have at most 80 decision turns; acting decisively and early matters.</pre></div></div><div class=pack><div><span class=ptitle>Commit Under Uncertainty — Pick the One Region That Hides the Survivor</span><span class=cap style='background:#9b8cce'>reasoning</span><div class=pid>reasoning-frontier-commit · rush-hour-arena</div></div><div class=why><b>Why:</b> Path planning to any point is solved. The unsolved problem is deciding which of several unexplored regions to commit a time/fuel-limited searcher to when only one hides the target and going to the wrong one first means you never reach the right one in time. That commitment-under-uncertainty step is the real search-and-rescue problem, not the navigation.<br><br><b>Robotics analogue:</b> UAV/UGV search-and-rescue frontier selection: with limited endurance and several candidate search cells, choose the cell most likely to contain the survivor before the battery dies.<br></div><div class=cell><span class=clab>easy</span> <span class=pid>(level easy · fog vision)</span><pre>One survivor, one plausible region. The objective marker sits in the far-NE region and nothing else does — perception is honest, the only task is to plan a direct commitment within the deadline.<br>WIN WHEN: spot ≥1 enemy buildings AND before game tick 6500.<br>You have at most 30 decision turns; acting decisively and early matters.</pre></div><div class=cell><span class=clab>medium</span> <span class=pid>(level medium · fog vision)</span><pre>Two candidate regions, one decoy. A decoy marker sits in the far-SE region; the real survivor is in the far-NE. From the corner both are equidistant-looking, but only the NE building counts toward &#x27;buildings_discovered&#x27;. Committing to the SE decoy first wastes enough ticks to miss the deadline — the model must reason about which to commit to, not just navigate.<br>WIN WHEN: spot ≥1 enemy buildings AND before game tick 5200.<br>You have at most 34 decision turns; acting decisively and early matters.</pre></div><div class=cell><span class=clab>hard</span> <span class=pid>(level hard · fog vision)</span><pre>Three candidate regions, two decoys, a tighter deadline, and an attrition constraint. Decoy unit clusters sit in the SE and mid-S regions; the real survivor marker is in the far-NE. One decoy cluster is hostile (stance 2) so a careless route into it can get the lone scout killed — the model must pick the commitment that finds the building, fits the clock, AND avoids the lethal frontier. Reasoning must trade route safety against time without splitting (only one unit).<br>WIN WHEN: spot ≥1 enemy buildings AND before game tick 4200 AND lose ≤0 of your own units.<br>YOU LOSE IF: your whole force is destroyed.<br>You have at most 38 decision turns; acting decisively and early matters.</pre></div></div><div class=pack><div><span class=ptitle>Safe vs Fast — Choose the Survivable Longer Route Over the Deadly Shortcut</span><span class=cap style='background:#9b8cce'>reasoning</span><div class=pid>reasoning-risk-route · rush-hour-arena</div></div><div class=why><b>Why:</b> Shortest-path is solved. The unsolved decision is whether the shortest path is the right path: a direct corridor may pass through a hazard that destroys the vehicle, while a longer perimeter route completes the mission intact. Choosing the survivable plan over the fastest one, given a deadline that the detour still satisfies, is the actual mission-planning problem.<br><br><b>Robotics analogue:</b> Field-robot route selection under hazard: a delivery/inspection robot must reject the shortest path when it crosses a no-go hazard zone and instead commit to a longer but survivable route that still meets the time window.<br></div><div class=cell><span class=clab>easy</span> <span class=pid>(level easy · fog vision)</span><pre>One central hazard, one obvious safe detour. The direct line x-axis route passes through a tesla coil + rocket infantry that shred the jeep; the top and bottom edges are completely open. Generous deadline: any safe detour wins. The model only has to recognise the corridor is lethal and not drive into it.<br>WIN WHEN: get a unit into region (120,20) r=7 AND lose ≤0 of your own units AND before game tick 7500.<br>You have at most 32 decision turns; acting decisively and early matters.</pre></div><div class=cell><span class=clab>medium</span> <span class=pid>(level medium · fog vision)</span><pre>The bottom detour is now also guarded, so only the top-edge route is safe — a longer commitment that the deadline still allows but with less slack. The model must reason that the shortest route is lethal, the bottom route is also lethal, and the top route, though longest, is the only plan that satisfies both the no-loss constraint and the tighter clock.<br>WIN WHEN: get a unit into region (120,20) r=7 AND lose ≤0 of your own units AND before game tick 6000.<br>You have at most 36 decision turns; acting decisively and early matters.</pre></div><div class=cell><span class=clab>hard</span> <span class=pid>(level hard · fog vision)</span><pre>Both edges are partly contested and the deadline is tight enough that the longest fully-safe path is too slow — the model must reason about a graduated tradeoff: the top edge is fastest but grazes a defended pocket, the bottom edge is fully safe but too long for the clock, so the only winning plan threads the narrow safe seam near the top while staying out of weapon range. Risk and route must be traded against the deadline, not avoided outright.<br>WIN WHEN: get a unit into region (122,20) r=6 AND lose ≤0 of your own units AND before game tick 4800.<br>You have at most 40 decision turns; acting decisively and early matters.</pre></div></div><div class=pack><div><span class=ptitle>Risk Dilemma — Destroy the Enemy Economy via the Safer Route</span><span class=cap style='background:#9b8cce'>reasoning</span><div class=pid>strategy-dilemma · singles-dilemma</div></div><div class=why><b>Why:</b> Two routes to the objective: a safe-but-long path vs a short-but-lethal one. The decision is route risk assessment under a deadline, not pathfinding itself.<br><b>Robotics analogue:</b> Field-robot route selection rejecting a hazardous shortest path for a survivable longer one within a time window.</div><div class=cell><span class=clab>easy</span> <span class=pid>(level easy · fog vision)</span><pre>Risk Dilemma \u2014 destroy the enemy&#x27;s key economic buildings (construction yard + refinery). The enemy is deliberately strong: brute-forcing every defender bleeds the force and loses. Take the safer route, reach the base, eliminate fact+proc.<br>WIN WHEN: destroy the enemy fact+proc AND before game tick 16000.<br>YOU LOSE IF: your whole force is destroyed.<br>You have at most 100 decision turns; acting decisively and early matters.</pre></div><div class=cell><span class=clab>medium</span> <span class=pid>(level medium · fog vision)</span><pre>Same objective (fact+proc), tighter clock and an attrition cap \u2014 a costly brawl no longer counts as a win even if you reach base.<br>WIN WHEN: destroy the enemy fact+proc AND before game tick 12000 AND lose ≤8 of your own units.<br>YOU LOSE IF: your whole force is destroyed.<br>You have at most 90 decision turns; acting decisively and early matters.</pre></div><div class=cell><span class=clab>hard</span> <span class=pid>(level hard · fog vision)</span><pre>fact+proc destroyed, tight clock, strict attrition: only a clean raid that avoids the strong defenses wins; brute force loses.<br>WIN WHEN: destroy the enemy fact+proc AND before game tick 9000 AND lose ≤4 of your own units.<br>YOU LOSE IF: your whole force is destroyed.<br>You have at most 80 decision turns; acting decisively and early matters.</pre></div></div><div class=pack><div><span class=ptitle>The Gauntlet — Run a Defended Corridor to Destroy the Enemy Economy</span><span class=cap style='background:#9b8cce'>reasoning</span><div class=pid>strategy-gauntlet · singles-gauntlet</div></div><div class=why><b>Why:</b> Reach the objective through a defended corridor: sequence commitment and timing so the force survives the run — planning under attrition, not motor control.<br><b>Robotics analogue:</b> Autonomous traversal of a hazardous corridor: plan a timed run that survives staged threats.</div><div class=cell><span class=clab>easy</span> <span class=pid>(level easy · fog vision)</span><pre>The Gauntlet \u2014 run the defended corridor and destroy the enemy&#x27;s key economic buildings (construction yard + refinery). The corridor is lethal to a brute-force push; sequence and time the run so the force survives to kill fact+proc.<br>WIN WHEN: destroy the enemy fact+proc AND before game tick 16000.<br>YOU LOSE IF: your whole force is destroyed.<br>You have at most 100 decision turns; acting decisively and early matters.</pre></div><div class=cell><span class=clab>medium</span> <span class=pid>(level medium · fog vision)</span><pre>Same objective (fact+proc), tighter clock + attrition cap \u2014 a costly run no longer counts as a win.<br>WIN WHEN: destroy the enemy fact+proc AND before game tick 12000 AND lose ≤8 of your own units.<br>YOU LOSE IF: your whole force is destroyed.<br>You have at most 90 decision turns; acting decisively and early matters.</pre></div><div class=cell><span class=clab>hard</span> <span class=pid>(level hard · fog vision)</span><pre>fact+proc destroyed, tight clock, strict attrition: only a well-timed run that survives the gauntlet wins.<br>WIN WHEN: destroy the enemy fact+proc AND before game tick 9000 AND lose ≤4 of your own units.<br>YOU LOSE IF: your whole force is destroyed.<br>You have at most 80 decision turns; acting decisively and early matters.</pre></div></div></main>
openra_bench/scenarios/packs/custom-map-no-enemy.yaml CHANGED
@@ -45,8 +45,9 @@ base:
45
  levels:
46
  easy:
47
  description: >
48
- Reach the goal zone in the near part of the confined region. No
49
- enemyonly perception + a committed route matters.
 
50
  win_condition:
51
  all_of:
52
  - reach_region: {x: 30, y: 16, radius: 6}
@@ -54,7 +55,9 @@ levels:
54
  max_turns: 50
55
  medium:
56
  description: >
57
- Goal zone is farther across the bounded region; tighter clock.
 
 
58
  win_condition:
59
  all_of:
60
  - reach_region: {x: 55, y: 18, radius: 5}
 
45
  levels:
46
  easy:
47
  description: >
48
+ Move your units to the goal zone near (30,16). No enemy on this
49
+ mapsuccess is purely reading the terrain and committing a
50
+ route. Win: a unit reaches the zone within the time limit.
51
  win_condition:
52
  all_of:
53
  - reach_region: {x: 30, y: 16, radius: 6}
 
55
  max_turns: 50
56
  medium:
57
  description: >
58
+ Same task, harder: the goal zone is farther at (55,18) and the
59
+ clock is tighter. Plan the longer route and go directly. Win: a
60
+ unit reaches the zone in time.
61
  win_condition:
62
  all_of:
63
  - reach_region: {x: 55, y: 18, radius: 5}
scripts/gen_scenario_docs.py ADDED
@@ -0,0 +1,146 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Generate docs/scenarios.html — the human-readable scenario catalog.
2
+
3
+ For every ACTIVE pack: title, capability, why-it-exists, the runnable
4
+ configs (or 3 levels), and per cell the plain-language objective the
5
+ model actually sees (objective_brief: description + WIN/LOSE + turn
6
+ budget). Run: python scripts/gen_scenario_docs.py [--open]
7
+ """
8
+
9
+ from __future__ import annotations
10
+
11
+ import glob
12
+ import html
13
+ import os
14
+ import sys
15
+ from pathlib import Path
16
+
17
+ ROOT = Path(__file__).resolve().parent.parent
18
+ PACKS = ROOT / "openra_bench" / "scenarios" / "packs"
19
+ OUT = ROOT / "docs" / "scenarios.html"
20
+
21
+ sys.path.insert(0, str(ROOT)) # runnable as a standalone script
22
+
23
+ from openra_bench.game_knowledge import objective_brief # noqa: E402
24
+ from openra_bench.scenarios import load_pack # noqa: E402
25
+
26
+ _CAP_COLOR = {
27
+ "perception": "#7497db", "reasoning": "#9b8cce",
28
+ "action": "#5fae7a", "adversarial": "#d2683c",
29
+ }
30
+
31
+
32
+ def _esc(s) -> str:
33
+ return html.escape(str(s)).replace("\n", "<br>")
34
+
35
+
36
+ def _cells(pack):
37
+ """[(label, CompiledLevel)] — configs if declared, else 3 levels."""
38
+ out = []
39
+ if pack.configs:
40
+ for c in pack.configs:
41
+ out.append((c.name, pack.compile_config(c.name)))
42
+ else:
43
+ for lv in ("easy", "medium", "hard"):
44
+ out.append((lv, pack.compile(lv)))
45
+ return out
46
+
47
+
48
+ def build() -> str:
49
+ packs = []
50
+ for f in sorted(glob.glob(str(PACKS / "*.yaml"))):
51
+ b = os.path.basename(f)
52
+ if b.startswith(("_", "TEMPLATE")):
53
+ continue
54
+ p = load_pack(f)
55
+ if p.meta.status == "active":
56
+ packs.append(p)
57
+
58
+ by_cap: dict[str, list] = {}
59
+ for p in packs:
60
+ by_cap.setdefault(p.meta.capability, []).append(p)
61
+
62
+ parts = [
63
+ "<!doctype html><meta charset=utf-8>",
64
+ "<title>OpenRA-Bench — Scenario Catalog</title>",
65
+ """<style>
66
+ body{font:15px/1.5 -apple-system,Segoe UI,Roboto,sans-serif;
67
+ margin:0;background:#0f1115;color:#e6e6e6}
68
+ header{padding:24px 32px;background:#161922;
69
+ border-bottom:1px solid #2a2f3a}
70
+ h1{margin:0;font-size:22px} .sub{color:#9aa3b2;margin-top:6px}
71
+ main{padding:24px 32px;max-width:1100px}
72
+ h2{margin:34px 0 8px;font-size:18px;border-bottom:1px solid #2a2f3a;
73
+ padding-bottom:6px}
74
+ .pack{background:#161922;border:1px solid #2a2f3a;border-radius:10px;
75
+ padding:16px 18px;margin:14px 0}
76
+ .ptitle{font-size:17px;font-weight:600}
77
+ .pid{color:#7e8796;font-size:12px;font-family:ui-monospace,monospace}
78
+ .cap{display:inline-block;padding:2px 9px;border-radius:10px;
79
+ color:#fff;font-size:12px;margin-left:8px;vertical-align:middle}
80
+ .why{color:#c3cad6;margin:8px 0 12px;font-size:14px}
81
+ .cell{border-left:3px solid #2a2f3a;padding:6px 0 6px 14px;
82
+ margin:10px 0}
83
+ .clab{font-weight:600;color:#cdd5e3}
84
+ pre{white-space:pre-wrap;background:#0f1115;border:1px solid #242a35;
85
+ border-radius:6px;padding:10px 12px;margin:6px 0 0;
86
+ font:13px/1.45 ui-monospace,monospace;color:#d7dce6}
87
+ .toc a{color:#7497db;text-decoration:none;margin-right:14px}
88
+ </style>""",
89
+ "<header><h1>OpenRA-Bench — Scenario Catalog</h1>",
90
+ f"<div class=sub>{len(packs)} active scenarios · the title, why "
91
+ "it exists, and the exact objective the model is given per "
92
+ "runnable config.</div>",
93
+ "<div class=sub toc>" + " ".join(
94
+ f"<a href='#{c}'>{c} ({len(v)})</a>"
95
+ for c, v in sorted(by_cap.items())
96
+ ) + "</div></header><main>",
97
+ ]
98
+
99
+ for cap in sorted(by_cap):
100
+ parts.append(f"<h2 id='{cap}'>{cap}</h2>")
101
+ for p in sorted(by_cap[cap], key=lambda x: x.meta.id):
102
+ col = _CAP_COLOR.get(cap, "#666")
103
+ parts.append("<div class=pack>")
104
+ parts.append(
105
+ f"<div><span class=ptitle>{_esc(p.meta.title)}</span>"
106
+ f"<span class=cap style='background:{col}'>{cap}</span>"
107
+ f"<div class=pid>{p.meta.id} · {p.base_map}</div></div>"
108
+ )
109
+ parts.append(
110
+ f"<div class=why><b>Why:</b> {_esc(p.meta.real_world_meaning)}"
111
+ f"<br><b>Robotics analogue:</b> "
112
+ f"{_esc(p.meta.robotics_analogue)}</div>"
113
+ )
114
+ try:
115
+ cells = _cells(p)
116
+ except Exception as e: # noqa: BLE001
117
+ parts.append(f"<div class=why>(compile error: {_esc(e)})</div>")
118
+ cells = []
119
+ for label, cl in cells:
120
+ fog = getattr(cl, "fog_mode", "vision")
121
+ ob = objective_brief(
122
+ cl.scenario.description, cl.win_condition,
123
+ cl.fail_condition, cl.max_turns,
124
+ )
125
+ parts.append(
126
+ f"<div class=cell><span class=clab>{label}</span> "
127
+ f"<span class=pid>(level {cl.level} · fog "
128
+ f"{fog})</span><pre>{_esc(ob)}</pre></div>"
129
+ )
130
+ parts.append("</div>")
131
+ parts.append("</main>")
132
+ return "".join(parts)
133
+
134
+
135
+ def main(argv):
136
+ OUT.parent.mkdir(parents=True, exist_ok=True)
137
+ OUT.write_text(build(), encoding="utf-8")
138
+ print(f"wrote {OUT}")
139
+ if "--open" in argv:
140
+ import subprocess
141
+
142
+ subprocess.run(["open", str(OUT)], check=False)
143
+
144
+
145
+ if __name__ == "__main__":
146
+ main(sys.argv[1:])