Spaces:
Running
Add handoff ablation (recover-from-deficit / capitalize-on-advantage)
Browse filesA `prefix` controller plays the first K turns of an episode, then the
model inherits the live game state and finishes it โ a pure STATE
handoff ("take over from here"), no transcript carried over, so it
stays orthogonal to the in-context-learning axis.
Sweeping the prefix quality decomposes a reasoning capability:
- a losing prefix (`stall`) hands the model a deficit โ the recovery
test;
- a winning prefix (a replayed winning trajectory) hands it an
advantage โ the capitalize test.
Every result carries a `passivity` stat โ the fraction of the model's
turns spent on `observe`/`stop` only. Under the bad-prefix deficit
that is passivity-under-pressure: a number for the freeze-and-panic
failure mode (when losing, models tend to stop/observe instead of
ordering an active retreat or redirect).
- handoff.py: TrajectoryController (replays a recorded run's per-turn
tool_calls from messages.json), HandoffController (k-turn prefix โ
main switch + passivity tracking), run_handoff helper.
- run_eval.py: --handoff-sweep expands each pack:level into
handoff-{base,bad,good} cells; --handoff-k, --handoff-bank.
Pinned by tests/test_handoff_ablation.py.
- CLAUDE.md +13 -0
- openra_bench/handoff.py +165 -0
- openra_bench/run_eval.py +92 -1
- tests/test_handoff_ablation.py +152 -0
|
@@ -228,6 +228,19 @@ A scenario is defective if any of the following hold:
|
|
| 228 |
the full grid with `run_eval --perception-sweep` (expands every
|
| 229 |
`pack:level` into `pack:level:<mode>`); the human Play tab stays
|
| 230 |
on the canonical `vision` (fogged) modality.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 231 |
- **`pbox` costs 600** (not the 400 some old specs assumed);
|
| 232 |
defense and infantry are SEPARATE production queues so an
|
| 233 |
efficient policy queues `build('pbox')` and `build('e1')` in
|
|
|
|
| 228 |
the full grid with `run_eval --perception-sweep` (expands every
|
| 229 |
`pack:level` into `pack:level:<mode>`); the human Play tab stays
|
| 230 |
on the canonical `vision` (fogged) modality.
|
| 231 |
+
- **Handoff ablation** (`openra_bench/handoff.py`, `run_eval
|
| 232 |
+
--handoff-sweep`). A `HandoffController` lets a `prefix` controller
|
| 233 |
+
play the first K turns, then the model inherits the live game state
|
| 234 |
+
("take over from here" โ a pure STATE handoff, no transcript). A
|
| 235 |
+
`stall` prefix hands the model a losing position (recovery test); a
|
| 236 |
+
replayed winning trajectory (`TrajectoryController`, sourced from a
|
| 237 |
+
`--handoff-bank` of Playback runs) hands it a winning one
|
| 238 |
+
(capitalize-on-advantage). Sweep cells are
|
| 239 |
+
`pack:level:handoff-{base,bad,good}`. Every result carries a
|
| 240 |
+
`passivity` stat โ the fraction of the model's turns spent on
|
| 241 |
+
`observe`/`stop` only โ the freeze-and-panic signal. A replayed
|
| 242 |
+
trajectory MUST come from the same `pack:level:seed` (engine actor
|
| 243 |
+
ids are seed-deterministic).
|
| 244 |
- **`pbox` costs 600** (not the 400 some old specs assumed);
|
| 245 |
defense and infantry are SEPARATE production queues so an
|
| 246 |
efficient policy queues `build('pbox')` and `build('e1')` in
|
|
@@ -0,0 +1,165 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Handoff ablation โ hand a model a partially-played game.
|
| 2 |
+
|
| 3 |
+
A handoff episode is split: a `prefix` controller plays the first `k`
|
| 4 |
+
turns, then the model inherits the live game state and finishes it.
|
| 5 |
+
It is a PURE STATE handoff โ the model gets no transcript of the
|
| 6 |
+
prefix, only the board it produced ("take over from here").
|
| 7 |
+
|
| 8 |
+
Sweeping the prefix QUALITY decomposes two capabilities:
|
| 9 |
+
|
| 10 |
+
* a **good** prefix (a winning trajectory) โ can the model *capitalize
|
| 11 |
+
on an advantage*? A flat-low outcome curve means it derails even a
|
| 12 |
+
won position.
|
| 13 |
+
* a **bad** prefix (a losing trajectory, or `stall`) โ can the model
|
| 14 |
+
*recover from a deficit*? This is the controlled measurement of the
|
| 15 |
+
freeze-and-panic failure mode: handed a losing board, does the model
|
| 16 |
+
fight (retreat / redirect) or sit on `observe`/`stop`? The
|
| 17 |
+
`passivity` stat on the result quantifies exactly that.
|
| 18 |
+
|
| 19 |
+
The prefix is a recorded run replayed turn-for-turn. Because engine
|
| 20 |
+
actor ids are seed-deterministic, a replayed trajectory MUST come from
|
| 21 |
+
the same `pack:level:seed` as the handoff episode.
|
| 22 |
+
"""
|
| 23 |
+
|
| 24 |
+
from __future__ import annotations
|
| 25 |
+
|
| 26 |
+
import json
|
| 27 |
+
from pathlib import Path
|
| 28 |
+
from typing import Any
|
| 29 |
+
|
| 30 |
+
from .controller import BaseController, as_controller, introspection_source
|
| 31 |
+
|
| 32 |
+
# A turn is "passive" when the model issued nothing but these โ the
|
| 33 |
+
# freeze-and-panic tell (low-commitment default instead of an active
|
| 34 |
+
# retreat / redirect).
|
| 35 |
+
_PASSIVE_TOOLS = {"observe", "stop"}
|
| 36 |
+
|
| 37 |
+
|
| 38 |
+
def stall_policy(render_state: dict, Command: Any) -> list:
|
| 39 |
+
"""The canonical losing prefix: do nothing, every turn. Synthesises
|
| 40 |
+
a guaranteed-deficit handoff with no recorded trajectory needed."""
|
| 41 |
+
return [Command.observe()]
|
| 42 |
+
|
| 43 |
+
|
| 44 |
+
def _load_trajectory(source: Any) -> list[list[dict]]:
|
| 45 |
+
"""Per-turn tool-call lists from a recorded run. `source` may be a
|
| 46 |
+
ready list, a Playback directory, or a path to its messages.json."""
|
| 47 |
+
if isinstance(source, list):
|
| 48 |
+
return source
|
| 49 |
+
p = Path(source)
|
| 50 |
+
if p.is_dir():
|
| 51 |
+
p = p / "messages.json"
|
| 52 |
+
msgs = json.loads(p.read_text())
|
| 53 |
+
turns: list[list[dict]] = []
|
| 54 |
+
for m in msgs:
|
| 55 |
+
if m.get("role") != "assistant":
|
| 56 |
+
continue
|
| 57 |
+
calls: list[dict] = []
|
| 58 |
+
for tc in m.get("tool_calls") or []:
|
| 59 |
+
fn = tc.get("function") or {}
|
| 60 |
+
args = fn.get("arguments")
|
| 61 |
+
if isinstance(args, str):
|
| 62 |
+
try:
|
| 63 |
+
args = json.loads(args)
|
| 64 |
+
except (ValueError, TypeError):
|
| 65 |
+
args = {}
|
| 66 |
+
calls.append({"name": fn.get("name"), "arguments": args or {}})
|
| 67 |
+
turns.append(calls)
|
| 68 |
+
return turns
|
| 69 |
+
|
| 70 |
+
|
| 71 |
+
class TrajectoryController(BaseController):
|
| 72 |
+
"""Replays a recorded run: turn N re-issues the commands the
|
| 73 |
+
recorded agent issued on its turn N. Past the recording's end it
|
| 74 |
+
falls back to `observe()`. Used as a deterministic handoff prefix โ
|
| 75 |
+
a `win`-outcome run is a good prefix, a `loss` is a bad one."""
|
| 76 |
+
|
| 77 |
+
def __init__(self, source: Any, name: str | None = None) -> None:
|
| 78 |
+
super().__init__(name or "trajectory")
|
| 79 |
+
self._turns = _load_trajectory(source)
|
| 80 |
+
self._i = 0
|
| 81 |
+
|
| 82 |
+
def reset(self, ctx: Any) -> None:
|
| 83 |
+
self._i = 0
|
| 84 |
+
|
| 85 |
+
def act(self, observation: dict, Command: Any) -> list:
|
| 86 |
+
from .agent import _to_commands
|
| 87 |
+
|
| 88 |
+
if self._i < len(self._turns):
|
| 89 |
+
calls = self._turns[self._i]
|
| 90 |
+
self._i += 1
|
| 91 |
+
return _to_commands(calls, Command) or [Command.observe()]
|
| 92 |
+
return [Command.observe()]
|
| 93 |
+
|
| 94 |
+
|
| 95 |
+
def _is_passive(cmds: list, _cmd_name) -> bool:
|
| 96 |
+
"""A turn with no command other than observe/stop (or no command)."""
|
| 97 |
+
if not cmds:
|
| 98 |
+
return True
|
| 99 |
+
return all((_cmd_name(c) or "") in _PASSIVE_TOOLS for c in cmds)
|
| 100 |
+
|
| 101 |
+
|
| 102 |
+
class HandoffController(BaseController):
|
| 103 |
+
"""`prefix` plays turns 0..k-1, then `main` inherits the live state
|
| 104 |
+
and finishes the episode. Pure state handoff โ `main` carries no
|
| 105 |
+
transcript of the prefix.
|
| 106 |
+
|
| 107 |
+
`handoff_stats` accumulates, over the MAIN agent's turns only:
|
| 108 |
+
`main_turns`, `passive_turns` (observe/stop-only), and `passivity`
|
| 109 |
+
(their ratio) โ the freeze-and-panic signal. When the prefix handed
|
| 110 |
+
`main` a losing position, `passivity` IS passivity-under-pressure."""
|
| 111 |
+
|
| 112 |
+
def __init__(
|
| 113 |
+
self, prefix: Any, main: Any, k: int, name: str | None = None
|
| 114 |
+
) -> None:
|
| 115 |
+
super().__init__(name or f"handoff-k{int(k)}")
|
| 116 |
+
self._prefix = as_controller(prefix)
|
| 117 |
+
self._main = as_controller(main)
|
| 118 |
+
self._k = max(0, int(k))
|
| 119 |
+
self._turn = 0
|
| 120 |
+
# Playback should record the MAIN agent's transcript, not this
|
| 121 |
+
# wrapper's โ expose it as the introspection source.
|
| 122 |
+
self.source = introspection_source(self._main)
|
| 123 |
+
self.handoff_stats = self._fresh_stats()
|
| 124 |
+
|
| 125 |
+
def _fresh_stats(self) -> dict:
|
| 126 |
+
return {
|
| 127 |
+
"k": self._k, "main_turns": 0,
|
| 128 |
+
"passive_turns": 0, "passivity": 0.0,
|
| 129 |
+
}
|
| 130 |
+
|
| 131 |
+
def reset(self, ctx: Any) -> None:
|
| 132 |
+
self._turn = 0
|
| 133 |
+
self._prefix.reset(ctx)
|
| 134 |
+
self._main.reset(ctx)
|
| 135 |
+
self.handoff_stats = self._fresh_stats()
|
| 136 |
+
|
| 137 |
+
def act(self, observation: dict, Command: Any) -> list:
|
| 138 |
+
if self._turn < self._k:
|
| 139 |
+
self._turn += 1
|
| 140 |
+
return self._prefix.act(observation, Command)
|
| 141 |
+
cmds = self._main.act(observation, Command)
|
| 142 |
+
self._turn += 1
|
| 143 |
+
from .eval_core import _cmd_tool_name
|
| 144 |
+
|
| 145 |
+
st = self.handoff_stats
|
| 146 |
+
st["main_turns"] += 1
|
| 147 |
+
if _is_passive(cmds, _cmd_tool_name):
|
| 148 |
+
st["passive_turns"] += 1
|
| 149 |
+
st["passivity"] = st["passive_turns"] / st["main_turns"]
|
| 150 |
+
return cmds
|
| 151 |
+
|
| 152 |
+
|
| 153 |
+
def run_handoff(
|
| 154 |
+
compiled: Any, main: Any, prefix: Any, k: int,
|
| 155 |
+
seed: int = 0, playback: Any = None,
|
| 156 |
+
):
|
| 157 |
+
"""Run a handoff episode: `prefix` plays the first `k` turns, `main`
|
| 158 |
+
finishes. Returns the `EpisodeResult` with `.handoff_stats` attached
|
| 159 |
+
(k, main_turns, passive_turns, passivity)."""
|
| 160 |
+
from .eval_core import run_level
|
| 161 |
+
|
| 162 |
+
ctrl = HandoffController(prefix, main, k)
|
| 163 |
+
res = run_level(compiled, ctrl, seed, playback)
|
| 164 |
+
res.handoff_stats = dict(ctrl.handoff_stats)
|
| 165 |
+
return res
|
|
@@ -99,6 +99,50 @@ def _agg(scores: list) -> dict:
|
|
| 99 |
}
|
| 100 |
|
| 101 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 102 |
def evaluate(
|
| 103 |
packs: list[Path],
|
| 104 |
levels: list[str],
|
|
@@ -118,6 +162,9 @@ def evaluate(
|
|
| 118 |
report_path: str | Path | None = None,
|
| 119 |
progress=None,
|
| 120 |
perception_sweep: bool = False,
|
|
|
|
|
|
|
|
|
|
| 121 |
) -> dict:
|
| 122 |
"""Run packsรlevelsรseeds. If `held_out_seeds` is given, those are
|
| 123 |
run too and tagged split='held_out'; the report adds
|
|
@@ -129,6 +176,14 @@ def evaluate(
|
|
| 129 |
ablation cells (`pack:level:<mode>` for mode in PERCEPTION_MODES โ
|
| 130 |
vision/structured ร fog/no-fog) instead of the raw 3 levels, so one
|
| 131 |
run yields the full channel-cost / fog-cost decomposition.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 132 |
"""
|
| 133 |
from .resilience import (
|
| 134 |
BudgetExceeded,
|
|
@@ -220,6 +275,16 @@ def evaluate(
|
|
| 220 |
cl.fog_mode = mode
|
| 221 |
cl.config_name = f"{lv}:{mode}"
|
| 222 |
unit_iter.append((cl, f"{pack.meta.id}:{lv}:{mode}"))
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 223 |
# Declared configs (pack:config_name, each pins level+fog_mode)
|
| 224 |
# supersede the raw 3-level enumeration when present.
|
| 225 |
elif pack.configs:
|
|
@@ -258,7 +323,19 @@ def evaluate(
|
|
| 258 |
seed,
|
| 259 |
)
|
| 260 |
pb.run_id, pb.model = run_id, model
|
| 261 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 262 |
sc = score_episode(compiled, res)
|
| 263 |
if pb is not None:
|
| 264 |
(pb.dir / "score.json").write_text(
|
|
@@ -292,6 +369,8 @@ def evaluate(
|
|
| 292 |
"reward_vector": res.reward_vector,
|
| 293 |
"turns": res.turns,
|
| 294 |
"notes": sc.notes,
|
|
|
|
|
|
|
| 295 |
"_sc": sc,
|
| 296 |
}
|
| 297 |
|
|
@@ -600,6 +679,15 @@ def main(argv: list[str]) -> int:
|
|
| 600 |
help="run the 2x2 perception ablation: every "
|
| 601 |
"pack:level expanded into vision/structured x "
|
| 602 |
"fog/no-fog (pack:level:<mode>)")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 603 |
a = ap.parse_args(argv[1:])
|
| 604 |
|
| 605 |
cfg = None
|
|
@@ -642,6 +730,9 @@ def main(argv: list[str]) -> int:
|
|
| 642 |
dry_run=a.dry_run,
|
| 643 |
report_path=a.out,
|
| 644 |
perception_sweep=a.perception_sweep,
|
|
|
|
|
|
|
|
|
|
| 645 |
progress=lambda d, n, rec, c: print(
|
| 646 |
f"[{d}/{n}] {rec['cell']}:{rec['split']}#{rec['seed']} "
|
| 647 |
f"{rec['outcome']} comp={rec['composite']} "
|
|
|
|
| 99 |
}
|
| 100 |
|
| 101 |
|
| 102 |
+
def _find_win_trajectory(bank: str | Path, cell: str, seed: int) -> str | None:
|
| 103 |
+
"""Path to a winning run's messages.json for this cell+seed, scanned
|
| 104 |
+
from a `--handoff-bank` directory of Playback runs โ the good-prefix
|
| 105 |
+
source. None when the bank holds no matching win. (Engine actor ids
|
| 106 |
+
are seed-deterministic, so the trajectory must match pack/level/seed
|
| 107 |
+
for a faithful replay.)"""
|
| 108 |
+
base = cell.rsplit(":handoff-", 1)[0] # "pack:level"
|
| 109 |
+
pack_id, _, level = base.partition(":")
|
| 110 |
+
for mf in sorted(Path(bank).rglob("manifest.json")):
|
| 111 |
+
try:
|
| 112 |
+
m = json.loads(mf.read_text())
|
| 113 |
+
except (ValueError, OSError):
|
| 114 |
+
continue
|
| 115 |
+
if (
|
| 116 |
+
str(m.get("pack_id")) == pack_id
|
| 117 |
+
and str(m.get("level")) == level
|
| 118 |
+
and int(m.get("seed", -1)) == int(seed)
|
| 119 |
+
and str(m.get("outcome")) == "win"
|
| 120 |
+
and (mf.parent / "messages.json").exists()
|
| 121 |
+
):
|
| 122 |
+
return str(mf.parent / "messages.json")
|
| 123 |
+
return None
|
| 124 |
+
|
| 125 |
+
|
| 126 |
+
def _handoff_wrap(agent, cell: str, seed: int, k: int, bank):
|
| 127 |
+
"""Wrap `agent` in a HandoffController for a `:handoff-<kind>` cell.
|
| 128 |
+
Returns (controller, note)."""
|
| 129 |
+
from .handoff import HandoffController, TrajectoryController, stall_policy
|
| 130 |
+
|
| 131 |
+
kind = cell.rsplit(":handoff-", 1)[1]
|
| 132 |
+
if kind == "bad": # losing prefix โ the recovery / freeze test
|
| 133 |
+
return HandoffController(stall_policy, agent, k), ""
|
| 134 |
+
if kind == "good": # winning prefix โ capitalize-on-advantage
|
| 135 |
+
traj = _find_win_trajectory(bank, cell, seed) if bank else None
|
| 136 |
+
if traj is None:
|
| 137 |
+
return (
|
| 138 |
+
HandoffController(stall_policy, agent, 0),
|
| 139 |
+
f"no winning trajectory in bank for seed {seed} โ ran as base",
|
| 140 |
+
)
|
| 141 |
+
return HandoffController(TrajectoryController(traj), agent, k), ""
|
| 142 |
+
# base โ k=0; the model plays the whole episode (baseline passivity).
|
| 143 |
+
return HandoffController(stall_policy, agent, 0), ""
|
| 144 |
+
|
| 145 |
+
|
| 146 |
def evaluate(
|
| 147 |
packs: list[Path],
|
| 148 |
levels: list[str],
|
|
|
|
| 162 |
report_path: str | Path | None = None,
|
| 163 |
progress=None,
|
| 164 |
perception_sweep: bool = False,
|
| 165 |
+
handoff_sweep: bool = False,
|
| 166 |
+
handoff_k: int = 3,
|
| 167 |
+
handoff_bank: str | Path | None = None,
|
| 168 |
) -> dict:
|
| 169 |
"""Run packsรlevelsรseeds. If `held_out_seeds` is given, those are
|
| 170 |
run too and tagged split='held_out'; the report adds
|
|
|
|
| 176 |
ablation cells (`pack:level:<mode>` for mode in PERCEPTION_MODES โ
|
| 177 |
vision/structured ร fog/no-fog) instead of the raw 3 levels, so one
|
| 178 |
run yields the full channel-cost / fog-cost decomposition.
|
| 179 |
+
|
| 180 |
+
`handoff_sweep` expands every packรlevel into handoff cells
|
| 181 |
+
(`pack:level:handoff-{base,bad,good}`): the model plays the whole
|
| 182 |
+
episode (`base`), or inherits a losing position after a `stall`
|
| 183 |
+
prefix (`bad` โ the recovery / freeze-and-panic test), or a winning
|
| 184 |
+
position replayed from a `handoff_bank` trajectory (`good` โ the
|
| 185 |
+
capitalize-on-advantage test). `handoff_k` is the prefix length.
|
| 186 |
+
Each record carries a `passivity` stat (observe/stop-only fraction).
|
| 187 |
"""
|
| 188 |
from .resilience import (
|
| 189 |
BudgetExceeded,
|
|
|
|
| 275 |
cl.fog_mode = mode
|
| 276 |
cl.config_name = f"{lv}:{mode}"
|
| 277 |
unit_iter.append((cl, f"{pack.meta.id}:{lv}:{mode}"))
|
| 278 |
+
# Handoff sweep: each level as base / bad / good handoff cells.
|
| 279 |
+
# `good` needs a winning trajectory from the bank โ emitted only
|
| 280 |
+
# when a bank is supplied; `base`/`bad` always run.
|
| 281 |
+
elif handoff_sweep:
|
| 282 |
+
kinds = ["base", "bad"] + (["good"] if handoff_bank else [])
|
| 283 |
+
unit_iter = [
|
| 284 |
+
(compile_level(pack, lv), f"{pack.meta.id}:{lv}:handoff-{kind}")
|
| 285 |
+
for lv in levels
|
| 286 |
+
for kind in kinds
|
| 287 |
+
]
|
| 288 |
# Declared configs (pack:config_name, each pins level+fog_mode)
|
| 289 |
# supersede the raw 3-level enumeration when present.
|
| 290 |
elif pack.configs:
|
|
|
|
| 323 |
seed,
|
| 324 |
)
|
| 325 |
pb.run_id, pb.model = run_id, model
|
| 326 |
+
ctrl = factory(compiled)
|
| 327 |
+
if handoff_sweep and ":handoff-" in cell:
|
| 328 |
+
ctrl, _hnote = _handoff_wrap(
|
| 329 |
+
ctrl, cell, seed, handoff_k, handoff_bank
|
| 330 |
+
)
|
| 331 |
+
else:
|
| 332 |
+
_hnote = ""
|
| 333 |
+
res = run_level(compiled, ctrl, seed=seed, playback=pb)
|
| 334 |
+
hstats = getattr(ctrl, "handoff_stats", None)
|
| 335 |
+
if hstats is not None:
|
| 336 |
+
hstats = dict(hstats)
|
| 337 |
+
if _hnote:
|
| 338 |
+
hstats["note"] = _hnote
|
| 339 |
sc = score_episode(compiled, res)
|
| 340 |
if pb is not None:
|
| 341 |
(pb.dir / "score.json").write_text(
|
|
|
|
| 369 |
"reward_vector": res.reward_vector,
|
| 370 |
"turns": res.turns,
|
| 371 |
"notes": sc.notes,
|
| 372 |
+
"passivity": hstats.get("passivity") if hstats else None,
|
| 373 |
+
"handoff": hstats,
|
| 374 |
"_sc": sc,
|
| 375 |
}
|
| 376 |
|
|
|
|
| 679 |
help="run the 2x2 perception ablation: every "
|
| 680 |
"pack:level expanded into vision/structured x "
|
| 681 |
"fog/no-fog (pack:level:<mode>)")
|
| 682 |
+
ap.add_argument("--handoff-sweep", action="store_true",
|
| 683 |
+
help="run the handoff ablation: each pack:level as "
|
| 684 |
+
"handoff-base / handoff-bad (recovery) / handoff-good "
|
| 685 |
+
"(capitalize) cells")
|
| 686 |
+
ap.add_argument("--handoff-k", type=int, default=3,
|
| 687 |
+
help="handoff prefix length in turns (default 3)")
|
| 688 |
+
ap.add_argument("--handoff-bank", default=None,
|
| 689 |
+
help="dir of Playback runs โ source of winning "
|
| 690 |
+
"trajectories for the handoff-good prefix")
|
| 691 |
a = ap.parse_args(argv[1:])
|
| 692 |
|
| 693 |
cfg = None
|
|
|
|
| 730 |
dry_run=a.dry_run,
|
| 731 |
report_path=a.out,
|
| 732 |
perception_sweep=a.perception_sweep,
|
| 733 |
+
handoff_sweep=a.handoff_sweep,
|
| 734 |
+
handoff_k=a.handoff_k,
|
| 735 |
+
handoff_bank=a.handoff_bank,
|
| 736 |
progress=lambda d, n, rec, c: print(
|
| 737 |
f"[{d}/{n}] {rec['cell']}:{rec['split']}#{rec['seed']} "
|
| 738 |
f"{rec['outcome']} comp={rec['composite']} "
|
|
@@ -0,0 +1,152 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""The handoff ablation โ hand a model a partially-played game.
|
| 2 |
+
|
| 3 |
+
A `prefix` controller plays the first K turns, then the model inherits
|
| 4 |
+
the live state. A GOOD prefix (winning trajectory) tests
|
| 5 |
+
capitalize-on-advantage; a BAD prefix (`stall`) tests recovery โ and
|
| 6 |
+
the `passivity` stat (observe/stop-only turns) quantifies the
|
| 7 |
+
freeze-and-panic failure mode the recovery cell is built to expose.
|
| 8 |
+
"""
|
| 9 |
+
|
| 10 |
+
from __future__ import annotations
|
| 11 |
+
|
| 12 |
+
import json
|
| 13 |
+
from pathlib import Path
|
| 14 |
+
|
| 15 |
+
import pytest
|
| 16 |
+
|
| 17 |
+
pytest.importorskip("openra_rl_training", reason="Rust env wheel not installed")
|
| 18 |
+
|
| 19 |
+
from openra_bench.handoff import (HandoffController, TrajectoryController,
|
| 20 |
+
_load_trajectory, run_handoff, stall_policy)
|
| 21 |
+
from openra_bench.scenarios import load_pack
|
| 22 |
+
from openra_bench.scenarios.loader import compile_level
|
| 23 |
+
|
| 24 |
+
PACKS = Path(__file__).parent.parent / "openra_bench" / "scenarios" / "packs"
|
| 25 |
+
_PACK = "perception-count-the-threat.yaml"
|
| 26 |
+
|
| 27 |
+
|
| 28 |
+
def _compiled(level: str = "easy"):
|
| 29 |
+
return compile_level(load_pack(PACKS / _PACK), level)
|
| 30 |
+
|
| 31 |
+
|
| 32 |
+
# โโ Trajectory loading / replay โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
|
| 33 |
+
|
| 34 |
+
def test_load_trajectory_list_passthrough():
|
| 35 |
+
traj = [[{"name": "observe", "arguments": {}}]]
|
| 36 |
+
assert _load_trajectory(traj) is traj
|
| 37 |
+
|
| 38 |
+
|
| 39 |
+
def test_load_trajectory_from_messages_json(tmp_path):
|
| 40 |
+
msgs = [
|
| 41 |
+
{"role": "system", "content": "x"},
|
| 42 |
+
{"role": "user", "content": "turn 1"},
|
| 43 |
+
{"role": "assistant", "content": "", "tool_calls": [
|
| 44 |
+
{"id": "c0", "type": "function", "function": {
|
| 45 |
+
"name": "move_units",
|
| 46 |
+
"arguments": {"unit_ids": [1], "target_x": 5, "target_y": 5},
|
| 47 |
+
}}]},
|
| 48 |
+
{"role": "tool", "tool_call_id": "c0", "content": "ok"},
|
| 49 |
+
{"role": "user", "content": "turn 2"},
|
| 50 |
+
{"role": "assistant", "content": "", "tool_calls": [
|
| 51 |
+
{"id": "c0", "type": "function",
|
| 52 |
+
"function": {"name": "observe", "arguments": {}}}]},
|
| 53 |
+
]
|
| 54 |
+
p = tmp_path / "messages.json"
|
| 55 |
+
p.write_text(json.dumps(msgs))
|
| 56 |
+
turns = _load_trajectory(p)
|
| 57 |
+
assert len(turns) == 2
|
| 58 |
+
assert turns[0][0]["name"] == "move_units"
|
| 59 |
+
assert turns[1][0]["name"] == "observe"
|
| 60 |
+
|
| 61 |
+
|
| 62 |
+
def test_trajectory_controller_replays_then_falls_back():
|
| 63 |
+
import openra_train
|
| 64 |
+
|
| 65 |
+
C = openra_train.Command
|
| 66 |
+
tc = TrajectoryController([
|
| 67 |
+
[{"name": "observe", "arguments": {}}],
|
| 68 |
+
[{"name": "stop", "arguments": {"unit_ids": [1]}}],
|
| 69 |
+
])
|
| 70 |
+
tc.reset(None)
|
| 71 |
+
assert "Observe" in repr(tc.act({}, C)[0])
|
| 72 |
+
assert "Stop" in repr(tc.act({}, C)[0])
|
| 73 |
+
# past the recording's end โ observe
|
| 74 |
+
assert "Observe" in repr(tc.act({}, C)[0])
|
| 75 |
+
|
| 76 |
+
|
| 77 |
+
# โโ Handoff switch + passivity โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
|
| 78 |
+
|
| 79 |
+
def test_handoff_switches_prefix_to_main_at_k():
|
| 80 |
+
pcalls, mcalls = [], []
|
| 81 |
+
|
| 82 |
+
def prefix(rs, C):
|
| 83 |
+
pcalls.append(1)
|
| 84 |
+
return [C.observe()]
|
| 85 |
+
|
| 86 |
+
def main(rs, C):
|
| 87 |
+
mcalls.append(1)
|
| 88 |
+
return [C.observe()]
|
| 89 |
+
|
| 90 |
+
res = run_handoff(_compiled("easy"), main=main, prefix=prefix, k=3, seed=1)
|
| 91 |
+
assert len(pcalls) == 3, "prefix must play exactly k turns"
|
| 92 |
+
assert len(mcalls) == res.turns - 3, "main plays the remainder"
|
| 93 |
+
assert res.handoff_stats["k"] == 3
|
| 94 |
+
assert res.handoff_stats["main_turns"] == len(mcalls)
|
| 95 |
+
|
| 96 |
+
|
| 97 |
+
def test_passivity_is_one_when_main_freezes():
|
| 98 |
+
"""A main that only ever observes scores passivity 1.0 โ the
|
| 99 |
+
freeze-and-panic signal; an active policy scores low."""
|
| 100 |
+
from openra_bench.eval_core import scripted_explore_agent
|
| 101 |
+
|
| 102 |
+
frozen = run_handoff(
|
| 103 |
+
_compiled("medium"), main=stall_policy, prefix=stall_policy,
|
| 104 |
+
k=2, seed=1,
|
| 105 |
+
)
|
| 106 |
+
assert frozen.handoff_stats["passivity"] == 1.0
|
| 107 |
+
|
| 108 |
+
active = run_handoff(
|
| 109 |
+
_compiled("medium"), main=scripted_explore_agent,
|
| 110 |
+
prefix=stall_policy, k=2, seed=1,
|
| 111 |
+
)
|
| 112 |
+
assert active.handoff_stats["passivity"] < 0.5
|
| 113 |
+
|
| 114 |
+
|
| 115 |
+
def test_k0_handoff_is_a_full_episode():
|
| 116 |
+
from openra_bench.eval_core import scripted_explore_agent
|
| 117 |
+
|
| 118 |
+
res = run_handoff(
|
| 119 |
+
_compiled("easy"), main=scripted_explore_agent,
|
| 120 |
+
prefix=stall_policy, k=0, seed=1,
|
| 121 |
+
)
|
| 122 |
+
assert res.handoff_stats["main_turns"] == res.turns
|
| 123 |
+
|
| 124 |
+
|
| 125 |
+
# โโ Sweep wiring โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
|
| 126 |
+
|
| 127 |
+
def test_handoff_sweep_expands_base_and_bad_cells():
|
| 128 |
+
from openra_bench.run_eval import evaluate
|
| 129 |
+
|
| 130 |
+
out = evaluate(
|
| 131 |
+
[PACKS / _PACK], levels=["easy"], seeds=[1],
|
| 132 |
+
handoff_sweep=True, dry_run=True,
|
| 133 |
+
)
|
| 134 |
+
assert set(out["cells"]) == {
|
| 135 |
+
"perception-count-the-threat:easy:handoff-base",
|
| 136 |
+
"perception-count-the-threat:easy:handoff-bad",
|
| 137 |
+
}
|
| 138 |
+
|
| 139 |
+
|
| 140 |
+
def test_find_win_trajectory_matches_a_banked_win(tmp_path):
|
| 141 |
+
from openra_bench.run_eval import _find_win_trajectory
|
| 142 |
+
|
| 143 |
+
d = tmp_path / "run" / "p__seed1"
|
| 144 |
+
d.mkdir(parents=True)
|
| 145 |
+
(d / "manifest.json").write_text(json.dumps(
|
| 146 |
+
{"pack_id": "p", "level": "easy", "seed": 1, "outcome": "win"}))
|
| 147 |
+
(d / "messages.json").write_text("[]")
|
| 148 |
+
assert _find_win_trajectory(
|
| 149 |
+
tmp_path, "p:easy:handoff-good", 1
|
| 150 |
+
) == str(d / "messages.json")
|
| 151 |
+
# a different seed / a loss is not matched
|
| 152 |
+
assert _find_win_trajectory(tmp_path, "p:easy:handoff-good", 2) is None
|