run_eval: add --bedrock-region flag and route bedrock provider config a3ba9ba Xiaochuang Yuan commited on May 23
Bucket D: discover archived packs + load adversarial-siege from _archive 0f4480c Xiaochuang Yuan commited on May 23
Quality drive: schema fix, 5 new/revised packs, 4 engine tests, scenario audit 6d71d3b yxc20098 commited on May 23
Add handoff ablation (recover-from-deficit / capitalize-on-advantage) cb15568 yxc20098 commited on May 22
feat(providers): add together.ai preset (Qwen3.6-Plus and friends) 90c4b50 yxc20098 commited on May 20
action-multiunit-coordination hard: spatial-grounding via relative-direction objective 51d66ad yxc20098 commited on May 19
Scenario configs (level+fog per cell) + adversarial-duel duel/interrupts + clearer names f244b78 yxc20098 commited on May 19
run_eval: --or-provider (pin OpenRouter provider/quant, no fallback) + --fog-mode CLI cf788d9 yxc20098 commited on May 19
Structured-fog text mode, premium routing, codex descriptions, minimap colour-by-difficulty 93ee9dd yxc20098 commited on May 19
Wire bench to vendored training prompt v2 (system/briefing/minimap) 8e88074 yxc20098 commited on May 19
Training-parity minimap (real terrain + legend) + viewer (system/thinking/debrief) 39fba02 yxc20098 commited on May 18
Live-smoke fixes: tool-call wire 400, episode resilience, real PNG minimap 247ff7a yxc20098 commited on May 18
Deterministic scenario-scoped game knowledge + explicit objective 049448a yxc20098 commited on May 18
Generalization-gap metric: held-out split in run_eval + leaderboard 03e4efa yxc20098 commited on May 17