Semantic run names: Probe/Drift/Anchor/Restrain/Champion + regen all plots 84fbeda Anurag Agarwal commited on 16 days ago
Run 6 results + training fixes + all plots regenerated aae07d0 Anurag Agarwal commited on 16 days ago
plots: add training progression + diagnostics, drop W&B links 099bec8 verified agarwalanu3103 commited on 16 days ago
eval: enforce one-tool-call response format on every turn a22fcfd verified agarwalanu3103 commited on 17 days ago
feat: add run_eval.py to Space (needed by eval_with_vllm.py for trained-model evals) 6473a24 verified agarwalanu3103 commited on 17 days ago