Commit History

env_client: retry transient 5xx + typed EnvUnavailableError; trainer survives env outages
839b00f

Viraj commited on

action_parser: strict variant strips Qwen3 <think>...</think> before json.loads
4c2ea83

Viraj commited on

publish: auto push adapter (and optional merged) to HF Hub after training
92fd4b1

Viraj commited on

Switch base model to Qwen/Qwen3-8B (Qwen3.5-9B is multimodal Qwen3_5ForConditionalGeneration, unsupported by unsloth)
bb9a838

Viraj commited on

Fix training pipeline: TRL>=0.25 rollout, real generation, curriculum/W&B callbacks
8af730f

Viraj commited on

Add episode replay dashboard and GRPO training pipeline
509ef18

Navaneeth Sharma commited on

Update Red/Blue showdown behavior and refresh Qwen benchmark artifacts.
f4ce885

Viraj commited on

refactor: enhance type safety in inference and evaluation scripts; update pyright config to exclude specific directories
2780361

Viraj commited on

refactor: enhance type safety in inference and evaluation scripts; update pyright config to exclude specific directories
e40ec5e

Viraj commited on

feat: improve blue showdown runtime context
c4730ff

Viraj commited on

feat: add red reasoning metadata
6947df8

Viraj commited on

feat: harden environment integration
ef2c8af

Viraj commited on

feat: add dense red reward scoring
027331e

Viraj commited on

feat: add blue defender curriculum
a0cce81

Viraj commited on

feat: add raw bash red agent access
5d7586e

Viraj commited on

feat: port WarGames phase 0 environment
20eb0ca

Viraj commited on