env_client: retry transient 5xx + typed EnvUnavailableError; trainer survives env outages 839b00f Viraj commited on Apr 26
action_parser: strict variant strips Qwen3 <think>...</think> before json.loads 4c2ea83 Viraj commited on Apr 26
publish: auto push adapter (and optional merged) to HF Hub after training 92fd4b1 Viraj commited on Apr 25
Switch base model to Qwen/Qwen3-8B (Qwen3.5-9B is multimodal Qwen3_5ForConditionalGeneration, unsupported by unsloth) bb9a838 Viraj commited on Apr 25
Fix training pipeline: TRL>=0.25 rollout, real generation, curriculum/W&B callbacks 8af730f Viraj commited on Apr 25
Update Red/Blue showdown behavior and refresh Qwen benchmark artifacts. f4ce885 Viraj commited on Apr 25
refactor: enhance type safety in inference and evaluation scripts; update pyright config to exclude specific directories 2780361 Viraj commited on Apr 25
refactor: enhance type safety in inference and evaluation scripts; update pyright config to exclude specific directories e40ec5e Viraj commited on Apr 25