feat: add rollout observability — JSONL logs + per-step action/reward printing b0b9657 qtzx06 commited on Mar 8
fix: drop vLLM colocate (version conflict), use native GRPO generation eafdbce qtzx06 commited on Mar 8
feat: rewrite training to use TRL rollout_func + OpenEnv multi-turn pattern 93f58fd qtzx06 commited on Mar 8
feat: add --mode infer for Qwen inference test, default to Qwen3.5-9B 0b5e8b0 qtzx06 commited on Mar 8