0x960 / train

Commit History

feat: finalize swarm tooling and submission artifacts
eac9d9f

qtzx06 commited on

feat: add rollout observability — JSONL logs + per-step action/reward printing
b0b9657

qtzx06 commited on

fix: load 4-bit model manually before passing to GRPOTrainer
7cf0d25

qtzx06 commited on

fix: use QLoRA (4-bit + LoRA) for Qwen3.5-9B training
93a63c7

qtzx06 commited on

fix: add LoRA + gradient checkpointing to fit 9B on single H100
e83a908

qtzx06 commited on

fix: drop vLLM colocate (version conflict), use native GRPO generation
eafdbce

qtzx06 commited on

feat: rewrite training to use TRL rollout_func + OpenEnv multi-turn pattern
93f58fd

qtzx06 commited on

feat: add --mode infer for Qwen inference test, default to Qwen3.5-9B
0b5e8b0

qtzx06 commited on

feat: fix openenv 0.2.1 API, add deployment files and GRPO training
ea3bbb3

qtzx06 commited on

feat: add openenv wrapper and training stub
eb29dc8

qtzx06 commited on

chore: add project scaffolding
165e54c

qtzx06 commited on