optigami_ / training

Commit History

Fix dashboard logging URL to use proxy path, force Docker rebuild
a2c523c
Running

sissississi Claude Opus 4.6 commited on

Fix MAX_SEQ_LENGTH: 1024 was too small for prompt+completion, bump to 2048
9b2abc6

sissississi Claude Opus 4.6 commited on

Fix Qwen3 thinking mode: add /no_think, increase max_completion_length
9a9721a

sissississi Claude Opus 4.6 commited on

Fix API response structure: done/reward are top-level, not in observation
444b086

sissississi Claude Opus 4.6 commited on

Hardcode task definitions in notebook to avoid /tasks API dependency
e19247c

sissississi Claude Opus 4.6 commited on

Redesign frontend as training dashboard + add live activity feed
d662461

sissississi Claude Opus 4.6 commited on

Route rewards through OpenEnv API instead of local computation
c0cedb4

sissississi Claude Opus 4.6 commited on

Fix GRPO: remove SFT, multi-task dataset, instruct model only
490094b

sissississi Claude Opus 4.6 commited on

Fix training: add SFT warmup + switch to instruct model
4edf79e

sissississi Claude Opus 4.6 commited on

Update training notebook: vLLM fast inference, Qwen3-4B, max_steps=300
4859185

sissississi Claude Opus 4.6 commited on

Add GRPO training Colab notebook
c228f1f

sissississi Claude Opus 4.6 commited on

Add RL training environment with OpenEnv backend
bc52096

sissississi Claude Opus 4.6 commited on