Spaces:

openenv-community
/

optigami_

Sleeping

App Files Files Community

optigami_ / training

Commit History

Fix dashboard logging URL to use proxy path, force Docker rebuild

a2c523c

sissississi Claude Opus 4.6 commited on Mar 8

Fix MAX_SEQ_LENGTH: 1024 was too small for prompt+completion, bump to 2048

9b2abc6

sissississi Claude Opus 4.6 commited on Mar 8

Fix Qwen3 thinking mode: add /no_think, increase max_completion_length

9a9721a

sissississi Claude Opus 4.6 commited on Mar 8

Fix API response structure: done/reward are top-level, not in observation

444b086

sissississi Claude Opus 4.6 commited on Mar 8

Hardcode task definitions in notebook to avoid /tasks API dependency

e19247c

sissississi Claude Opus 4.6 commited on Mar 8

Redesign frontend as training dashboard + add live activity feed

d662461

sissississi Claude Opus 4.6 commited on Mar 8

Route rewards through OpenEnv API instead of local computation

c0cedb4

sissississi Claude Opus 4.6 commited on Mar 8

Fix GRPO: remove SFT, multi-task dataset, instruct model only

490094b

sissississi Claude Opus 4.6 commited on Mar 8

Fix training: add SFT warmup + switch to instruct model

4edf79e

sissississi Claude Opus 4.6 commited on Mar 8

Update training notebook: vLLM fast inference, Qwen3-4B, max_steps=300

4859185

sissississi Claude Opus 4.6 commited on Mar 8

Add GRPO training Colab notebook

c228f1f

sissississi Claude Opus 4.6 commited on Mar 8

Add RL training environment with OpenEnv backend

bc52096

sissississi Claude Opus 4.6 commited on Mar 8

Commit History

Fix dashboard logging URL to use proxy path, force Docker rebuild a2c523c

Fix MAX_SEQ_LENGTH: 1024 was too small for prompt+completion, bump to 2048 9b2abc6

Fix Qwen3 thinking mode: add /no_think, increase max_completion_length 9a9721a

Fix API response structure: done/reward are top-level, not in observation 444b086

Hardcode task definitions in notebook to avoid /tasks API dependency e19247c

Redesign frontend as training dashboard + add live activity feed d662461

Route rewards through OpenEnv API instead of local computation c0cedb4

Fix GRPO: remove SFT, multi-task dataset, instruct model only 490094b

Fix training: add SFT warmup + switch to instruct model 4edf79e

Update training notebook: vLLM fast inference, Qwen3-4B, max_steps=300 4859185

Add GRPO training Colab notebook c228f1f

Add RL training environment with OpenEnv backend bc52096

Fix dashboard logging URL to use proxy path, force Docker rebuild

a2c523c

Fix MAX_SEQ_LENGTH: 1024 was too small for prompt+completion, bump to 2048

9b2abc6

Fix Qwen3 thinking mode: add /no_think, increase max_completion_length

9a9721a

Fix API response structure: done/reward are top-level, not in observation

444b086

Hardcode task definitions in notebook to avoid /tasks API dependency

e19247c

Redesign frontend as training dashboard + add live activity feed

d662461

Route rewards through OpenEnv API instead of local computation

c0cedb4

Fix GRPO: remove SFT, multi-task dataset, instruct model only

490094b

Fix training: add SFT warmup + switch to instruct model

4edf79e

Update training notebook: vLLM fast inference, Qwen3-4B, max_steps=300

4859185

Add GRPO training Colab notebook

c228f1f

Add RL training environment with OpenEnv backend

bc52096