Spaces:

openenv-community
/

test-local-nested-envs

Sleeping

Commit History

Claude commited on Mar 8

Claude commited on Mar 8

Claude commited on Mar 8

Claude commited on Mar 8

Claude commited on Mar 8

Claude commited on Mar 8

Claude commited on Mar 8

Claude commited on Mar 8

Claude commited on Mar 8

Claude commited on Mar 8

Claude commited on Mar 8

Claude commited on Mar 8

Claude commited on Mar 8

Claude commited on Mar 8

Claude commited on Mar 8

Claude commited on Mar 8

Claude commited on Mar 8

Claude commited on Mar 8

Claude commited on Mar 8

Claude commited on Mar 8

Claude commited on Mar 8

Claude commited on Mar 8

Claude commited on Mar 8

Claude commited on Mar 8

Claude commited on Mar 8

Claude commited on Mar 8

Claude commited on Mar 8

Claude commited on Mar 8

Commit History

Pre-format SFT dataset as text column, drop formatting_func 384df8f unverified

Fix pickling error in SFT formatting_func closure 20a8ae9 unverified

Fix SFT formatting_func to return list of strings 591804f unverified

Fix SFT: set completion_only_loss=False for formatting_func compat 44f7f8c unverified

Fix SFT warm start: add formatting_func for Unsloth SFTTrainer b8e7dcd unverified

Cap prompt generation at 512 tokens and add version print ee71a24 unverified

Add SFT warm start before GRPO and DB connectivity init check c2dc160 unverified

Add local model inference backend for Layer 2 10418d0 unverified

Increase max completion length from 512 to 2048 552e492 unverified

Make Supabase uploads incremental — upload after every step 76f180f unverified

Add Supabase upload for training results (Storage + DB) 28bcb40 unverified

Add raw training summary output and adjust training scale 71b0977 unverified

Improve reward function to break refuse-everything local minimum and scale training bd8220a unverified

Add volume verification, fsync, and stdout fallback for training outputs f703ff1 unverified

Clean up dead code, unused imports, and move hardcoded values to config.yaml 3dc48b7 unverified

Add --llm-agent and other legacy CLI flags for backwards compatibility 03d9529 unverified

Centralize all training params in config.yaml (single source of truth) 4e2b74e unverified

Remove mock mode: only real GRPO RL training remains 288d9a2 unverified

Add clear training progress logging with technical + domain names 4b89b89 unverified

Update docstrings to reflect LLM-only training pipeline 01518e0 unverified

Align GRPOConfig defaults with CLI: 10 steps, 7 episodes ca36c02 unverified

Remove all rule-based fallback systems, require LLM inference 21da591 unverified

Fix hardcoded values in report: dynamic customer count and eval episode labels 7ed3d6b unverified

Reduce training defaults for fast iteration: steps=10, episodes=7 b1d7ca2 unverified

Add training report & logging system with reward charts and conversation comparisons 506d641 unverified

Wire up real LLM integration via HF Inference API 4ac72af unverified

Fix critical gaps: prompt-sensitive agent, adversarial customers, executable GRPO, OpenEnv wrapper b259333 unverified

Implement self-improving AI oversight system with nested RL environments e6b0e2f unverified

Pre-format SFT dataset as text column, drop formatting_func

384df8f
unverified

Fix pickling error in SFT formatting_func closure

20a8ae9
unverified

Fix SFT formatting_func to return list of strings

591804f
unverified

Fix SFT: set completion_only_loss=False for formatting_func compat

44f7f8c
unverified

Fix SFT warm start: add formatting_func for Unsloth SFTTrainer

b8e7dcd
unverified

Cap prompt generation at 512 tokens and add version print

ee71a24
unverified

Add SFT warm start before GRPO and DB connectivity init check

c2dc160
unverified

Add local model inference backend for Layer 2

10418d0
unverified

Increase max completion length from 512 to 2048

552e492
unverified

Make Supabase uploads incremental — upload after every step

76f180f
unverified

Add Supabase upload for training results (Storage + DB)

28bcb40
unverified

Add raw training summary output and adjust training scale

71b0977
unverified

Improve reward function to break refuse-everything local minimum and scale training

bd8220a
unverified

Add volume verification, fsync, and stdout fallback for training outputs

f703ff1
unverified

Clean up dead code, unused imports, and move hardcoded values to config.yaml

3dc48b7
unverified

Add --llm-agent and other legacy CLI flags for backwards compatibility

03d9529
unverified

Centralize all training params in config.yaml (single source of truth)

4e2b74e
unverified

Remove mock mode: only real GRPO RL training remains

288d9a2
unverified

Add clear training progress logging with technical + domain names

4b89b89
unverified

Update docstrings to reflect LLM-only training pipeline

01518e0
unverified

Align GRPOConfig defaults with CLI: 10 steps, 7 episodes

ca36c02
unverified

Remove all rule-based fallback systems, require LLM inference

21da591
unverified

Fix hardcoded values in report: dynamic customer count and eval episode labels

7ed3d6b
unverified

Reduce training defaults for fast iteration: steps=10, episodes=7

b1d7ca2
unverified

Add training report & logging system with reward charts and conversation comparisons

506d641
unverified

Wire up real LLM integration via HF Inference API

4ac72af
unverified

Fix critical gaps: prompt-sensitive agent, adversarial customers, executable GRPO, OpenEnv wrapper

b259333
unverified

Implement self-improving AI oversight system with nested RL environments

e6b0e2f
unverified