Spaces:

openenv-community
/

test-local-nested-envs

Running on T4

Claude commited on 3 days ago

Commit

ca36c02

unverified ·

1 Parent(s): 21da591

Align GRPOConfig defaults with CLI: 10 steps, 7 episodes

The dataclass defaults were still 50 steps / 10 episodes, causing
deployments that don't pass explicit CLI args to run much longer
than intended.

https://claude.ai/code/session_01DPirJ78YYN4fJUvUFJ5D6V

Files changed (2) hide show

layer1/grpo_trainer.py +2 -2
layer1/train.py +1 -1

layer1/grpo_trainer.py CHANGED Viewed

@@ -37,8 +37,8 @@ class GRPOConfig:
     # GRPO
     num_candidates: int = 4         # N candidate prompts per step
-    episodes_per_candidate: int = 10  # K episodes to evaluate each candidate
-    num_training_steps: int = 50
     learning_rate: float = 5e-5
     max_prompt_length: int = 512

     # GRPO
     num_candidates: int = 4         # N candidate prompts per step
+    episodes_per_candidate: int = 7   # K episodes to evaluate each candidate
+    num_training_steps: int = 10
     learning_rate: float = 5e-5
     max_prompt_length: int = 512

layer1/train.py CHANGED Viewed

@@ -3,7 +3,7 @@ Layer 1 — Executable GRPO training script.
 Usage:
     # Full GPU training (requires Colab/GPU + train deps)
-    python -m layer1.train --mode train --steps 50
     # CPU mock optimization (evaluates hand-written prompts)
     python -m layer1.train --mode mock --episodes 20

 Usage:
     # Full GPU training (requires Colab/GPU + train deps)
+    python -m layer1.train --mode train --steps 10
     # CPU mock optimization (evaluates hand-written prompts)
     python -m layer1.train --mode mock --episodes 20