Spaces:
Sleeping
Phase M Stage A: --max-steps 100, fix Track-IO wiring
Browse filesStage A is a 100-step test run to validate the GRPO pipeline + Track-IO
logging before the production 200-step Stage B run. The previous 200-step
attempt was cancelled because (a) Track-IO was completely silent — the
trainer never streamed metrics anywhere, so we'd be training blind; and
(b) 200 × ~80s/step would have exceeded the 4h job timeout.
Root causes for silent Track-IO:
1. grpo_train.py never set report_to in GRPOConfig, so TRL/transformers
defaulted to "none" and the built-in TrackioCallback was never registered.
2. The wrapper set env var TRACKIO_SPACE_NAME, but trackio reads
TRACKIO_SPACE_ID. The package code in trackio/utils.py only looks at
TRACKIO_SPACE_ID — TRACKIO_SPACE_NAME was a fabricated name.
3. The Track-IO Space yashash045/dpg-trackio existed but was an empty
gradio shell (NO_APP_FILE, only README + .gitattributes). Even if 1+2
were fixed, the Space had no app to render the dashboard.
Fixes:
- training/grpo_train.py: add report_to=["trackio"] to GRPOConfig
(1-line addition; transformers' built-in trackio integration handles
the rest via TrackioCallback).
- scripts/phase_m_grpo_job.py:
- Rename TRACKIO_SPACE_NAME -> TRACKIO_SPACE_ID (the actual env var
trackio.init() reads)
- Add TRACKIO_PROJECT="devops-pipeline-gym-grpo" so trackio.init has
a project name without falling back to default.
- Reduce --max-steps 200 -> 100 for Stage A test run.
- Update [5/7] print to say "100 steps (Stage A test run)" for
log clarity.
- Track-IO Space yashash045/dpg-trackio: deleted (empty NO_APP_FILE
placeholder). trackio.init() will recreate it via
trackio.deploy.create_space_if_not_exists on the trainer's first
metric push, with the proper SDK + app + storage.
Stage B (the production 200-step run) reverts --max-steps to 200 and
extends the job --timeout from 4h to 6h. That's a separate commit later
tonight after Stage A confirms metrics are flowing.
Cost target Stage A: ~\$0.25 of \$29.76 remaining.
- scripts/phase_m_grpo_job.py +4 -3
- training/grpo_train.py +1 -0
|
@@ -146,14 +146,14 @@ if not health_ok:
|
|
| 146 |
|
| 147 |
try:
|
| 148 |
# Step 5: Run GRPO training
|
| 149 |
-
print("[5/7] Running GRPO
|
| 150 |
subprocess.run(
|
| 151 |
[
|
| 152 |
sys.executable, "training/grpo_train.py",
|
| 153 |
"--model", "unsloth/Qwen3-1.7B-bnb-4bit",
|
| 154 |
"--sft-adapter-path", f"{sft_path}/final",
|
| 155 |
"--env-url", "http://localhost:8000",
|
| 156 |
-
"--max-steps", "
|
| 157 |
"--batch-size", "4",
|
| 158 |
"--num-generations", "8",
|
| 159 |
"--learning-rate", "5e-6",
|
|
@@ -162,7 +162,8 @@ try:
|
|
| 162 |
check=True,
|
| 163 |
env={
|
| 164 |
**os.environ,
|
| 165 |
-
"
|
|
|
|
| 166 |
},
|
| 167 |
)
|
| 168 |
finally:
|
|
|
|
| 146 |
|
| 147 |
try:
|
| 148 |
# Step 5: Run GRPO training
|
| 149 |
+
print("[5/7] Running GRPO 100 steps (Stage A test run; Qwen3-1.7B + SFT adapter)...", flush=True)
|
| 150 |
subprocess.run(
|
| 151 |
[
|
| 152 |
sys.executable, "training/grpo_train.py",
|
| 153 |
"--model", "unsloth/Qwen3-1.7B-bnb-4bit",
|
| 154 |
"--sft-adapter-path", f"{sft_path}/final",
|
| 155 |
"--env-url", "http://localhost:8000",
|
| 156 |
+
"--max-steps", "100",
|
| 157 |
"--batch-size", "4",
|
| 158 |
"--num-generations", "8",
|
| 159 |
"--learning-rate", "5e-6",
|
|
|
|
| 162 |
check=True,
|
| 163 |
env={
|
| 164 |
**os.environ,
|
| 165 |
+
"TRACKIO_SPACE_ID": "yashash045/dpg-trackio",
|
| 166 |
+
"TRACKIO_PROJECT": "devops-pipeline-gym-grpo",
|
| 167 |
},
|
| 168 |
)
|
| 169 |
finally:
|
|
@@ -625,6 +625,7 @@ def main():
|
|
| 625 |
max_steps=args.max_steps,
|
| 626 |
beta=0.01,
|
| 627 |
push_to_hub=False,
|
|
|
|
| 628 |
)
|
| 629 |
# Newer TRL versions support loss_type + mask_truncated_completions.
|
| 630 |
# Pass them best-effort; GRPOConfig raises TypeError on unknown kwargs.
|
|
|
|
| 625 |
max_steps=args.max_steps,
|
| 626 |
beta=0.01,
|
| 627 |
push_to_hub=False,
|
| 628 |
+
report_to=["trackio"],
|
| 629 |
)
|
| 630 |
# Newer TRL versions support loss_type + mask_truncated_completions.
|
| 631 |
# Pass them best-effort; GRPOConfig raises TypeError on unknown kwargs.
|