yashash04 commited on
Commit
4940a15
·
1 Parent(s): 3cf8472

Phase M Stage A: --max-steps 100, fix Track-IO wiring

Browse files

Stage A is a 100-step test run to validate the GRPO pipeline + Track-IO
logging before the production 200-step Stage B run. The previous 200-step
attempt was cancelled because (a) Track-IO was completely silent — the
trainer never streamed metrics anywhere, so we'd be training blind; and
(b) 200 × ~80s/step would have exceeded the 4h job timeout.

Root causes for silent Track-IO:
1. grpo_train.py never set report_to in GRPOConfig, so TRL/transformers
defaulted to "none" and the built-in TrackioCallback was never registered.
2. The wrapper set env var TRACKIO_SPACE_NAME, but trackio reads
TRACKIO_SPACE_ID. The package code in trackio/utils.py only looks at
TRACKIO_SPACE_ID — TRACKIO_SPACE_NAME was a fabricated name.
3. The Track-IO Space yashash045/dpg-trackio existed but was an empty
gradio shell (NO_APP_FILE, only README + .gitattributes). Even if 1+2
were fixed, the Space had no app to render the dashboard.

Fixes:
- training/grpo_train.py: add report_to=["trackio"] to GRPOConfig
(1-line addition; transformers' built-in trackio integration handles
the rest via TrackioCallback).
- scripts/phase_m_grpo_job.py:
- Rename TRACKIO_SPACE_NAME -> TRACKIO_SPACE_ID (the actual env var
trackio.init() reads)
- Add TRACKIO_PROJECT="devops-pipeline-gym-grpo" so trackio.init has
a project name without falling back to default.
- Reduce --max-steps 200 -> 100 for Stage A test run.
- Update [5/7] print to say "100 steps (Stage A test run)" for
log clarity.
- Track-IO Space yashash045/dpg-trackio: deleted (empty NO_APP_FILE
placeholder). trackio.init() will recreate it via
trackio.deploy.create_space_if_not_exists on the trainer's first
metric push, with the proper SDK + app + storage.

Stage B (the production 200-step run) reverts --max-steps to 200 and
extends the job --timeout from 4h to 6h. That's a separate commit later
tonight after Stage A confirms metrics are flowing.

Cost target Stage A: ~\$0.25 of \$29.76 remaining.

scripts/phase_m_grpo_job.py CHANGED
@@ -146,14 +146,14 @@ if not health_ok:
146
 
147
  try:
148
  # Step 5: Run GRPO training
149
- print("[5/7] Running GRPO 200 steps (Qwen3-1.7B + SFT adapter)...", flush=True)
150
  subprocess.run(
151
  [
152
  sys.executable, "training/grpo_train.py",
153
  "--model", "unsloth/Qwen3-1.7B-bnb-4bit",
154
  "--sft-adapter-path", f"{sft_path}/final",
155
  "--env-url", "http://localhost:8000",
156
- "--max-steps", "200",
157
  "--batch-size", "4",
158
  "--num-generations", "8",
159
  "--learning-rate", "5e-6",
@@ -162,7 +162,8 @@ try:
162
  check=True,
163
  env={
164
  **os.environ,
165
- "TRACKIO_SPACE_NAME": "yashash045/dpg-trackio",
 
166
  },
167
  )
168
  finally:
 
146
 
147
  try:
148
  # Step 5: Run GRPO training
149
+ print("[5/7] Running GRPO 100 steps (Stage A test run; Qwen3-1.7B + SFT adapter)...", flush=True)
150
  subprocess.run(
151
  [
152
  sys.executable, "training/grpo_train.py",
153
  "--model", "unsloth/Qwen3-1.7B-bnb-4bit",
154
  "--sft-adapter-path", f"{sft_path}/final",
155
  "--env-url", "http://localhost:8000",
156
+ "--max-steps", "100",
157
  "--batch-size", "4",
158
  "--num-generations", "8",
159
  "--learning-rate", "5e-6",
 
162
  check=True,
163
  env={
164
  **os.environ,
165
+ "TRACKIO_SPACE_ID": "yashash045/dpg-trackio",
166
+ "TRACKIO_PROJECT": "devops-pipeline-gym-grpo",
167
  },
168
  )
169
  finally:
training/grpo_train.py CHANGED
@@ -625,6 +625,7 @@ def main():
625
  max_steps=args.max_steps,
626
  beta=0.01,
627
  push_to_hub=False,
 
628
  )
629
  # Newer TRL versions support loss_type + mask_truncated_completions.
630
  # Pass them best-effort; GRPOConfig raises TypeError on unknown kwargs.
 
625
  max_steps=args.max_steps,
626
  beta=0.01,
627
  push_to_hub=False,
628
+ report_to=["trackio"],
629
  )
630
  # Newer TRL versions support loss_type + mask_truncated_completions.
631
  # Pass them best-effort; GRPOConfig raises TypeError on unknown kwargs.