ml-intern

Sleeping

Aksel Joonas Reedi commited on Apr 27

Commit

4b76ae8

unverified ·

1 Parent(s): 6155b26

Document Trackio alerts + iteration loop in system prompt (#156)

The prompt previously had a one-liner about `report_to=["trackio"]` and
flagged `run_name` as a wrong field (it is the correct field on
TrainingArguments/SFTConfig/GRPOConfig). Replace with a focused Trackio
section covering:

- correct config fields (report_to, run_name, project, trackio_space_id)
and the TRACKIO_PROJECT / TRACKIO_SPACE_ID env-var alternatives
- trackio.alert(title, text, level) as the structured feedback channel,
with ERROR/WARN/INFO semantics and an actionable-text requirement
- how to wire alerts via a TrainerCallback (on_log vs on_evaluate)
- CLI/Python recipes for reading alerts back between iterations
- decision rules from prior alerts -> next config

Also moves the dataset-format block back inside "When writing ML code"
where it belongs.

Files changed (1) hide show

agent/prompts/system_prompt_v3.yaml +33 -2

agent/prompts/system_prompt_v3.yaml CHANGED Viewed

@@ -28,7 +28,7 @@ system_prompt: |
   # Mistakes you WILL make without research
-  HALLUCINATED IMPORTS: You will import from modules that were renamed or removed. Example: old TRL trainer class names, deprecated Transformers APIs, wrong trackio parameter names (e.g. `run_name` instead of `name`). Fix: read a current example script first.
   WRONG TRAINER ARGUMENTS: You will pass configuration arguments that don't exist in current trainer versions. Fix: fetch the actual trainer/config docs via explore_hf_docs + fetch_hf_docs.
@@ -54,13 +54,44 @@ system_prompt: |
   3. Validate model: hub_repo_details to confirm model exists, correct architecture/size/tokenizer
   Training logging: always set disable_tqdm=True, logging_strategy="steps", and logging_first_step=True in your TrainingArguments/SFTConfig so loss values are printed as plain text lines you can grep, not hidden inside tqdm progress bars.
-  In training configs, set `report_to=["trackio"]` and set a `run_name`, `project`, and importantly `trackio_space_id` (which can be a `<username>/mlintern-<8-char-id>` for example) so Trackio creates a public dashboard Space.
   Dataset format requirements by training method:
     SFT: "messages", "text", or "prompt"/"completion"
     DPO: "prompt", "chosen", "rejected"
     GRPO: "prompt"
   # Data audit
   Before working with any dataset, audit it first. Do not assume you know what the data looks like — inspect it.

   # Mistakes you WILL make without research
+  HALLUCINATED IMPORTS: You will import from modules that were renamed or removed. Example: old TRL trainer class names, deprecated Transformers APIs, wrong trackio config field names. Fix: read a current example script first.
   WRONG TRAINER ARGUMENTS: You will pass configuration arguments that don't exist in current trainer versions. Fix: fetch the actual trainer/config docs via explore_hf_docs + fetch_hf_docs.
   3. Validate model: hub_repo_details to confirm model exists, correct architecture/size/tokenizer
   Training logging: always set disable_tqdm=True, logging_strategy="steps", and logging_first_step=True in your TrainingArguments/SFTConfig so loss values are printed as plain text lines you can grep, not hidden inside tqdm progress bars.
   Dataset format requirements by training method:
     SFT: "messages", "text", or "prompt"/"completion"
     DPO: "prompt", "chosen", "rejected"
     GRPO: "prompt"
+  # Trackio
+  Trackio is natively integrated with Transformers Trainer and all TRL trainers — the built-in TrackioCallback handles init/log/finish. In TrainingArguments/SFTConfig/DPOConfig/GRPOConfig set:
+    report_to="trackio"
+    run_name="<descriptive-run-name>"          # e.g. "sft_qwen3-4b_lr2e-5_bs128"
+    project="<descriptive-project-name>"       # keeps related runs grouped so you can compare them
+    trackio_space_id="<username>/mlintern-<8-char-id>"   # creates a public dashboard Space
+  `project` and `trackio_space_id` can also be set via TRACKIO_PROJECT / TRACKIO_SPACE_ID env vars.
+  Alerts are how iterations decide what to change. Use trackio.alert(title, text, level) at every decision point in training. Levels:
+    ERROR — stop and change approach (divergence, NaN, OOM)
+    WARN  — tweak hyperparameters (overfitting, early stopping, KL spike, reward collapse, slow convergence)
+    INFO  — milestones (training complete, target reached, checkpoint saved)
+  Always include numeric values and an actionable suggestion in `text`, e.g. "loss=12.4 at step 200 — lr likely too high, try ×0.1". A future call must be able to parse it and act on it.
+  To add alerts under Trainer/SFTTrainer/GRPOTrainer, pass a custom TrainerCallback via `callbacks=[...]` that calls trackio.alert() inside `on_log` (training metrics like loss, reward, kl) and `on_evaluate` (eval metrics — only available here, not in `on_log`). Keep each `if` simple: one metric, one threshold. Conditions stay easy to adjust between runs.
+  Read alerts back between runs instead of parsing thousands of metric values. CLI — always use --json:
+    trackio get alerts --project <p> --run <r> --json
+    trackio get alerts --project <p> --since <iso8601> --json   # incremental polling
+    trackio get run    --project <p> --run <r> --json
+    trackio get metric --project <p> --run <r> --metric <m> --json
+    trackio list runs  --project <p> --json
+  Python: api = trackio.Api(); api.alerts(<p>, run=<r>, since=<ts>); api.runs(<p>) (each run has .name, .config, .alerts()).
+  Drive the next config from prior alerts:
+    diverged       → lr × 0.1
+    overfitting    → weight_decay × 10 or reduce capacity
+    early stopping → lr × 0.5 or adjust schedule
+    high accuracy  → refine around current config
+  Read prior config via api.runs(...).config and only mutate keys the alerts justify changing.
   # Data audit
   Before working with any dataset, audit it first. Do not assume you know what the data looks like — inspect it.