Spaces:
Running on CPU Upgrade
Running on CPU Upgrade
Use hyphenated ml-intern Trackio prefixes (#250)
Browse files* Use hyphenated ml-intern Trackio prefixes
Co-authored-by: OpenAI Codex <codex@openai.com>
* Drop legacy Trackio prefix handling
Co-authored-by: OpenAI Codex <codex@openai.com>
---------
Co-authored-by: OpenAI Codex <codex@openai.com>
agent/prompts/system_prompt_v3.yaml
CHANGED
|
@@ -66,7 +66,7 @@ system_prompt: |
|
|
| 66 |
report_to="trackio"
|
| 67 |
run_name="<descriptive-run-name>" # e.g. "sft_qwen3-4b_lr2e-5_bs128"
|
| 68 |
project="<descriptive-project-name>" # keeps related runs grouped so you can compare them
|
| 69 |
-
trackio_space_id="<username>/
|
| 70 |
`project` and `trackio_space_id` can also be set via TRACKIO_PROJECT / TRACKIO_SPACE_ID env vars.
|
| 71 |
|
| 72 |
Alerts are how iterations decide what to change. Use trackio.alert(title, text, level) at every decision point in training. Levels:
|
|
|
|
| 66 |
report_to="trackio"
|
| 67 |
run_name="<descriptive-run-name>" # e.g. "sft_qwen3-4b_lr2e-5_bs128"
|
| 68 |
project="<descriptive-project-name>" # keeps related runs grouped so you can compare them
|
| 69 |
+
trackio_space_id="<username>/ml-intern-<8-char-id>" # creates a public dashboard Space
|
| 70 |
`project` and `trackio_space_id` can also be set via TRACKIO_PROJECT / TRACKIO_SPACE_ID env vars.
|
| 71 |
|
| 72 |
Alerts are how iterations decide what to change. Use trackio.alert(title, text, level) at every decision point in training. Levels:
|
agent/tools/jobs_tool.py
CHANGED
|
@@ -1116,7 +1116,7 @@ HF_JOBS_TOOL_SPEC = {
|
|
| 1116 |
"Job storage is EPHEMERAL β all files are deleted when the job ends. Without push_to_hub, trained models are lost permanently.\n"
|
| 1117 |
"- Include trackio monitoring and provide the dashboard URL to the user. "
|
| 1118 |
"When the script uses report_to='trackio', also pass `trackio_space_id` "
|
| 1119 |
-
"(e.g. '<username>/
|
| 1120 |
"they are injected as TRACKIO_SPACE_ID/TRACKIO_PROJECT env vars and let the UI embed the live dashboard.\n\n"
|
| 1121 |
"BATCH/ABLATION JOBS: Submit ONE job first. Check logs to confirm it starts training successfully. "
|
| 1122 |
"Only then submit the remaining jobs. Never submit all at once β if there's a bug, all jobs fail.\n\n"
|
|
@@ -1204,7 +1204,7 @@ HF_JOBS_TOOL_SPEC = {
|
|
| 1204 |
"type": "string",
|
| 1205 |
"description": (
|
| 1206 |
"Optional. The HF Space hosting the trackio dashboard for this run "
|
| 1207 |
-
"(e.g. '<username>/
|
| 1208 |
"Injected as TRACKIO_SPACE_ID env var and used by the UI to embed "
|
| 1209 |
"the live dashboard. Set this whenever the script uses "
|
| 1210 |
"report_to='trackio'. The Space is auto-created and seeded with the "
|
|
|
|
| 1116 |
"Job storage is EPHEMERAL β all files are deleted when the job ends. Without push_to_hub, trained models are lost permanently.\n"
|
| 1117 |
"- Include trackio monitoring and provide the dashboard URL to the user. "
|
| 1118 |
"When the script uses report_to='trackio', also pass `trackio_space_id` "
|
| 1119 |
+
"(e.g. '<username>/ml-intern-<8char>') and `trackio_project` as tool args β "
|
| 1120 |
"they are injected as TRACKIO_SPACE_ID/TRACKIO_PROJECT env vars and let the UI embed the live dashboard.\n\n"
|
| 1121 |
"BATCH/ABLATION JOBS: Submit ONE job first. Check logs to confirm it starts training successfully. "
|
| 1122 |
"Only then submit the remaining jobs. Never submit all at once β if there's a bug, all jobs fail.\n\n"
|
|
|
|
| 1204 |
"type": "string",
|
| 1205 |
"description": (
|
| 1206 |
"Optional. The HF Space hosting the trackio dashboard for this run "
|
| 1207 |
+
"(e.g. '<username>/ml-intern-<8char>', under YOUR HF namespace). "
|
| 1208 |
"Injected as TRACKIO_SPACE_ID env var and used by the UI to embed "
|
| 1209 |
"the live dashboard. Set this whenever the script uses "
|
| 1210 |
"report_to='trackio'. The Space is auto-created and seeded with the "
|
agent/tools/sandbox_tool.py
CHANGED
|
@@ -542,7 +542,7 @@ SANDBOX_CREATE_TOOL_SPEC = {
|
|
| 542 |
"Common picks: t4-small (16GB VRAM, fits β€1-3B), a10g-small (24GB, β€7B), a100-large (80GB, β€30B). "
|
| 543 |
"If the model won't fit, pick larger hardware upfront β OOM on a sandbox wastes time.\n\n"
|
| 544 |
"If you intend to run a training script in this sandbox that uses report_to='trackio', "
|
| 545 |
-
"pass `trackio_space_id` (e.g. '<username>/
|
| 546 |
"are set as TRACKIO_SPACE_ID/TRACKIO_PROJECT secrets in the sandbox and the UI can embed the live dashboard.\n\n"
|
| 547 |
"Hardware: " + ", ".join([e.value for e in SpaceHardware]) + ".\n"
|
| 548 |
),
|
|
@@ -563,7 +563,7 @@ SANDBOX_CREATE_TOOL_SPEC = {
|
|
| 563 |
"type": "string",
|
| 564 |
"description": (
|
| 565 |
"Optional. The HF Space hosting the trackio dashboard for runs in this sandbox "
|
| 566 |
-
"(e.g. '<username>/
|
| 567 |
"TRACKIO_SPACE_ID secret and surfaced to the UI. The Space is auto-created and "
|
| 568 |
"seeded with the trackio dashboard β DO NOT pre-create it via hf_repo_git, "
|
| 569 |
"that produces an empty Space that breaks the embed."
|
|
|
|
| 542 |
"Common picks: t4-small (16GB VRAM, fits β€1-3B), a10g-small (24GB, β€7B), a100-large (80GB, β€30B). "
|
| 543 |
"If the model won't fit, pick larger hardware upfront β OOM on a sandbox wastes time.\n\n"
|
| 544 |
"If you intend to run a training script in this sandbox that uses report_to='trackio', "
|
| 545 |
+
"pass `trackio_space_id` (e.g. '<username>/ml-intern-<8char>') and `trackio_project` so they "
|
| 546 |
"are set as TRACKIO_SPACE_ID/TRACKIO_PROJECT secrets in the sandbox and the UI can embed the live dashboard.\n\n"
|
| 547 |
"Hardware: " + ", ".join([e.value for e in SpaceHardware]) + ".\n"
|
| 548 |
),
|
|
|
|
| 563 |
"type": "string",
|
| 564 |
"description": (
|
| 565 |
"Optional. The HF Space hosting the trackio dashboard for runs in this sandbox "
|
| 566 |
+
"(e.g. '<username>/ml-intern-<8char>', under YOUR HF namespace). Injected as "
|
| 567 |
"TRACKIO_SPACE_ID secret and surfaced to the UI. The Space is auto-created and "
|
| 568 |
"seeded with the trackio dashboard β DO NOT pre-create it via hf_repo_git, "
|
| 569 |
"that produces an empty Space that breaks the embed."
|
tests/unit/test_trackio_space_ids.py
ADDED
|
@@ -0,0 +1,16 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import json
|
| 2 |
+
from pathlib import Path
|
| 3 |
+
|
| 4 |
+
from agent.tools.jobs_tool import HF_JOBS_TOOL_SPEC
|
| 5 |
+
from agent.tools.sandbox_tool import SANDBOX_CREATE_TOOL_SPEC
|
| 6 |
+
|
| 7 |
+
|
| 8 |
+
def test_trackio_space_examples_use_hyphenated_ml_intern_prefix():
|
| 9 |
+
prompt = Path("agent/prompts/system_prompt_v3.yaml").read_text()
|
| 10 |
+
tool_specs = json.dumps([HF_JOBS_TOOL_SPEC, SANDBOX_CREATE_TOOL_SPEC])
|
| 11 |
+
legacy_prefix = "ml" + "intern"
|
| 12 |
+
|
| 13 |
+
assert "<username>/ml-intern-<8-char-id>" in prompt
|
| 14 |
+
assert "<username>/ml-intern-<8char>" in tool_specs
|
| 15 |
+
assert legacy_prefix not in prompt
|
| 16 |
+
assert legacy_prefix not in tool_specs
|