razvan
/

ml-intern-codex-plugin

ml-intern

Model card Files Files and versions

xet

Community

razvan commited on 14 days ago

Commit

5b68ff9

verified ·

1 Parent(s): 822b4fb

Upload plugins/mlintern/commands/run.md with huggingface_hub

Browse files

Files changed (1) hide show

plugins/mlintern/commands/run.md +46 -0

plugins/mlintern/commands/run.md ADDED Viewed

	@@ -0,0 +1,46 @@

+# /mlintern:run
+Run an ML Intern task end-to-end.
+## Arguments
+- `prompt` (required): A one-sentence description of the ML deliverable. Examples: "fine-tune Qwen3-4B for code completion on python-code-dataset", "benchmark sentence-transformers/all-MiniLM-L6-v2 on STS-B", "train a diffusion LoRA on my art dataset".
+- `--model` (optional): LiteLLM model ID to use (e.g., `huggingface/openai/gpt-oss-120b`). Defaults to the environment default.
+- `--background` (optional): Queue the task and return immediately. Check status later.
+- `--status <job-id>` (optional): Check status of a background job.
+- `--result <job-id>` (optional): Fetch the final report of a completed background job.
+- `--cancel <job-id>` (optional): Cancel a running background job.
+## Workflow
+1. Clarify the deliverable from the prompt.
+2. Research the task before writing code:
+   - Search for landmark and recent papers if the task is novel.
+   - Read HF docs for current API patterns.
+   - Find a working implementation example.
+3. Validate inputs:
+   - Inspect dataset schema, splits, sample rows.
+   - Verify model repo exists, architecture matches, tokenizer available.
+4. Implement the smallest working version.
+5. Smoke test locally or in a small HF Job.
+6. Run the full training/evaluation job with HF Jobs.
+7. Evaluate results against the target.
+8. Save code, configs, and reports; publish ML artifacts to Hugging Face.
+## Output
+Return:
+- Deliverable status (complete / partial / failed).
+- GitHub branch, commit, PR, or report path for code.
+- Hugging Face model/dataset/Space URLs for published artifacts.
+- Job ID and log URL for HF Jobs runs.
+- Metrics and evaluation results when available.
+- Known failures, compromises, and next recommended steps.
+## Guardrails
+- Never silently substitute a dataset, model, or training method. Ask for approval if the original request is incompatible.
+- Always set realistic timeouts for HF Jobs (at least 2 hours for real training).
+- Always include `push_to_hub=True` and `hub_model_id` in training configs.
+- Run one job first before launching sweeps or ablations.
+- For OOM errors: reduce batch size and increase gradient accumulation, enable gradient checkpointing, or upgrade hardware. Do not change the requested method.