# `agents.mlevolve` Runs [MLEvolve](https://github.com/InternScience/MLEvolve) on a GraphTestbed task. MLEvolve is an MCGS auto-ML harness wired for OpenAI-compatible APIs. Default model: **`gpt-5.3-codex-spark`** (a pipe-through alias you define in your CLIProxyAPI `oauth-model-alias.codex` block). ## Install ```bash bash agents/mlevolve/install.sh # heavy: clones the repo + pip-installs torch and ML deps (~5-10 GB). ``` Lands at `agents/mlevolve/_vendor/MLEvolve/`. Set `MLEVOLVE_DIR` if you already have a clone elsewhere. ## Run ```bash gtb fetch figraph python -m agents.mlevolve.runner --task figraph ``` Output: ``` runs/mlevolve/figraph// ├── mlebench-tree/figraph/ │ ├── prepared/public/{train.csv,test.csv,description.md,sample_submission.csv} │ ├── prepared/private/test.csv # val labels — local grader uses this │ └── REAL_TEST_FEATURES.csv # the actual test split, for re-execute ├── agent.log └── val_submission.csv # MLEvolve's best on the val "test" split ``` ## ⚠ v1 limitation: val-as-test GraphTestbed's actual test labels live on the scoring server, not on disk. For the local mle-bench grader to function, the adapter exposes `val_features.csv` (with labels) as the "test" set MLEvolve searches against. The CSV the runner harvests is therefore predictions on **val**, not test. To submit a real test-set score: 1. Open `agents/mlevolve/_vendor/MLEvolve/runs//` and find the best runfile.py (search order: best score in the run's tree summary). 2. Re-execute it against the real test split: ```bash cd cp /mlebench-tree/figraph/REAL_TEST_FEATURES.csv ./test.csv cp /mlebench-tree/figraph/prepared/public/train.csv ./train.csv python # produces submission.csv ``` 3. Submit: ```bash gtb submit figraph --file ./submission.csv --agent mlevolve-codex-spark ``` This step is manual in v1 because the structure of MLEvolve's `runfile.py` varies per task and we don't want to silently mis-execute. It is on the roadmap to automate. ## Knobs | flag | default | meaning | | --- | --- | --- | | `--model` | `gpt-5.3-codex-spark` | sent to proxy via OPENAI_BASE_URL/v1 | | `--steps` | 100 | MCGS exploration count (upstream default: 500) | | `--time-limit-min` | 120 | per-task wall-clock cap (upstream default: 720) | | `--gpus` | 0 | passed to `search.num_gpus` | The `--model` string must exist in your CLIProxyAPI `oauth-model-alias.codex` (or be a real model your Codex account exposes).