Spaces:
Sleeping
Sleeping
| # `agents.mlevolve` | |
| Runs [MLEvolve](https://github.com/InternScience/MLEvolve) on a GraphTestbed | |
| task. MLEvolve is an MCGS auto-ML harness wired for OpenAI-compatible APIs. | |
| Default model: **`gpt-5.3-codex-spark`** (a pipe-through alias you define in | |
| your CLIProxyAPI `oauth-model-alias.codex` block). | |
| ## Install | |
| ```bash | |
| bash agents/mlevolve/install.sh | |
| # heavy: clones the repo + pip-installs torch and ML deps (~5-10 GB). | |
| ``` | |
| Lands at `agents/mlevolve/_vendor/MLEvolve/`. Set `MLEVOLVE_DIR` if you | |
| already have a clone elsewhere. | |
| ## Run | |
| ```bash | |
| gtb fetch figraph | |
| python -m agents.mlevolve.runner --task figraph | |
| ``` | |
| Output: | |
| ``` | |
| runs/mlevolve/figraph/<timestamp>/ | |
| βββ mlebench-tree/figraph/ | |
| β βββ prepared/public/{train.csv,test.csv,description.md,sample_submission.csv} | |
| β βββ prepared/private/test.csv # val labels β local grader uses this | |
| β βββ REAL_TEST_FEATURES.csv # the actual test split, for re-execute | |
| βββ agent.log | |
| βββ val_submission.csv # MLEvolve's best on the val "test" split | |
| ``` | |
| ## β v1 limitation: val-as-test | |
| GraphTestbed's actual test labels live on the scoring server, not on disk. | |
| For the local mle-bench grader to function, the adapter exposes | |
| `val_features.csv` (with labels) as the "test" set MLEvolve searches against. | |
| The CSV the runner harvests is therefore predictions on **val**, not test. | |
| To submit a real test-set score: | |
| 1. Open `agents/mlevolve/_vendor/MLEvolve/runs/<latest-ts>/` and find the | |
| best runfile.py (search order: best score in the run's tree summary). | |
| 2. Re-execute it against the real test split: | |
| ```bash | |
| cd <some scratch dir> | |
| cp <ws>/mlebench-tree/figraph/REAL_TEST_FEATURES.csv ./test.csv | |
| cp <ws>/mlebench-tree/figraph/prepared/public/train.csv ./train.csv | |
| python <runfile> # produces submission.csv | |
| ``` | |
| 3. Submit: | |
| ```bash | |
| gtb submit figraph --file ./submission.csv --agent mlevolve-codex-spark | |
| ``` | |
| This step is manual in v1 because the structure of MLEvolve's `runfile.py` | |
| varies per task and we don't want to silently mis-execute. It is on the | |
| roadmap to automate. | |
| ## Knobs | |
| | flag | default | meaning | | |
| | --- | --- | --- | | |
| | `--model` | `gpt-5.3-codex-spark` | sent to proxy via OPENAI_BASE_URL/v1 | | |
| | `--steps` | 100 | MCGS exploration count (upstream default: 500) | | |
| | `--time-limit-min` | 120 | per-task wall-clock cap (upstream default: 720) | | |
| | `--gpus` | 0 | passed to `search.num_gpus` | | |
| The `--model` string must exist in your CLIProxyAPI | |
| `oauth-model-alias.codex` (or be a real model your Codex account exposes). | |