Spaces:
Running
Running
File size: 2,602 Bytes
d094faf | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 | # `agents.mlevolve`
Runs [MLEvolve](https://github.com/InternScience/MLEvolve) on a GraphTestbed
task. MLEvolve is an MCGS auto-ML harness wired for OpenAI-compatible APIs.
Default model: **`gpt-5.3-codex-spark`** (a pipe-through alias you define in
your CLIProxyAPI `oauth-model-alias.codex` block).
## Install
```bash
bash agents/mlevolve/install.sh
# heavy: clones the repo + pip-installs torch and ML deps (~5-10 GB).
```
Lands at `agents/mlevolve/_vendor/MLEvolve/`. Set `MLEVOLVE_DIR` if you
already have a clone elsewhere.
## Run
```bash
gtb fetch figraph
python -m agents.mlevolve.runner --task figraph
```
Output:
```
runs/mlevolve/figraph/<timestamp>/
βββ mlebench-tree/figraph/
β βββ prepared/public/{train.csv,test.csv,description.md,sample_submission.csv}
β βββ prepared/private/test.csv # val labels β local grader uses this
β βββ REAL_TEST_FEATURES.csv # the actual test split, for re-execute
βββ agent.log
βββ val_submission.csv # MLEvolve's best on the val "test" split
```
## β v1 limitation: val-as-test
GraphTestbed's actual test labels live on the scoring server, not on disk.
For the local mle-bench grader to function, the adapter exposes
`val_features.csv` (with labels) as the "test" set MLEvolve searches against.
The CSV the runner harvests is therefore predictions on **val**, not test.
To submit a real test-set score:
1. Open `agents/mlevolve/_vendor/MLEvolve/runs/<latest-ts>/` and find the
best runfile.py (search order: best score in the run's tree summary).
2. Re-execute it against the real test split:
```bash
cd <some scratch dir>
cp <ws>/mlebench-tree/figraph/REAL_TEST_FEATURES.csv ./test.csv
cp <ws>/mlebench-tree/figraph/prepared/public/train.csv ./train.csv
python <runfile> # produces submission.csv
```
3. Submit:
```bash
gtb submit figraph --file ./submission.csv --agent mlevolve-codex-spark
```
This step is manual in v1 because the structure of MLEvolve's `runfile.py`
varies per task and we don't want to silently mis-execute. It is on the
roadmap to automate.
## Knobs
| flag | default | meaning |
| --- | --- | --- |
| `--model` | `gpt-5.3-codex-spark` | sent to proxy via OPENAI_BASE_URL/v1 |
| `--steps` | 100 | MCGS exploration count (upstream default: 500) |
| `--time-limit-min` | 120 | per-task wall-clock cap (upstream default: 720) |
| `--gpus` | 0 | passed to `search.num_gpus` |
The `--model` string must exist in your CLIProxyAPI
`oauth-model-alias.codex` (or be a real model your Codex account exposes).
|