Spaces:

lanczos
/

graphtestbed

Sleeping

App Files Files Community

graphtestbed / agents /mlevolve /README.md

Zhu Jiajun (jz28583)

Add agents/ harness integrations and HF Space scoring deployment

d094faf 23 days ago

preview code

raw

history blame contribute delete

2.6 kB

	# `agents.mlevolve`

	Runs [MLEvolve](https://github.com/InternScience/MLEvolve) on a GraphTestbed
	task. MLEvolve is an MCGS auto-ML harness wired for OpenAI-compatible APIs.

	Default model: `gpt-5.3-codex-spark` (a pipe-through alias you define in
	your CLIProxyAPI `oauth-model-alias.codex` block).

	## Install

	```bash
	bash agents/mlevolve/install.sh
	# heavy: clones the repo + pip-installs torch and ML deps (~5-10 GB).
	```

	Lands at `agents/mlevolve/_vendor/MLEvolve/`. Set `MLEVOLVE_DIR` if you
	already have a clone elsewhere.

	## Run

	```bash
	gtb fetch figraph
	python -m agents.mlevolve.runner --task figraph
	```

	Output:

	```
	runs/mlevolve/figraph/<timestamp>/
	├── mlebench-tree/figraph/
	│ ├── prepared/public/{train.csv,test.csv,description.md,sample_submission.csv}
	│ ├── prepared/private/test.csv # val labels — local grader uses this
	│ └── REAL_TEST_FEATURES.csv # the actual test split, for re-execute
	├── agent.log
	└── val_submission.csv # MLEvolve's best on the val "test" split
	```

	## ⚠ v1 limitation: val-as-test

	GraphTestbed's actual test labels live on the scoring server, not on disk.
	For the local mle-bench grader to function, the adapter exposes
	`val_features.csv` (with labels) as the "test" set MLEvolve searches against.

	The CSV the runner harvests is therefore predictions on val, not test.
	To submit a real test-set score:

	1. Open `agents/mlevolve/_vendor/MLEvolve/runs/<latest-ts>/` and find the
	best runfile.py (search order: best score in the run's tree summary).
	2. Re-execute it against the real test split:
	```bash
	cd <some scratch dir>
	cp <ws>/mlebench-tree/figraph/REAL_TEST_FEATURES.csv ./test.csv
	cp <ws>/mlebench-tree/figraph/prepared/public/train.csv ./train.csv
	python <runfile> # produces submission.csv
	```
	3. Submit:
	```bash
	gtb submit figraph --file ./submission.csv --agent mlevolve-codex-spark
	```

	This step is manual in v1 because the structure of MLEvolve's `runfile.py`
	varies per task and we don't want to silently mis-execute. It is on the
	roadmap to automate.

	## Knobs

	\| flag \| default \| meaning \|
	\| --- \| --- \| --- \|
	\| `--model` \| `gpt-5.3-codex-spark` \| sent to proxy via OPENAI_BASE_URL/v1 \|
	\| `--steps` \| 100 \| MCGS exploration count (upstream default: 500) \|
	\| `--time-limit-min` \| 120 \| per-task wall-clock cap (upstream default: 720) \|
	\| `--gpus` \| 0 \| passed to `search.num_gpus` \|

	The `--model` string must exist in your CLIProxyAPI
	`oauth-model-alias.codex` (or be a real model your Codex account exposes).