Frontier-CS Benchmark
Evolves C++ solutions for Frontier-CS algorithmic optimization problems using SkyDiscover.
Setup
# 1. Clone Frontier-CS
cd benchmarks/frontier-cs-eval
git clone https://github.com/FrontierCS/Frontier-CS.git
# 2. Start the judge server (requires Docker)
cd Frontier-CS/algorithmic
docker compose up -d
# 3. Install dependencies (from project root)
cd ../../..
uv sync --extra frontier-cs
# 4. Set your API key
export OPENAI_API_KEY=...
Run
Supported algorithms: adaevolve, evox, openevolve, gepa, shinkaevolve
Single problem:
cd benchmarks/frontier-cs-eval
FRONTIER_CS_PROBLEM=0 uv run skydiscover-run initial_program.cpp evaluator.py \
-c config.yaml -s [search_algorithm] -i 50
All problems in parallel:
uv run python run_all_frontiercs.py --search [search_algorithm] --iterations 50 --workers 6
Evaluate best programs (post-discovery)
uv run python run_best_programs_frontiercs.py
Analyze results
uv run python combine_results.py # merge training/testing scores into CSV
uv run python analyze_results.py # generate plots and statistics
Files
| File | Description |
|---|---|
initial_program.cpp |
Seed C++ program |
evaluator.py |
Evaluates C++ solutions via Frontier-CS docker judge |
config.yaml |
Config with system prompt template |
run_all_frontiercs.py |
Parallelizes evolution across all problems |
run_best_programs_frontiercs.py |
Re-evaluates best programs after evolution |
combine_results.py |
Combines training/testing scores into CSV |
analyze_results.py |
Generates score analysis plots and statistics |
Environment variables
| Variable | Default | Description |
|---|---|---|
OPENAI_API_KEY |
(required) | API key |
FRONTIER_CS_PROBLEM |
0 |
Problem ID to evolve |
JUDGE_URLS |
http://localhost:8081 |
Comma-separated judge server URLs |