JustinTX's picture
Add files using upload-large-folder tool
b0e88cf verified

Frontier-CS Benchmark

Evolves C++ solutions for Frontier-CS algorithmic optimization problems using SkyDiscover.

Setup

# 1. Clone Frontier-CS
cd benchmarks/frontier-cs-eval
git clone https://github.com/FrontierCS/Frontier-CS.git

# 2. Start the judge server (requires Docker)
cd Frontier-CS/algorithmic
docker compose up -d

# 3. Install dependencies (from project root)
cd ../../..
uv sync --extra frontier-cs

# 4. Set your API key
export OPENAI_API_KEY=...

Run

Supported algorithms: adaevolve, evox, openevolve, gepa, shinkaevolve

Single problem:

cd benchmarks/frontier-cs-eval
FRONTIER_CS_PROBLEM=0 uv run skydiscover-run initial_program.cpp evaluator.py \
  -c config.yaml -s [search_algorithm] -i 50

All problems in parallel:

uv run python run_all_frontiercs.py --search [search_algorithm] --iterations 50 --workers 6

Evaluate best programs (post-discovery)

uv run python run_best_programs_frontiercs.py

Analyze results

uv run python combine_results.py   # merge training/testing scores into CSV
uv run python analyze_results.py   # generate plots and statistics

Files

File Description
initial_program.cpp Seed C++ program
evaluator.py Evaluates C++ solutions via Frontier-CS docker judge
config.yaml Config with system prompt template
run_all_frontiercs.py Parallelizes evolution across all problems
run_best_programs_frontiercs.py Re-evaluates best programs after evolution
combine_results.py Combines training/testing scores into CSV
analyze_results.py Generates score analysis plots and statistics

Environment variables

Variable Default Description
OPENAI_API_KEY (required) API key
FRONTIER_CS_PROBLEM 0 Problem ID to evolve
JUDGE_URLS http://localhost:8081 Comma-separated judge server URLs