# Frontier-CS Benchmark Evolves C++ solutions for [Frontier-CS](https://github.com/facebookresearch/Frontier-CS) algorithmic optimization problems using SkyDiscover. ## Setup ```bash # 1. Clone Frontier-CS cd benchmarks/frontier-cs-eval git clone https://github.com/FrontierCS/Frontier-CS.git # 2. Start the judge server (requires Docker) cd Frontier-CS/algorithmic docker compose up -d # 3. Install dependencies (from project root) cd ../../.. uv sync --extra frontier-cs # 4. Set your API key export OPENAI_API_KEY=... ``` ## Run Supported algorithms: `adaevolve`, `evox`, `openevolve`, `gepa`, `shinkaevolve` Single problem: ```bash cd benchmarks/frontier-cs-eval FRONTIER_CS_PROBLEM=0 uv run skydiscover-run initial_program.cpp evaluator.py \ -c config.yaml -s [search_algorithm] -i 50 ``` All problems in parallel: ```bash uv run python run_all_frontiercs.py --search [search_algorithm] --iterations 50 --workers 6 ``` ## Evaluate best programs (post-discovery) ```bash uv run python run_best_programs_frontiercs.py ``` ## Analyze results ```bash uv run python combine_results.py # merge training/testing scores into CSV uv run python analyze_results.py # generate plots and statistics ``` ## Files | File | Description | |------|-------------| | `initial_program.cpp` | Seed C++ program | | `evaluator.py` | Evaluates C++ solutions via Frontier-CS docker judge | | `config.yaml` | Config with system prompt template | | `run_all_frontiercs.py` | Parallelizes evolution across all problems | | `run_best_programs_frontiercs.py` | Re-evaluates best programs after evolution | | `combine_results.py` | Combines training/testing scores into CSV | | `analyze_results.py` | Generates score analysis plots and statistics | ## Environment variables | Variable | Default | Description | |----------|---------|-------------| | `OPENAI_API_KEY` | (required) | API key | | `FRONTIER_CS_PROBLEM` | `0` | Problem ID to evolve | | `JUDGE_URLS` | `http://localhost:8081` | Comma-separated judge server URLs |