JustinTX's picture
Add files using upload-large-folder tool
b0e88cf verified
# Frontier-CS Benchmark
Evolves C++ solutions for [Frontier-CS](https://github.com/facebookresearch/Frontier-CS) algorithmic optimization problems using SkyDiscover.
## Setup
```bash
# 1. Clone Frontier-CS
cd benchmarks/frontier-cs-eval
git clone https://github.com/FrontierCS/Frontier-CS.git
# 2. Start the judge server (requires Docker)
cd Frontier-CS/algorithmic
docker compose up -d
# 3. Install dependencies (from project root)
cd ../../..
uv sync --extra frontier-cs
# 4. Set your API key
export OPENAI_API_KEY=...
```
## Run
Supported algorithms: `adaevolve`, `evox`, `openevolve`, `gepa`, `shinkaevolve`
Single problem:
```bash
cd benchmarks/frontier-cs-eval
FRONTIER_CS_PROBLEM=0 uv run skydiscover-run initial_program.cpp evaluator.py \
-c config.yaml -s [search_algorithm] -i 50
```
All problems in parallel:
```bash
uv run python run_all_frontiercs.py --search [search_algorithm] --iterations 50 --workers 6
```
## Evaluate best programs (post-discovery)
```bash
uv run python run_best_programs_frontiercs.py
```
## Analyze results
```bash
uv run python combine_results.py # merge training/testing scores into CSV
uv run python analyze_results.py # generate plots and statistics
```
## Files
| File | Description |
|------|-------------|
| `initial_program.cpp` | Seed C++ program |
| `evaluator.py` | Evaluates C++ solutions via Frontier-CS docker judge |
| `config.yaml` | Config with system prompt template |
| `run_all_frontiercs.py` | Parallelizes evolution across all problems |
| `run_best_programs_frontiercs.py` | Re-evaluates best programs after evolution |
| `combine_results.py` | Combines training/testing scores into CSV |
| `analyze_results.py` | Generates score analysis plots and statistics |
## Environment variables
| Variable | Default | Description |
|----------|---------|-------------|
| `OPENAI_API_KEY` | (required) | API key |
| `FRONTIER_CS_PROBLEM` | `0` | Problem ID to evolve |
| `JUDGE_URLS` | `http://localhost:8081` | Comma-separated judge server URLs |