# Frontier-CS Benchmark

Evolves C++ solutions for [Frontier-CS](https://github.com/facebookresearch/Frontier-CS) algorithmic optimization problems using SkyDiscover.

## Setup

```bash
# 1. Clone Frontier-CS
cd benchmarks/frontier-cs-eval
git clone https://github.com/FrontierCS/Frontier-CS.git

# 2. Start the judge server (requires Docker)
cd Frontier-CS/algorithmic
docker compose up -d

# 3. Install dependencies (from project root)
cd ../../..
uv sync --extra frontier-cs

# 4. Set your API key
export OPENAI_API_KEY=...
```

## Run

Supported algorithms: `adaevolve`, `evox`, `openevolve`, `gepa`, `shinkaevolve`


Single problem:
```bash
cd benchmarks/frontier-cs-eval
FRONTIER_CS_PROBLEM=0 uv run skydiscover-run initial_program.cpp evaluator.py \
  -c config.yaml -s [search_algorithm] -i 50
```

All problems in parallel:
```bash
uv run python run_all_frontiercs.py --search [search_algorithm] --iterations 50 --workers 6
```

## Evaluate best programs (post-discovery)

```bash
uv run python run_best_programs_frontiercs.py
```

## Analyze results

```bash
uv run python combine_results.py   # merge training/testing scores into CSV
uv run python analyze_results.py   # generate plots and statistics
```

## Files

| File | Description |
|------|-------------|
| `initial_program.cpp` | Seed C++ program |
| `evaluator.py` | Evaluates C++ solutions via Frontier-CS docker judge |
| `config.yaml` | Config with system prompt template |
| `run_all_frontiercs.py` | Parallelizes evolution across all problems |
| `run_best_programs_frontiercs.py` | Re-evaluates best programs after evolution |
| `combine_results.py` | Combines training/testing scores into CSV |
| `analyze_results.py` | Generates score analysis plots and statistics |

## Environment variables

| Variable | Default | Description |
|----------|---------|-------------|
| `OPENAI_API_KEY` | (required) | API key |
| `FRONTIER_CS_PROBLEM` | `0` | Problem ID to evolve |
| `JUDGE_URLS` | `http://localhost:8081` | Comma-separated judge server URLs |