| # ALE-Bench: AtCoder Heuristic Contest Benchmark |
|
|
| 10 problems from AtCoder Heuristic Contests (AHC), evaluated via the `ale_bench` package. Programs are written in C++ and scored on 50 public test cases during evolution. A separate private evaluator runs the full hidden test set for final ranking. |
|
|
| ## Problems |
|
|
| | Problem | Description | |
| |---------|-------------| |
| | `ahc008` | Pet partitioning β place walls to create pet-free areas on a 30Γ30 grid over 300 turns | |
| | `ahc011` | AtCoder Heuristic Contest 11 | |
| | `ahc015` | AtCoder Heuristic Contest 15 | |
| | `ahc016` | AtCoder Heuristic Contest 16 | |
| | `ahc024` | AtCoder Heuristic Contest 24 | |
| | `ahc025` | Balance weighing β use a balance scale to divide N items into D equal-weight sets using Q queries | |
| | `ahc026` | AtCoder Heuristic Contest 26 | |
| | `ahc027` | AtCoder Heuristic Contest 27 | |
| | `ahc039` | AtCoder Heuristic Contest 39 | |
| | `ahc046` | AtCoder Heuristic Contest 46 | |
|
|
| ## Quick Start |
|
|
| Run evolution on a single problem: |
|
|
| ```bash |
| uv run skydiscover-run \ |
| benchmarks/ale_bench/ale-bench-lite-problems/ahc025/initial_program.cpp \ |
| benchmarks/ale_bench/ale-bench-lite-problems/ahc025/evaluator.py \ |
| -c benchmarks/ale_bench/ale-bench-lite-problems/ahc025/config.yaml \ |
| --search evox \ |
| -i 100 |
| ``` |
|
|
| ## Scoring |
|
|
| During evolution, each iteration runs 50 public test cases: |
|
|
| ``` |
| combined_score = overall_absolute_score * optim_factor / num_public_cases |
| ``` |
|
|
| `optim_factor` is `+1` for maximize problems and `-1` for minimize problems (so `combined_score` is always higher-is-better). |
|
|
| ## Private Evaluation |
|
|
| After evolution, evaluate the best program on the full private test set: |
|
|
| ```bash |
| python benchmarks/ale_bench/private_eval.py \ |
| --program-path path/to/best_program.cpp \ |
| --problem-id ahc025 |
| ``` |
|
|
| This runs 3 independent evaluations and reports the average private rank, performance score, and per-case pass/fail counts. |
|
|
| ## Directory Structure |
|
|
| ``` |
| ale_bench/ |
| βββ ale-bench-lite-problems/ |
| β βββ ahcXXX/ |
| β βββ initial_program.cpp # Starting C++ solution |
| β βββ evaluator.py # Runs 50 public cases via ale_bench |
| β βββ config.yaml # Search config (cpp, diff-based, 100 iterations) |
| βββ ale_agent_best/ |
| β βββ ahcXXX.cpp # Best known solutions (reference) |
| βββ private_eval.py # Full private set evaluation + ranking |
| ``` |
|
|
| ## Requirements |
|
|
| Requires the `ale_bench` and `ale_bench_eval` packages. These are not in the default `uv sync` β install them separately per the ALE-Bench documentation. |
|
|
| ## Config Defaults |
|
|
| All problems share the same base config: |
|
|
| ```yaml |
| language: cpp |
| diff_based_evolution: true |
| max_iterations: 100 |
| max_solution_length: 60000 |
| evaluator: |
| timeout: 10000 |
| ``` |
|
|