# ALE-Bench: AtCoder Heuristic Contest Benchmark 10 problems from AtCoder Heuristic Contests (AHC), evaluated via the `ale_bench` package. Programs are written in C++ and scored on 50 public test cases during evolution. A separate private evaluator runs the full hidden test set for final ranking. ## Problems | Problem | Description | |---------|-------------| | `ahc008` | Pet partitioning — place walls to create pet-free areas on a 30×30 grid over 300 turns | | `ahc011` | AtCoder Heuristic Contest 11 | | `ahc015` | AtCoder Heuristic Contest 15 | | `ahc016` | AtCoder Heuristic Contest 16 | | `ahc024` | AtCoder Heuristic Contest 24 | | `ahc025` | Balance weighing — use a balance scale to divide N items into D equal-weight sets using Q queries | | `ahc026` | AtCoder Heuristic Contest 26 | | `ahc027` | AtCoder Heuristic Contest 27 | | `ahc039` | AtCoder Heuristic Contest 39 | | `ahc046` | AtCoder Heuristic Contest 46 | ## Quick Start Run evolution on a single problem: ```bash uv run skydiscover-run \ benchmarks/ale_bench/ale-bench-lite-problems/ahc025/initial_program.cpp \ benchmarks/ale_bench/ale-bench-lite-problems/ahc025/evaluator.py \ -c benchmarks/ale_bench/ale-bench-lite-problems/ahc025/config.yaml \ --search evox \ -i 100 ``` ## Scoring During evolution, each iteration runs 50 public test cases: ``` combined_score = overall_absolute_score * optim_factor / num_public_cases ``` `optim_factor` is `+1` for maximize problems and `-1` for minimize problems (so `combined_score` is always higher-is-better). ## Private Evaluation After evolution, evaluate the best program on the full private test set: ```bash python benchmarks/ale_bench/private_eval.py \ --program-path path/to/best_program.cpp \ --problem-id ahc025 ``` This runs 3 independent evaluations and reports the average private rank, performance score, and per-case pass/fail counts. ## Directory Structure ``` ale_bench/ ├── ale-bench-lite-problems/ │ └── ahcXXX/ │ ├── initial_program.cpp # Starting C++ solution │ ├── evaluator.py # Runs 50 public cases via ale_bench │ └── config.yaml # Search config (cpp, diff-based, 100 iterations) ├── ale_agent_best/ │ └── ahcXXX.cpp # Best known solutions (reference) └── private_eval.py # Full private set evaluation + ranking ``` ## Requirements Requires the `ale_bench` and `ale_bench_eval` packages. These are not in the default `uv sync` — install them separately per the ALE-Bench documentation. ## Config Defaults All problems share the same base config: ```yaml language: cpp diff_based_evolution: true max_iterations: 100 max_solution_length: 60000 evaluator: timeout: 10000 ```