| # KernelBench Integration with SkyDiscover |
|
|
| GPU kernel optimization tasks using the [KernelBench](https://github.com/ScalingIntelligence/KernelBench) dataset and evaluation protocol. |
|
|
| ## Overview |
|
|
| The KernelBench integration allows you to run SkyDiscover on any problem from the KernelBench dataset. The framework automatically: |
|
|
| 1. Fetches the reference implementation of the target kernel from KernelBench |
| 2. Creates an initial_program.py with EVOLVE-BLOCK markers |
| 3. Configures the evaluator with problem-specific parameters |
| 4. Runs the optimization using either a containerized or native Python evaluator |
| |
| The evaluator uses the KernelBench evaluation infrastructure to measure speedup over PyTorch eager execution. |
| |
| ### Evaluator Modes |
| |
| - **Containerized (Docker)**: Runs evaluation inside a Docker container (default) |
| - **Native Python**: Runs evaluation directly as Python code (for clusters without Docker/Podman) |
| |
| ## Directory Structure |
| |
| ``` |
| benchmarks/kernelbench/ |
| βββ config.yaml # System prompt + search/evaluator settings |
| βββ resolver.py # Benchmark loader (fetches target problems from KernelBench) |
| βββ requirements.txt # Resolver dependencies (kernelbench library) |
| βββ evaluator/ # Self-contained Docker benchmark |
| βββ Dockerfile # Container image definition |
| βββ evaluate.sh # Entrypoint (receives solution path) |
| βββ evaluator.py # Scoring logic using KernelBench |
| βββ requirements.txt # Evaluator dependencies (kernelbench[gpu]) |
| βββ wrapper.py # JSON protocol wrapper |
| ``` |
| |
| **Note:** The `run_and_check.py` script is downloaded directly from the KernelBench repository during Docker build (pinned to commit `423217d` for reproducibility). To update, modify the `KERNELBENCH_COMMIT` build arg in the Dockerfile. |
|
|
| ## Installation |
|
|
| Before using the KernelBench integration, install the required dependencies: |
|
|
| ```bash |
| # Install KernelBench library (required for problem fetching) |
| uv pip install -r benchmarks/kernelbench/requirements.txt |
| ``` |
|
|
| **Note:** The resolver (problem fetching) only needs the base `kernelbench` package. The containerized evaluator installs `kernelbench[gpu]` for GPU support. |
|
|
| ## Quick Start |
|
|
| ### Using Docker (Default) |
|
|
| Edit `benchmarks/kernelbench/config.yaml` to select a target kernel from the [KernelBench database](https://huggingface.co/datasets/ScalingIntelligence/KernelBench): |
|
|
| ```yaml |
| benchmark: |
| # KernelBench problem specification |
| level: 2 # Problem difficulty level (1, 2, 3 or 4) |
| problem_id: 5 # Specific problem ID within the level |
| ``` |
|
|
| Then, run optimization on this problem: |
|
|
| ```bash |
| # algo can be "adaevolve", "evox", "topk", "beam_search", "best_of_n", etc. |
| uv run skydiscover-run benchmarks/kernelbench/evaluator/ \ |
| -c benchmarks/kernelbench/config.yaml \ |
| --search <algo> \ |
| --iterations 50 |
| ``` |
|
|
| ### Using Native Python (No Docker Required) |
|
|
| For clusters without Docker/Podman privileges, you can run the evaluator as native Python code. |
|
|
| #### 1. Install Dependencies |
|
|
| ```bash |
| # Install KernelBench with GPU support |
| pip install -r benchmarks/kernelbench/evaluator/requirements.txt |
| ``` |
|
|
| #### 2. Configure Native Mode |
|
|
| Edit `benchmarks/kernelbench/config.yaml`: |
|
|
| ```yaml |
| benchmark: |
| enabled: true |
| name: kernelbench |
| resolver: benchmarks.kernelbench.resolver |
| |
| # Set to false to use native Python evaluator (no Docker) |
| use_docker: false |
| |
| level: 2 |
| problem_id: 11 |
| # ... rest of config |
| ``` |
|
|
| #### 3. Run Optimization |
|
|
| ```bash |
| # algo can be "adaevolve", "evox", "topk", "beam_search", "best_of_n", etc. |
| uv run skydiscover-run benchmarks/kernelbench/evaluator/ \ |
| -c benchmarks/kernelbench/config.yaml \ |
| --search <algo> \ |
| --iterations 50 |
| ``` |
|
|
| **Note:** The `run_and_check.py` script from KernelBench will be automatically downloaded on first run. |
|
|
| **Note:** No initial_program argument is needed - it is fetched automatically based on the `benchmark` section in config.yaml. |
| |
| ## Configuration Reference |
| |
| ### Benchmark Section |
| |
| The `benchmark` section in `config.yaml` controls problem loading: |
| |
| ```yaml |
| benchmark: |
| enabled: true # Enable benchmark loader |
| name: kernelbench # Benchmark name (for logging) |
| resolver: benchmarks.kernelbench.resolver # Python module path |
| |
| # Evaluator mode |
| use_docker: true # true: containerized (Docker), false: native Python |
| |
| # Problem specification |
| level: 1 # Difficulty: 1 (easy), 2 (medium), 3 (hard), 4 (very hard) |
| problem_id: 1 # Problem ID within the level |
| |
| # Dataset source |
| dataset_src: huggingface # 'huggingface' or 'local' |
| dataset_name: ScalingIntelligence/KernelBench # HF dataset name |
| |
| # Evaluation settings |
| eval_mode: local # 'local' or 'modal' |
| gpu: H100 # GPU type: H100, A100, etc. |
| num_correct_trials: 5 # Correctness validation runs |
| num_perf_trials: 100 # Performance measurement runs |
| ``` |
| |
| ### Environment Variables |
| |
| The resolver provides these environment variables to the evaluator: |
| |
| - `KERNELBENCH_LEVEL`: Problem difficulty level (1, 2, or 3) |
| - `KERNELBENCH_PROBLEM_ID`: Specific problem within the level |
| - `KERNELBENCH_EVAL_MODE`: Evaluation mode (local, modal) |
| - `KERNELBENCH_GPU`: GPU type (H100, A100, etc.) |
| - `KERNELBENCH_NUM_CORRECT_TRIALS`: Number of correctness validation runs |
| - `KERNELBENCH_NUM_PERF_TRIALS`: Number of performance measurement runs |
| - `KERNELBENCH_TIMEOUT`: Timeout per evaluation in seconds |
| |
| These variables are passed directly to the evaluator (not set globally), ensuring isolation between concurrent runs. |
| |
| ### Evaluation Modes |
| |
| - **local**: Run evaluation on your local machine (requires GPU) |
| - **modal**: Run evaluation on Modal's cloud GPUs (requires Modal setup) |
| |
| ### GPU Types |
| |
| The list of currently supported GPU types can be found [here](https://github.com/ScalingIntelligence/KernelBench/blob/423217d9fda91e0c2d67e4a43bf62f96f6d104f1/scripts/run_and_check.py#L16). |
| |
| ## Metrics |
| |
| The evaluator returns: |
| |
| - **combined_score**: Speedup over PyTorch eager execution (primary metric) |
| - **speedup_over_eager**: Same as combined_score |
| - **speedup_over_compile**: Speedup over torch.compile() |
| - **kernel_time_ms**: Execution time of optimized kernel |
| - **ref_eager_time_ms**: Reference eager execution time |
| |
| |
| ## Traditional Usage (Manual Initial Program) |
| |
| You can still provide an initial program manually if needed: |
| |
| ```bash |
| # Run with explicit initial program |
| uv run skydiscover-run my_kernel.py benchmarks/kernelbench/evaluator/ \ |
| -c benchmarks/kernelbench/config.yaml \ |
| --search <algo> |
| ``` |
| |
| ## Troubleshooting |
| |
| ### Error: "kernelbench package not found" |
| |
| Install KernelBench: |
| ```bash |
| pip install "kernelbench[gpu] @ git+https://github.com/ScalingIntelligence/KernelBench.git" |
| ``` |
| |
| ### Error: "Failed to resolve benchmark problem" |
| |
| Check that: |
| 1. `benchmark.enabled` is `true` in config |
| 2. `level` and `problem_id` are valid |
| 3. KernelBench package is installed |
| 4. You have internet access (for HuggingFace dataset) |
|
|
| ### Generated Files Location |
|
|
| The framework creates temporary files in `/tmp/skydiscover_kernelbench_*/`: |
| - `initial_program.py`: Generated initial program |
| - Evaluator uses the existing `benchmarks/kernelbench/evaluator/` directory |
|
|