sky / benchmarks /kernelbench /README.md
JustinTX's picture
Add files using upload-large-folder tool
16dd578 verified
# KernelBench Integration with SkyDiscover
GPU kernel optimization tasks using the [KernelBench](https://github.com/ScalingIntelligence/KernelBench) dataset and evaluation protocol.
## Overview
The KernelBench integration allows you to run SkyDiscover on any problem from the KernelBench dataset. The framework automatically:
1. Fetches the reference implementation of the target kernel from KernelBench
2. Creates an initial_program.py with EVOLVE-BLOCK markers
3. Configures the evaluator with problem-specific parameters
4. Runs the optimization using either a containerized or native Python evaluator
The evaluator uses the KernelBench evaluation infrastructure to measure speedup over PyTorch eager execution.
### Evaluator Modes
- **Containerized (Docker)**: Runs evaluation inside a Docker container (default)
- **Native Python**: Runs evaluation directly as Python code (for clusters without Docker/Podman)
## Directory Structure
```
benchmarks/kernelbench/
β”œβ”€β”€ config.yaml # System prompt + search/evaluator settings
β”œβ”€β”€ resolver.py # Benchmark loader (fetches target problems from KernelBench)
β”œβ”€β”€ requirements.txt # Resolver dependencies (kernelbench library)
└── evaluator/ # Self-contained Docker benchmark
β”œβ”€β”€ Dockerfile # Container image definition
β”œβ”€β”€ evaluate.sh # Entrypoint (receives solution path)
β”œβ”€β”€ evaluator.py # Scoring logic using KernelBench
β”œβ”€β”€ requirements.txt # Evaluator dependencies (kernelbench[gpu])
└── wrapper.py # JSON protocol wrapper
```
**Note:** The `run_and_check.py` script is downloaded directly from the KernelBench repository during Docker build (pinned to commit `423217d` for reproducibility). To update, modify the `KERNELBENCH_COMMIT` build arg in the Dockerfile.
## Installation
Before using the KernelBench integration, install the required dependencies:
```bash
# Install KernelBench library (required for problem fetching)
uv pip install -r benchmarks/kernelbench/requirements.txt
```
**Note:** The resolver (problem fetching) only needs the base `kernelbench` package. The containerized evaluator installs `kernelbench[gpu]` for GPU support.
## Quick Start
### Using Docker (Default)
Edit `benchmarks/kernelbench/config.yaml` to select a target kernel from the [KernelBench database](https://huggingface.co/datasets/ScalingIntelligence/KernelBench):
```yaml
benchmark:
# KernelBench problem specification
level: 2 # Problem difficulty level (1, 2, 3 or 4)
problem_id: 5 # Specific problem ID within the level
```
Then, run optimization on this problem:
```bash
# algo can be "adaevolve", "evox", "topk", "beam_search", "best_of_n", etc.
uv run skydiscover-run benchmarks/kernelbench/evaluator/ \
-c benchmarks/kernelbench/config.yaml \
--search <algo> \
--iterations 50
```
### Using Native Python (No Docker Required)
For clusters without Docker/Podman privileges, you can run the evaluator as native Python code.
#### 1. Install Dependencies
```bash
# Install KernelBench with GPU support
pip install -r benchmarks/kernelbench/evaluator/requirements.txt
```
#### 2. Configure Native Mode
Edit `benchmarks/kernelbench/config.yaml`:
```yaml
benchmark:
enabled: true
name: kernelbench
resolver: benchmarks.kernelbench.resolver
# Set to false to use native Python evaluator (no Docker)
use_docker: false
level: 2
problem_id: 11
# ... rest of config
```
#### 3. Run Optimization
```bash
# algo can be "adaevolve", "evox", "topk", "beam_search", "best_of_n", etc.
uv run skydiscover-run benchmarks/kernelbench/evaluator/ \
-c benchmarks/kernelbench/config.yaml \
--search <algo> \
--iterations 50
```
**Note:** The `run_and_check.py` script from KernelBench will be automatically downloaded on first run.
**Note:** No initial_program argument is needed - it is fetched automatically based on the `benchmark` section in config.yaml.
## Configuration Reference
### Benchmark Section
The `benchmark` section in `config.yaml` controls problem loading:
```yaml
benchmark:
enabled: true # Enable benchmark loader
name: kernelbench # Benchmark name (for logging)
resolver: benchmarks.kernelbench.resolver # Python module path
# Evaluator mode
use_docker: true # true: containerized (Docker), false: native Python
# Problem specification
level: 1 # Difficulty: 1 (easy), 2 (medium), 3 (hard), 4 (very hard)
problem_id: 1 # Problem ID within the level
# Dataset source
dataset_src: huggingface # 'huggingface' or 'local'
dataset_name: ScalingIntelligence/KernelBench # HF dataset name
# Evaluation settings
eval_mode: local # 'local' or 'modal'
gpu: H100 # GPU type: H100, A100, etc.
num_correct_trials: 5 # Correctness validation runs
num_perf_trials: 100 # Performance measurement runs
```
### Environment Variables
The resolver provides these environment variables to the evaluator:
- `KERNELBENCH_LEVEL`: Problem difficulty level (1, 2, or 3)
- `KERNELBENCH_PROBLEM_ID`: Specific problem within the level
- `KERNELBENCH_EVAL_MODE`: Evaluation mode (local, modal)
- `KERNELBENCH_GPU`: GPU type (H100, A100, etc.)
- `KERNELBENCH_NUM_CORRECT_TRIALS`: Number of correctness validation runs
- `KERNELBENCH_NUM_PERF_TRIALS`: Number of performance measurement runs
- `KERNELBENCH_TIMEOUT`: Timeout per evaluation in seconds
These variables are passed directly to the evaluator (not set globally), ensuring isolation between concurrent runs.
### Evaluation Modes
- **local**: Run evaluation on your local machine (requires GPU)
- **modal**: Run evaluation on Modal's cloud GPUs (requires Modal setup)
### GPU Types
The list of currently supported GPU types can be found [here](https://github.com/ScalingIntelligence/KernelBench/blob/423217d9fda91e0c2d67e4a43bf62f96f6d104f1/scripts/run_and_check.py#L16).
## Metrics
The evaluator returns:
- **combined_score**: Speedup over PyTorch eager execution (primary metric)
- **speedup_over_eager**: Same as combined_score
- **speedup_over_compile**: Speedup over torch.compile()
- **kernel_time_ms**: Execution time of optimized kernel
- **ref_eager_time_ms**: Reference eager execution time
## Traditional Usage (Manual Initial Program)
You can still provide an initial program manually if needed:
```bash
# Run with explicit initial program
uv run skydiscover-run my_kernel.py benchmarks/kernelbench/evaluator/ \
-c benchmarks/kernelbench/config.yaml \
--search <algo>
```
## Troubleshooting
### Error: "kernelbench package not found"
Install KernelBench:
```bash
pip install "kernelbench[gpu] @ git+https://github.com/ScalingIntelligence/KernelBench.git"
```
### Error: "Failed to resolve benchmark problem"
Check that:
1. `benchmark.enabled` is `true` in config
2. `level` and `problem_id` are valid
3. KernelBench package is installed
4. You have internet access (for HuggingFace dataset)
### Generated Files Location
The framework creates temporary files in `/tmp/skydiscover_kernelbench_*/`:
- `initial_program.py`: Generated initial program
- Evaluator uses the existing `benchmarks/kernelbench/evaluator/` directory