File size: 7,411 Bytes
af83196 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 | # KernelBench Integration with SkyDiscover
GPU kernel optimization tasks using the [KernelBench](https://github.com/ScalingIntelligence/KernelBench) dataset and evaluation protocol.
## Overview
The KernelBench integration allows you to run SkyDiscover on any problem from the KernelBench dataset. The framework automatically:
1. Fetches the reference implementation of the target kernel from KernelBench
2. Creates an initial_program.py with EVOLVE-BLOCK markers
3. Configures the evaluator with problem-specific parameters
4. Runs the optimization using either a containerized or native Python evaluator
The evaluator uses the KernelBench evaluation infrastructure to measure speedup over PyTorch eager execution.
### Evaluator Modes
- **Containerized (Docker)**: Runs evaluation inside a Docker container (default)
- **Native Python**: Runs evaluation directly as Python code (for clusters without Docker/Podman)
## Directory Structure
```
benchmarks/kernelbench/
βββ config.yaml # System prompt + search/evaluator settings
βββ resolver.py # Benchmark loader (fetches target problems from KernelBench)
βββ requirements.txt # Resolver dependencies (kernelbench library)
βββ evaluator/ # Self-contained Docker benchmark
βββ Dockerfile # Container image definition
βββ evaluate.sh # Entrypoint (receives solution path)
βββ evaluator.py # Scoring logic using KernelBench
βββ requirements.txt # Evaluator dependencies (kernelbench[gpu])
βββ wrapper.py # JSON protocol wrapper
```
**Note:** The `run_and_check.py` script is downloaded directly from the KernelBench repository during Docker build (pinned to commit `423217d` for reproducibility). To update, modify the `KERNELBENCH_COMMIT` build arg in the Dockerfile.
## Installation
Before using the KernelBench integration, install the required dependencies:
```bash
# Install KernelBench library (required for problem fetching)
uv pip install -r benchmarks/kernelbench/requirements.txt
```
**Note:** The resolver (problem fetching) only needs the base `kernelbench` package. The containerized evaluator installs `kernelbench[gpu]` for GPU support.
## Quick Start
### Using Docker (Default)
Edit `benchmarks/kernelbench/config.yaml` to select a target kernel from the [KernelBench database](https://huggingface.co/datasets/ScalingIntelligence/KernelBench):
```yaml
benchmark:
# KernelBench problem specification
level: 2 # Problem difficulty level (1, 2, 3 or 4)
problem_id: 5 # Specific problem ID within the level
```
Then, run optimization on this problem:
```bash
# algo can be "adaevolve", "evox", "topk", "beam_search", "best_of_n", etc.
uv run skydiscover-run benchmarks/kernelbench/evaluator/ \
-c benchmarks/kernelbench/config.yaml \
--search <algo> \
--iterations 50
```
### Using Native Python (No Docker Required)
For clusters without Docker/Podman privileges, you can run the evaluator as native Python code.
#### 1. Install Dependencies
```bash
# Install KernelBench with GPU support
pip install -r benchmarks/kernelbench/evaluator/requirements.txt
```
#### 2. Configure Native Mode
Edit `benchmarks/kernelbench/config.yaml`:
```yaml
benchmark:
enabled: true
name: kernelbench
resolver: benchmarks.kernelbench.resolver
# Set to false to use native Python evaluator (no Docker)
use_docker: false
level: 2
problem_id: 11
# ... rest of config
```
#### 3. Run Optimization
```bash
# algo can be "adaevolve", "evox", "topk", "beam_search", "best_of_n", etc.
uv run skydiscover-run benchmarks/kernelbench/evaluator/ \
-c benchmarks/kernelbench/config.yaml \
--search <algo> \
--iterations 50
```
**Note:** The `run_and_check.py` script from KernelBench will be automatically downloaded on first run.
**Note:** No initial_program argument is needed - it is fetched automatically based on the `benchmark` section in config.yaml.
## Configuration Reference
### Benchmark Section
The `benchmark` section in `config.yaml` controls problem loading:
```yaml
benchmark:
enabled: true # Enable benchmark loader
name: kernelbench # Benchmark name (for logging)
resolver: benchmarks.kernelbench.resolver # Python module path
# Evaluator mode
use_docker: true # true: containerized (Docker), false: native Python
# Problem specification
level: 1 # Difficulty: 1 (easy), 2 (medium), 3 (hard), 4 (very hard)
problem_id: 1 # Problem ID within the level
# Dataset source
dataset_src: huggingface # 'huggingface' or 'local'
dataset_name: ScalingIntelligence/KernelBench # HF dataset name
# Evaluation settings
eval_mode: local # 'local' or 'modal'
gpu: H100 # GPU type: H100, A100, etc.
num_correct_trials: 5 # Correctness validation runs
num_perf_trials: 100 # Performance measurement runs
```
### Environment Variables
The resolver provides these environment variables to the evaluator:
- `KERNELBENCH_LEVEL`: Problem difficulty level (1, 2, or 3)
- `KERNELBENCH_PROBLEM_ID`: Specific problem within the level
- `KERNELBENCH_EVAL_MODE`: Evaluation mode (local, modal)
- `KERNELBENCH_GPU`: GPU type (H100, A100, etc.)
- `KERNELBENCH_NUM_CORRECT_TRIALS`: Number of correctness validation runs
- `KERNELBENCH_NUM_PERF_TRIALS`: Number of performance measurement runs
- `KERNELBENCH_TIMEOUT`: Timeout per evaluation in seconds
These variables are passed directly to the evaluator (not set globally), ensuring isolation between concurrent runs.
### Evaluation Modes
- **local**: Run evaluation on your local machine (requires GPU)
- **modal**: Run evaluation on Modal's cloud GPUs (requires Modal setup)
### GPU Types
The list of currently supported GPU types can be found [here](https://github.com/ScalingIntelligence/KernelBench/blob/423217d9fda91e0c2d67e4a43bf62f96f6d104f1/scripts/run_and_check.py#L16).
## Metrics
The evaluator returns:
- **combined_score**: Speedup over PyTorch eager execution (primary metric)
- **speedup_over_eager**: Same as combined_score
- **speedup_over_compile**: Speedup over torch.compile()
- **kernel_time_ms**: Execution time of optimized kernel
- **ref_eager_time_ms**: Reference eager execution time
## Traditional Usage (Manual Initial Program)
You can still provide an initial program manually if needed:
```bash
# Run with explicit initial program
uv run skydiscover-run my_kernel.py benchmarks/kernelbench/evaluator/ \
-c benchmarks/kernelbench/config.yaml \
--search <algo>
```
## Troubleshooting
### Error: "kernelbench package not found"
Install KernelBench:
```bash
pip install "kernelbench[gpu] @ git+https://github.com/ScalingIntelligence/KernelBench.git"
```
### Error: "Failed to resolve benchmark problem"
Check that:
1. `benchmark.enabled` is `true` in config
2. `level` and `problem_id` are valid
3. KernelBench package is installed
4. You have internet access (for HuggingFace dataset)
### Generated Files Location
The framework creates temporary files in `/tmp/skydiscover_kernelbench_*/`:
- `initial_program.py`: Generated initial program
- Evaluator uses the existing `benchmarks/kernelbench/evaluator/` directory
|