KernelBench Integration with SkyDiscover
GPU kernel optimization tasks using the KernelBench dataset and evaluation protocol.
Overview
The KernelBench integration allows you to run SkyDiscover on any problem from the KernelBench dataset. The framework automatically:
- Fetches the reference implementation of the target kernel from KernelBench
- Creates an initial_program.py with EVOLVE-BLOCK markers
- Configures the evaluator with problem-specific parameters
- Runs the optimization using either a containerized or native Python evaluator
The evaluator uses the KernelBench evaluation infrastructure to measure speedup over PyTorch eager execution.
Evaluator Modes
- Containerized (Docker): Runs evaluation inside a Docker container (default)
- Native Python: Runs evaluation directly as Python code (for clusters without Docker/Podman)
Directory Structure
benchmarks/kernelbench/
βββ config.yaml # System prompt + search/evaluator settings
βββ resolver.py # Benchmark loader (fetches target problems from KernelBench)
βββ requirements.txt # Resolver dependencies (kernelbench library)
βββ evaluator/ # Self-contained Docker benchmark
βββ Dockerfile # Container image definition
βββ evaluate.sh # Entrypoint (receives solution path)
βββ evaluator.py # Scoring logic using KernelBench
βββ requirements.txt # Evaluator dependencies (kernelbench[gpu])
βββ wrapper.py # JSON protocol wrapper
Note: The run_and_check.py script is downloaded directly from the KernelBench repository during Docker build (pinned to commit 423217d for reproducibility). To update, modify the KERNELBENCH_COMMIT build arg in the Dockerfile.
Installation
Before using the KernelBench integration, install the required dependencies:
# Install KernelBench library (required for problem fetching)
uv pip install -r benchmarks/kernelbench/requirements.txt
Note: The resolver (problem fetching) only needs the base kernelbench package. The containerized evaluator installs kernelbench[gpu] for GPU support.
Quick Start
Using Docker (Default)
Edit benchmarks/kernelbench/config.yaml to select a target kernel from the KernelBench database:
benchmark:
# KernelBench problem specification
level: 2 # Problem difficulty level (1, 2, 3 or 4)
problem_id: 5 # Specific problem ID within the level
Then, run optimization on this problem:
# algo can be "adaevolve", "evox", "topk", "beam_search", "best_of_n", etc.
uv run skydiscover-run benchmarks/kernelbench/evaluator/ \
-c benchmarks/kernelbench/config.yaml \
--search <algo> \
--iterations 50
Using Native Python (No Docker Required)
For clusters without Docker/Podman privileges, you can run the evaluator as native Python code.
1. Install Dependencies
# Install KernelBench with GPU support
pip install -r benchmarks/kernelbench/evaluator/requirements.txt
2. Configure Native Mode
Edit benchmarks/kernelbench/config.yaml:
benchmark:
enabled: true
name: kernelbench
resolver: benchmarks.kernelbench.resolver
# Set to false to use native Python evaluator (no Docker)
use_docker: false
level: 2
problem_id: 11
# ... rest of config
3. Run Optimization
# algo can be "adaevolve", "evox", "topk", "beam_search", "best_of_n", etc.
uv run skydiscover-run benchmarks/kernelbench/evaluator/ \
-c benchmarks/kernelbench/config.yaml \
--search <algo> \
--iterations 50
Note: The run_and_check.py script from KernelBench will be automatically downloaded on first run.
Note: No initial_program argument is needed - it is fetched automatically based on the benchmark section in config.yaml.
Configuration Reference
Benchmark Section
The benchmark section in config.yaml controls problem loading:
benchmark:
enabled: true # Enable benchmark loader
name: kernelbench # Benchmark name (for logging)
resolver: benchmarks.kernelbench.resolver # Python module path
# Evaluator mode
use_docker: true # true: containerized (Docker), false: native Python
# Problem specification
level: 1 # Difficulty: 1 (easy), 2 (medium), 3 (hard), 4 (very hard)
problem_id: 1 # Problem ID within the level
# Dataset source
dataset_src: huggingface # 'huggingface' or 'local'
dataset_name: ScalingIntelligence/KernelBench # HF dataset name
# Evaluation settings
eval_mode: local # 'local' or 'modal'
gpu: H100 # GPU type: H100, A100, etc.
num_correct_trials: 5 # Correctness validation runs
num_perf_trials: 100 # Performance measurement runs
Environment Variables
The resolver provides these environment variables to the evaluator:
KERNELBENCH_LEVEL: Problem difficulty level (1, 2, or 3)KERNELBENCH_PROBLEM_ID: Specific problem within the levelKERNELBENCH_EVAL_MODE: Evaluation mode (local, modal)KERNELBENCH_GPU: GPU type (H100, A100, etc.)KERNELBENCH_NUM_CORRECT_TRIALS: Number of correctness validation runsKERNELBENCH_NUM_PERF_TRIALS: Number of performance measurement runsKERNELBENCH_TIMEOUT: Timeout per evaluation in seconds
These variables are passed directly to the evaluator (not set globally), ensuring isolation between concurrent runs.
Evaluation Modes
- local: Run evaluation on your local machine (requires GPU)
- modal: Run evaluation on Modal's cloud GPUs (requires Modal setup)
GPU Types
The list of currently supported GPU types can be found here.
Metrics
The evaluator returns:
- combined_score: Speedup over PyTorch eager execution (primary metric)
- speedup_over_eager: Same as combined_score
- speedup_over_compile: Speedup over torch.compile()
- kernel_time_ms: Execution time of optimized kernel
- ref_eager_time_ms: Reference eager execution time
Traditional Usage (Manual Initial Program)
You can still provide an initial program manually if needed:
# Run with explicit initial program
uv run skydiscover-run my_kernel.py benchmarks/kernelbench/evaluator/ \
-c benchmarks/kernelbench/config.yaml \
--search <algo>
Troubleshooting
Error: "kernelbench package not found"
Install KernelBench:
pip install "kernelbench[gpu] @ git+https://github.com/ScalingIntelligence/KernelBench.git"
Error: "Failed to resolve benchmark problem"
Check that:
benchmark.enabledistruein configlevelandproblem_idare valid- KernelBench package is installed
- You have internet access (for HuggingFace dataset)
Generated Files Location
The framework creates temporary files in /tmp/skydiscover_kernelbench_*/:
initial_program.py: Generated initial program- Evaluator uses the existing
benchmarks/kernelbench/evaluator/directory