sky2 / benchmarks /kernelbench /README.md
JustinTX's picture
Add files using upload-large-folder tool
af83196 verified

KernelBench Integration with SkyDiscover

GPU kernel optimization tasks using the KernelBench dataset and evaluation protocol.

Overview

The KernelBench integration allows you to run SkyDiscover on any problem from the KernelBench dataset. The framework automatically:

  1. Fetches the reference implementation of the target kernel from KernelBench
  2. Creates an initial_program.py with EVOLVE-BLOCK markers
  3. Configures the evaluator with problem-specific parameters
  4. Runs the optimization using either a containerized or native Python evaluator

The evaluator uses the KernelBench evaluation infrastructure to measure speedup over PyTorch eager execution.

Evaluator Modes

  • Containerized (Docker): Runs evaluation inside a Docker container (default)
  • Native Python: Runs evaluation directly as Python code (for clusters without Docker/Podman)

Directory Structure

benchmarks/kernelbench/
β”œβ”€β”€ config.yaml              # System prompt + search/evaluator settings
β”œβ”€β”€ resolver.py              # Benchmark loader (fetches target problems from KernelBench)
β”œβ”€β”€ requirements.txt         # Resolver dependencies (kernelbench library)
└── evaluator/               # Self-contained Docker benchmark
    β”œβ”€β”€ Dockerfile           # Container image definition
    β”œβ”€β”€ evaluate.sh          # Entrypoint (receives solution path)
    β”œβ”€β”€ evaluator.py         # Scoring logic using KernelBench
    β”œβ”€β”€ requirements.txt     # Evaluator dependencies (kernelbench[gpu])
    └── wrapper.py           # JSON protocol wrapper

Note: The run_and_check.py script is downloaded directly from the KernelBench repository during Docker build (pinned to commit 423217d for reproducibility). To update, modify the KERNELBENCH_COMMIT build arg in the Dockerfile.

Installation

Before using the KernelBench integration, install the required dependencies:

# Install KernelBench library (required for problem fetching)
uv pip install -r benchmarks/kernelbench/requirements.txt

Note: The resolver (problem fetching) only needs the base kernelbench package. The containerized evaluator installs kernelbench[gpu] for GPU support.

Quick Start

Using Docker (Default)

Edit benchmarks/kernelbench/config.yaml to select a target kernel from the KernelBench database:

benchmark:
  # KernelBench problem specification
  level: 2                    # Problem difficulty level (1, 2, 3 or 4)
  problem_id: 5               # Specific problem ID within the level

Then, run optimization on this problem:

# algo can be "adaevolve", "evox", "topk", "beam_search", "best_of_n", etc.
uv run skydiscover-run benchmarks/kernelbench/evaluator/ \
  -c benchmarks/kernelbench/config.yaml \
  --search <algo> \
  --iterations 50

Using Native Python (No Docker Required)

For clusters without Docker/Podman privileges, you can run the evaluator as native Python code.

1. Install Dependencies

# Install KernelBench with GPU support
pip install -r benchmarks/kernelbench/evaluator/requirements.txt

2. Configure Native Mode

Edit benchmarks/kernelbench/config.yaml:

benchmark:
  enabled: true
  name: kernelbench
  resolver: benchmarks.kernelbench.resolver
  
  # Set to false to use native Python evaluator (no Docker)
  use_docker: false
  
  level: 2
  problem_id: 11
  # ... rest of config

3. Run Optimization

# algo can be "adaevolve", "evox", "topk", "beam_search", "best_of_n", etc.
uv run skydiscover-run benchmarks/kernelbench/evaluator/ \
  -c benchmarks/kernelbench/config.yaml \
  --search <algo> \
  --iterations 50

Note: The run_and_check.py script from KernelBench will be automatically downloaded on first run.

Note: No initial_program argument is needed - it is fetched automatically based on the benchmark section in config.yaml.

Configuration Reference

Benchmark Section

The benchmark section in config.yaml controls problem loading:

benchmark:
  enabled: true                    # Enable benchmark loader
  name: kernelbench                # Benchmark name (for logging)
  resolver: benchmarks.kernelbench.resolver  # Python module path
  
  # Evaluator mode
  use_docker: true                 # true: containerized (Docker), false: native Python
  
  # Problem specification
  level: 1                         # Difficulty: 1 (easy), 2 (medium), 3 (hard), 4 (very hard)
  problem_id: 1                    # Problem ID within the level
  
  # Dataset source
  dataset_src: huggingface         # 'huggingface' or 'local'
  dataset_name: ScalingIntelligence/KernelBench  # HF dataset name
  
  # Evaluation settings
  eval_mode: local                 # 'local' or 'modal'
  gpu: H100                        # GPU type: H100, A100, etc.
  num_correct_trials: 5            # Correctness validation runs
  num_perf_trials: 100             # Performance measurement runs

Environment Variables

The resolver provides these environment variables to the evaluator:

  • KERNELBENCH_LEVEL: Problem difficulty level (1, 2, or 3)
  • KERNELBENCH_PROBLEM_ID: Specific problem within the level
  • KERNELBENCH_EVAL_MODE: Evaluation mode (local, modal)
  • KERNELBENCH_GPU: GPU type (H100, A100, etc.)
  • KERNELBENCH_NUM_CORRECT_TRIALS: Number of correctness validation runs
  • KERNELBENCH_NUM_PERF_TRIALS: Number of performance measurement runs
  • KERNELBENCH_TIMEOUT: Timeout per evaluation in seconds

These variables are passed directly to the evaluator (not set globally), ensuring isolation between concurrent runs.

Evaluation Modes

  • local: Run evaluation on your local machine (requires GPU)
  • modal: Run evaluation on Modal's cloud GPUs (requires Modal setup)

GPU Types

The list of currently supported GPU types can be found here.

Metrics

The evaluator returns:

  • combined_score: Speedup over PyTorch eager execution (primary metric)
  • speedup_over_eager: Same as combined_score
  • speedup_over_compile: Speedup over torch.compile()
  • kernel_time_ms: Execution time of optimized kernel
  • ref_eager_time_ms: Reference eager execution time

Traditional Usage (Manual Initial Program)

You can still provide an initial program manually if needed:

# Run with explicit initial program
uv run skydiscover-run my_kernel.py benchmarks/kernelbench/evaluator/ \
  -c benchmarks/kernelbench/config.yaml \
  --search <algo>

Troubleshooting

Error: "kernelbench package not found"

Install KernelBench:

pip install "kernelbench[gpu] @ git+https://github.com/ScalingIntelligence/KernelBench.git"

Error: "Failed to resolve benchmark problem"

Check that:

  1. benchmark.enabled is true in config
  2. level and problem_id are valid
  3. KernelBench package is installed
  4. You have internet access (for HuggingFace dataset)

Generated Files Location

The framework creates temporary files in /tmp/skydiscover_kernelbench_*/:

  • initial_program.py: Generated initial program
  • Evaluator uses the existing benchmarks/kernelbench/evaluator/ directory