Add files using upload-large-folder tool

16dd578 verified 22 days ago

7.41 kB

	# KernelBench Integration with SkyDiscover

	GPU kernel optimization tasks using the [KernelBench](https://github.com/ScalingIntelligence/KernelBench) dataset and evaluation protocol.

	## Overview

	The KernelBench integration allows you to run SkyDiscover on any problem from the KernelBench dataset. The framework automatically:

	1. Fetches the reference implementation of the target kernel from KernelBench
	2. Creates an initial_program.py with EVOLVE-BLOCK markers
	3. Configures the evaluator with problem-specific parameters
	4. Runs the optimization using either a containerized or native Python evaluator

	The evaluator uses the KernelBench evaluation infrastructure to measure speedup over PyTorch eager execution.

	### Evaluator Modes

	- Containerized (Docker): Runs evaluation inside a Docker container (default)
	- Native Python: Runs evaluation directly as Python code (for clusters without Docker/Podman)

	## Directory Structure

	```
	benchmarks/kernelbench/
	├── config.yaml # System prompt + search/evaluator settings
	├── resolver.py # Benchmark loader (fetches target problems from KernelBench)
	├── requirements.txt # Resolver dependencies (kernelbench library)
	└── evaluator/ # Self-contained Docker benchmark
	├── Dockerfile # Container image definition
	├── evaluate.sh # Entrypoint (receives solution path)
	├── evaluator.py # Scoring logic using KernelBench
	├── requirements.txt # Evaluator dependencies (kernelbench[gpu])
	└── wrapper.py # JSON protocol wrapper
	```

	Note: The `run_and_check.py` script is downloaded directly from the KernelBench repository during Docker build (pinned to commit `423217d` for reproducibility). To update, modify the `KERNELBENCH_COMMIT` build arg in the Dockerfile.

	## Installation

	Before using the KernelBench integration, install the required dependencies:

	```bash
	# Install KernelBench library (required for problem fetching)
	uv pip install -r benchmarks/kernelbench/requirements.txt
	```

	Note: The resolver (problem fetching) only needs the base `kernelbench` package. The containerized evaluator installs `kernelbench[gpu]` for GPU support.

	## Quick Start

	### Using Docker (Default)

	Edit `benchmarks/kernelbench/config.yaml` to select a target kernel from the [KernelBench database](https://huggingface.co/datasets/ScalingIntelligence/KernelBench):

	```yaml
	benchmark:
	# KernelBench problem specification
	level: 2 # Problem difficulty level (1, 2, 3 or 4)
	problem_id: 5 # Specific problem ID within the level
	```

	Then, run optimization on this problem:

	```bash
	# algo can be "adaevolve", "evox", "topk", "beam_search", "best_of_n", etc.
	uv run skydiscover-run benchmarks/kernelbench/evaluator/ \
	-c benchmarks/kernelbench/config.yaml \
	--search <algo> \
	--iterations 50
	```

	### Using Native Python (No Docker Required)

	For clusters without Docker/Podman privileges, you can run the evaluator as native Python code.

	#### 1. Install Dependencies

	```bash
	# Install KernelBench with GPU support
	pip install -r benchmarks/kernelbench/evaluator/requirements.txt
	```

	#### 2. Configure Native Mode

	Edit `benchmarks/kernelbench/config.yaml`:

	```yaml
	benchmark:
	enabled: true
	name: kernelbench
	resolver: benchmarks.kernelbench.resolver

	# Set to false to use native Python evaluator (no Docker)
	use_docker: false

	level: 2
	problem_id: 11
	# ... rest of config
	```

	#### 3. Run Optimization

	```bash
	# algo can be "adaevolve", "evox", "topk", "beam_search", "best_of_n", etc.
	uv run skydiscover-run benchmarks/kernelbench/evaluator/ \
	-c benchmarks/kernelbench/config.yaml \
	--search <algo> \
	--iterations 50
	```

	Note: The `run_and_check.py` script from KernelBench will be automatically downloaded on first run.

	Note: No initial_program argument is needed - it is fetched automatically based on the `benchmark` section in config.yaml.

	## Configuration Reference

	### Benchmark Section

	The `benchmark` section in `config.yaml` controls problem loading:

	```yaml
	benchmark:
	enabled: true # Enable benchmark loader
	name: kernelbench # Benchmark name (for logging)
	resolver: benchmarks.kernelbench.resolver # Python module path

	# Evaluator mode
	use_docker: true # true: containerized (Docker), false: native Python

	# Problem specification
	level: 1 # Difficulty: 1 (easy), 2 (medium), 3 (hard), 4 (very hard)
	problem_id: 1 # Problem ID within the level

	# Dataset source
	dataset_src: huggingface # 'huggingface' or 'local'
	dataset_name: ScalingIntelligence/KernelBench # HF dataset name

	# Evaluation settings
	eval_mode: local # 'local' or 'modal'
	gpu: H100 # GPU type: H100, A100, etc.
	num_correct_trials: 5 # Correctness validation runs
	num_perf_trials: 100 # Performance measurement runs
	```

	### Environment Variables

	The resolver provides these environment variables to the evaluator:

	- `KERNELBENCH_LEVEL`: Problem difficulty level (1, 2, or 3)
	- `KERNELBENCH_PROBLEM_ID`: Specific problem within the level
	- `KERNELBENCH_EVAL_MODE`: Evaluation mode (local, modal)
	- `KERNELBENCH_GPU`: GPU type (H100, A100, etc.)
	- `KERNELBENCH_NUM_CORRECT_TRIALS`: Number of correctness validation runs
	- `KERNELBENCH_NUM_PERF_TRIALS`: Number of performance measurement runs
	- `KERNELBENCH_TIMEOUT`: Timeout per evaluation in seconds

	These variables are passed directly to the evaluator (not set globally), ensuring isolation between concurrent runs.

	### Evaluation Modes

	- local: Run evaluation on your local machine (requires GPU)
	- modal: Run evaluation on Modal's cloud GPUs (requires Modal setup)

	### GPU Types

	The list of currently supported GPU types can be found [here](https://github.com/ScalingIntelligence/KernelBench/blob/423217d9fda91e0c2d67e4a43bf62f96f6d104f1/scripts/run_and_check.py#L16).

	## Metrics

	The evaluator returns:

	- combined_score: Speedup over PyTorch eager execution (primary metric)
	- speedup_over_eager: Same as combined_score
	- speedup_over_compile: Speedup over torch.compile()
	- kernel_time_ms: Execution time of optimized kernel
	- ref_eager_time_ms: Reference eager execution time


	## Traditional Usage (Manual Initial Program)

	You can still provide an initial program manually if needed:

	```bash
	# Run with explicit initial program
	uv run skydiscover-run my_kernel.py benchmarks/kernelbench/evaluator/ \
	-c benchmarks/kernelbench/config.yaml \
	--search <algo>
	```

	## Troubleshooting

	### Error: "kernelbench package not found"

	Install KernelBench:
	```bash
	pip install "kernelbench[gpu] @ git+https://github.com/ScalingIntelligence/KernelBench.git"
	```

	### Error: "Failed to resolve benchmark problem"

	Check that:
	1. `benchmark.enabled` is `true` in config
	2. `level` and `problem_id` are valid
	3. KernelBench package is installed
	4. You have internet access (for HuggingFace dataset)

	### Generated Files Location

	The framework creates temporary files in `/tmp/skydiscover_kernelbench_*/`:
	- `initial_program.py`: Generated initial program
	- Evaluator uses the existing `benchmarks/kernelbench/evaluator/` directory