Buckets:

hf-doc-build
/

doc-dev

Files

xet

hf-doc-build/doc-dev / kernels /pr_321 /en /kernel-requirements.md

rtrm

about 1 month ago

preview code

download

raw

9.98 kB

	# Kernel requirements

	Kernels on the Hub must fulfill the requirements outlined on this page. By
	ensuring kernels are compliant, they can be used on a wide range of Linux
	systems and Torch builds.

	[Join us on Discord](https://discord.gg/H6Tkmd88N3) for questions and discussions
	about building kernels!

	## Directory layout

	A kernel repository on the Hub must contain a `build` directory. This
	directory contains build variants of a kernel in the form of directories
	following the template
	`-cxx---`.
	For example `build/torch26-cxx98-cu118-x86_64-linux`.

	The kernel is in the build variant directory and must contain a
	`__init__.py` file. For compatibility with older versions of the
	`kernels` package, each variant directory must also contain a single
	directory with the same name as the repository (replacing `-` by `_`).
	For instance, kernels in the `kernels-community/activation` repository
	have a directory like `build//activation`. This directory
	must contain an `__init__.py` file that exports the same symbols as
	`__init__.py` in the build variant directory `build/`.
	[This example](https://huggingface.co/kernels-test/flattened-build/blob/main/build/torch-universal/flattened_build/__init__.py)
	shows how this can be done. This compatibility directory is
	automatically created by `kernel-builder`.

	## Build variants

	A kernel can be compliant for a specific compute framework (e.g. CUDA) or
	architecture (e.g. x86_64). For compliance with a compute framework and
	architecture combination, all the variants from the [build variant list](builder/build-variants)
	must be available for that combination.

	## Kernel metadata

	The build variant directory can optionally contain a `metadata.json` file.
	Currently the metadata specifies the kernel's version and Python dependencies,
	for example:

	```json
	{
	"python-depends": ["einops"],
	"version": 1
	```

	### Python dependencies

	You can specify Python dependencies that your kernel requires. Dependencies can be either general (required for all backends) or backend-specific (required only for certain compute backends like CUDA, ROCm, XPU, Metal, or CPU).

	#### General dependencies

	For dependencies required regardless of the backend, use the `python-depends` field:

	```json
	{
	"python-depends": ["einops"]
	}
	```

	#### Backend-specific dependencies

	For dependencies that are only needed for specific backends, use the `python-depends-backends` field:

	```json
	{
	"python-depends-backends": {
	"cuda": ["nvidia-cutlass-dsl"],
	"xpu": ["onednn"]
	}
	}
	```

	#### Combined example

	You can specify both general and backend-specific dependencies:

	```json
	{
	"python-depends": ["einops"],
	"python-depends-backends": {
	"cuda": ["nvidia-cutlass-dsl"]
	},
	"version": 1
	}
	```

	#### Allowed dependencies

	The following dependencies are currently allowed:

	General dependencies:

	- `einops`

	Backend-specific dependencies:

	- CUDA: `nvidia-cutlass-dsl`
	- XPU: `onednn`

	Dependencies are validated based on the backend being used. When a kernel is loaded, only the dependencies relevant to the active backend are checked.

	## Versioning

	Kernels are versioned using a major version. The kernel revisions of a
	version are stored in a branch of the form `v`. Each build
	variant will also have the kernel version in `metadata.json`.

	The version must be bumped in the following cases:

	- The kernel API is changed in an incompatible way.
	- The API is extended in a compatible way, but not all build variants
	receive the extension (e.g. because they are for older Torch versions
	that are not supported by `kernel-builder` anymore).

	In both cases, build variants that are not updated must be removed from
	the new version's branch.

	## Native Python module

	Kernels will typically contain a native Python module with precompiled
	compute kernels and bindings. This module must fulfill the requirements
	outlined in this section. For all operating systems, a kernel must not
	have dynamic library dependencies outside:

	- Torch;
	- CUDA/ROCm libraries installed as dependencies of Torch.

	## Compatibility with torch.compile

	The Kernel Hub also encourages to write the kernels in a `torch.compile`
	compliant way. This helps to ensure that the kernels are compatible with
	`torch.compile` without introducing any graph breaks and triggering
	recompilation which can limit the benefits of compilation.

	[Here](https://github.com/huggingface/kernels/blob/f83b4da6b7f6b171b47bb9bf96271ae2273bc9d3/builder/examples/relu-backprop-compile/tests/test_relu.py#L162)
	is a simple test example which checks for graph breaks and
	recompilation triggers during `torch.compile`.

	### Linux

	- Use [ABI3/Limited API](https://docs.python.org/3/c-api/stable.html#stable-application-binary-interface)
	for compatibility with Python 3.9 and later.
	- Compatible with [`manylinux_2_28`](https://github.com/pypa/manylinux?tab=readme-ov-file#manylinux_2_28-almalinux-8-based).
	This means that the extension must not use symbols versions higher than:
	- GLIBC 2.28
	- GLIBCXX 3.4.24
	- CXXABI 1.3.11
	- GCC 7.0.0

	These requirements can be checked with the ABI checker (see below).

	### macOS

	- Use [ABI3/Limited API](https://docs.python.org/3/c-api/stable.html#stable-application-binary-interface)
	for compatibility with Python 3.9 and later.
	- macOS deployment target 15.0.
	- Metal 3.0 (`-std=metal3.0`).

	The ABI3 requirement can be checked with the ABI checker (see below).

	### ABI checker

	The manylinux_2_28 and Python ABI 3.9 version requirements can be checked with
	[`kernel-abi-check`](https://crates.io/crates/kernel-abi-check):

	```bash

	$ cargo install kernel-abi-check
	$ kernel-abi-check result/relu/_relu_e87e0ca_dirty.abi3.so
	🐍 Checking for compatibility with manylinux_2_28 and Python ABI version 3.9
	✅ No compatibility issues found
	```

	## Torch extension

	Torch native extension functions must be [registered](https://pytorch.org/tutorials/advanced/cpp_custom_ops.html#cpp-custom-ops-tutorial)
	in `torch.ops.`. Since we allow loading of multiple versions of
	a module in the same Python process, `namespace` must be unique for each
	version of a kernel. Failing to do so will create clashes when different
	versions of the same kernel are loaded. Two suggested ways of doing this
	are:

	- Appending a truncated SHA-1 hash of the git commit that the kernel was
	built from to the name of the extension.
	- Appending random material to the name of the extension.

	Note: we recommend against appending a version number or git tag.
	Version numbers are typically not bumped on each commit, so users
	might use two different commits that happen to have the same version
	number. Git tags are not stable, so they do not provide a good way
	of guaranteeing uniqueness of the namespace.

	## Layers

	A kernel can provide layers in addition to kernel functions. A layer from
	the Hub can replace the `forward` method of an existing layer for a certain
	device type. This makes it possible to provide more performant kernels for
	existing layers. See the [layers documentation](layers) for more information
	on how to use layers.

	### Writing layers

	To make the extension of layers safe, the layers must fulfill the following
	requirements:

	- The layers are subclasses of `torch.nn.Module`.
	- The layers are pure, meaning that they do not have their own state. This
	means that:
	- The layer must not define its own constructor.
	- The layer must not use class variables.
	- No other methods must be defined than `forward`.
	- The `forward` method has a signature that is compatible with the
	`forward` method that it is extending.

	There are two exceptions to the _no class variables rule_:

	1. The `has_backward` variable can be used to indicate whether the layer has
	a backward pass implemented (`True` when absent).
	2. The `can_torch_compile` variable can be used to indicate whether the layer
	supports `torch.compile` (`False` when absent).

	This is an example of a pure layer:

	```python
	class SiluAndMul(nn.Module):
	# This layer does not implement backward.
	has_backward: bool = False

	def forward(self, x: torch.Tensor):
	d = x.shape[-1] // 2
	output_shape = x.shape[:-1] + (d,)
	out = torch.empty(output_shape, dtype=x.dtype, device=x.device)
	ops.silu_and_mul(out, x)
	return out
	```

	For some layers, the `forward` method has to use state from the adopting class.
	In these cases, we recommend to use type annotations to indicate what member
	variables are expected. For instance:

	```python
	class LlamaRMSNorm(nn.Module):
	weight: torch.Tensor
	variance_epsilon: float

	def forward(self, hidden_states: torch.Tensor) -> torch.Tensor:
	return rms_norm_fn(
	hidden_states,
	self.weight,
	bias=None,
	residual=None,
	eps=self.variance_epsilon,
	dropout_p=0.0,
	prenorm=False,
	residual_in_fp32=False,
	)
	```

	This layer expects the adopting layer to have `weight` and `variance_epsilon`
	member variables and uses them in the `forward` method.

	### Exporting layers

	To accommodate portable loading, `layers` must be defined in the main
	`__init__.py` file. For example:

	```python
	from . import layers

	__all__ = [
	# ...
	"layers"
	# ...
	]
	```

	## Python requirements

	- Python code must be compatible with Python 3.9 and later.
	- All Python code imports from the kernel itself must be relative. So,
	for instance if in the example kernel `example`,
	`module_b` needs a function from `module_a`, import as:

	```python
	from .module_a import foo
	```

	Never use:

	```python
	# DO NOT DO THIS!

	from example.module_a import foo
	```

	The latter would import from the module `example` that is in Python's
	global module dict. However, since we allow loading multiple versions
	of a module, we uniquely name the module.

	- Only modules from the Python standard library, Torch, or the kernel itself
	can be imported.

Xet Storage Details

Size:: 9.98 kB
Xet hash:: 01c3940c98409e881215bf9c147ce741d1dc58890363a664b3f70ec988970cbb

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.