Buckets:

hf-doc-build
/

doc-dev

Files

xet

hf-doc-build/doc-dev / transformers /pr_26617 /en /kernel_doc /overview.md

rtrm

about 1 month ago

preview code

download

raw

2.29 kB

	# Kernels

	PyTorch operations are general-purpose. Hardware vendors and the community create specialized implementations that run faster on specific platforms. Installing these optimized kernels is a challenge because it requires matching compiler versions, CUDA toolkits, and platform-specific builds.

	\| platform \| supported devices \|
	\| :--- \| :--- \|
	\| NVIDIA GPUs (CUDA) \| Modern architectures with compute capability 7.0+ (Volta, Turing, Ampere, Hopper, Blackwell) \|
	\| AMD GPUs (ROCm) \| Compatible with ROCm-supported devices \|
	\| Apple Silicon (Metal) \| M-series chips (M1, M2, M3, M4 and newer) \|
	\| Intel GPUs (XPU) \| Intel Data Center GPU Max Series and compatible devices \|

	[Kernels](https://huggingface.co/docs/kernels/index) solves this by distributing precompiled binaries through the [Hub](https://huggingface.co/kernels-community). It detects your platform at runtime and loads the right binary automatically.

	When `use_kernels=True`, Transformers identifies layers with available optimized kernel implementations. It downloads and [caches](../installation#cache-directory) kernels from the Hub only when needed to reduce startup time. Kernels accelerate compute-intensive operations such as attention, normalization, and fused operations.

	Not all operations have kernel implementations. The library falls back to standard PyTorch when no kernel is available.

	## Determinism

	Some kernels produce slightly different results than PyTorch due to operation reordering or accumulation strategies. These differences are functionally equivalent but affect reproducibility.

	For deterministic behavior, try the following.

	- Check kernel repository documentation for determinism guarantees. For example, the SDPA kernel in [gpt-oss-metal-kernels](https://huggingface.co/kernels-community/gpt-oss-metal-kernels#4-scaled-dot-product-attention-sdpa) matches the PyTorch implementation 97% of the time.
	- Disable specific kernels that affect your use case.
	- Set random seeds and PyTorch deterministic flags.

	## Resources

	- [Loading kernels](./loading_kernels) guide to get started
	- [Kernels](https://github.com/huggingface/kernels) GitHub repository
	- [Enhance Your Models in 5 Minutes with the Hugging Face Kernel Hub](https://huggingface.co/blog/hello-hf-kernels) blog post

Xet Storage Details

Size:: 2.29 kB
Xet hash:: 024d5a5f9f6dc0fc17eca3f29bfe5e2b0a574ebbdad125ac25a60b08d0577d5b

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.