Buckets:

rtrm's picture
|
download
raw
2.29 kB
# Kernels
PyTorch operations are general-purpose. Hardware vendors and the community create specialized implementations that run faster on specific platforms. Installing these optimized kernels is a challenge because it requires matching compiler versions, CUDA toolkits, and platform-specific builds.
| platform | supported devices |
| :--- | :--- |
| NVIDIA GPUs (CUDA) | Modern architectures with compute capability 7.0+ (Volta, Turing, Ampere, Hopper, Blackwell) |
| AMD GPUs (ROCm) | Compatible with ROCm-supported devices |
| Apple Silicon (Metal) | M-series chips (M1, M2, M3, M4 and newer) |
| Intel GPUs (XPU) | Intel Data Center GPU Max Series and compatible devices |
[Kernels](https://huggingface.co/docs/kernels/index) solves this by distributing precompiled binaries through the [Hub](https://huggingface.co/kernels-community). It detects your platform at runtime and loads the right binary automatically.
When `use_kernels=True`, Transformers identifies layers with available optimized kernel implementations. It downloads and [caches](../installation#cache-directory) kernels from the Hub only when needed to reduce startup time. Kernels accelerate compute-intensive operations such as attention, normalization, and fused operations.
Not all operations have kernel implementations. The library falls back to standard PyTorch when no kernel is available.
## Determinism
Some kernels produce slightly different results than PyTorch due to operation reordering or accumulation strategies. These differences are functionally equivalent but affect reproducibility.
For deterministic behavior, try the following.
- Check kernel repository documentation for determinism guarantees. For example, the SDPA kernel in [gpt-oss-metal-kernels](https://huggingface.co/kernels-community/gpt-oss-metal-kernels#4-scaled-dot-product-attention-sdpa) matches the PyTorch implementation 97% of the time.
- Disable specific kernels that affect your use case.
- Set random seeds and PyTorch deterministic flags.
## Resources
- [Loading kernels](./loading_kernels) guide to get started
- [Kernels](https://github.com/huggingface/kernels) GitHub repository
- [Enhance Your Models in 5 Minutes with the Hugging Face Kernel Hub](https://huggingface.co/blog/hello-hf-kernels) blog post

Xet Storage Details

Size:
2.29 kB
·
Xet hash:
024d5a5f9f6dc0fc17eca3f29bfe5e2b0a574ebbdad125ac25a60b08d0577d5b

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.