Add ApplyRoPE and RMSNorm kernels written in OpenAI Triton
#10
by
Cheshire94 - opened
No description provided.
Cheshire94 changed pull request title from
pr/9
to pr/10
This PR add kernels of ApplyRoPE and RMSNorm written in OpenAI Triton. These kernels offer better performance, support a wider range of GPU architectures (including V100 and T4), and require no pre-compilation, compared with flash-attn. They are enabled automatically if Triton is installed (usually bundled with PyTorch 2.x).
Cheshire94 changed pull request title from
pr/10
to Add ApplyRoPE and RMSNorm kernels written in OpenAI Triton
Cheshire94 changed pull request status to
closed