Add ApplyRoPE and RMSNorm kernels written in OpenAI Triton

#10

by Cheshire94 - opened Dec 5, 2023

base: refs/heads/main

←

from: refs/pr/10

Discussion Files changed

+151

-3

Cheshire94

Dec 5, 2023

No description provided.

Add ApplyRoPE and RMSNorm kernels written in OpenAI Triton.af642029

Cheshire94 changed pull request title from pr/9 to pr/10 Dec 5, 2023

Cheshire94

Dec 5, 2023

This PR add kernels of ApplyRoPE and RMSNorm written in OpenAI Triton. These kernels offer better performance, support a wider range of GPU architectures (including V100 and T4), and require no pre-compilation, compared with flash-attn. They are enabled automatically if Triton is installed (usually bundled with PyTorch 2.x).

Cheshire94 changed pull request title from pr/10 to Add ApplyRoPE and RMSNorm kernels written in OpenAI Triton Dec 5, 2023

Cheshire94 changed pull request status to closed Dec 5, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment