Add ApplyRoPE and RMSNorm kernels written in OpenAI Triton

by wangzihan99 - opened Dec 4, 2023

base: refs/heads/main

←

from: refs/pr/8

Discussion Files changed

+152

-4

wangzihan99

Dec 4, 2023

No description provided.

Add ApplyRoPE and RMSNorm kernels written in OpenAI Triton.1b59f635

wangzihan99

Dec 4, 2023

This PR add kernels of ApplyRoPE and RMSNorm written in OpenAI Triton. These kernels offer better performance, support a wider range of GPU architectures (including V100 and T4), and require no pre-compilation, compared with flash-attn. They are enabled automatically if Triton is installed (usually bundled with PyTorch 2.x).

wangzihan99 changed pull request status to open Dec 4, 2023

Merge branch 'dev_triton' of https://huggingface.co/Qwen/Qwen-7B-Chat-Int4 into pr/84690395f

wangzihan99 changed pull request status to closed Dec 5, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment