Kernels

Update build/torch-universal/triton_kernels/target_info.py

#7
by KernelMC - opened

If num_sms is called on a system running HIP, it currently returns None. But, the expression in the is_cuda() branch of this function (torch.cuda.get_device_properties(0).multi_processor_count) can also be used on a HIP system. This can be verified by evaluating this expression in a docker container running rocm/pytorch:rocm7.2_ubuntu24.04_py3.12_pytorch_release_2.9.1 on a system with a supported AMD GPU or APU. Thus, I propose this branch should be taken if is_cuda() or is_hip().

I just realized that @janantos opened a similar PR 5 months ago. Not sure if it makes more sense to add a new branch to num_sms when the expression is the same between the two. But either way, can we get one of these small changes merged to fix support for gpt-oss on ROCm?

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment