Updating to Pytorch 2.4

#5
by chultquist0 - opened
Haichen Wang Research Group org
No description provided.
chultquist0 changed pull request title from Updating to Pytorch 2.8 to Updating to Pytorch 2.4
Haichen Wang Research Group org

2.0 -> 2.1:

  • torch.sparse now includes prototype support for semi-structured (2:4) sparsity on NVIDIA® GPUs
  • New CPU performance features include inductor improvements (e.g. bfloat16 support and dynamic shapes), AVX512 kernel support, and scaled-dot-product-attention kernels
  • torch.compile can now compile NumPy operations via translating them into PyTorch-equivalent operations

2.1->2.2

  • scaled_dot_product_attention (SDPA) now supports FlashAttention-2, yielding around 2x speedups compared to previous versions

2.2->2.3

  • Tensor Parallelism improves the experience for training Large Language Models using native PyTorch function
  • semi-structured sparsity implements semi-structured sparsity as a Tensor subclass, with observed speedups of up to 1.6 over dense matrix multiplication.

2.3->2.4

  • Introduced a new default server backend for TCPStore built with libuv which should introduce significantly lower initialization times and better scalability
  • Pytorch users can now experience improved quality and performance gains with the beta BF16 symbolic shape support
Haichen Wang Research Group org

image.png

Based on a small setup with 4 trainings for each of torch 2.0 vs. torch 2.4, it looks like about 8% decrease in training time.

chultquist0 changed pull request status to open
chultquist0 changed pull request status to merged
Haichen Wang Research Group org
edited Jul 18, 2025

image.png

using 600k events total, and 1024 batch size, the speed up is less dramatic but still noticeable (about 3%)

Torch 2.4
Batch: 809.5060 s
Train: 417.8602 s
Eval: 790.6683 s

Torch 2.0
Batch: 829.1964 s
Train: 432.0877 s
Eval: 811.7211 s

Sign up or log in to comment