Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Julius-L 's Collections
inference acceleration
multimodal dataset
Generation
Long Context
Finetuning
Memory Efficient Training
Pretraining
Model Architecture
Model Merging
Sparsification
Quantization
LLM Technical Reports
Unseen Papers

inference acceleration

updated Jun 3, 2025
Upvote
-

  • SageAttention2++: A More Efficient Implementation of SageAttention2

    Paper • 2505.21136 • Published May 27, 2025 • 45

  • SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training

    Paper • 2505.11594 • Published May 16, 2025 • 75
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs