JetSpec: Breaking the Scaling Ceiling of Speculative Decoding with Parallel Tree Drafting Paper • 2606.18394 • Published 2 days ago • 26
Efficient Long-context Language Model Training by Core Attention Disaggregation Paper • 2510.18121 • Published Oct 20, 2025 • 124
FastWan Collection models trained with video sparse attention: https://arxiv.org/abs/2505.13389 and distillation • 6 items • Updated Mar 2 • 11
FastWan Collection models trained with video sparse attention: https://arxiv.org/abs/2505.13389 and distillation • 6 items • Updated Mar 2 • 11
FastWan Collection models trained with video sparse attention: https://arxiv.org/abs/2505.13389 and distillation • 6 items • Updated Mar 2 • 11