The Past Is Not Past: Memory-Enhanced Dynamic Reward Shaping Paper • 2604.11297 • Published 6 days ago • 135
Rethinking UMM Visual Generation: Masked Modeling for Efficient Image-Only Pre-training Paper • 2603.16139 • Published Mar 17 • 32
SpargeAttention2: Trainable Sparse Attention via Hybrid Top-k+Top-p Masking and Distillation Fine-Tuning Paper • 2602.13515 • Published Feb 13 • 44
MOVA: Towards Scalable and Synchronized Video-Audio Generation Paper • 2602.08794 • Published Feb 9 • 159
DiRL: An Efficient Post-Training Framework for Diffusion Language Models Paper • 2512.22234 • Published Dec 23, 2025 • 22
Beyond Real: Imaginary Extension of Rotary Position Embeddings for Long-Context LLMs Paper • 2512.07525 • Published Dec 8, 2025 • 60
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm Paper • 2511.04570 • Published Nov 6, 2025 • 242
Sparser Block-Sparse Attention via Token Permutation Paper • 2510.21270 • Published Oct 24, 2025 • 25
Energy-Based Transformers are Scalable Learners and Thinkers Paper • 2507.02092 • Published Jul 2, 2025 • 69
PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters Paper • 2504.08791 • Published Apr 7, 2025 • 140