DiRL: An Efficient Post-Training Framework for Diffusion Language Models Paper • 2512.22234 • Published 11 days ago • 18
Beyond Real: Imaginary Extension of Rotary Position Embeddings for Long-Context LLMs Paper • 2512.07525 • Published 26 days ago • 55
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm Paper • 2511.04570 • Published Nov 6, 2025 • 211
Sparser Block-Sparse Attention via Token Permutation Paper • 2510.21270 • Published Oct 24, 2025 • 24
Energy-Based Transformers are Scalable Learners and Thinkers Paper • 2507.02092 • Published Jul 2, 2025 • 69
PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters Paper • 2504.08791 • Published Apr 7, 2025 • 137