DashAttention Collection 8B models for reproducibility of DashAttention paper • 4 items • Updated about 6 hours ago
Inference-Time Hyper-Scaling with KV Cache Compression Paper • 2506.05345 • Published Jun 5, 2025 • 30
Bootstrapping World Models from Dynamics Models in Multimodal Foundation Models Paper • 2506.06006 • Published Jun 6, 2025 • 15
XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning Paper • 2005.00333 • Published May 1, 2020