DFlash Collection Block Diffusion for Flash Speculative Decoding • 4 items • Updated about 2 hours ago • 12
VLASH: Real-Time VLAs via Future-State-Aware Asynchronous Inference Paper • 2512.01031 • Published Nov 30, 2025 • 25
ParoQuant: Pairwise Rotation Quantization for Efficient Reasoning LLM Inference Paper • 2511.10645 • Published Nov 13, 2025 • 4
Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations Paper • 2510.23607 • Published Oct 27, 2025 • 179
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM Paper • 2510.15870 • Published Oct 17, 2025 • 91
SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity Paper • 2506.16500 • Published Jun 19, 2025 • 16
Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding Paper • 2505.22618 • Published May 28, 2025 • 45
Token-Efficient Long Video Understanding for Multimodal LLMs Paper • 2503.04130 • Published Mar 6, 2025 • 96
StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation Paper • 2312.12491 • Published Dec 19, 2023 • 74
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models Paper • 2309.12307 • Published Sep 21, 2023 • 89