DFlash: Block Diffusion for Flash Speculative Decoding Paper • 2602.06036 • Published 2 days ago • 20
DualCamCtrl: Dual-Branch Diffusion Model for Geometry-Aware Camera-Controlled Video Generation Paper • 2511.23127 • Published Nov 28, 2025 • 44
VLASH: Real-Time VLAs via Future-State-Aware Asynchronous Inference Paper • 2512.01031 • Published Nov 30, 2025 • 25
TiViBench: Benchmarking Think-in-Video Reasoning for Video Generative Models Paper • 2511.13704 • Published Nov 17, 2025 • 43
ParoQuant: Pairwise Rotation Quantization for Efficient Reasoning LLM Inference Paper • 2511.10645 • Published Nov 13, 2025 • 5
MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding Paper • 2408.11049 • Published Aug 20, 2024 • 14
SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity Paper • 2506.16500 • Published Jun 19, 2025 • 16
SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer Paper • 2303.17605 • Published Mar 30, 2023
HAT: Hardware-Aware Transformers for Efficient Natural Language Processing Paper • 2005.14187 • Published May 28, 2020 • 2
MapPrior: Bird's-Eye View Map Layout Estimation with Generative Models Paper • 2308.12963 • Published Aug 24, 2023
FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer Paper • 2301.08739 • Published Jan 20, 2023
LongVILA: Scaling Long-Context Visual Language Models for Long Videos Paper • 2408.10188 • Published Aug 19, 2024 • 52
AMC: AutoML for Model Compression and Acceleration on Mobile Devices Paper • 1802.03494 • Published Feb 10, 2018
APQ: Joint Search for Network Architecture, Pruning and Quantization Policy Paper • 2006.08509 • Published Jun 15, 2020