SpecInfer: Accelerating Generative LLM Serving with Speculative Inference and Token Tree Verification Paper • 2305.09781 • Published May 16, 2023 • 4
GNNPipe: Scaling Deep GNN Training with Pipelined Model Parallelism Paper • 2308.10087 • Published Aug 19, 2023 • 1
Sequoia: Scalable, Robust, and Hardware-aware Speculative Decoding Paper • 2402.12374 • Published Feb 19, 2024 • 4
TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding Paper • 2404.11912 • Published Apr 18, 2024 • 17
SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices Paper • 2406.02532 • Published Jun 4, 2024 • 13
MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding Paper • 2408.11049 • Published Aug 20, 2024 • 14
MINI-SEQUENCE TRANSFORMER: Optimizing Intermediate Memory for Long Sequences Training Paper • 2407.15892 • Published Jul 22, 2024
Sirius: Contextual Sparsity with Correction for Efficient LLMs Paper • 2409.03856 • Published Sep 5, 2024 • 1
CAT Pruning: Cluster-Aware Token Pruning For Text-to-Image Diffusion Models Paper • 2502.00433 • Published Feb 1, 2025
HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading Paper • 2502.12574 • Published Feb 18, 2025 • 13
APE: Faster and Longer Context-Augmented Generation via Adaptive Parallel Encoding Paper • 2502.05431 • Published Feb 8, 2025 • 6
Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference Paper • 2402.09398 • Published Feb 14, 2024
BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation Paper • 2205.13542 • Published May 26, 2022
FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer Paper • 2301.08739 • Published Jan 20, 2023
It Takes Two: On the Seamlessness between Reward and Policy Model in RLHF Paper • 2406.07971 • Published Jun 12, 2024
S$^{2}$FT: Efficient, Scalable and Generalizable LLM Fine-tuning by Structured Sparsity Paper • 2412.06289 • Published Dec 9, 2024