Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation Paper • 2604.10098 • Published 5 days ago • 67
ISO: Overlap of Computation and Communication within Seqenence For LLM Inference Paper • 2409.11155 • Published Sep 4, 2024
BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Competitive Large Language Model Baseline Paper • 2408.15079 • Published Aug 27, 2024 • 56
Clover-2: Accurate Inference for Regressive Lightweight Speculative Decoding Paper • 2408.00264 • Published Aug 1, 2024
Clover: Regressive Lightweight Speculative Decoding with Sequential Knowledge Paper • 2405.00263 • Published May 1, 2024 • 16