Squeezed Attention: Accelerating Long Context Length LLM Inference Paper • 2411.09688 • Published Nov 14, 2024 • 1
SPEED: Speculative Pipelined Execution for Efficient Decoding Paper • 2310.12072 • Published Oct 18, 2023