view article Article KV Caching Explained: Optimizing Transformer Inference Efficiency Jan 30, 2025 • 221
view article Article You could have designed state of the art positional encoding Nov 25, 2024 • 442
view article Article Efficient LLM Pretraining: Packed Sequences and Masked Attention Oct 7, 2024 • 64
Llama 2: Open Foundation and Fine-Tuned Chat Models Paper • 2307.09288 • Published Jul 18, 2023 • 248