view article Article KV Caching Explained: Optimizing Transformer Inference Efficiency not-lain • Jan 30, 2025 • 352
view article Article You could have designed state of the art positional encoding FL33TW00D-HF • Nov 25, 2024 • 487
view article Article Efficient LLM Pretraining: Packed Sequences and Masked Attention sirluk • Oct 7, 2024 • 71
view article Article SmolLM - blazingly fast and remarkably powerful +1 loubnabnl, anton-l, eliebak • Jul 16, 2024 • 460
Llama 2: Open Foundation and Fine-Tuned Chat Models Paper • 2307.09288 • Published Jul 18, 2023 • 252