Landmark Attention: Random-Access Infinite Context Length for Transformers Paper • 2305.16300 • Published May 25, 2023
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers Paper • 2210.17323 • Published Oct 31, 2022 • 10