OSCAR: Offline Spectral Covariance-Aware Rotation for 2-bit KV Cache Quantization Paper • 2605.17757 • Published 8 days ago • 62
CorDA: Context-Oriented Decomposition Adaptation of Large Language Models Paper • 2406.05223 • Published Jun 7, 2024 • 4
FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design Paper • 2401.14112 • Published Jan 25, 2024 • 20