Quantized Side Tuning: Fast and Memory-Efficient Tuning of Quantized Large Language Models Paper ⢠2401.07159 ⢠Published Jan 13, 2024 ⢠1
KV-Distill: Nearly Lossless Learnable Context Compression for LLMs Paper ⢠2503.10337 ⢠Published Mar 13, 2025 ⢠1
LMCache: An Efficient KV Cache Layer for Enterprise-Scale LLM Inference Paper ⢠2510.09665 ⢠Published Oct 8, 2025 ⢠1
Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language Models Paper ⢠2503.16257 ⢠Published Mar 20, 2025 ⢠27
SALAD: Achieve High-Sparsity Attention via Efficient Linear Attention Tuning for Video Diffusion Transformer Paper ⢠2601.16515 ⢠Published 4 days ago ⢠12
Unsloth Dynamic 2.0 Quants Collection New 2.0 version of our Dynamic GGUF + Quants. Dynamic 2.0 achieves superior accuracy & SOTA quantization performance. ⢠69 items ⢠Updated about 2 hours ago ⢠321