view article Article Who Routes LLM Routers? RouterArena: Building the Evaluation Foundation for LLM Routing JerryPotter • Nov 11, 2025 • 14
view article Article Transformers backend integration in SGLang +3 zhyncs, ispobock, lmzheng, JinnP, marcsun13 • Jun 23, 2025 • 56
KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache Paper • 2402.02750 • Published Feb 5, 2024 • 5
AutoL2S: Auto Long-Short Reasoning for Efficient Large Language Models Paper • 2505.22662 • Published May 28, 2025 • 6
Give Me FP32 or Give Me Death? Challenges and Solutions for Reproducible Reasoning Paper • 2506.09501 • Published Jun 11, 2025 • 20
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models Paper • 2503.16419 • Published Mar 20, 2025 • 77