Taylor-Calibrate: Principled Initialization for Hybrid Linear Attention Distillation Paper • 2606.16429 • Published 18 days ago • 5 • 5
Taylor-Calibrate: Principled Initialization for Hybrid Linear Attention Distillation Paper • 2606.16429 • Published 18 days ago • 5 • 5
Taylor-Calibrate: Principled Initialization for Hybrid Linear Attention Distillation Paper • 2606.16429 • Published 18 days ago • 5
Taylor-Calibrate: Principled Initialization for Hybrid Linear Attention Distillation Paper • 2606.16429 • Published 18 days ago • 5
Zhongzhu/OSCAR-LLAMACPP-Qwen3-4B-Thinking-2507-INT2-KV Text Generation • 4B • Updated 24 days ago • 228
Zhongzhu/OSCAR-LLAMACPP-Qwen3-4B-Thinking-2507-INT2-KV Text Generation • 4B • Updated 24 days ago • 228
OSCAR: Offline Spectral Covariance-Aware Rotation for 2-bit KV Cache Quantization Paper • 2605.17757 • Published May 18 • 66
OSCAR: Offline Spectral Covariance-Aware Rotation for 2-bit KV Cache Quantization Paper • 2605.17757 • Published May 18 • 66
togethercomputer/CoderForge-Preview-32B-SWE-Bench-Verified-Evaluation-trajectories Viewer • Updated Feb 2 • 500 • 352 • 13
CorDA: Context-Oriented Decomposition Adaptation of Large Language Models Paper • 2406.05223 • Published Jun 7, 2024 • 4