NVIDIA Nemotron v3 Collection Open, Production-ready Enterprise Models • 12 items • Updated 4 days ago • 198
Cerebras REAP Collection Sparse MoE models compressed using REAP (Router-weighted Expert Activation Pruning) method • 30 items • Updated 18 days ago • 131
view article Article MLA: Redefining KV-Cache Through Low-Rank Projections and On-Demand Decompression Feb 4, 2025 • 21
Compressing KV Cache for Long-Context LLM Inference with Inter-Layer Attention Similarity Paper • 2412.02252 • Published Dec 3, 2024 • 2
TransMLA: Multi-head Latent Attention Is All You Need Paper • 2502.07864 • Published Feb 11, 2025 • 57
Kimi k1.5: Scaling Reinforcement Learning with LLMs Paper • 2501.12599 • Published Jan 22, 2025 • 126
Hibiki fr-en Collection Hibiki is a model for streaming speech translation , which can run on device! See https://github.com/kyutai-labs/hibiki. • 7 items • Updated Dec 24, 2025 • 55