Other LLM Related Scaling Embeddings Outperforms Scaling Experts in Language Models Paper • 2601.21204 • Published Jan 29 • 103 STEM: Scaling Transformers with Embedding Modules Paper • 2601.10639 • Published Jan 15 • 2
Scaling Embeddings Outperforms Scaling Experts in Language Models Paper • 2601.21204 • Published Jan 29 • 103
dLLMs Fast-dLLM v2: Efficient Block-Diffusion LLM Paper • 2509.26328 • Published Sep 30, 2025 • 58 Attention Is All You Need for KV Cache in Diffusion LLMs Paper • 2510.14973 • Published Oct 16, 2025 • 42 Attention Sinks in Diffusion Language Models Paper • 2510.15731 • Published Oct 17, 2025 • 50 Diffusion Language Models are Super Data Learners Paper • 2511.03276 • Published Nov 5, 2025 • 132
Attention Is All You Need for KV Cache in Diffusion LLMs Paper • 2510.14973 • Published Oct 16, 2025 • 42
Other LLM Related Scaling Embeddings Outperforms Scaling Experts in Language Models Paper • 2601.21204 • Published Jan 29 • 103 STEM: Scaling Transformers with Embedding Modules Paper • 2601.10639 • Published Jan 15 • 2
Scaling Embeddings Outperforms Scaling Experts in Language Models Paper • 2601.21204 • Published Jan 29 • 103
dLLMs Fast-dLLM v2: Efficient Block-Diffusion LLM Paper • 2509.26328 • Published Sep 30, 2025 • 58 Attention Is All You Need for KV Cache in Diffusion LLMs Paper • 2510.14973 • Published Oct 16, 2025 • 42 Attention Sinks in Diffusion Language Models Paper • 2510.15731 • Published Oct 17, 2025 • 50 Diffusion Language Models are Super Data Learners Paper • 2511.03276 • Published Nov 5, 2025 • 132
Attention Is All You Need for KV Cache in Diffusion LLMs Paper • 2510.14973 • Published Oct 16, 2025 • 42