Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled Image-Text-to-Text • 28B • Updated 9 days ago • 353k • 1.97k
LLaMA-NAS: Efficient Neural Architecture Search for Large Language Models Paper • 2405.18377 • Published May 28, 2024 • 21
Union of Experts: Adapting Hierarchical Routing to Equivalently Decomposed Transformer Paper • 2503.02495 • Published Mar 4, 2025 • 9
Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B Paper • 2511.06221 • Published Nov 9, 2025 • 133
view article Article What is MoE 2.0? Update Your Knowledge about Mixture-of-experts Apr 27, 2025 • 10
deepseek-ai/DeepSeek-R1-Distill-Qwen-32B Text Generation • 33B • Updated Feb 24, 2025 • 914k • • 1.53k
Efficient Multi-modal Large Language Models via Progressive Consistency Distillation Paper • 2510.00515 • Published Oct 1, 2025 • 42