Running Featured 77 Distilling 100B+ Models 40x Faster with TRL 📝 77 TRL distillation for 100B+ teachers, 40x faster
Marco-MoE Collection A suit of multilingual MoE models with highly-sparse architectures • 5 items • Updated 29 days ago • 16
Marco-MoE Collection A suit of multilingual MoE models with highly-sparse architectures • 5 items • Updated 29 days ago • 16
Marco-MoE Collection A suit of multilingual MoE models with highly-sparse architectures • 5 items • Updated 29 days ago • 16
Marco-MoE Collection A suit of multilingual MoE models with highly-sparse architectures • 5 items • Updated 29 days ago • 16
Marco-MoE Collection A suit of multilingual MoE models with highly-sparse architectures • 5 items • Updated 29 days ago • 16