Collections
Discover the best community collections!
Collections trending this week
-
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models
Paper β’ 2310.16795 β’ Published β’ 27 -
Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference
Paper β’ 2308.12066 β’ Published β’ 4 -
Towards MoE Deployment: Mitigating Inefficiencies in Mixture-of-Expert (MoE) Inference
Paper β’ 2303.06182 β’ Published β’ 1 -
EvoMoE: An Evolutional Mixture-of-Experts Training Framework via Dense-To-Sparse Gate
Paper β’ 2112.14397 β’ Published β’ 1
-
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models
Paper β’ 2310.16795 β’ Published β’ 27 -
Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference
Paper β’ 2308.12066 β’ Published β’ 4 -
Towards MoE Deployment: Mitigating Inefficiencies in Mixture-of-Expert (MoE) Inference
Paper β’ 2303.06182 β’ Published β’ 1 -
EvoMoE: An Evolutional Mixture-of-Experts Training Framework via Dense-To-Sparse Gate
Paper β’ 2112.14397 β’ Published β’ 1