Detection Transformer for Teeth Detection, Segmentation, and Numbering in Oral Rare Diseases: Focus on Data Augmentation and Inpainting Techniques Paper • 2402.04408 • Published Feb 6, 2024 • 1
Towards MoE Deployment: Mitigating Inefficiencies in Mixture-of-Expert (MoE) Inference Paper • 2303.06182 • Published Mar 10, 2023 • 2
CAMERA: Multi-Matrix Joint Compression for MoE Models via Micro-Expert Redundancy Analysis Paper • 2508.02322 • Published Aug 4, 2025 • 1
Sub-MoE: Efficient Mixture-of-Expert LLMs Compression via Subspace Expert Merging Paper • 2506.23266 • Published Jun 29, 2025 • 1
Shortcut-connected Expert Parallelism for Accelerating Mixture-of-Experts Paper • 2404.05019 • Published Apr 7, 2024 • 2
Expert Race: A Flexible Routing Strategy for Scaling Diffusion Transformer with Mixture of Experts Paper • 2503.16057 • Published Mar 20, 2025 • 15
Swin SMT: Global Sequential Modeling in 3D Medical Image Segmentation Paper • 2407.07514 • Published Jul 10, 2024 • 1
MoE Parallel Folding: Heterogeneous Parallelism Mappings for Efficient Large-Scale MoE Model Training with Megatron Core Paper • 2504.14960 • Published Apr 21, 2025 • 2
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts Paper • 2405.11273 • Published May 18, 2024 • 20
Capacity-Aware Inference: Mitigating the Straggler Effect in Mixture of Experts Paper • 2503.05066 • Published Mar 7, 2025 • 5
Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models Paper • 2504.07951 • Published Apr 10, 2025 • 31
A Hybrid Tensor-Expert-Data Parallelism Approach to Optimize Mixture-of-Experts Training Paper • 2303.06318 • Published Mar 11, 2023 • 2
Chain-of-Experts: Unlocking the Communication Power of Mixture-of-Experts Models Paper • 2506.18945 • Published Jun 23, 2025 • 42
Towards Greater Leverage: Scaling Laws for Efficient Mixture-of-Experts Language Models Paper • 2507.17702 • Published Jul 23, 2025 • 7
Scalable Training of Mixture-of-Experts Models with Megatron Core Paper • 2603.07685 • Published Mar 8 • 3
Towards a Multimodal Large Language Model with Pixel-Level Insight for Biomedicine Paper • 2412.09278 • Published Dec 12, 2024 • 1
Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient MoE for Instruction Tuning Paper • 2309.05444 • Published Sep 11, 2023 • 2