view article Article makeMoE: Implement a Sparse Mixture of Experts Language Model from Scratch AviSoori1x • May 7, 2024 • 122
view article Article Distributed Training with JAX and Flax NNX: A Practical Guide to Sharding jiagaoxiang • Mar 26, 2025 • 14