view article Article makeMoE: Implement a Sparse Mixture of Experts Language Model from Scratch May 7, 2024 • 111
view article Article No GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL +4 Jun 3 • 96
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection Paper • 2403.03507 • Published Mar 6, 2024 • 189