UMoE: Unifying Attention and FFN with Shared Experts Paper • 2505.07260 • Published May 12, 2025 • 10
view article Article The NLP Course is becoming the LLM Course +8 burtenshaw, reach-vb, lewtun, fdaudens, pcuenq, tomaarsen, coyotte508, mishig, sergiopaniego, julien-c • Apr 3, 2025 • 107
view article Article How to train a new language model from scratch using Transformers and Tokenizers julien-c • Feb 14, 2020 • 62