Lifting the Curse of Capacity Gap in Distilling Language Models
Paper
•
2305.12129
•
Published
minimoe-6L-384H distilled from base-base-uncased on Wikipedia.
Repository: https://github.com/GeneZC/MiniMoE arXiv: https://arxiv.org/abs/2305.12129