Lifting the Curse of Capacity Gap in Distilling Language Models
Paper
•
2305.12129
•
Published
minimoe-4L-384H distilled from bert-base-uncased on Wikipedia.
Repository: https://github.com/GeneZC/MiniMoE arXiv: https://arxiv.org/abs/2305.12129