Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -136,11 +136,6 @@ A model to test how MoE will route without square expansion.
 ### "[What is a Mixture of Experts (MoE)?](https://huggingface.co/blog/moe)"
-<a href='https://ko-fi.com/S6S2UH2TC' target='_blank'><img height='38' style='border:0px;height:36px;' src='https://storage.ko-fi.com/cdn/kofi1.png?v=3' border='0' alt='Buy Me a Coffee at ko-fi.com' /></a>
-<a href='https://discord.gg/KFS229xD' target='_blank'><img width='140' height='500' style='border:0px;height:36px;' src='https://i.ibb.co/tqwznYM/Discord-button.png' border='0' alt='Join Our Discord!' /></a>
-### (from the MistralAI papers...click the quoted question above to navigate to it directly.)
 The scale of a model is one of the most important axes for better model quality. Given a fixed computing budget, training a larger model for fewer steps is better than training a smaller model for more steps.
 Mixture of Experts enable models to be pretrained with far less compute, which means you can dramatically scale up the model or dataset size with the same compute budget as a dense model. In particular, a MoE model should achieve the same quality as its dense counterpart much faster during pretraining.

 ### "[What is a Mixture of Experts (MoE)?](https://huggingface.co/blog/moe)"
 The scale of a model is one of the most important axes for better model quality. Given a fixed computing budget, training a larger model for fewer steps is better than training a smaller model for more steps.
 Mixture of Experts enable models to be pretrained with far less compute, which means you can dramatically scale up the model or dataset size with the same compute budget as a dense model. In particular, a MoE model should achieve the same quality as its dense counterpart much faster during pretraining.