26b (Dense ) + 31b (MoE) <=VS=> 26b (MoE) + 31b (Dense )

#111
by Markobes - opened

Don't you think it would be more logical to release other model variants, where the dense model is slightly smaller, since the MoE structure allows for heavier models?

  • I think the current models are pure marketing and a desire to show superiority over the Qwen-27B, or at least get close to it. And you sacrificed the users' hopes.

Don't you think it would be more logical to release other model variants, where the dense model is slightly smaller, since the MoE structure allows for heavier models?

  • I think the current models are pure marketing and a desire to show superiority over the Qwen-27B, or at least get close to it. And you sacrificed the users' hopes.

I personally waited a new dense AI model for 2 years till Google finally release one. And I'm very grateful for that.
P.S: Dense 31B variant already small and making it smaller would only make the model intelligence more like a chat bot than similar to AGI and I would like to see 80B Dense AI model from Google in the future.

My guess is that - they trained another 80b MOE or larger MOE model but found that was tooo powerful and may have an impact on their flash product line. So, they didn't release that model.
Explained why there's a 31B dense and a 26B MOE - it's tooooo weird that there's only 1 MOE model in this series and they even created a table to show that single model

Though grateful for this 31b dense - it provides a base model!

And my last hope is that they can continue to release ~12B dense models - I prefer finetuning on this size

Sign up or log in to comment