We need Air and we need Flash

#3
by jacek2024 - opened

Please provide some Air or at least a Flash.

Of course, we are not in a position to make demands, so I would phrase the topic differently:

Dear Z.ai employees, we really like what you're doing and sincerely appreciate your efforts to democratize artificial intelligence and machine learning, especially the smaller models in your GLM lineup that are frequently SOTA in their class.

Therefore, we kindly ask you not to abandon this direction and to continue delighting us with wonderful consumer grade open-weights releases, like Flash, and, especially, Air models.

Alternatively, a native 4-bit quant release like Kimi-K2.5 or gpt-oss-120b would be nice. This model now no longer fits within 8xH100 and 4xH200 setups at FP8

I would really like a flash model else my 32gb of ddr4 will cry.

Joining the request :)

I think people are sleeping on Flash 4.7. You see "it's a bunch of 3B models glued together" and go "oh no". But after playing with it for a (long) while, it's been one of the best models in this (total) size range in... well, forever actually. Hint, hint, fine-tune. As much as a Flash 5.0 would be nice, what's the point when literally nobody toyed with 4.7 yet?

+1 For an Air model! GLM-4.5-Air had both great performance for its size and active parameter count, but also had quite good world knowledge compared to other MoE models of similar size (within my own testing of course).

GLM-4.7-Flash is a nice model for its size, but there is a gap for a model in-between Flash and GLM-4.7/GLM-5, which GLM-4.5-Air filled nicely.

I am quite happy to have GLM-5 being open weights for such a strong model in many regards, but an Air variant would be quite amazing for the local community! A model ~100-150B parameters gives the density for general world knowledge and wide range of capabilities, while being small enough for single GPU+CPU setups!

Even if 100b is enough for some people, for example my Rx 5700 xt and 32gb of ram require smaller models, I would like a flash to be honest, but both could be great!

I second that!
Glm5-Air 212b ?! :D

Would love it if it comes true :)
I can only have the access to 8 H100 GPUs, so something at the size of GLM 4.7 for GLM 5 Air would be awesome.

The value of the glm4.5-air model is that it ran at acceptable speed and the highest possible quality on 64GB + 24GB! And it outperformed the gpt-oss 120b in many areas, not to mention everything that came with a smaller memory... except maybe the qwen 3 next coder... As for the later glm 4.6v, it's definitely not up to par, worse.

However, the 200+ model (minimax m2.1, step 3.5 flash) is more relevant for me now, as it has more memory (128GB + 24GB).

All this runs on an old (4-year-old) AMD Ryzen 5900X and RTX 3090 local computer. Lots of people have this kind of hardware! Especially with 64GB. Owners of computers with DDR5 memory actually get a 2x speed boost. This is very impressive, considering that a year and a half ago only substandard models with 8b parameters and without any practical use were available.

Sign up or log in to comment