Question about model completeness and M4 Max test

by gamhtoi - opened Feb 25

Discussion

gamhtoi

Feb 25

Hi, I noticed the model page shows "CURRENTLY UPLOADING..." and only 11 out of
57 model shards are available (~103GB).

Questions:

The hardware compatibility says it was tested on M4 Max 128GB with
Inferencer v1.10.1, but the full model would be ~500GB+. How is this possible?
Is there a smaller quantization (like q4.5 or q5.5) that is already
complete and available for download?
When do you expect the full upload to complete?

Thanks!

inferencerlabs

Owner Feb 25

The 5.6bit version was tested across a M3 Ultra 512GB RAM and M4 Max 128GB RAM with Inferencer's distributed compute feature which pools both machine's RAM together.
Yes, the 4.8bit version is uploaded here: https://huggingface.co/inferencerlabs/GLM-5-MLX-4.8bit
Let me know if you still would like it uploaded given (1).

ruichao

Feb 27

when i load the model using lmstudio , this error : Error when loading model: ValueError: Missing 2843 parameters:
lm_head.biases,
lm_head.scales,
lm_head.weight,
model.layers.17.input_layernorm.weight,
model.layers.17.mlp.gate.e_score_correction_bias,
model.layers.17.mlp.gate.weight,
model.layers.17.mlp.shared_experts.down_proj.biases,
model.layers.17.mlp.shared_experts.down_proj.scales,

ruichao

Feb 27

when i load the model using lmstudio , this error : Error when loading model: ValueError: Missing 2843 parameters:
lm_head.biases,
lm_head.scales,
lm_head.weight,
model.layers.17.input_layernorm.weight,
model.layers.17.mlp.gate.e_score_correction_bias,
model.layers.17.mlp.gate.weight,
model.layers.17.mlp.shared_experts.down_proj.biases,
model.layers.17.mlp.shared_experts.down_proj.scales,

i am using m3 ultra 256GB version

inferencerlabs

Owner Feb 27

The 5.6bit version requires over 512GB RAM.

ruichao

Feb 27

The 5.6bit version requires over 512GB RAM.

get ，i saw the size only 100GB

cgeekm

Mar 5

Hi, yes I'm interested in this quantized version on my side for a 512GB Mac Studio please.

inferencerlabs

Owner Mar 5

For a 512GB see: https://huggingface.co/inferencerlabs/GLM-5-MLX-4.8bit

TooCas

Mar 21

Could you please upload your 5.6bit please? (I'm using m3-ultra 512 and a m4-max 128 ... both studios)
Thank you!
Love your tweaks! All of them are great quants! ❤️
Oh, one other question... why don't you enable vision on your models? (Not that I really use it.. but I was thinking of using it with moltis.. and vision might be good too! ... and using it with mlx-vlm)

inferencerlabs

Owner Mar 21

Yes can do, and yes vision will be enabled going forwards - it was mainly a storage optimisation, however the latest conversion Mistral Small 4 has the vision weights.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment