EuroLLM 22B conversion on a Mac Studio M2 Max 32Gb

#28
by HugMischa - opened
MLX Community org
โ€ข
edited 11 days ago

Hi,
I have an issue with a local conversion of the EuroLLM 22B model. I can use the Q6-version from the community with no issues. Now I would like to test a mixed 4-6 quantization with a group size of 128. When I try to convert the model using:

mlx_lm.convert \
      --hf-path mlx-community/EuroLLM-22B-Instruct-2512-mlx-bf16 \
      --mlx-path models/EuroLLM-22b-mixed-4-6 \
      -q \
      --quant-predicate mixed_4_6

and these models:

  • mlx-community/EuroLLM-22B-Instruct-2512-mlx-bf16 or
  • utter-project/EuroLLM-22B-Instruct-2512

I get this error:

libc++abi: terminating due to uncaught exception of type std::runtime_error: [METAL] Command buffer execution failed: Caused GPU Timeout Error (00000002:kIOGPUCommandBufferCallbackErrorTimeout)

I increased the time out with:

sudo sysctl debug.lowpri_throttle_enabled=0
export METAL_DEVICE_TIMEOUT=7200

Is my Mac Studio just not suitable for the job? Or is it a certain setting I need to change? I have successfully converted Codestral 22B before on my Mac.
Currently I am using MLX-LM 0.30.1.

MLX Community org

Ok, found the reason for this.... Bit stupid. I had the huffingface cache on a external disk. Thought that Thunderbolt with a SSD disk would be no problem. It is not for most model, but it ended up being a big issue with EuroLLM 22B. So GPU timeout was caused by tot much latency from my Mac M2 Thunderbolt Connection.

Sign up or log in to comment