does not appear to have a file named configuration_glm4_moe.py

by hi-im-bored - opened Dec 12, 2025

Dec 12, 2025

Error w/ both SGLang & vLLM

SError: Firworks/INTELLECT-3-nvfp4 does not appear to have a file named configuration_glm4_moe.py. Checkout 'https://huggingface.co/Firworks/INTELLECT-3-nvfp4/tree/main' for available files.

hi-im-bored

Dec 13, 2025

nvm this is upstream as well

hi-im-bored changed discussion status to closed Dec 13, 2025

Firworks

Owner Dec 13, 2025

•

edited Dec 13, 2025

Did you try running with
-e VLLM_USE_FLASHINFER_MOE_FP4=1? I've got a sample vllm command that works on the model card. I tested this one on an RTX Pro 6000 Blackwell and it ran.

hi-im-bored

Dec 13, 2025

Yeah, still had the issue, just needed to apply this https://huggingface.co/PrimeIntellect/INTELLECT-3/discussions/1, after that it automatically resolved the architecture fine

On a DGX Spark w/ vllm/vllm-openai:latest, some other issues w/ e.x. not using the TRTLLM backend but again not the model's fault just quirks of new hardware ig

Firworks

Owner Dec 13, 2025

Ah thanks for the info. I just removed it. I checked and the base repo also has it removed. As for the Spark, I wish I had one to test these quants on. It was a big inspiration for me trying to make all these quants so I hope at least some of them work on there.

hi-im-bored

Dec 13, 2025

Sorry to pile more work on you but by any chance would you also be interested/willing to do speculative decoding for the model as well? https://nvidia.github.io/Model-Optimizer/guides/5_speculative_decoding.html

4.5air fp8 supports it but seems like it isn't quite compatible with the PI release

Firworks

Owner Dec 14, 2025

Seems interesting. I spent like 6 hours trying to make an EAGLE version of INTELLECT-3. I think I'm maybe closeish to cracking it but I have to stop for a bit. I'll keep messing with it and see if I can make one. As far as I can tell once the EAGLE weights for INTELLECT-3 exist it should be able to be used with either my NVFP4 quant or the full fat version and speed them up.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment