magiccodingman/Seed-OSS-36B-Instruct-unsloth-MagicQuant-Hybrid-GGUF

Any chance of safetensors format?

by apex-3d - opened Dec 19, 2025

Dec 19, 2025

Thank you for giving us a MagicQuant of Seed!

Is there any chance we can get the weights in safetensors format for better compatibility with VLLM? I want to run this with tensor parallelism across two GPUs for better speed, but VLLM's TP doesn't support the seed_oss GGUF architecture. llama.cpp's --split-mode row helps a little bit, but it still leaves a lot of performance on the table.

magiccodingman

Owner Dec 20, 2025

So sadly, MagicQuant tactics and vLLM do not mix at all. And it pains me too! But this isn't a MagicQuant vs vLLM, but instead this is the nature of GGUF vs vLLM design reality.

vLLM is fundamentally built around uniform tensor layouts. So it can do true tensor parallelism. But MagicQuant via GGUF, deliberately uses hybrid per tensor quantization, which breaks those assumptions.

That said, if someone ever figures out how to bring hybrid tensor logic into vLLM native format, that would be an incredible day.

magiccodingman changed discussion status to closed Dec 20, 2025

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment