How to fix Qwen 3.5 Quant/tool issues + NEW Sheep And a few lost sheep...

#2063

by DavidAU - opened 1 day ago

Hey :

re: TOOLS usage issues :

Use Q6 or Q8 quants ; the issue lies in new tensors, which should be Q8/BF16 (same for all quants) // NOT quantized ; but instead are quanted at the "quant level" so to speak.

This seems to be drastically affecting tool usage.

To manually create a quant via llamacpp with the tensors corrected use (main.gguf is created at convert-to-gguf ... step):

./llama-quantize --tensor-type ssm_alpha=bf16 --tensor-type ssm_beta=bf16 x:/main.gguf D:/lms/test/llms/4B-MODEL-NAME-Q4_K_S.gguf Q4_K_S 8

(llama-quantize == llama-quantize.exe)
(this will work for all Qwen 3.5 models, regardless of parameter size)

NEW SHEEP:
https://huggingface.co/DavidAU/Qwen3.5-9B-Star-Trek-TNG-DS9-Heretic-Uncensored-Thinking

LOST SHEEP:
https://huggingface.co/DavidAU/Qwen3.5-9B-Polaris-HighIQ-THINKING
https://huggingface.co/DavidAU/Qwen3.5-9B-Deckard-Claude-DIMOE-Uncensored-Heretic-Thinking
https://huggingface.co/DavidAU/Qwen3.5-9B-Claude-4.6-Opus-Deckard-V4.2-Uncensored-Heretic-Thinking
https://huggingface.co/DavidAU/Qwen3.5-2B-Claude-4.6-OS-Auto-Variable-HERETIC-UNCENSORED-THINKING

RichardErkhov

about 21 hours ago

uhm Im not sure what I need to do, so I will just quant it with normal queue. if something needs to be adjusted during the quantization process (but still runs on the main llama cpp branch), I need to ask nico to help... please let me know

I started queueing and got some random error and it doesnt seem to queue, but queued at the same, doesnt show up in the queue yet says it's there...

It's (hopefully) queued!

You can check for progress at http://hf.tst.eu/status.html or regularly check the model
summary page at https://hf.tst.eu/model#Qwen3.5-9B-Star-Trek-TNG-DS9-Heretic-Uncensored-Thinking-GGUF
https://hf.tst.eu/model#Qwen3.5-9B-Polaris-HighIQ-THINKING-GGUF
https://hf.tst.eu/model#Qwen3.5-9B-Deckard-Claude-DIMOE-Uncensored-Heretic-Thinking-GGUF
https://hf.tst.eu/model#Qwen3.5-9B-Claude-4.6-Opus-Deckard-V4.2-Uncensored-Heretic-Thinking-GGUF
https://hf.tst.eu/model#Qwen3.5-2B-Claude-4.6-OS-Auto-Variable-HERETIC-UNCENSORED-THINKING-GGUF
for quants to appear.

DavidAU

about 20 hours ago

thanks !

RE: Quanting ; just a heads up RE: 2 tensors.
I am going to open a ticket at llamacpp to hopefully get it standardized / part of normal quanting for Qwen 3.5s.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment