2X Qwen 3.5 9B SOMPOA Heresy

#2336

by redaihf - opened May 7

Discussion

redaihf

May 7

https://huggingface.co/MuXodious/Qwen3.5-9B-SOMPOA-heresy
https://huggingface.co/MuXodious/Qwen3.5-9B-SOMPOA-heresy-MTP

redaihf

May 7

Thanks @MuXodious !

MuXodious

May 7

•

edited May 7

Oh, wait. MTP needs them support PR and I need to see if it works at all with the PR after my frankensteining attempt.

RichardErkhov

May 8

remind me when merged so I update llama cpp and queue, right now Im not queueing

MuXodious

May 8

•

edited May 8

remind me when merged so I update llama cpp and queue, right now Im not queueing

You can queue the non-MTP version. I'll let y'all know when the MTP PR gets the go signal. MTP may not be useful much for the 9B and below, given the standard 98GB VRAM under everyone's hands these days, but should provide a good speed boost to larger models.

Ps. I got a nice speed bump with this thing on, not bad. ~75 t/s -> 100 t/s.

redaihf

May 10

•

edited May 10

@RichardErkhov can we please have the non-MTP model queued? Sorry for the confusion.

https://huggingface.co/MuXodious/Qwen3.5-9B-SOMPOA-heresy

RichardErkhov

May 10

•

edited May 10

It's queued!

You can check for progress at http://hf.tst.eu/status.html or regularly check the model
summary page at https://hf.tst.eu/model#Qwen3.5-9B-SOMPOA-heresy-GGUF for quants to appear.

please dont forget to remind me when the mtp finally merges =)

MuXodious

May 10

please dont forget to remind me when the mtp finally merges =)

It will take some time. They are currently making adjustments to the scaffolding code prior to finalising the MTP PR. I'll follow up with the notice once they green light the PR. THANKS for the quants as always.

MuXodious

May 11

Preliminary PRs are merged. The work should continue for the MTP Support. I'll post updates as things progress.

MuXodious

May 16

•

edited May 16

PR22673 MTP Support is merged! With the latest update, speeds upped from ~77.11 t/s to ~110.15 t/s at Q8_0 (Qwen 3.5 9B).

Don't forget to run llama.cpp with the arguments --spec-type draft-mtp --spec-draft-n-max 3 or add the following lines to each MTP-supported model in your preset file.

spec-type = draft-mtp 
spec-draft-n-max = 3

nicoboss

May 16

PR22673 MTP Support is merged! With the latest update, speeds upped from ~77.11 t/s to ~110.15 t/s at Q8_0.

@RichardErkhov I updated llama.cpp on nico1 in case you want to give it a try. Please keep in mind that latest update also includes https://github.com/ggml-org/llama.cpp/pull/17114 which was a massive pain to merge into ouer llama.cpp fork so if convert fails you know why.

RichardErkhov

May 16

It's queued with priority 6969 =)

You can check for progress at http://hf.tst.eu/status.html or regularly check the model
summary page at https://hf.tst.eu/model#Qwen3.5-9B-SOMPOA-heresy-MTP-GGUF for quants to appear.

RichardErkhov

May 16

@nicoboss can you update rich1 as well? dont forget the internet restart in 10 minutes

RichardErkhov

May 16

👀 👀 👀 👀 👀

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment