Samplig+SD settings

#2
by InformaticsSolutions - opened

Hi,
i was wondering what could be the optimal values for --top-k and --spec-draft-n-max for these quants. Using the exact same settings (incl. --top-k 40 and -spec-draft-n-max 4 ), Jackrong' Q8_0 is about 10tps faster. Or maybe the quantization here makes it inherently slower? Thanks a lot.

There really is no difference between these two quants.
This small 10tps difference can be from the speculative decoding, getting a few percent better hitrate at times.
Try them with zero (0) temperature so that they will answer as deterministic as possible so they will produce the closest output.
With --temp 0 I get ~140tps on both of those model with my 7900XTX.
after 10 questions I get sometimes 135, sometimes its 145, so about ~140

Sign up or log in to comment