Using this speculator with Red Hat AI's quantized model
1
#2 opened about 2 months ago
by
nebula0248
Slower throughput with speculative decoding
#1 opened 3 months ago
by
baptle