k/v cache quantization

#1
by alexminder - opened

Hello, thank you for the model.
You suggest to use --cache-type-k f16 --cache-type-v f16 . Did you try --cache-type-k q8_0 --cache-type-v q5_1 ? I took it from https://anbeeld.com/articles/kv-cache-quantization-benchmarks-for-long-context

Sign up or log in to comment