KV cache not used by default
#7
by
jhy-123
- opened
Hi, I noticed that the inference speed is quite slow when following the example in the model card. It seems past_key_values is not being utilized in the generate function—could you please clarify if there is a specific reason why KV cache isn't enabled by default?