KV cache not used by default

#7
by jhy-123 - opened

Hi, I noticed that the inference speed is quite slow when following the example in the model card. It seems past_key_values is not being utilized in the generate function—could you please clarify if there is a specific reason why KV cache isn't enabled by default?

Sign up or log in to comment