KV cache not used by default

by jhy-123 - opened Jan 8

Jan 8

Hi, I noticed that the inference speed is quite slow when following the example in the model card. It seems past_key_values is not being utilized in the generate function—could you please clarify if there is a specific reason why KV cache isn't enabled by default?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment