Article
GLM-5.2: Built for Long-Horizon Tasks
zai-org
• • 102KV caching enables to re-use what the model previously generated. That way, the model only focuses on the new tokens to generate.
Here is an illustrated explanation of KV caching: https://huggingface.co/blog/not-lain/kv-caching