transformers-community
/

sink_cache

custom_generate

Model card Files Files and versions

joaogante commited on May 23, 2025

Commit

bdb93c2

·

1 Parent(s): 2a0ad2b

update readme

Files changed (1) hide show

README.md +9 -3

README.md CHANGED Viewed

@@ -7,9 +7,10 @@ tags:
 ⚠️ WORK IN PROGRESS ⚠️
 ## Description
-Implementation of the cache introduced in the [Attention Sinks paper](https://arxiv.org/abs/2309.17453). It allows the
-model to generate beyond the length of its context window, without losing fluency in the conversation. As it discards
-past tokens, the model will lose the ability to generate tokens that depend on the context that was discarded.
 This implementation should match the `SinkCache` class present in `transformers<4.53.0`.
@@ -33,3 +34,8 @@ in `generate.py`, in this repository.
 ## Example usage

 ⚠️ WORK IN PROGRESS ⚠️
 ## Description
+Implementation of the cache introduced in the [Attention Sinks paper](https://huggingface.co/papers/2309.17453).
+It allows the model to generate beyond the length of its context window, without losing fluency in the conversation.
+It's also a solution to contain the memory footprint of the KV cache. As it discards past tokens, the model will lose
+the ability to generate tokens that depend on the context that was discarded.
 This implementation should match the `SinkCache` class present in `transformers<4.53.0`.
 ## Example usage
+```py
+# requires `transformers>=4.52.0`
+```