joaogante commited on
Commit
bdb93c2
·
1 Parent(s): 2a0ad2b

update readme

Browse files
Files changed (1) hide show
  1. README.md +9 -3
README.md CHANGED
@@ -7,9 +7,10 @@ tags:
7
  ⚠️ WORK IN PROGRESS ⚠️
8
 
9
  ## Description
10
- Implementation of the cache introduced in the [Attention Sinks paper](https://arxiv.org/abs/2309.17453). It allows the
11
- model to generate beyond the length of its context window, without losing fluency in the conversation. As it discards
12
- past tokens, the model will lose the ability to generate tokens that depend on the context that was discarded.
 
13
 
14
  This implementation should match the `SinkCache` class present in `transformers<4.53.0`.
15
 
@@ -33,3 +34,8 @@ in `generate.py`, in this repository.
33
 
34
 
35
  ## Example usage
 
 
 
 
 
 
7
  ⚠️ WORK IN PROGRESS ⚠️
8
 
9
  ## Description
10
+ Implementation of the cache introduced in the [Attention Sinks paper](https://huggingface.co/papers/2309.17453).
11
+ It allows the model to generate beyond the length of its context window, without losing fluency in the conversation.
12
+ It's also a solution to contain the memory footprint of the KV cache. As it discards past tokens, the model will lose
13
+ the ability to generate tokens that depend on the context that was discarded.
14
 
15
  This implementation should match the `SinkCache` class present in `transformers<4.53.0`.
16
 
 
34
 
35
 
36
  ## Example usage
37
+
38
+ ```py
39
+ # requires `transformers>=4.52.0`
40
+
41
+ ```