update readme
Browse files
README.md
CHANGED
|
@@ -7,9 +7,10 @@ tags:
|
|
| 7 |
⚠️ WORK IN PROGRESS ⚠️
|
| 8 |
|
| 9 |
## Description
|
| 10 |
-
Implementation of the cache introduced in the [Attention Sinks paper](https://
|
| 11 |
-
model to generate beyond the length of its context window, without losing fluency in the conversation.
|
| 12 |
-
|
|
|
|
| 13 |
|
| 14 |
This implementation should match the `SinkCache` class present in `transformers<4.53.0`.
|
| 15 |
|
|
@@ -33,3 +34,8 @@ in `generate.py`, in this repository.
|
|
| 33 |
|
| 34 |
|
| 35 |
## Example usage
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 7 |
⚠️ WORK IN PROGRESS ⚠️
|
| 8 |
|
| 9 |
## Description
|
| 10 |
+
Implementation of the cache introduced in the [Attention Sinks paper](https://huggingface.co/papers/2309.17453).
|
| 11 |
+
It allows the model to generate beyond the length of its context window, without losing fluency in the conversation.
|
| 12 |
+
It's also a solution to contain the memory footprint of the KV cache. As it discards past tokens, the model will lose
|
| 13 |
+
the ability to generate tokens that depend on the context that was discarded.
|
| 14 |
|
| 15 |
This implementation should match the `SinkCache` class present in `transformers<4.53.0`.
|
| 16 |
|
|
|
|
| 34 |
|
| 35 |
|
| 36 |
## Example usage
|
| 37 |
+
|
| 38 |
+
```py
|
| 39 |
+
# requires `transformers>=4.52.0`
|
| 40 |
+
|
| 41 |
+
```
|