Instructions to use transformers-community/sink_cache with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use transformers-community/sink_cache with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("transformers-community/sink_cache", dtype="auto") - Notebooks
- Google Colab
- Kaggle
add WIP
Browse files
README.md
CHANGED
|
@@ -4,6 +4,8 @@ tags:
|
|
| 4 |
- custom_generate
|
| 5 |
---
|
| 6 |
|
|
|
|
|
|
|
| 7 |
## Description
|
| 8 |
Implementation of the cache introduced in the [Attention Sinks paper](https://arxiv.org/abs/2309.17453). It allows the
|
| 9 |
model to generate beyond the length of its context window, without losing fluency in the conversation. As it discards
|
|
|
|
| 4 |
- custom_generate
|
| 5 |
---
|
| 6 |
|
| 7 |
+
⚠️ WORK IN PROGRESS ⚠️
|
| 8 |
+
|
| 9 |
## Description
|
| 10 |
Implementation of the cache introduced in the [Attention Sinks paper](https://arxiv.org/abs/2309.17453). It allows the
|
| 11 |
model to generate beyond the length of its context window, without losing fluency in the conversation. As it discards
|