Update README.md
Browse filesadd info about kv cache saving
README.md
CHANGED
|
@@ -66,6 +66,8 @@ print(response[0]["generated_text"])
|
|
| 66 |
|
| 67 |
## The LCKV Collection
|
| 68 |
|
|
|
|
|
|
|
| 69 |
This model was randomly initialized, then pre-trained on 100B tokens from [SlimPajama](https://huggingface.co/datasets/cerebras/SlimPajama-627B).
|
| 70 |
|
| 71 |
The evaluation follows that of TinyLlama. Refer to [our paper](https://arxiv.org/abs/2405.10637) for more details.
|
|
|
|
| 66 |
|
| 67 |
## The LCKV Collection
|
| 68 |
|
| 69 |
+
The model has 2 warmup layers. i.e. 3/22 KV cache of a standard TinyLlama.
|
| 70 |
+
|
| 71 |
This model was randomly initialized, then pre-trained on 100B tokens from [SlimPajama](https://huggingface.co/datasets/cerebras/SlimPajama-627B).
|
| 72 |
|
| 73 |
The evaluation follows that of TinyLlama. Refer to [our paper](https://arxiv.org/abs/2405.10637) for more details.
|