whynlp
/

tinyllama-lckv-w2-100b

Text Generation

Model card Files Files and versions

whynlp commited on Dec 2, 2024

Commit

9d15e38

·

verified ·

1 Parent(s): 39e1675

Update README.md

add info about kv cache saving

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -66,6 +66,8 @@ print(response[0]["generated_text"])
 ## The LCKV Collection
 This model was randomly initialized, then pre-trained on 100B tokens from [SlimPajama](https://huggingface.co/datasets/cerebras/SlimPajama-627B).
 The evaluation follows that of TinyLlama. Refer to [our paper](https://arxiv.org/abs/2405.10637) for more details.

 ## The LCKV Collection
+The model has 2 warmup layers. i.e. 3/22 KV cache of a standard TinyLlama.
 This model was randomly initialized, then pre-trained on 100B tokens from [SlimPajama](https://huggingface.co/datasets/cerebras/SlimPajama-627B).
 The evaluation follows that of TinyLlama. Refer to [our paper](https://arxiv.org/abs/2405.10637) for more details.