Update README.md: add PG19 evaluation results
Browse files
README.md
CHANGED
|
@@ -80,6 +80,18 @@ Their personalities, so diverse,
|
|
| 80 |
Their charm, a gift, that's forever told.
|
| 81 |
```
|
| 82 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 83 |
## Limitations and Bias
|
| 84 |
|
| 85 |
As with all language models, LLaMA-2-7B-32K-Chat may generate incorrect or biased content. It's important to keep this in mind when using the model.
|
|
|
|
| 80 |
Their charm, a gift, that's forever told.
|
| 81 |
```
|
| 82 |
|
| 83 |
+
## Model Evaluation
|
| 84 |
+
|
| 85 |
+
We evaluate the model with [PG19 dataset](https://huggingface.co/datasets/pg19) and compare the perplexity with [Llama-2-7b-chat](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf),
|
| 86 |
+
the results are summarized below (note that the perplexity is normalized following the protocol [here](https://together.ai/blog/llama-2-7b-32k)).
|
| 87 |
+
|
| 88 |
+
| Model | 2K Seq | 4K Seq | 8K Seq | 16K Seq | 32K Seq |
|
| 89 |
+
| -------- | ------- | ------- | ------- | ------- | ------- |
|
| 90 |
+
| LLaMA-2-7B-Chat (Meta) | 1.844 | 1.833 | N/A | N/A | N/A |
|
| 91 |
+
| LLaMA-2-7B-32K-Chat (ours) | 1.813 | 1.798 | 1.781 | 1.778 | 1.772|
|
| 92 |
+
|
| 93 |
+
We observe that LLaMA-2-7B-32K-Chat obtains reasonable (and even better) perplexity, comparable to the original LLaMA-2-7B-Chat model.
|
| 94 |
+
|
| 95 |
## Limitations and Bias
|
| 96 |
|
| 97 |
As with all language models, LLaMA-2-7B-32K-Chat may generate incorrect or biased content. It's important to keep this in mind when using the model.
|