Commit
·
c2d54e2
1
Parent(s):
ba4d664
Update README.md
Browse files
README.md
CHANGED
|
@@ -21,14 +21,14 @@ Then We evaluated `Mistral-7B-Instruct-v0.1` against benchmarks that are specifi
|
|
| 21 |
Although the performance of the models on long context was fairly competitive on long context less than 4096 tokens,
|
| 22 |
there were some limitations on its performance on longer context. Motivated by improving its performance on longer context, we finetuned the Mistral 7B model, and produced `Mistrallite`. The model managed to `signifantly boost the performance of long context handling` over Mistral-7B-Instruct-v0.1. The detailed `long context evalutaion results` are as below:
|
| 23 |
|
| 24 |
-
|
| 25 |
|Model Name|Input length| Input length | Input length| Input length| Input length|
|
| 26 |
|----------|-------------:|-------------:|------------:|-----------:|-----------:|
|
| 27 |
| | 2851| 5568 |8313 | 11044 | 13780
|
| 28 |
-
| Mistral-7B-Instruct-v0.1 |
|
| 29 |
| MistralLite | **100%** | **100%** | **100%** | **100%** | **98%** |
|
| 30 |
|
| 31 |
-
|
| 32 |
|
| 33 |
|Model Name|Input length| Input length | Input length| Input length| Input length|Input length|
|
| 34 |
|----------|-------------:|-------------:|------------:|-----------:|-----------:|-----------:|
|
|
@@ -36,7 +36,7 @@ there were some limitations on its performance on longer context. Motivated by i
|
|
| 36 |
| Mistral-7B-Instruct-v0.1 | **98%** | 62% | 42% | 42% | 32% | 30% |
|
| 37 |
| MistralLite | **98%** | **92%** | **88%** | **76%** | **70%** | **60%** |
|
| 38 |
|
| 39 |
-
|
| 40 |
|
| 41 |
|Model Name|Input length| Input length | Input length| Input length|
|
| 42 |
|----------|-------------:|-------------:|------------:|-----------:|
|
|
@@ -44,7 +44,7 @@ there were some limitations on its performance on longer context. Motivated by i
|
|
| 44 |
| Mistral-7B-Instruct-v0.1 | **100%** | 50% | 20% | 30% |
|
| 45 |
| MistralLite | **100%** | **100%** | **100%** | **100%** |
|
| 46 |
|
| 47 |
-
|
| 48 |
|Model Name| Test set Accuracy | Hard subset Accuracy|
|
| 49 |
|----------|-------------:|-------------:|
|
| 50 |
| Mistral-7B-Instruct-v0.1 | 44.3% | 39.7% |
|
|
|
|
| 21 |
Although the performance of the models on long context was fairly competitive on long context less than 4096 tokens,
|
| 22 |
there were some limitations on its performance on longer context. Motivated by improving its performance on longer context, we finetuned the Mistral 7B model, and produced `Mistrallite`. The model managed to `signifantly boost the performance of long context handling` over Mistral-7B-Instruct-v0.1. The detailed `long context evalutaion results` are as below:
|
| 23 |
|
| 24 |
+
1. [Topic Retrieval](https://lmsys.org/blog/2023-06-29-longchat/)
|
| 25 |
|Model Name|Input length| Input length | Input length| Input length| Input length|
|
| 26 |
|----------|-------------:|-------------:|------------:|-----------:|-----------:|
|
| 27 |
| | 2851| 5568 |8313 | 11044 | 13780
|
| 28 |
+
| Mistral-7B-Instruct-v0.1 | 100% | 50% | 2% | 0% | 0% |
|
| 29 |
| MistralLite | **100%** | **100%** | **100%** | **100%** | **98%** |
|
| 30 |
|
| 31 |
+
2. [Line Retrieval](https://lmsys.org/blog/2023-06-29-longchat/#longeval-results)
|
| 32 |
|
| 33 |
|Model Name|Input length| Input length | Input length| Input length| Input length|Input length|
|
| 34 |
|----------|-------------:|-------------:|------------:|-----------:|-----------:|-----------:|
|
|
|
|
| 36 |
| Mistral-7B-Instruct-v0.1 | **98%** | 62% | 42% | 42% | 32% | 30% |
|
| 37 |
| MistralLite | **98%** | **92%** | **88%** | **76%** | **70%** | **60%** |
|
| 38 |
|
| 39 |
+
3. [Pass key Retrieval](https://github.com/epfml/landmark-attention/blob/main/llama/run_test.py#L101)
|
| 40 |
|
| 41 |
|Model Name|Input length| Input length | Input length| Input length|
|
| 42 |
|----------|-------------:|-------------:|------------:|-----------:|
|
|
|
|
| 44 |
| Mistral-7B-Instruct-v0.1 | **100%** | 50% | 20% | 30% |
|
| 45 |
| MistralLite | **100%** | **100%** | **100%** | **100%** |
|
| 46 |
|
| 47 |
+
4. [Question Answering with Long Input Texts](https://nyu-mll.github.io/quality/)
|
| 48 |
|Model Name| Test set Accuracy | Hard subset Accuracy|
|
| 49 |
|----------|-------------:|-------------:|
|
| 50 |
| Mistral-7B-Instruct-v0.1 | 44.3% | 39.7% |
|