amazon
/

MistralLite

@@ -21,14 +21,14 @@ Then We evaluated `Mistral-7B-Instruct-v0.1` against benchmarks that are specifi
 Although the performance of the models on long context was fairly competitive on long context less than 4096 tokens,
 there were some limitations on its performance on longer context. Motivated by improving its performance on longer context, we finetuned the Mistral 7B model, and produced `Mistrallite`. The model managed to `signifantly boost the performance of long context handling` over Mistral-7B-Instruct-v0.1. The detailed `long context evalutaion results` are as below:
-### [Topic Retrieval](https://lmsys.org/blog/2023-06-29-longchat/) ###
 |Model Name|Input length| Input length | Input length| Input length| Input length|
 |----------|-------------:|-------------:|------------:|-----------:|-----------:|
 |          | 2851| 5568 |8313 | 11044 | 13780
-|   Mistral-7B-Instruct-v0.1  | 90%        | 0%       | 0%      | 0%     | 0% |
 |   MistralLite   | **100%**        | **100%**       | **100%**      | **100%**     | **98%** |
-### [Line Retrieval](https://lmsys.org/blog/2023-06-29-longchat/#longeval-results) ###
 |Model Name|Input length| Input length | Input length| Input length| Input length|Input length|
 |----------|-------------:|-------------:|------------:|-----------:|-----------:|-----------:|
@@ -36,7 +36,7 @@ there were some limitations on its performance on longer context. Motivated by i
 |   Mistral-7B-Instruct-v0.1   | **98%**        | 62%       | 42%      | 42%     | 32% | 30% |
 |   MistralLite   | **98%**        | **92%**       | **88%**      | **76%**     | **70%** | **60%** |
-### [Pass key Retrieval](https://github.com/epfml/landmark-attention/blob/main/llama/run_test.py#L101) ###
 |Model Name|Input length| Input length | Input length| Input length|
 |----------|-------------:|-------------:|------------:|-----------:|
@@ -44,7 +44,7 @@ there were some limitations on its performance on longer context. Motivated by i
 |   Mistral-7B-Instruct-v0.1   | **100%**        | 50%       | 20%      | 30%   |
 |   MistralLite  | **100%**        | **100%**       | **100%**      | **100%**   |
-### [Question Answering with Long Input Texts](https://nyu-mll.github.io/quality/) ###
 |Model Name| Test set Accuracy | Hard subset Accuracy|
 |----------|-------------:|-------------:|
 | Mistral-7B-Instruct-v0.1 | 44.3% | 39.7% |

 Although the performance of the models on long context was fairly competitive on long context less than 4096 tokens,
 there were some limitations on its performance on longer context. Motivated by improving its performance on longer context, we finetuned the Mistral 7B model, and produced `Mistrallite`. The model managed to `signifantly boost the performance of long context handling` over Mistral-7B-Instruct-v0.1. The detailed `long context evalutaion results` are as below:
+1. [Topic Retrieval](https://lmsys.org/blog/2023-06-29-longchat/)
 |Model Name|Input length| Input length | Input length| Input length| Input length|
 |----------|-------------:|-------------:|------------:|-----------:|-----------:|
 |          | 2851| 5568 |8313 | 11044 | 13780
+|   Mistral-7B-Instruct-v0.1  | 100%        | 50%       | 2%      | 0%     | 0% |
 |   MistralLite   | **100%**        | **100%**       | **100%**      | **100%**     | **98%** |
+2. [Line Retrieval](https://lmsys.org/blog/2023-06-29-longchat/#longeval-results)
 |Model Name|Input length| Input length | Input length| Input length| Input length|Input length|
 |----------|-------------:|-------------:|------------:|-----------:|-----------:|-----------:|
 |   Mistral-7B-Instruct-v0.1   | **98%**        | 62%       | 42%      | 42%     | 32% | 30% |
 |   MistralLite   | **98%**        | **92%**       | **88%**      | **76%**     | **70%** | **60%** |
+3. [Pass key Retrieval](https://github.com/epfml/landmark-attention/blob/main/llama/run_test.py#L101)
 |Model Name|Input length| Input length | Input length| Input length|
 |----------|-------------:|-------------:|------------:|-----------:|
 |   Mistral-7B-Instruct-v0.1   | **100%**        | 50%       | 20%      | 30%   |
 |   MistralLite  | **100%**        | **100%**       | **100%**      | **100%**   |
+4. [Question Answering with Long Input Texts](https://nyu-mll.github.io/quality/)
 |Model Name| Test set Accuracy | Hard subset Accuracy|
 |----------|-------------:|-------------:|
 | Mistral-7B-Instruct-v0.1 | 44.3% | 39.7% |