The following accuracy is using lm-eval and HF:
```
lm_eval \
--model hf \
--model_args pretrained="nm-testing/Llama-3.1-8B-Instruct-QKV-Cache-FP8",dtype=auto,device_map="auto",max_length=100000 \
--tasks "niah_single_1" \
--write_out \
--batch_size 1 \
--output_path "niah_single_1.json" \
--show_config
```
Replace ```niah_single_1``` with ```niah_single_2```,```niah_single_2```,```niah_multikey_1```, ```niah_multikey_2```, ```niah_multikey_3```
### Accuracy with HF
| Category |
Task |
meta-llama/Llama-3.1-8B-Instruct |
nm-testing/Llama-3.1-8B-Instruct-QKV-Cache-FP8 |
Recovery (%) |
| NIAH |
niah_single_1 (100K) |
OOM |
0.0 |
0 |
| niah_single_2 (100K) |
OOM |
0.0 |
0 |
| niah_single_3 (100K) |
OOM |
0.0 |
0 |
| niah_multikey_1 (100K) |
OOM |
0.0 |
0. |
| niah_multikey_2 (100K) |
OOM |
0.0 |
0.0 |
| niah_multikey_3 (100K) |
OOM |
0.0 |
0.0 |
The following accuracy is using lm-eval and vLLM:
```
lm_eval \
--model vllm \
--model_args pretrained="nm-testing/Llama-3.1-8B-Instruct-QKV-Cache-FP8",dtype=auto,add_bos_token=True,max_model_len=131072,tensor_parallel_size=1,gpu_memory_utilization=0.7,enable_chunked_prefill=True,trust_remote_code=True \
--tasks "niah_single_1" \
--write_out \
--batch_size 1 \
--output_path "niah_single_1.json" \
--show_config
```
Replace ```niah_single_1``` with ```niah_single_2```,```niah_single_2```,```niah_multikey_1```, ```niah_multikey_2```, ```niah_multikey_3```
### Accuracy with vLLM
| Category |
Task |
meta-llama/Llama-3.1-8B-Instruct |
nm-testing/Llama-3.1-8B-Instruct-QKV-Cache-FP8 |
Recovery (%) |
| LongBench V1 |
Gov Report |
32.16 |
20.82 |
64.73 |
| 2WikimQA |
17.35 |
18.53 |
106.8 |
| Qasper |
18.22 |
26.6 |
145 |
| MultifieldQA |
31.11 |
21.95 |
70.55 |
| HotpotQA |
16.23 |
4.86 |
29.9 |
| Musique |
9.11 |
0.94 |
10.31 |
| NIAH |
niah_single_1 (4K) |
100.0 |
100.0 |
100.0 |
| niah_single_2 (4K) |
100.0 |
100.0 |
100.0 |
| niah_single_3 (4K) |
100.0 |
100.0 |
100.0 |
| niah_multikey_1 (4K) |
100.0 |
100.0 |
100.0 |
| niah_multikey_2 (4K) |
100.0 |
100.0 |
100.0 |
| niah_multikey_3 (4K) |
100.0 |
100.0 |
100.0 |