krishnateja95's picture
Update README.md
0f67515 verified
The following accuracy is using lm-eval and HF:
```
lm_eval \
--model hf \
--model_args pretrained="nm-testing/Llama-3.1-8B-Instruct-QKV-Cache-FP8",dtype=auto,device_map="auto",max_length=100000 \
--tasks "niah_single_1" \
--write_out \
--batch_size 1 \
--output_path "niah_single_1.json" \
--show_config
```
Replace ```niah_single_1``` with ```niah_single_2```,```niah_single_2```,```niah_multikey_1```, ```niah_multikey_2```, ```niah_multikey_3```
### Accuracy with HF
<table>
<thead>
<tr>
<th>Category</th>
<th>Task</th>
<th>meta-llama/Llama-3.1-8B-Instruct</th>
<th>nm-testing/Llama-3.1-8B-Instruct-QKV-Cache-FP8</th>
<th>Recovery (%)</th>
</tr>
</thead>
<tbody>
<tr>
<td rowspan="6"><b>NIAH</b></td>
<td>niah_single_1 (100K)</td>
<td>OOM</td>
<td>0.0</td>
<td>0</td>
</tr>
<tr>
<td>niah_single_2 (100K)</td>
<td>OOM</td>
<td>0.0</td>
<td>0</td>
</tr>
<tr>
<td>niah_single_3 (100K)</td>
<td>OOM</td>
<td>0.0</td>
<td>0</td>
</tr>
<tr>
<td>niah_multikey_1 (100K)</td>
<td>OOM</td>
<td>0.0</td>
<td>0.</td>
</tr>
<tr>
<td>niah_multikey_2 (100K)</td>
<td>OOM</td>
<td>0.0</td>
<td>0.0</td>
</tr>
<tr>
<td>niah_multikey_3 (100K)</td>
<td>OOM</td>
<td>0.0</td>
<td>0.0</td>
</tr>
</tbody>
</table>
The following accuracy is using lm-eval and vLLM:
```
lm_eval \
--model vllm \
--model_args pretrained="nm-testing/Llama-3.1-8B-Instruct-QKV-Cache-FP8",dtype=auto,add_bos_token=True,max_model_len=131072,tensor_parallel_size=1,gpu_memory_utilization=0.7,enable_chunked_prefill=True,trust_remote_code=True \
--tasks "niah_single_1" \
--write_out \
--batch_size 1 \
--output_path "niah_single_1.json" \
--show_config
```
Replace ```niah_single_1``` with ```niah_single_2```,```niah_single_2```,```niah_multikey_1```, ```niah_multikey_2```, ```niah_multikey_3```
### Accuracy with vLLM
<table>
<thead>
<tr>
<th>Category</th>
<th>Task</th>
<th>meta-llama/Llama-3.1-8B-Instruct</th>
<th>nm-testing/Llama-3.1-8B-Instruct-QKV-Cache-FP8</th>
<th>Recovery (%)</th>
</tr>
</thead>
<tbody>
<tr>
<td rowspan="6"><b>LongBench V1</b></td>
<td>Gov Report</td>
<td>32.16</td>
<td>20.82</td>
<td>64.73</td>
</tr>
<tr>
<td>2WikimQA</td>
<td>17.35</td>
<td>18.53</td>
<td>106.8</td>
</tr>
<tr>
<td>Qasper</td>
<td>18.22</td>
<td>26.6</td>
<td>145</td>
</tr>
<tr>
<td>MultifieldQA</td>
<td>31.11</td>
<td>21.95</td>
<td>70.55</td>
</tr>
<tr>
<td>HotpotQA</td>
<td>16.23</td>
<td>4.86</td>
<td>29.9</td>
</tr>
<tr>
<td>Musique</td>
<td>9.11</td>
<td>0.94</td>
<td>10.31</td>
</tr>
<tr>
<td rowspan="6"><b>NIAH</b></td>
<td>niah_single_1 (4K)</td>
<td>100.0</td>
<td>100.0</td>
<td>100.0</td>
</tr>
<tr>
<td>niah_single_2 (4K)</td>
<td>100.0</td>
<td>100.0</td>
<td>100.0</td>
</tr>
<tr>
<td>niah_single_3 (4K)</td>
<td>100.0</td>
<td>100.0</td>
<td>100.0</td>
</tr>
<tr>
<td>niah_multikey_1 (4K)</td>
<td>100.0</td>
<td>100.0</td>
<td>100.0</td>
</tr>
<tr>
<td>niah_multikey_2 (4K)</td>
<td>100.0</td>
<td>100.0</td>
<td>100.0</td>
</tr>
<tr>
<td>niah_multikey_3 (4K)</td>
<td>100.0</td>
<td>100.0</td>
<td>100.0</td>
</tr>
</tbody>
</table>