The following accuracy is using lm-eval and HF: ``` lm_eval \ --model hf \ --model_args pretrained="nm-testing/Llama-3.1-8B-Instruct-QKV-Cache-FP8",dtype=auto,device_map="auto",max_length=100000 \ --tasks "niah_single_1" \ --write_out \ --batch_size 1 \ --output_path "niah_single_1.json" \ --show_config ``` Replace ```niah_single_1``` with ```niah_single_2```,```niah_single_2```,```niah_multikey_1```, ```niah_multikey_2```, ```niah_multikey_3``` ### Accuracy with HF

Category	Task	meta-llama/Llama-3.1-8B-Instruct	Recovery (%)
NIAH	niah_single_1 (100K)	OOM	0
	niah_single_2 (100K)	OOM	0
	niah_single_3 (100K)	OOM	0
	niah_multikey_1 (100K)	OOM	0.
	niah_multikey_2 (100K)	OOM	0.0
	niah_multikey_3 (100K)	OOM	0.0

The following accuracy is using lm-eval and vLLM: ``` lm_eval \ --model vllm \ --model_args pretrained="nm-testing/Llama-3.1-8B-Instruct-QKV-Cache-FP8",dtype=auto,add_bos_token=True,max_model_len=131072,tensor_parallel_size=1,gpu_memory_utilization=0.7,enable_chunked_prefill=True,trust_remote_code=True \ --tasks "niah_single_1" \ --write_out \ --batch_size 1 \ --output_path "niah_single_1.json" \ --show_config ``` Replace ```niah_single_1``` with ```niah_single_2```,```niah_single_2```,```niah_multikey_1```, ```niah_multikey_2```, ```niah_multikey_3``` ### Accuracy with vLLM

Category	Task	meta-llama/Llama-3.1-8B-Instruct	nm-testing/Llama-3.1-8B-Instruct-QKV-Cache-FP8	Recovery (%)
LongBench V1	Gov Report	32.16	20.82	64.73
	2WikimQA	17.35	18.53	106.8
	Qasper	18.22	26.6	145
	MultifieldQA	31.11	21.95	70.55
	HotpotQA	16.23	4.86	29.9
	Musique	9.11	0.94	10.31
NIAH	niah_single_1 (4K)	100.0	100.0	100.0
	niah_single_2 (4K)	100.0	100.0	100.0
	niah_single_3 (4K)	100.0	100.0	100.0
	niah_multikey_1 (4K)	100.0	100.0	100.0
	niah_multikey_2 (4K)	100.0	100.0	100.0
	niah_multikey_3 (4K)	100.0	100.0	100.0