Upload folder using huggingface_hub

Browse files

Files changed (4) hide show

benchmark_logs/[inc-requant-woq-staticfp8-dmoe-fp8kv-delayedsampling]static-online-gaudi3-0.92util-TPparallel8-EP8-loop1moegroups-multistep1_nprompt448_rrateinf_bs448_i1024_o1024_mdllen2048_run1.log +29 -0
benchmark_logs/[inc-requant-woq-staticfp8-dmoe-fp8kv-delayedsampling]static-online-gaudi3-0.92util-TPparallel8-EP8-loop1moegroups-multistep1_nprompt448_rrateinf_bs448_i1024_o1024_mdllen2048_serving.log +0 -0
benchmark_logs/[staticfp8-dmoe-fp8kv-delayedsampling]static-online-gaudi3-0.92util-TPparallel8-EP8-loop1moegroups-multistep1_nprompt448_rrateinf_bs448_i1024_o1024_mdllen2048_run1.log +29 -0
benchmark_logs/[staticfp8-dmoe-fp8kv-delayedsampling]static-online-gaudi3-0.92util-TPparallel8-EP8-loop1moegroups-multistep1_nprompt448_rrateinf_bs448_i1024_o1024_mdllen2048_serving.log +0 -0

benchmark_logs/[inc-requant-woq-staticfp8-dmoe-fp8kv-delayedsampling]static-online-gaudi3-0.92util-TPparallel8-EP8-loop1moegroups-multistep1_nprompt448_rrateinf_bs448_i1024_o1024_mdllen2048_run1.log ADDED Viewed

@@ -0,0 +1,29 @@
  0%|          | 0/448 [00:00<?, ?it/s]
  0%|          | 1/448 [06:10<45:58:53, 370.32s/it]

+INFO 03-24 02:44:45 __init__.py:199] Automatically detected platform hpu.
+Namespace(backend='vllm', base_url=None, host='localhost', port=8080, endpoint='/v1/completions', dataset=None, dataset_name='sonnet', dataset_path='benchmarks/sonnet.txt', max_concurrency=None, model='/data/models/DeepSeek-R1/', tokenizer='/data/models/DeepSeek-R1/', best_of=1, use_beam_search=False, num_prompts=448, logprobs=None, request_rate=inf, burstiness=1.0, seed=0, trust_remote_code=False, disable_tqdm=False, profile=False, save_result=False, metadata=None, result_dir=None, result_filename=None, ignore_eos=False, percentile_metrics='ttft,tpot,itl', metric_percentiles='99', goodput=None, sonnet_input_len=1024, sonnet_output_len=1024, sonnet_prefix_len=100, sharegpt_output_len=None, random_input_len=1024, random_output_len=128, random_range_ratio=1.0, random_prefix_len=0, hf_subset=None, hf_split=None, hf_output_len=None, tokenizer_mode='auto', served_model_name=None, lora_modules=None)
+Starting initial single prompt test run...
+Initial test run completed. Starting main benchmark run...
+Traffic request rate: inf
+Burstiness factor: 1.0 (Poisson process)
+Maximum request concurrency: None
  0%|          | 0/448 [00:00<?, ?it/s]
  0%|          | 1/448 [06:10<45:58:53, 370.32s/it]
+============ Serving Benchmark Result ============
+Successful requests:                     448
+Benchmark duration (s):                  370.37
+Total input tokens:                      414918
+Total generated tokens:                  458752
+Request throughput (req/s):              1.21
+Output token throughput (tok/s):         1238.64
+Total Token throughput (tok/s):          2358.93
+---------------Time to First Token----------------
+Mean TTFT (ms):                          16517.07
+Median TTFT (ms):                        16477.37
+P99 TTFT (ms):                           31681.38
+-----Time per Output Token (excl. 1st token)------
+Mean TPOT (ms):                          345.77
+Median TPOT (ms):                        345.83
+P99 TPOT (ms):                           360.52
+---------------Inter-token Latency----------------
+Mean ITL (ms):                           345.77
+Median ITL (ms):                         350.61
+P99 ITL (ms):                            386.83
+==================================================

The diff for this file is too large to render. See raw diff

benchmark_logs/[staticfp8-dmoe-fp8kv-delayedsampling]static-online-gaudi3-0.92util-TPparallel8-EP8-loop1moegroups-multistep1_nprompt448_rrateinf_bs448_i1024_o1024_mdllen2048_run1.log ADDED Viewed

@@ -0,0 +1,29 @@
  0%|          | 0/448 [00:00<?, ?it/s]
  0%|          | 1/448 [03:38<27:11:15, 218.96s/it]

+INFO 03-24 01:45:30 __init__.py:199] Automatically detected platform hpu.
+Namespace(backend='vllm', base_url=None, host='localhost', port=8080, endpoint='/v1/completions', dataset=None, dataset_name='sonnet', dataset_path='benchmarks/sonnet.txt', max_concurrency=None, model='/data/models/DeepSeek-R1-static/', tokenizer='/data/models/DeepSeek-R1-static/', best_of=1, use_beam_search=False, num_prompts=448, logprobs=None, request_rate=inf, burstiness=1.0, seed=0, trust_remote_code=False, disable_tqdm=False, profile=False, save_result=False, metadata=None, result_dir=None, result_filename=None, ignore_eos=False, percentile_metrics='ttft,tpot,itl', metric_percentiles='99', goodput=None, sonnet_input_len=1024, sonnet_output_len=1024, sonnet_prefix_len=100, sharegpt_output_len=None, random_input_len=1024, random_output_len=128, random_range_ratio=1.0, random_prefix_len=0, hf_subset=None, hf_split=None, hf_output_len=None, tokenizer_mode='auto', served_model_name=None, lora_modules=None)
+Starting initial single prompt test run...
+Initial test run completed. Starting main benchmark run...
+Traffic request rate: inf
+Burstiness factor: 1.0 (Poisson process)
+Maximum request concurrency: None
  0%|          | 0/448 [00:00<?, ?it/s]
  0%|          | 1/448 [03:38<27:11:15, 218.96s/it]
+============ Serving Benchmark Result ============
+Successful requests:                     448
+Benchmark duration (s):                  219.01
+Total input tokens:                      414918
+Total generated tokens:                  458752
+Request throughput (req/s):              2.05
+Output token throughput (tok/s):         2094.71
+Total Token throughput (tok/s):          3989.26
+---------------Time to First Token----------------
+Mean TTFT (ms):                          16797.56
+Median TTFT (ms):                        16810.33
+P99 TTFT (ms):                           32200.30
+-----Time per Output Token (excl. 1st token)------
+Mean TPOT (ms):                          197.54
+Median TPOT (ms):                        197.54
+P99 TPOT (ms):                           212.64
+---------------Inter-token Latency----------------
+Mean ITL (ms):                           197.54
+Median ITL (ms):                         183.74
+P99 ITL (ms):                            228.45
+==================================================

benchmark_logs/[staticfp8-dmoe-fp8kv-delayedsampling]static-online-gaudi3-0.92util-TPparallel8-EP8-loop1moegroups-multistep1_nprompt448_rrateinf_bs448_i1024_o1024_mdllen2048_serving.log ADDED Viewed

The diff for this file is too large to render. See raw diff