hxgy610 commited on
Commit
ba20cd8
·
verified ·
1 Parent(s): 508054e

Update instruction in Readme

Browse files
Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -45,6 +45,8 @@ vllm serve \
45
  ### Evaluation
46
  From vllm-bench, the acceptance length (AL) on HumanEval dataset with different speculation length K is shown as below.
47
 
 
 
48
  | K | Acceptance Length |
49
  |---|-------------------|
50
  | 4 | 4.30 |
@@ -60,7 +62,7 @@ vllm bench serve \
60
  --model Qwen/Qwen3-Coder-30B-A3B-Instruct \
61
  --dataset-name custom \
62
  --dataset-path /home/ubuntu/eval_datasets/humaneval_qwen3coder_bench.jsonl \
63
- --custom-output-len 4096 \
64
  --num-prompts 80 \
65
  --max-concurrency 1 \
66
  --temperature 0 \
 
45
  ### Evaluation
46
  From vllm-bench, the acceptance length (AL) on HumanEval dataset with different speculation length K is shown as below.
47
 
48
+ We use instruction-formatted prompts following standard practice for instruct models (similar to [DeepSeek-Coder evaluation](https://github.com/deepseek-ai/DeepSeek-Coder/blob/main/Evaluation/HumanEval/eval_instruct.py) and [Llama 3.1 8B instruction evaluation](https://huggingface.co/datasets/meta-llama/Llama-3.1-8B-Instruct-evals/viewer/Llama-3.1-8B-Instruct-evals__human_eval__details?row=1)). The instruction we add in front of each prompt is ```Complete the following Python function. Only output the code, no explanations.```
49
+
50
  | K | Acceptance Length |
51
  |---|-------------------|
52
  | 4 | 4.30 |
 
62
  --model Qwen/Qwen3-Coder-30B-A3B-Instruct \
63
  --dataset-name custom \
64
  --dataset-path /home/ubuntu/eval_datasets/humaneval_qwen3coder_bench.jsonl \
65
+ --custom-output-len 256 \
66
  --num-prompts 80 \
67
  --max-concurrency 1 \
68
  --temperature 0 \