ordlibrary
/

DeepSolanaCoder

Model card Files Files and versions

DeepSolanaCoder / DeepSeek-Coder-main /Evaluation /MBPP /README.md

ordlibrary's picture

Upload folder using huggingface_hub

f1e6b80 verified about 1 year ago

|

history blame contribute delete

1.89 kB

	## 1. Introduction

	We provide a test script to evaluate the performance of the deepseek-coder model on code generation benchmarks, [MBPP](https://huggingface.co/datasets/mbpp), with 3-shot setting.



	## 2. Setup

	```
	pip install accelerate
	pip install attrdict
	pip install transformers
	pip install pytorch
	```



	## 3. Evaluation

	We've created a sample script, eval.sh, that demonstrates how to test the deepseek-coder-1.3b-base model on the MBPP dataset leveraging 8 GPUs.

	```bash
	MODEL_NAME_OR_PATH="deepseek-ai/deepseek-coder-1.3b-base"
	DATASET_ROOT="data/"
	LANGUAGE="python"
	python -m accelerate.commands.launch --config_file test_config.yaml eval_pal.py --logdir ${MODEL_NAME_OR_PATH} --dataroot ${DATASET_ROOT}
	```

	## 4. Experimental Results

	We report experimental results here for several models. We set the maximum input length to 4096 and the maximum output length to 500, and employ the greedy search strategy.



	#### (1) Multilingual Base Models

	\| Model \| Size \| Pass@1 \|
	\|-------------------\|------\|--------\|
	\| CodeShell \| 7B \| 38.6% \|
	\| CodeGeeX2 \| 6B \| 36.2% \|
	\| StarCoder \| 16B \| 42.8% \|
	\| CodeLLama-Base \| 7B \| 38.6% \|
	\| CodeLLama-Base \| 13B \| 47.0% \|
	\| CodeLLama-Base \| 34B \| 55.0% \|
	\| \| \| \| \| \| \| \| \| \| \| \|
	\| DeepSeek-Coder-Base\| 1.3B \| 46.8% \|
	\| DeepSeek-Coder-Base\| 5.7B \| 57.2% \|
	\| DeepSeek-Coder-Base\| 6.7B \| 60.6% \|
	\| DeepSeek-Coder-Base\|33B \| 66.0% \|

	#### (2) Instruction-Tuned Models
	\| Model \| Size \| Pass@1 \|
	\|---------------------\|------\|--------\|
	\| GPT-3.5-Turbo \| - \| 70.8% \|
	\| GPT-4 \| - \| 80.0% \|
	\| \| \| \| \| \| \| \| \| \| \| \|
	\| DeepSeek-Coder-Instruct \| 1.3B \| 49.4% \|
	\| DeepSeek-Coder-Instruct \| 6.7B \| 65.4% \|
	\| DeepSeek-Coder-Instruct \| 33B \| 70.0% \|