🔎 Checking the compilability of post-processed code:: click to expand ::
To double-check the post-processing results, you can use `evalplus.syncheck` to check the code validity before and after sanitization, which will print erroneous code snippets and why they are wrong:
```shell
# 💡 If you are storing codes in jsonl:
evalplus.syncheck --samples samples.jsonl --dataset [humaneval|mbpp]
# 💡 If you are storing codes in directories:
evalplus.syncheck --samples /path/to/vicuna-[??]b_temp_[??] --dataset [humaneval|mbpp]
```
## Code Evaluation
You are strongly recommended to use a sandbox such as [docker](https://docs.docker.com/get-docker/):
```bash
docker run --rm --pull=always -v $(pwd)/evalplus_results:/app ganler/evalplus:latest \
evalplus.evaluate --dataset humaneval \
--samples /app/humaneval/ise-uiuc--Magicoder-S-DS-6.7B_vllm_temp_0.0.jsonl
```
...Or if you want to try it locally regardless of the risks ⚠️:
```bash
evalplus.evaluate --dataset [humaneval|mbpp] --samples samples.jsonl
```
To use a user-defined dataset locally, you can set `HUMANEVAL_OVERRIDE_PATH` or `MBPP_OVERRIDE_PATH`:
```bash
HUMANEVAL_OVERRIDE_PATH="/path/to/HumanEvalPlus.jsonl.gz" evalplus.evaluate --dataset humaneval --samples samples.jsonl
```
> [!Tip]
>
> Program execution can be configured. See [Program Execution in EvalPlus and EvalPerf](./docs/execution.md).