Test-suite Reduction
Preperation Work
As test-suite reduction relies on the results of evaluation, make sure that you've run the evaluation script and an eval_results.json has been generated for each model under test.
Use the following command to install necessary dependencies:
# in $EVALPLUS_ROOT
pip install -r tools/tsr/requirements.txt
Usage
python3 run.py \
--dataset DATASET \
--sample_eval_dir SAMPLE_DIR \
--model MODEL \
[--report_dir REPORT_DIR]
# Example
python3 run.py --dataset humaneval --sample_eval_dir $HOME/HumanEval --model ALL
Parameter descriptions:
--dataset: currently,humanevalandmbppare supported.--sample_eval_diris the directory containing all the LLM evaluation results. We require the directory be structured asSAMPLE_EVAL_DIR βββ LLM_1 β βββ ... β βββ eval_results.json βββ LLM_2 β βββ ... βββ ...--report_diris the directory where we store intermediate files, pass@k results, and reduced dataset. If not specified,REPORT_DIR=./tsr_infoby default.- If
MODELis a specific LLM name, the cross-validation results will be generated inREPORT_DIR; ifMODEL == ALL, a reduced dataset will be generated inREPORT_DIR.
Known Issues
If you find the program stuck at the mutant generation step, try removing the line
assert len(completion_id) == len(problems), "Missing problems in samples"
in evalplus/evaluate.py.