| # Dataset Quick Evaluation Tutorial | |
| OpenCompass provides two paths for quickly evaluating the provided data, the data format protocol based on ChatMLDataset and the data format protocol based on CustomDataset. | |
| Compared to the complete dataset integration process in [new_dataset.md](./new_dataset.md), these two evaluation paths are more convenient and efficient, being able to directly enter the evaluation process without adding new configuration files. | |
| But if you have specific needs for custom reading/inference/evaluation, it is recommended to still follow the complete integration process to add a new dataset. | |
| ## Data Format Protocol and Fast Evaluation Based on ChatMLDataset | |
| OpenCompass has recently launched a dataset evaluation mode based on the ChatML dialogue template, which allow users to provide a dataset .json file that conforms to the ChatML dialogue template, and simply set the dataset information config like model configs to start evaluating directly. | |
| ### Format Requirements for Data Files | |
| This evaluation method only supports data files in `.json` format, and each sample must comply with the following format: | |
| The format of a text-only dataset with a simple structure: | |
| ```jsonl | |
| { | |
| "question":[ | |
| { | |
| "role": "system" # Omittable | |
| "content": Str | |
| }, | |
| { | |
| "role": "user", | |
| "content": Str | |
| } | |
| ], | |
| "answer":[ | |
| Str | |
| ] | |
| } | |
| { | |
| ... | |
| } | |
| ... | |
| ``` | |
| The format of multiple rounds and multiple modes datasets: | |
| ```jsonl | |
| { | |
| "question":[ | |
| { | |
| "role": "system", | |
| "content": Str, | |
| }, | |
| { | |
| "role": "user", | |
| "content": Str or List | |
| [ | |
| { | |
| "type": Str, # "image" | |
| "image_url": Str, | |
| }, | |
| ... | |
| { | |
| "type": Str, # "text" | |
| "text": Str, | |
| }, | |
| ] | |
| }, | |
| { | |
| "role": "assistant", | |
| "content": Str | |
| }, | |
| { | |
| "role": "user", | |
| "content": Str or List | |
| }, | |
| ... | |
| ], | |
| "answer":[ | |
| Str, | |
| Str, | |
| ... | |
| ] | |
| } | |
| { | |
| ... | |
| } | |
| ... | |
| ``` | |
| (As OpenCompass currently does not support multi-mode evaluation, the template above is for reference only.) | |
| When ChatMLDataset reading `.json` files, it will use `pydantic` to perform simple format validation on the files. | |
| You can use `tools/chatml_fformat_test.py` to check your provided data file. | |
| After format checking, please add a config dictionary named `chatml_datasets` in your running config file to convert the data file into an OpenCompass dataset at runtime. | |
| An example is as follows: | |
| ```python | |
| chatml_datasets = [ | |
| dict( | |
| abbr='YOUR_DATASET_NAME', | |
| path='YOUR_DATASET_PATH', | |
| evaluator=dict( | |
| type='cascade_evaluator', | |
| rule_evaluator=dict( | |
| type='math_evaluator', | |
| ), | |
| llm_evaluator=dict( | |
| type='llm_evaluator', | |
| prompt="YOUR_JUDGE_PROMPT", | |
| judge_cfg=dict(), # YOUR Judge Model Config | |
| ) | |
| ), | |
| n=1, # Repeat Number | |
| ), | |
| ] | |
| ``` | |
| The ChatML evaluation module currently provides four preset evaluators, `mcq_rule_evaluator` used for MCQ evaluation, `math_evaluator` used for latex mathematical formula evaluation, `llm_evaluator` used for evaluating answers that are open-ended or difficult to extract), and `cascade_evaluator`, an evaluation mode composed of rule and LLM evaluators cascaded together. | |
| In addition, if you have a long-term need to use datasets based on ChatML templates, you can contribute your dataset config to `opencompass/config/chatml_datasets`. | |
| An eval example of calling these dataset configs is provided in `examples/evalchat_datasets.py`. | |
| ## Data Format Protocol and Fast Evaluation Based on CustomsDataset | |
| (This module is no longer being updated, but it can still be used if there is a need for cli- quick evaluation.) | |
| This module support two types of tasks: multiple choice (`mcq`) and question & answer (`qa`). For `mcq`, both ppl and gen inferences are supported; for `qa`, gen inference is supported. | |
| ### Dataset Format | |
| We support datasets in both `.jsonl` and `.csv` formats. | |
| #### Multiple Choice (`mcq`) | |
| For `mcq` datasets, the default fields are as follows: | |
| - `question`: The stem of the multiple-choice question. | |
| - `A`, `B`, `C`, ...: Single uppercase letters representing the options, with no limit on the number. Defaults to parsing consecutive letters strating from `A` as options. | |
| - `answer`: The correct answer to the multiple-choice question, which must be one of the options used above, such as `A`, `B`, `C`, etc. | |
| Non-default fields will be read in but are not used by default. To use them, specify in the `.meta.json` file. | |
| An example of the `.jsonl` format: | |
| ```jsonl | |
| {"question": "165+833+650+615=", "A": "2258", "B": "2263", "C": "2281", "answer": "B"} | |
| {"question": "368+959+918+653+978=", "A": "3876", "B": "3878", "C": "3880", "answer": "A"} | |
| {"question": "776+208+589+882+571+996+515+726=", "A": "5213", "B": "5263", "C": "5383", "answer": "B"} | |
| {"question": "803+862+815+100+409+758+262+169=", "A": "4098", "B": "4128", "C": "4178", "answer": "C"} | |
| ``` | |
| An example of the `.csv` format: | |
| ```csv | |
| question,A,B,C,answer | |
| 127+545+588+620+556+199=,2632,2635,2645,B | |
| 735+603+102+335+605=,2376,2380,2410,B | |
| 506+346+920+451+910+142+659+850=,4766,4774,4784,C | |
| 504+811+870+445=,2615,2630,2750,B | |
| ``` | |
| #### Question & Answer (`qa`) | |
| For `qa` datasets, the default fields are as follows: | |
| - `question`: The stem of the question & answer question. | |
| - `answer`: The correct answer to the question & answer question. It can be missing, indicating the dataset has no correct answer. | |
| Non-default fields will be read in but are not used by default. To use them, specify in the `.meta.json` file. | |
| An example of the `.jsonl` format: | |
| ```jsonl | |
| {"question": "752+361+181+933+235+986=", "answer": "3448"} | |
| {"question": "712+165+223+711=", "answer": "1811"} | |
| {"question": "921+975+888+539=", "answer": "3323"} | |
| {"question": "752+321+388+643+568+982+468+397=", "answer": "4519"} | |
| ``` | |
| An example of the `.csv` format: | |
| ```csv | |
| question,answer | |
| 123+147+874+850+915+163+291+604=,3967 | |
| 149+646+241+898+822+386=,3142 | |
| 332+424+582+962+735+798+653+214=,4700 | |
| 649+215+412+495+220+738+989+452=,4170 | |
| ``` | |
| ### Command Line List | |
| Custom datasets can be directly called for evaluation through the command line. | |
| ```bash | |
| python run.py \ | |
| --models hf_llama2_7b \ | |
| --custom-dataset-path xxx/test_mcq.csv \ | |
| --custom-dataset-data-type mcq \ | |
| --custom-dataset-infer-method ppl | |
| ``` | |
| ```bash | |
| python run.py \ | |
| --models hf_llama2_7b \ | |
| --custom-dataset-path xxx/test_qa.jsonl \ | |
| --custom-dataset-data-type qa \ | |
| --custom-dataset-infer-method gen | |
| ``` | |
| In most cases, `--custom-dataset-data-type` and `--custom-dataset-infer-method` can be omitted. OpenCompass will | |
| set them based on the following logic: | |
| - If options like `A`, `B`, `C`, etc., can be parsed from the dataset file, it is considered an `mcq` dataset; otherwise, it is considered a `qa` dataset. | |
| - The default `infer_method` is `gen`. | |
| ### Configuration File | |
| In the original configuration file, simply add a new item to the `datasets` variable. Custom datasets can be mixed with regular datasets. | |
| ```python | |
| datasets = [ | |
| {"path": "xxx/test_mcq.csv", "data_type": "mcq", "infer_method": "ppl"}, | |
| {"path": "xxx/test_qa.jsonl", "data_type": "qa", "infer_method": "gen"}, | |
| ] | |
| ``` | |
| ### Supplemental Information for Dataset `.meta.json` | |
| OpenCompass will try to parse the input dataset file by default, so in most cases, the `.meta.json` file is **not necessary**. However, if the dataset field names are not the default ones, or custom prompt words are required, it should be specified in the `.meta.json` file. | |
| The file is placed in the same directory as the dataset, with the filename followed by `.meta.json`. An example file structure is as follows: | |
| ```tree | |
| . | |
| βββ test_mcq.csv | |
| βββ test_mcq.csv.meta.json | |
| βββ test_qa.jsonl | |
| βββ test_qa.jsonl.meta.json | |
| ``` | |
| Possible fields in this file include: | |
| - `abbr` (str): Abbreviation of the dataset, serving as its ID. | |
| - `data_type` (str): Type of dataset, options are `mcq` and `qa`. | |
| - `infer_method` (str): Inference method, options are `ppl` and `gen`. | |
| - `human_prompt` (str): User prompt template for generating prompts. Variables in the template are enclosed in `{}`, like `{question}`, `{opt1}`, etc. If `template` exists, this field will be ignored. | |
| - `bot_prompt` (str): Bot prompt template for generating prompts. Variables in the template are enclosed in `{}`, like `{answer}`, etc. If `template` exists, this field will be ignored. | |
| - `template` (str or dict): Question template for generating prompts. Variables in the template are enclosed in `{}`, like `{question}`, `{opt1}`, etc. The relevant syntax is in [here](../prompt/prompt_template.md) regarding `infer_cfg['prompt_template']['template']`. | |
| - `input_columns` (list): List of input fields for reading data. | |
| - `output_column` (str): Output field for reading data. | |
| - `options` (list): List of options for reading data, valid only when `data_type` is `mcq`. | |
| For example: | |
| ```json | |
| { | |
| "human_prompt": "Question: 127 + 545 + 588 + 620 + 556 + 199 =\nA. 2632\nB. 2635\nC. 2645\nAnswer: Let's think step by step, 127 + 545 + 588 + 620 + 556 + 199 = 672 + 588 + 620 + 556 + 199 = 1260 + 620 + 556 + 199 = 1880 + 556 + 199 = 2436 + 199 = 2635. So the answer is B.\nQuestion: {question}\nA. {A}\nB. {B}\nC. {C}\nAnswer: ", | |
| "bot_prompt": "{answer}" | |
| } | |
| ``` | |
| or | |
| ```json | |
| { | |
| "template": "Question: {my_question}\nX. {X}\nY. {Y}\nZ. {Z}\nW. {W}\nAnswer:", | |
| "input_columns": ["my_question", "X", "Y", "Z", "W"], | |
| "output_column": "my_answer", | |
| } | |
| ``` | |