Buckets:
| # Adding a Custom Task | |
| Lighteval provides a flexible framework for creating custom evaluation tasks. This guide explains how to create and integrate new tasks into the evaluation system. | |
| ## Step-by-Step Creation of a Task | |
| > [!WARNING] | |
| > To contribute your task to the Lighteval repository, you would first need | |
| > to install the required dev dependencies by running `pip install -e .[dev]` | |
| > and then run `pre-commit install` to install the pre-commit hooks. | |
| ### Step 1: Create the Task File | |
| First, create a Python file or directory under the `src/lighteval/tasks/tasks` directory. | |
| A directory is helpfull if you need to split your file into multiple ones, just make sure to have one of the file named `main.py`. | |
| ### Step 2: Define the Prompt Function | |
| You need to define a prompt function that will convert a line from your | |
| dataset to a document to be used for evaluation. | |
| ```python | |
| from lighteval.tasks.requests import Doc | |
| # Define as many as you need for your different tasks | |
| def prompt_fn(line: dict, task_name: str): | |
| """Defines how to go from a dataset line to a doc object. | |
| Follow examples in src/lighteval/tasks/default_prompts.py, or get more info | |
| about what this function should do in the README. | |
| """ | |
| return Doc( | |
| task_name=task_name, | |
| query=line["question"], | |
| choices=[f" {c}" for c in line["choices"]], | |
| gold_index=line["gold"], | |
| ) | |
| ``` | |
| ### Step 3: Choose or Create Metrics | |
| You can either use an existing metric (defined in `lighteval.metrics.metrics.Metrics`) or [create a custom one](adding-a-new-metric). | |
| #### Using Existing Metrics | |
| ```python | |
| from lighteval.metrics import Metrics | |
| # Use an existing metric | |
| metric = Metrics.ACCURACY | |
| ``` | |
| #### Creating Custom Metrics | |
| ```python | |
| from lighteval.metrics.utils.metric_utils import SampleLevelMetric | |
| import numpy as np | |
| custom_metric = SampleLevelMetric( | |
| metric_name="my_custom_metric_name", | |
| higher_is_better=True, | |
| category="accuracy", | |
| sample_level_fn=lambda x: x, # How to compute score for one sample | |
| corpus_level_fn=np.mean, # How to aggregate the sample metrics | |
| ) | |
| ``` | |
| ### Step 4: Define Your Task | |
| You can define a task with or without subsets using [LightevalTaskConfig](/docs/lighteval/pr_1221/en/package_reference/tasks#lighteval.tasks.lighteval_task.LightevalTaskConfig). | |
| #### Simple Task (No Subsets) | |
| ```python | |
| from lighteval.tasks.lighteval_task import LightevalTaskConfig | |
| # This is how you create a simple task (like HellaSwag) which has one single subset | |
| # attached to it, and one evaluation possible. | |
| task = LightevalTaskConfig( | |
| name="myothertask", | |
| prompt_function=prompt_fn, # Must be defined in the file or imported | |
| hf_repo="your_dataset_repo_on_hf", | |
| hf_subset="default", | |
| hf_avail_splits=["train", "test"], | |
| evaluation_splits=["test"], | |
| few_shots_split="train", | |
| few_shots_select="random_sampling_from_train", | |
| metrics=[metric], # Select your metric in Metrics | |
| generation_size=256, | |
| stop_sequence=["\n", "Question:"], | |
| ) | |
| ``` | |
| #### Task with Multiple Subsets | |
| If you want to create a task with multiple subsets, add them to the | |
| `SAMPLE_SUBSETS` list and create a task for each subset. | |
| ```python | |
| SAMPLE_SUBSETS = ["subset1", "subset2", "subset3"] # List of all the subsets to use for this eval | |
| class CustomSubsetTask(LightevalTaskConfig): | |
| def __init__( | |
| self, | |
| name, | |
| hf_subset, | |
| ): | |
| super().__init__( | |
| name=name, | |
| hf_subset=hf_subset, | |
| prompt_function=prompt_fn, # Must be defined in the file or imported | |
| hf_repo="your_dataset_name", | |
| metrics=[custom_metric], # Select your metric in Metrics or use your custom_metric | |
| hf_avail_splits=["train", "test"], | |
| evaluation_splits=["test"], | |
| few_shots_split="train", | |
| few_shots_select="random_sampling_from_train", | |
| generation_size=256, | |
| stop_sequence=["\n", "Question:"], | |
| ) | |
| SUBSET_TASKS = [CustomSubsetTask(name=f"task:{subset}", hf_subset=subset) for subset in SAMPLE_SUBSETS] | |
| ``` | |
| ### Step 5: Add Tasks to the Table | |
| Then you need to add your task to the `TASKS_TABLE` list. | |
| ```python | |
| # STORE YOUR EVALS | |
| # Tasks with subsets: | |
| TASKS_TABLE = SUBSET_TASKS | |
| # Tasks without subsets: | |
| # TASKS_TABLE = [task] | |
| ``` | |
| ### Step 6: Creating a requirement file | |
| If your task has requirements, you need to create a `requirement.txt` file with | |
| only the required dependencies so that anyone can run your task. | |
| ## Running Your Custom Task | |
| Once your file is created, you can run the evaluation with the following command: | |
| ```bash | |
| lighteval accelerate \ | |
| "model_name=HuggingFaceH4/zephyr-7b-beta" \ | |
| {task} \ | |
| --custom-tasks {path_to_your_custom_task_file} | |
| ``` | |
| ### Example Usage | |
| ```bash | |
| # Run a custom task with 3 shot evaluation | |
| lighteval accelerate \ | |
| "model_name=openai-community/gpt2" \ | |
| "myothertask|3" \ | |
| --custom-tasks community_tasks/my_custom_task.py | |
| ``` | |
Xet Storage Details
- Size:
- 4.99 kB
- Xet hash:
- 885fa086fb3b07447f29aefcb4fd6fb6a1e2124766f0b14a06ff37945a89795e
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.