Buckets:

hf-doc-build/doc-dev / lighteval /pr_980 /en /adding-a-custom-task.md
rtrm's picture
|
download
raw
6.19 kB
# Adding a Custom Task
Lighteval provides a flexible framework for creating custom evaluation tasks. This guide explains how to create and integrate new tasks into the evaluation system.
## Task Categories
Before creating a custom task, consider which category it belongs to:
### Core Evaluations
Core evaluations are evaluations that only require standard logic in their
metrics and processing, and that we will add to our test suite to ensure non-regression through time. They already see high usage in the community.
### Extended Evaluations
Extended evaluations are evaluations that require custom logic in their
metrics (complex normalization, an LLM as a judge, etc.), that we added to
facilitate the life of users. They already see high usage in the community.
### Community Evaluations
Community evaluations are submissions by the community of new tasks.
A popular community evaluation can move to become an extended or core evaluation over time.
> [!TIP]
> You can find examples of custom tasks in the [community_tasks](https://github.com/huggingface/lighteval/tree/main/community_tasks) directory.
## Step-by-Step Creation of a Custom Task
> [!WARNING]
> To contribute your custom task to the Lighteval repository, you would first need
> to install the required dev dependencies by running `pip install -e .[dev]`
> and then run `pre-commit install` to install the pre-commit hooks.
### Step 1: Create the Task File
First, create a Python file under the `community_tasks` directory.
### Step 2: Define the Prompt Function
You need to define a prompt function that will convert a line from your
dataset to a document to be used for evaluation.
```python
from lighteval.tasks.requests import Doc
# Define as many as you need for your different tasks
def prompt_fn(line: dict, task_name: str):
"""Defines how to go from a dataset line to a doc object.
Follow examples in src/lighteval/tasks/default_prompts.py, or get more info
about what this function should do in the README.
"""
return Doc(
task_name=task_name,
query=line["question"],
choices=[f" {c}" for c in line["choices"]],
gold_index=line["gold"],
)
```
### Step 3: Choose or Create Metrics
You can either use an existing metric (defined in `lighteval.metrics.metrics.Metrics`) or [create a custom one](adding-a-new-metric).
#### Using Existing Metrics
```python
from lighteval.metrics import Metrics
# Use an existing metric
metric = Metrics.ACCURACY
```
#### Creating Custom Metrics
```python
from lighteval.metrics.utils.metric_utils import SampleLevelMetric
import numpy as np
custom_metric = SampleLevelMetric(
metric_name="my_custom_metric_name",
higher_is_better=True,
category="accuracy",
sample_level_fn=lambda x: x, # How to compute score for one sample
corpus_level_fn=np.mean, # How to aggregate the sample metrics
)
```
### Step 4: Define Your Task
You can define a task with or without subsets using [LightevalTaskConfig](/docs/lighteval/pr_980/en/package_reference/tasks#lighteval.tasks.lighteval_task.LightevalTaskConfig).
#### Simple Task (No Subsets)
```python
from lighteval.tasks.lighteval_task import LightevalTaskConfig
# This is how you create a simple task (like HellaSwag) which has one single subset
# attached to it, and one evaluation possible.
task = LightevalTaskConfig(
name="myothertask",
prompt_function=prompt_fn, # Must be defined in the file or imported
suite=["community"],
hf_repo="your_dataset_repo_on_hf",
hf_subset="default",
hf_avail_splits=["train", "test"],
evaluation_splits=["test"],
few_shots_split="train",
few_shots_select="random_sampling_from_train",
metrics=[metric], # Select your metric in Metrics
generation_size=256,
stop_sequence=["\n", "Question:"],
)
```
#### Task with Multiple Subsets
If you want to create a task with multiple subsets, add them to the
`SAMPLE_SUBSETS` list and create a task for each subset.
```python
SAMPLE_SUBSETS = ["subset1", "subset2", "subset3"] # List of all the subsets to use for this eval
class CustomSubsetTask(LightevalTaskConfig):
def __init__(
self,
name,
hf_subset,
):
super().__init__(
name=name,
hf_subset=hf_subset,
prompt_function=prompt_fn, # Must be defined in the file or imported
hf_repo="your_dataset_name",
metrics=[custom_metric], # Select your metric in Metrics or use your custom_metric
hf_avail_splits=["train", "test"],
evaluation_splits=["test"],
few_shots_split="train",
few_shots_select="random_sampling_from_train",
suite=["community"],
generation_size=256,
stop_sequence=["\n", "Question:"],
)
SUBSET_TASKS = [CustomSubsetTask(name=f"mytask:{subset}", hf_subset=subset) for subset in SAMPLE_SUBSETS]
```
### Step 5: Add Tasks to the Table
Then you need to add your task to the `TASKS_TABLE` list.
```python
# STORE YOUR EVALS
# Tasks with subsets:
TASKS_TABLE = SUBSET_TASKS
# Tasks without subsets:
# TASKS_TABLE = [task]
```
### Step 6: Creating a requirement file
If your task has requirements, you need to create a `requirement.txt` file with
only the required dependencies so that anyone can run your task.
## Running Your Custom Task
Once your file is created, you can run the evaluation with the following command:
```bash
lighteval accelerate \
"model_name=HuggingFaceH4/zephyr-7b-beta" \
"community|{custom_task}|{fewshots}" \
--custom-tasks {path_to_your_custom_task_file}
```
### Example Usage
```bash
# Run a custom task with zero-shot evaluation
lighteval accelerate \
"model_name=openai-community/gpt2" \
"community|myothertask|0" \
--custom-tasks community_tasks/my_custom_task.py
# Run a custom task with few-shot evaluation
lighteval accelerate \
"model_name=openai-community/gpt2" \
"community|myothertask|3" \
--custom-tasks community_tasks/my_custom_task.py
```
<EditOnGithub source="https://github.com/huggingface/lighteval/blob/main/docs/source/adding-a-custom-task.mdx" />

Xet Storage Details

Size:
6.19 kB
·
Xet hash:
4c6a8bd223a3ffbd021cbbbe2080e78da2a3ef48baa744659a7a2e599cfdc46d

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.