Buckets:

hf-doc-build/doc-dev / lighteval /pr_1221 /en /adding-a-custom-task.md
|
download
raw
4.99 kB
# Adding a Custom Task
Lighteval provides a flexible framework for creating custom evaluation tasks. This guide explains how to create and integrate new tasks into the evaluation system.
## Step-by-Step Creation of a Task
> [!WARNING]
> To contribute your task to the Lighteval repository, you would first need
> to install the required dev dependencies by running `pip install -e .[dev]`
> and then run `pre-commit install` to install the pre-commit hooks.
### Step 1: Create the Task File
First, create a Python file or directory under the `src/lighteval/tasks/tasks` directory.
A directory is helpfull if you need to split your file into multiple ones, just make sure to have one of the file named `main.py`.
### Step 2: Define the Prompt Function
You need to define a prompt function that will convert a line from your
dataset to a document to be used for evaluation.
```python
from lighteval.tasks.requests import Doc
# Define as many as you need for your different tasks
def prompt_fn(line: dict, task_name: str):
"""Defines how to go from a dataset line to a doc object.
Follow examples in src/lighteval/tasks/default_prompts.py, or get more info
about what this function should do in the README.
"""
return Doc(
task_name=task_name,
query=line["question"],
choices=[f" {c}" for c in line["choices"]],
gold_index=line["gold"],
)
```
### Step 3: Choose or Create Metrics
You can either use an existing metric (defined in `lighteval.metrics.metrics.Metrics`) or [create a custom one](adding-a-new-metric).
#### Using Existing Metrics
```python
from lighteval.metrics import Metrics
# Use an existing metric
metric = Metrics.ACCURACY
```
#### Creating Custom Metrics
```python
from lighteval.metrics.utils.metric_utils import SampleLevelMetric
import numpy as np
custom_metric = SampleLevelMetric(
metric_name="my_custom_metric_name",
higher_is_better=True,
category="accuracy",
sample_level_fn=lambda x: x, # How to compute score for one sample
corpus_level_fn=np.mean, # How to aggregate the sample metrics
)
```
### Step 4: Define Your Task
You can define a task with or without subsets using [LightevalTaskConfig](/docs/lighteval/pr_1221/en/package_reference/tasks#lighteval.tasks.lighteval_task.LightevalTaskConfig).
#### Simple Task (No Subsets)
```python
from lighteval.tasks.lighteval_task import LightevalTaskConfig
# This is how you create a simple task (like HellaSwag) which has one single subset
# attached to it, and one evaluation possible.
task = LightevalTaskConfig(
name="myothertask",
prompt_function=prompt_fn, # Must be defined in the file or imported
hf_repo="your_dataset_repo_on_hf",
hf_subset="default",
hf_avail_splits=["train", "test"],
evaluation_splits=["test"],
few_shots_split="train",
few_shots_select="random_sampling_from_train",
metrics=[metric], # Select your metric in Metrics
generation_size=256,
stop_sequence=["\n", "Question:"],
)
```
#### Task with Multiple Subsets
If you want to create a task with multiple subsets, add them to the
`SAMPLE_SUBSETS` list and create a task for each subset.
```python
SAMPLE_SUBSETS = ["subset1", "subset2", "subset3"] # List of all the subsets to use for this eval
class CustomSubsetTask(LightevalTaskConfig):
def __init__(
self,
name,
hf_subset,
):
super().__init__(
name=name,
hf_subset=hf_subset,
prompt_function=prompt_fn, # Must be defined in the file or imported
hf_repo="your_dataset_name",
metrics=[custom_metric], # Select your metric in Metrics or use your custom_metric
hf_avail_splits=["train", "test"],
evaluation_splits=["test"],
few_shots_split="train",
few_shots_select="random_sampling_from_train",
generation_size=256,
stop_sequence=["\n", "Question:"],
)
SUBSET_TASKS = [CustomSubsetTask(name=f"task:{subset}", hf_subset=subset) for subset in SAMPLE_SUBSETS]
```
### Step 5: Add Tasks to the Table
Then you need to add your task to the `TASKS_TABLE` list.
```python
# STORE YOUR EVALS
# Tasks with subsets:
TASKS_TABLE = SUBSET_TASKS
# Tasks without subsets:
# TASKS_TABLE = [task]
```
### Step 6: Creating a requirement file
If your task has requirements, you need to create a `requirement.txt` file with
only the required dependencies so that anyone can run your task.
## Running Your Custom Task
Once your file is created, you can run the evaluation with the following command:
```bash
lighteval accelerate \
"model_name=HuggingFaceH4/zephyr-7b-beta" \
{task} \
--custom-tasks {path_to_your_custom_task_file}
```
### Example Usage
```bash
# Run a custom task with 3 shot evaluation
lighteval accelerate \
"model_name=openai-community/gpt2" \
"myothertask|3" \
--custom-tasks community_tasks/my_custom_task.py
```

Xet Storage Details

Size:
4.99 kB
·
Xet hash:
885fa086fb3b07447f29aefcb4fd6fb6a1e2124766f0b14a06ff37945a89795e

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.