Buckets:

hf-doc-build
/

doc-dev

Files

xet

hf-doc-build/doc-dev / lighteval /pr_980 /en /adding-a-custom-task.md

rtrm

29 days ago

preview code

download

raw

6.19 kB

	# Adding a Custom Task

	Lighteval provides a flexible framework for creating custom evaluation tasks. This guide explains how to create and integrate new tasks into the evaluation system.

	## Task Categories

	Before creating a custom task, consider which category it belongs to:

	### Core Evaluations
	Core evaluations are evaluations that only require standard logic in their
	metrics and processing, and that we will add to our test suite to ensure non-regression through time. They already see high usage in the community.

	### Extended Evaluations
	Extended evaluations are evaluations that require custom logic in their
	metrics (complex normalization, an LLM as a judge, etc.), that we added to
	facilitate the life of users. They already see high usage in the community.

	### Community Evaluations
	Community evaluations are submissions by the community of new tasks.

	A popular community evaluation can move to become an extended or core evaluation over time.

	> [!TIP]
	> You can find examples of custom tasks in the [community_tasks](https://github.com/huggingface/lighteval/tree/main/community_tasks) directory.

	## Step-by-Step Creation of a Custom Task

	> [!WARNING]
	> To contribute your custom task to the Lighteval repository, you would first need
	> to install the required dev dependencies by running `pip install -e .[dev]`
	> and then run `pre-commit install` to install the pre-commit hooks.

	### Step 1: Create the Task File

	First, create a Python file under the `community_tasks` directory.

	### Step 2: Define the Prompt Function

	You need to define a prompt function that will convert a line from your
	dataset to a document to be used for evaluation.

	```python
	from lighteval.tasks.requests import Doc

	# Define as many as you need for your different tasks
	def prompt_fn(line: dict, task_name: str):
	"""Defines how to go from a dataset line to a doc object.
	Follow examples in src/lighteval/tasks/default_prompts.py, or get more info
	about what this function should do in the README.
	"""
	return Doc(
	task_name=task_name,
	query=line["question"],
	choices=[f" {c}" for c in line["choices"]],
	gold_index=line["gold"],
	)
	```

	### Step 3: Choose or Create Metrics

	You can either use an existing metric (defined in `lighteval.metrics.metrics.Metrics`) or [create a custom one](adding-a-new-metric).

	#### Using Existing Metrics

	```python
	from lighteval.metrics import Metrics

	# Use an existing metric
	metric = Metrics.ACCURACY
	```

	#### Creating Custom Metrics

	```python
	from lighteval.metrics.utils.metric_utils import SampleLevelMetric
	import numpy as np

	custom_metric = SampleLevelMetric(
	metric_name="my_custom_metric_name",
	higher_is_better=True,
	category="accuracy",
	sample_level_fn=lambda x: x, # How to compute score for one sample
	corpus_level_fn=np.mean, # How to aggregate the sample metrics
	)
	```

	### Step 4: Define Your Task

	You can define a task with or without subsets using [LightevalTaskConfig](/docs/lighteval/pr_980/en/package_reference/tasks#lighteval.tasks.lighteval_task.LightevalTaskConfig).

	#### Simple Task (No Subsets)

	```python
	from lighteval.tasks.lighteval_task import LightevalTaskConfig

	# This is how you create a simple task (like HellaSwag) which has one single subset
	# attached to it, and one evaluation possible.
	task = LightevalTaskConfig(
	name="myothertask",
	prompt_function=prompt_fn, # Must be defined in the file or imported
	suite=["community"],
	hf_repo="your_dataset_repo_on_hf",
	hf_subset="default",
	hf_avail_splits=["train", "test"],
	evaluation_splits=["test"],
	few_shots_split="train",
	few_shots_select="random_sampling_from_train",
	metrics=[metric], # Select your metric in Metrics
	generation_size=256,
	stop_sequence=["\n", "Question:"],
	)
	```

	#### Task with Multiple Subsets

	If you want to create a task with multiple subsets, add them to the
	`SAMPLE_SUBSETS` list and create a task for each subset.

	```python
	SAMPLE_SUBSETS = ["subset1", "subset2", "subset3"] # List of all the subsets to use for this eval

	class CustomSubsetTask(LightevalTaskConfig):
	def __init__(
	self,
	name,
	hf_subset,
	):
	super().__init__(
	name=name,
	hf_subset=hf_subset,
	prompt_function=prompt_fn, # Must be defined in the file or imported
	hf_repo="your_dataset_name",
	metrics=[custom_metric], # Select your metric in Metrics or use your custom_metric
	hf_avail_splits=["train", "test"],
	evaluation_splits=["test"],
	few_shots_split="train",
	few_shots_select="random_sampling_from_train",
	suite=["community"],
	generation_size=256,
	stop_sequence=["\n", "Question:"],
	)

	SUBSET_TASKS = [CustomSubsetTask(name=f"mytask:{subset}", hf_subset=subset) for subset in SAMPLE_SUBSETS]
	```

	### Step 5: Add Tasks to the Table

	Then you need to add your task to the `TASKS_TABLE` list.

	```python
	# STORE YOUR EVALS

	# Tasks with subsets:
	TASKS_TABLE = SUBSET_TASKS

	# Tasks without subsets:
	# TASKS_TABLE = [task]
	```

	### Step 6: Creating a requirement file

	If your task has requirements, you need to create a `requirement.txt` file with
	only the required dependencies so that anyone can run your task.

	## Running Your Custom Task

	Once your file is created, you can run the evaluation with the following command:

	```bash
	lighteval accelerate \
	"model_name=HuggingFaceH4/zephyr-7b-beta" \
	"community\|{custom_task}\|{fewshots}" \
	--custom-tasks {path_to_your_custom_task_file}
	```

	### Example Usage

	```bash
	# Run a custom task with zero-shot evaluation
	lighteval accelerate \
	"model_name=openai-community/gpt2" \
	"community\|myothertask\|0" \
	--custom-tasks community_tasks/my_custom_task.py

	# Run a custom task with few-shot evaluation
	lighteval accelerate \
	"model_name=openai-community/gpt2" \
	"community\|myothertask\|3" \
	--custom-tasks community_tasks/my_custom_task.py
	```


	<EditOnGithub source="https://github.com/huggingface/lighteval/blob/main/docs/source/adding-a-custom-task.mdx" />

Xet Storage Details

Size:: 6.19 kB
Xet hash:: 4c6a8bd223a3ffbd021cbbbe2080e78da2a3ef48baa744659a7a2e599cfdc46d

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.