Buckets:

hf-doc-build
/

doc-dev

Files

xet

hf-doc-build/doc-dev / lighteval /pr_1221 /en /using-the-python-api.md

HuggingFaceDocBuilder

6 days ago

preview code

download

raw

3.56 kB

	# Using the Python API

	Lighteval can be used from a custom Python script. To evaluate a model, you will need to set up an
	[EvaluationTracker](/docs/lighteval/pr_1221/en/package_reference/logging#lighteval.logging.evaluation_tracker.EvaluationTracker), [PipelineParameters](/docs/lighteval/pr_1221/en/package_reference/pipeline#lighteval.pipeline.PipelineParameters),
	a [`model`](package_reference/models) or a [`model_config`](package_reference/model_config),
	and a [Pipeline](/docs/lighteval/pr_1221/en/package_reference/pipeline#lighteval.pipeline.Pipeline).

	After that, simply run the pipeline and save the results.

	```python
	import lighteval
	from lighteval.logging.evaluation_tracker import EvaluationTracker
	from lighteval.models.vllm.vllm_model import VLLMModelConfig
	from lighteval.pipeline import ParallelismManager, Pipeline, PipelineParameters
	from lighteval.utils.imports import is_package_available

	if is_package_available("accelerate"):
	from datetime import timedelta
	from accelerate import Accelerator, InitProcessGroupKwargs
	accelerator = Accelerator(kwargs_handlers=[InitProcessGroupKwargs(timeout=timedelta(seconds=3000))])
	else:
	accelerator = None

	def main():
	evaluation_tracker = EvaluationTracker(
	output_dir="./results",
	save_details=True,
	push_to_hub=True,
	hub_results_org="your_username", # Replace with your actual username
	)

	pipeline_params = PipelineParameters(
	launcher_type=ParallelismManager.ACCELERATE,
	custom_tasks_directory=None, # Set to path if using custom tasks
	# Remove the parameter below once your configuration is tested
	max_samples=10
	)

	model_config = VLLMModelConfig(
	model_name="HuggingFaceH4/zephyr-7b-beta",
	dtype="float16",
	)

	task = "gsm8k\|5"

	pipeline = Pipeline(
	tasks=task,
	pipeline_parameters=pipeline_params,
	evaluation_tracker=evaluation_tracker,
	model_config=model_config,
	)

	pipeline.evaluate()
	pipeline.save_and_push_results()
	pipeline.show_results()

	if __name__ == "__main__":
	main()
	```

	## Key Components

	### EvaluationTracker
	The `EvaluationTracker` handles logging and saving evaluation results. It can save results locally and optionally push them to the Hugging Face Hub.

	### PipelineParameters
	`PipelineParameters` configures how the evaluation pipeline runs, including parallelism settings and task configuration.

	### Model Configuration
	Model configurations define the model to be evaluated, including the model name, data type, and other model-specific parameters. Different backends (VLLM, Transformers, etc.) have their own configuration classes.

	### Pipeline
	The `Pipeline` orchestrates the entire evaluation process, taking the tasks, model configuration, and parameters to run the evaluation.

	## Running Multiple Tasks

	You can evaluate on multiple tasks by providing a comma-separated list or a file path:

	```python
	# Multiple tasks as comma-separated string
	tasks = "aime24,aime25"

	# Or load from a file
	tasks = "./path/to/tasks.txt"

	pipeline = Pipeline(
	tasks=tasks,
	# ... other parameters
	)
	```

	## Custom Tasks

	To use custom tasks, set the `custom_tasks_directory` parameter to the path containing your custom task definitions:

	```python
	pipeline_params = PipelineParameters(
	custom_tasks_directory="./path/to/custom/tasks",
	# ... other parameters
	)
	```

	For more information on creating custom tasks, see the [Adding a Custom Task](adding-a-custom-task) guide.

Xet Storage Details

Size:: 3.56 kB
Xet hash:: 4354c685eaed81a74526485392c2d301dcd20453b958638ff108cb4d06fc71ec

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.