Buckets:

hf-doc-build/doc-dev / lighteval /pr_1221 /en /using-the-python-api.md
|
download
raw
3.56 kB
# Using the Python API
Lighteval can be used from a custom Python script. To evaluate a model, you will need to set up an
[EvaluationTracker](/docs/lighteval/pr_1221/en/package_reference/logging#lighteval.logging.evaluation_tracker.EvaluationTracker), [PipelineParameters](/docs/lighteval/pr_1221/en/package_reference/pipeline#lighteval.pipeline.PipelineParameters),
a [`model`](package_reference/models) or a [`model_config`](package_reference/model_config),
and a [Pipeline](/docs/lighteval/pr_1221/en/package_reference/pipeline#lighteval.pipeline.Pipeline).
After that, simply run the pipeline and save the results.
```python
import lighteval
from lighteval.logging.evaluation_tracker import EvaluationTracker
from lighteval.models.vllm.vllm_model import VLLMModelConfig
from lighteval.pipeline import ParallelismManager, Pipeline, PipelineParameters
from lighteval.utils.imports import is_package_available
if is_package_available("accelerate"):
from datetime import timedelta
from accelerate import Accelerator, InitProcessGroupKwargs
accelerator = Accelerator(kwargs_handlers=[InitProcessGroupKwargs(timeout=timedelta(seconds=3000))])
else:
accelerator = None
def main():
evaluation_tracker = EvaluationTracker(
output_dir="./results",
save_details=True,
push_to_hub=True,
hub_results_org="your_username", # Replace with your actual username
)
pipeline_params = PipelineParameters(
launcher_type=ParallelismManager.ACCELERATE,
custom_tasks_directory=None, # Set to path if using custom tasks
# Remove the parameter below once your configuration is tested
max_samples=10
)
model_config = VLLMModelConfig(
model_name="HuggingFaceH4/zephyr-7b-beta",
dtype="float16",
)
task = "gsm8k|5"
pipeline = Pipeline(
tasks=task,
pipeline_parameters=pipeline_params,
evaluation_tracker=evaluation_tracker,
model_config=model_config,
)
pipeline.evaluate()
pipeline.save_and_push_results()
pipeline.show_results()
if __name__ == "__main__":
main()
```
## Key Components
### EvaluationTracker
The `EvaluationTracker` handles logging and saving evaluation results. It can save results locally and optionally push them to the Hugging Face Hub.
### PipelineParameters
`PipelineParameters` configures how the evaluation pipeline runs, including parallelism settings and task configuration.
### Model Configuration
Model configurations define the model to be evaluated, including the model name, data type, and other model-specific parameters. Different backends (VLLM, Transformers, etc.) have their own configuration classes.
### Pipeline
The `Pipeline` orchestrates the entire evaluation process, taking the tasks, model configuration, and parameters to run the evaluation.
## Running Multiple Tasks
You can evaluate on multiple tasks by providing a comma-separated list or a file path:
```python
# Multiple tasks as comma-separated string
tasks = "aime24,aime25"
# Or load from a file
tasks = "./path/to/tasks.txt"
pipeline = Pipeline(
tasks=tasks,
# ... other parameters
)
```
## Custom Tasks
To use custom tasks, set the `custom_tasks_directory` parameter to the path containing your custom task definitions:
```python
pipeline_params = PipelineParameters(
custom_tasks_directory="./path/to/custom/tasks",
# ... other parameters
)
```
For more information on creating custom tasks, see the [Adding a Custom Task](adding-a-custom-task) guide.

Xet Storage Details

Size:
3.56 kB
·
Xet hash:
4354c685eaed81a74526485392c2d301dcd20453b958638ff108cb4d06fc71ec

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.