Buckets:
| # Using the Python API | |
| Lighteval can be used from a custom Python script. To evaluate a model, you will need to set up an | |
| [EvaluationTracker](/docs/lighteval/pr_1221/en/package_reference/logging#lighteval.logging.evaluation_tracker.EvaluationTracker), [PipelineParameters](/docs/lighteval/pr_1221/en/package_reference/pipeline#lighteval.pipeline.PipelineParameters), | |
| a [`model`](package_reference/models) or a [`model_config`](package_reference/model_config), | |
| and a [Pipeline](/docs/lighteval/pr_1221/en/package_reference/pipeline#lighteval.pipeline.Pipeline). | |
| After that, simply run the pipeline and save the results. | |
| ```python | |
| import lighteval | |
| from lighteval.logging.evaluation_tracker import EvaluationTracker | |
| from lighteval.models.vllm.vllm_model import VLLMModelConfig | |
| from lighteval.pipeline import ParallelismManager, Pipeline, PipelineParameters | |
| from lighteval.utils.imports import is_package_available | |
| if is_package_available("accelerate"): | |
| from datetime import timedelta | |
| from accelerate import Accelerator, InitProcessGroupKwargs | |
| accelerator = Accelerator(kwargs_handlers=[InitProcessGroupKwargs(timeout=timedelta(seconds=3000))]) | |
| else: | |
| accelerator = None | |
| def main(): | |
| evaluation_tracker = EvaluationTracker( | |
| output_dir="./results", | |
| save_details=True, | |
| push_to_hub=True, | |
| hub_results_org="your_username", # Replace with your actual username | |
| ) | |
| pipeline_params = PipelineParameters( | |
| launcher_type=ParallelismManager.ACCELERATE, | |
| custom_tasks_directory=None, # Set to path if using custom tasks | |
| # Remove the parameter below once your configuration is tested | |
| max_samples=10 | |
| ) | |
| model_config = VLLMModelConfig( | |
| model_name="HuggingFaceH4/zephyr-7b-beta", | |
| dtype="float16", | |
| ) | |
| task = "gsm8k|5" | |
| pipeline = Pipeline( | |
| tasks=task, | |
| pipeline_parameters=pipeline_params, | |
| evaluation_tracker=evaluation_tracker, | |
| model_config=model_config, | |
| ) | |
| pipeline.evaluate() | |
| pipeline.save_and_push_results() | |
| pipeline.show_results() | |
| if __name__ == "__main__": | |
| main() | |
| ``` | |
| ## Key Components | |
| ### EvaluationTracker | |
| The `EvaluationTracker` handles logging and saving evaluation results. It can save results locally and optionally push them to the Hugging Face Hub. | |
| ### PipelineParameters | |
| `PipelineParameters` configures how the evaluation pipeline runs, including parallelism settings and task configuration. | |
| ### Model Configuration | |
| Model configurations define the model to be evaluated, including the model name, data type, and other model-specific parameters. Different backends (VLLM, Transformers, etc.) have their own configuration classes. | |
| ### Pipeline | |
| The `Pipeline` orchestrates the entire evaluation process, taking the tasks, model configuration, and parameters to run the evaluation. | |
| ## Running Multiple Tasks | |
| You can evaluate on multiple tasks by providing a comma-separated list or a file path: | |
| ```python | |
| # Multiple tasks as comma-separated string | |
| tasks = "aime24,aime25" | |
| # Or load from a file | |
| tasks = "./path/to/tasks.txt" | |
| pipeline = Pipeline( | |
| tasks=tasks, | |
| # ... other parameters | |
| ) | |
| ``` | |
| ## Custom Tasks | |
| To use custom tasks, set the `custom_tasks_directory` parameter to the path containing your custom task definitions: | |
| ```python | |
| pipeline_params = PipelineParameters( | |
| custom_tasks_directory="./path/to/custom/tasks", | |
| # ... other parameters | |
| ) | |
| ``` | |
| For more information on creating custom tasks, see the [Adding a Custom Task](adding-a-custom-task) guide. | |
Xet Storage Details
- Size:
- 3.56 kB
- Xet hash:
- 4354c685eaed81a74526485392c2d301dcd20453b958638ff108cb4d06fc71ec
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.