Buckets:
| # Evaluating Custom Models | |
| Lighteval allows you to evaluate custom model implementations by creating a custom model class that inherits from `LightevalModel`. | |
| This is useful when you want to evaluate models that aren't directly supported by the standard backends and providers (Transformers, VLLM, etc.), or | |
| if you want to add your own pre/post-processing logic. | |
| ## Creating a Custom Model | |
| ### Step 1: Create Your Model Implementation | |
| Create a Python file containing your custom model implementation. The model must inherit from `LightevalModel` and implement all required methods. | |
| Here's a basic example: | |
| ```python | |
| from lighteval.models.abstract_model import LightevalModel | |
| from lighteval.models.model_output import ModelResponse | |
| from lighteval.tasks.requests import Doc, SamplingMethod | |
| from lighteval.utils.cache_management import SampleCache, cached | |
| class MyCustomModel(LightevalModel): | |
| def __init__(self, config): | |
| super().__init__(config) | |
| # Initialize your model here... | |
| # Enable caching (recommended) | |
| self._cache = SampleCache(config) | |
| @cached(SamplingMethod.GENERATIVE) | |
| def greedy_until(self, docs: List[Doc]) -> List[ModelResponse]: | |
| # Implement generation logic | |
| pass | |
| @cached(SamplingMethod.LOGPROBS) | |
| def loglikelihood(self, docs: List[Doc]) -> List[ModelResponse]: | |
| # Implement loglikelihood computation | |
| pass | |
| @cached(SamplingMethod.PERPLEXITY) | |
| def loglikelihood_rolling(self, docs: List[Doc]) -> List[ModelResponse]: | |
| # Implement rolling loglikelihood computation | |
| pass | |
| ``` | |
| ### Step 2: Model File Requirements | |
| The custom model file should contain exactly one class that inherits from `LightevalModel`. This class will be automatically detected and instantiated when loading the model. | |
| > [!TIP] | |
| > You can find a complete example of a custom model implementation in `examples/custom_models/google_translate_model.py`. | |
| ## Running the Evaluation | |
| You can evaluate your custom model using either the command-line interface or the Python API. | |
| ### Using the Command Line | |
| ```bash | |
| lighteval custom \ | |
| "google-translate" \ | |
| "examples/custom_models/google_translate_model.py" \ | |
| "wmt20:fr-de \ | |
| --max-samples 10 | |
| ``` | |
| The command takes three required arguments: | |
| - **Model name**: Used for tracking in results/logs | |
| - **Model implementation file path**: Path to your Python file containing the custom model | |
| - **Tasks**: Tasks to evaluate on (same format as other backends) | |
| ### Using the Python API | |
| ```python | |
| from lighteval.logging.evaluation_tracker import EvaluationTracker | |
| from lighteval.models.custom.custom_model import CustomModelConfig | |
| from lighteval.pipeline import Pipeline, PipelineParameters, ParallelismManager | |
| # Set up evaluation tracking | |
| evaluation_tracker = EvaluationTracker( | |
| output_dir="results", | |
| save_details=True | |
| ) | |
| # Configure the pipeline | |
| pipeline_params = PipelineParameters( | |
| launcher_type=ParallelismManager.CUSTOM, | |
| ) | |
| # Configure your custom model | |
| model_config = CustomModelConfig( | |
| model_name="my-custom-model", | |
| model_definition_file_path="path/to/my_model.py" | |
| ) | |
| # Create and run the pipeline | |
| pipeline = Pipeline( | |
| tasks=truthfulqa:mc, | |
| pipeline_parameters=pipeline_params, | |
| evaluation_tracker=evaluation_tracker, | |
| model_config=model_config | |
| ) | |
| pipeline.evaluate() | |
| pipeline.save_and_push_results() | |
| ``` | |
| ## Required Methods | |
| Your custom model must implement these core methods: | |
| ### `greedy_until` | |
| For generating text until a stop sequence or max tokens is reached. This is used for generative evaluations. | |
| ```python | |
| def greedy_until(self, docs: list[Doc]) -> list[ModelResponse]: | |
| """ | |
| Generate text until stop sequence or max tokens. | |
| Args: | |
| docs: list of documents containing prompts and generation parameters | |
| Returns: | |
| list of model responses with generated text | |
| """ | |
| pass | |
| ``` | |
| ### `loglikelihood` | |
| For computing log probabilities of specific continuations. This is used for multiple choice logprob evaluations. | |
| ```python | |
| def loglikelihood(self, docs: list[Doc]) -> list[ModelResponse]: | |
| """ | |
| Compute log probabilities of continuations. | |
| Args: | |
| docs: list of documents containing context and continuation pairs | |
| Returns: | |
| list of model responses with log probabilities | |
| """ | |
| pass | |
| ``` | |
| ### `loglikelihood_rolling` | |
| For computing rolling log probabilities of sequences. This is used for perplexity metrics. | |
| ```python | |
| def loglikelihood_rolling(self, docs: list[Doc]) -> list[ModelResponse]: | |
| """ | |
| Compute rolling log probabilities of sequences. | |
| Args: | |
| docs: list of documents containing text sequences | |
| Returns: | |
| list of model responses with rolling log probabilities | |
| """ | |
| pass | |
| ``` | |
| See the `LightevalModel` base class documentation for detailed method signatures and requirements. | |
| ## Enabling Caching (Recommended) | |
| Lighteval includes a caching system that can significantly speed up evaluations by storing and reusing model predictions. | |
| To enable caching in your custom model: | |
| ### Step 1: Import Caching Components | |
| ```python | |
| from lighteval.utils.cache_management import SampleCache, cached | |
| ``` | |
| ### Step 2: Initialize Cache in Constructor | |
| ```python | |
| def __init__(self, config): | |
| super().__init__(config) | |
| # Your initialization code... | |
| self._cache = SampleCache(config) | |
| ``` | |
| 3. Add cache decorators to your prediction methods: | |
| ```python | |
| @cached(SamplingMethod.GENERATIVE) | |
| def greedy_until(self, docs: List[Doc]) -> List[ModelResponse]: | |
| # Your implementation... | |
| ``` | |
| For detailed information about the caching system, see the [Caching Documentation](caching). | |
| ## Troubleshooting | |
| ### Common Issues | |
| 1. **Import Errors**: Ensure all required dependencies are installed | |
| 2. **Method Signature Errors**: Verify your methods match the expected signatures | |
| 3. **Caching Issues**: Check that cache decorators are applied correctly | |
| 4. **Performance Issues**: Consider implementing batching and caching | |
| ### Debugging Tips | |
| - Use the `--max-samples` flag to test with a small dataset | |
| - Enable detailed logging to see what's happening | |
| - Test individual methods in isolation | |
| - Check the example implementations for reference | |
| For more detailed information about custom model implementation, see the [Model Reference](package_reference/models). | |
Xet Storage Details
- Size:
- 6.36 kB
- Xet hash:
- c7038906b89ccbb804f20e089102ad2ffd1a5db4e3929068f5ddb58cf189fca8
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.