Buckets:

hf-doc-build
/

doc-dev

Files

xet

hf-doc-build/doc-dev / lighteval /pr_1221 /en /evaluating-a-custom-model.md

HuggingFaceDocBuilder

12 days ago

preview code

download

raw

6.36 kB

	# Evaluating Custom Models

	Lighteval allows you to evaluate custom model implementations by creating a custom model class that inherits from `LightevalModel`.
	This is useful when you want to evaluate models that aren't directly supported by the standard backends and providers (Transformers, VLLM, etc.), or
	if you want to add your own pre/post-processing logic.

	## Creating a Custom Model

	### Step 1: Create Your Model Implementation

	Create a Python file containing your custom model implementation. The model must inherit from `LightevalModel` and implement all required methods.

	Here's a basic example:

	```python
	from lighteval.models.abstract_model import LightevalModel
	from lighteval.models.model_output import ModelResponse
	from lighteval.tasks.requests import Doc, SamplingMethod
	from lighteval.utils.cache_management import SampleCache, cached

	class MyCustomModel(LightevalModel):
	def __init__(self, config):
	super().__init__(config)
	# Initialize your model here...

	# Enable caching (recommended)
	self._cache = SampleCache(config)

	@cached(SamplingMethod.GENERATIVE)
	def greedy_until(self, docs: List[Doc]) -> List[ModelResponse]:
	# Implement generation logic
	pass

	@cached(SamplingMethod.LOGPROBS)
	def loglikelihood(self, docs: List[Doc]) -> List[ModelResponse]:
	# Implement loglikelihood computation
	pass

	@cached(SamplingMethod.PERPLEXITY)
	def loglikelihood_rolling(self, docs: List[Doc]) -> List[ModelResponse]:
	# Implement rolling loglikelihood computation
	pass
	```

	### Step 2: Model File Requirements

	The custom model file should contain exactly one class that inherits from `LightevalModel`. This class will be automatically detected and instantiated when loading the model.

	> [!TIP]
	> You can find a complete example of a custom model implementation in `examples/custom_models/google_translate_model.py`.

	## Running the Evaluation

	You can evaluate your custom model using either the command-line interface or the Python API.

	### Using the Command Line

	```bash
	lighteval custom \
	"google-translate" \
	"examples/custom_models/google_translate_model.py" \
	"wmt20:fr-de \
	--max-samples 10
	```

	The command takes three required arguments:
	- Model name: Used for tracking in results/logs
	- Model implementation file path: Path to your Python file containing the custom model
	- Tasks: Tasks to evaluate on (same format as other backends)

	### Using the Python API

	```python
	from lighteval.logging.evaluation_tracker import EvaluationTracker
	from lighteval.models.custom.custom_model import CustomModelConfig
	from lighteval.pipeline import Pipeline, PipelineParameters, ParallelismManager

	# Set up evaluation tracking
	evaluation_tracker = EvaluationTracker(
	output_dir="results",
	save_details=True
	)

	# Configure the pipeline
	pipeline_params = PipelineParameters(
	launcher_type=ParallelismManager.CUSTOM,
	)

	# Configure your custom model
	model_config = CustomModelConfig(
	model_name="my-custom-model",
	model_definition_file_path="path/to/my_model.py"
	)

	# Create and run the pipeline
	pipeline = Pipeline(
	tasks=truthfulqa:mc,
	pipeline_parameters=pipeline_params,
	evaluation_tracker=evaluation_tracker,
	model_config=model_config
	)

	pipeline.evaluate()
	pipeline.save_and_push_results()
	```

	## Required Methods

	Your custom model must implement these core methods:

	### `greedy_until`
	For generating text until a stop sequence or max tokens is reached. This is used for generative evaluations.

	```python
	def greedy_until(self, docs: list[Doc]) -> list[ModelResponse]:
	"""
	Generate text until stop sequence or max tokens.

	Args:
	docs: list of documents containing prompts and generation parameters

	Returns:
	list of model responses with generated text
	"""
	pass
	```

	### `loglikelihood`
	For computing log probabilities of specific continuations. This is used for multiple choice logprob evaluations.

	```python
	def loglikelihood(self, docs: list[Doc]) -> list[ModelResponse]:
	"""
	Compute log probabilities of continuations.

	Args:
	docs: list of documents containing context and continuation pairs

	Returns:
	list of model responses with log probabilities
	"""
	pass
	```

	### `loglikelihood_rolling`
	For computing rolling log probabilities of sequences. This is used for perplexity metrics.

	```python
	def loglikelihood_rolling(self, docs: list[Doc]) -> list[ModelResponse]:
	"""
	Compute rolling log probabilities of sequences.

	Args:
	docs: list of documents containing text sequences

	Returns:
	list of model responses with rolling log probabilities
	"""
	pass
	```

	See the `LightevalModel` base class documentation for detailed method signatures and requirements.

	## Enabling Caching (Recommended)

	Lighteval includes a caching system that can significantly speed up evaluations by storing and reusing model predictions.
	To enable caching in your custom model:

	### Step 1: Import Caching Components
	```python
	from lighteval.utils.cache_management import SampleCache, cached
	```

	### Step 2: Initialize Cache in Constructor
	```python
	def __init__(self, config):
	super().__init__(config)
	# Your initialization code...
	self._cache = SampleCache(config)
	```

	3. Add cache decorators to your prediction methods:
	```python
	@cached(SamplingMethod.GENERATIVE)
	def greedy_until(self, docs: List[Doc]) -> List[ModelResponse]:
	# Your implementation...
	```

	For detailed information about the caching system, see the [Caching Documentation](caching).

	## Troubleshooting

	### Common Issues

	1. Import Errors: Ensure all required dependencies are installed
	2. Method Signature Errors: Verify your methods match the expected signatures
	3. Caching Issues: Check that cache decorators are applied correctly
	4. Performance Issues: Consider implementing batching and caching

	### Debugging Tips

	- Use the `--max-samples` flag to test with a small dataset
	- Enable detailed logging to see what's happening
	- Test individual methods in isolation
	- Check the example implementations for reference

	For more detailed information about custom model implementation, see the [Model Reference](package_reference/models).

Xet Storage Details

Size:: 6.36 kB
Xet hash:: c7038906b89ccbb804f20e089102ad2ffd1a5db4e3929068f5ddb58cf189fca8

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.