Buckets:
| # Model Configs | |
| The model configs are used to define the model and its parameters. All the parameters can be | |
| set in the `model-args` or in the model yaml file (see example | |
| [here](https://github.com/huggingface/lighteval/blob/main/examples/model_configs/vllm_model_config.yaml)). | |
| ### Base model config[[lighteval.models.abstract_model.ModelConfig]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class lighteval.models.abstract_model.ModelConfig</name><anchor>lighteval.models.abstract_model.ModelConfig</anchor><source>https://github.com/huggingface/lighteval/blob/vr_980/src/lighteval/models/abstract_model.py#L41</source><parameters>[{"name": "model_name", "val": ": str = None"}, {"name": "generation_parameters", "val": ": GenerationParameters = GenerationParameters(num_blocks=None, block_size=None, early_stopping=None, repetition_penalty=None, frequency_penalty=None, length_penalty=None, presence_penalty=None, max_new_tokens=None, min_new_tokens=None, seed=None, stop_tokens=None, temperature=0, top_k=None, min_p=None, top_p=None, truncate_prompt=None, cache_implementation=None, response_format=None)"}, {"name": "system_prompt", "val": ": str | None = None"}, {"name": "cache_dir", "val": ": str = '~/.cache/huggingface/lighteval'"}]</parameters><paramsdesc>- **model_name** (str) -- | |
| The model name or unique id | |
| - **generation_parameters** (GenerationParameters) -- | |
| Configuration parameters that control text generation behavior, including | |
| temperature, top_p, max_new_tokens, etc. Defaults to empty GenerationParameters. | |
| - **system_prompt** (str | None) -- | |
| Optional system prompt to be used with chat models. This prompt sets the | |
| behavior and context for the model during evaluation. | |
| - **cache_dir** (str) -- | |
| Directory to cache the model. Defaults to "~/.cache/huggingface/lighteval".</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| Base configuration class for all model types in Lighteval. | |
| This is the foundation class that all specific model configurations inherit from. | |
| It provides common functionality for parsing configuration from files and command-line arguments, | |
| as well as shared attributes that are used by all models like generation parameters and system prompts. | |
| Methods: | |
| from_path(path: str): | |
| Load configuration from a YAML file. | |
| from_args(args: str): | |
| Parse configuration from a command-line argument string. | |
| _parse_args(args: str): | |
| Static method to parse argument strings into configuration dictionaries. | |
| <ExampleCodeBlock anchor="lighteval.models.abstract_model.ModelConfig.example"> | |
| Example: | |
| ```python | |
| # Load from YAML file | |
| config = ModelConfig.from_path("model_config.yaml") | |
| # Load from command line arguments | |
| config = ModelConfig.from_args("model_name=meta-llama/Llama-3.1-8B-Instruct,system_prompt='You are a helpful assistant.',generation_parameters={temperature=0.7}") | |
| # Direct instantiation | |
| config = ModelConfig( | |
| model_name="meta-llama/Llama-3.1-8B-Instruct", | |
| generation_parameters=GenerationParameters(temperature=0.7), | |
| system_prompt="You are a helpful assistant." | |
| ) | |
| ``` | |
| </ExampleCodeBlock> | |
| </div> | |
| ## Local Models | |
| ### Transformers Model[[lighteval.models.transformers.transformers_model.TransformersModelConfig]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class lighteval.models.transformers.transformers_model.TransformersModelConfig</name><anchor>lighteval.models.transformers.transformers_model.TransformersModelConfig</anchor><source>https://github.com/huggingface/lighteval/blob/vr_980/src/lighteval/models/transformers/transformers_model.py#L70</source><parameters>[{"name": "model_name", "val": ": str"}, {"name": "generation_parameters", "val": ": GenerationParameters = GenerationParameters(num_blocks=None, block_size=None, early_stopping=None, repetition_penalty=None, frequency_penalty=None, length_penalty=None, presence_penalty=None, max_new_tokens=None, min_new_tokens=None, seed=None, stop_tokens=None, temperature=0, top_k=None, min_p=None, top_p=None, truncate_prompt=None, cache_implementation=None, response_format=None)"}, {"name": "system_prompt", "val": ": str | None = None"}, {"name": "cache_dir", "val": ": str = '~/.cache/huggingface/lighteval'"}, {"name": "tokenizer", "val": ": str | None = None"}, {"name": "subfolder", "val": ": str | None = None"}, {"name": "revision", "val": ": str = 'main'"}, {"name": "batch_size", "val": ": typing.Optional[typing.Annotated[int, Gt(gt=0)]] = None"}, {"name": "max_length", "val": ": typing.Optional[typing.Annotated[int, Gt(gt=0)]] = None"}, {"name": "model_loading_kwargs", "val": ": dict = <factory>"}, {"name": "add_special_tokens", "val": ": bool = True"}, {"name": "skip_special_tokens", "val": ": bool = True"}, {"name": "model_parallel", "val": ": bool | None = None"}, {"name": "dtype", "val": ": str | None = None"}, {"name": "device", "val": ": typing.Union[int, str] = 'cuda'"}, {"name": "trust_remote_code", "val": ": bool = False"}, {"name": "compile", "val": ": bool = False"}, {"name": "multichoice_continuations_start_space", "val": ": bool | None = None"}, {"name": "pairwise_tokenization", "val": ": bool = False"}, {"name": "continuous_batching", "val": ": bool = False"}, {"name": "override_chat_template", "val": ": bool = None"}]</parameters><paramsdesc>- **model_name** (str) -- | |
| HuggingFace Hub model ID or path to a pre-trained model. This corresponds to the | |
| `pretrained_model_name_or_path` argument in HuggingFace's `from_pretrained` method. | |
| - **tokenizer** (str | None) -- | |
| Optional HuggingFace Hub tokenizer ID. If not specified, uses the same ID as model_name. | |
| Useful when the tokenizer is different from the model (e.g., for multilingual models). | |
| - **subfolder** (str | None) -- | |
| Subfolder within the model repository. Used when models are stored in subdirectories. | |
| - **revision** (str) -- | |
| Git revision of the model to load. Defaults to "main". | |
| - **batch_size** (PositiveInt | None) -- | |
| Batch size for model inference. If None, will be automatically determined. | |
| - **max_length** (PositiveInt | None) -- | |
| Maximum sequence length for the model. If None, uses model's default. | |
| - **model_loading_kwargs** (dict) -- | |
| Additional keyword arguments passed to `from_pretrained`. Defaults to empty dict. | |
| - **add_special_tokens** (bool) -- | |
| Whether to add special tokens during tokenization. Defaults to True. | |
| - **skip_special_tokens** (bool) -- | |
| Whether the tokenizer should output special tokens back during generation. Needed for reasoning models. Defaults to True | |
| - **model_parallel** (bool | None) -- | |
| Whether to use model parallelism across multiple GPUs. If None, automatically | |
| determined based on available GPUs and model size. | |
| - **dtype** (str | None) -- | |
| Data type for model weights. Can be "float16", "bfloat16", "float32", "auto", "4bit", "8bit". | |
| If "auto", uses the model's default dtype. | |
| - **device** (Union[int, str]) -- | |
| Device to load the model on. Can be "cuda", "cpu", or GPU index. Defaults to "cuda". | |
| - **trust_remote_code** (bool) -- | |
| Whether to trust remote code when loading models. Defaults to False. | |
| - **compile** (bool) -- | |
| Whether to compile the model using torch.compile for optimization. Defaults to False. | |
| - **multichoice_continuations_start_space** (bool | None) -- | |
| Whether to add a space before multiple choice continuations. If None, uses model default. | |
| True forces adding space, False removes leading space if present. | |
| - **pairwise_tokenization** (bool) -- | |
| Whether to tokenize context and continuation separately or together. Defaults to False. | |
| - **continuous_batching** (bool) -- | |
| Whether to use continuous batching for generation. Defaults to False. | |
| - **override_chat_template** (bool) -- | |
| If True, we force the model to use a chat template. If alse, we prevent the model from using | |
| a chat template. If None, we use the default (true if present in the tokenizer, false otherwise) | |
| - **generation_parameters** (GenerationParameters, optional, defaults to empty GenerationParameters) -- | |
| Configuration parameters that control text generation behavior, including | |
| temperature, top_p, max_new_tokens, etc. | |
| - **system_prompt** (str | None, optional, defaults to None) -- Optional system prompt to be used with chat models. | |
| This prompt sets the behavior and context for the model during evaluation. | |
| - **cache_dir** (str, optional, defaults to "~/.cache/huggingface/lighteval") -- Directory to cache the model.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| Configuration class for HuggingFace Transformers models. | |
| This configuration is used to load and configure models from the HuggingFace Transformers library. | |
| <ExampleCodeBlock anchor="lighteval.models.transformers.transformers_model.TransformersModelConfig.example"> | |
| Example: | |
| ```python | |
| config = TransformersModelConfig( | |
| model_name="meta-llama/Llama-3.1-8B-Instruct", | |
| batch_size=4, | |
| dtype="float16", | |
| generation_parameters=GenerationParameters( | |
| temperature=0.7, | |
| max_new_tokens=100 | |
| ) | |
| ) | |
| ``` | |
| </ExampleCodeBlock> | |
| Note: | |
| This configuration supports quantization (4-bit and 8-bit) through the dtype parameter. | |
| When using quantization, ensure you have the required dependencies installed | |
| (bitsandbytes for 4-bit/8-bit quantization). | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class lighteval.models.transformers.adapter_model.requires.<locals>.inner_fn.<locals>.Placeholder</name><anchor>lighteval.models.transformers.adapter_model.requires.<locals>.inner_fn.<locals>.Placeholder</anchor><source>https://github.com/huggingface/lighteval/blob/vr_980/src/lighteval/models/transformers/adapter_model.py#L43</source><parameters>[{"name": "*args", "val": ""}, {"name": "**kwargs", "val": ""}]</parameters></docstring> | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class lighteval.models.transformers.delta_model.DeltaModelConfig</name><anchor>lighteval.models.transformers.delta_model.DeltaModelConfig</anchor><source>https://github.com/huggingface/lighteval/blob/vr_980/src/lighteval/models/transformers/delta_model.py#L38</source><parameters>[{"name": "model_name", "val": ": str"}, {"name": "generation_parameters", "val": ": GenerationParameters = GenerationParameters(num_blocks=None, block_size=None, early_stopping=None, repetition_penalty=None, frequency_penalty=None, length_penalty=None, presence_penalty=None, max_new_tokens=None, min_new_tokens=None, seed=None, stop_tokens=None, temperature=0, top_k=None, min_p=None, top_p=None, truncate_prompt=None, cache_implementation=None, response_format=None)"}, {"name": "system_prompt", "val": ": str | None = None"}, {"name": "cache_dir", "val": ": str = '~/.cache/huggingface/lighteval'"}, {"name": "tokenizer", "val": ": str | None = None"}, {"name": "subfolder", "val": ": str | None = None"}, {"name": "revision", "val": ": str = 'main'"}, {"name": "batch_size", "val": ": typing.Optional[typing.Annotated[int, Gt(gt=0)]] = None"}, {"name": "max_length", "val": ": typing.Optional[typing.Annotated[int, Gt(gt=0)]] = None"}, {"name": "model_loading_kwargs", "val": ": dict = <factory>"}, {"name": "add_special_tokens", "val": ": bool = True"}, {"name": "skip_special_tokens", "val": ": bool = True"}, {"name": "model_parallel", "val": ": bool | None = None"}, {"name": "dtype", "val": ": str | None = None"}, {"name": "device", "val": ": typing.Union[int, str] = 'cuda'"}, {"name": "trust_remote_code", "val": ": bool = False"}, {"name": "compile", "val": ": bool = False"}, {"name": "multichoice_continuations_start_space", "val": ": bool | None = None"}, {"name": "pairwise_tokenization", "val": ": bool = False"}, {"name": "continuous_batching", "val": ": bool = False"}, {"name": "override_chat_template", "val": ": bool = None"}, {"name": "base_model", "val": ": str"}]</parameters><paramsdesc>- **base_model** (str) -- | |
| HuggingFace Hub model ID or path to the base model. This is the original | |
| pre-trained model that the delta was computed from.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| Configuration class for delta models (weight difference models). | |
| This configuration is used to load models that represent the difference between a | |
| fine-tuned model and its base model. The delta weights are added to the base model | |
| during loading to reconstruct the full fine-tuned model. | |
| </div> | |
| ### VLLM Model[[lighteval.models.vllm.vllm_model.VLLMModelConfig]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class lighteval.models.vllm.vllm_model.VLLMModelConfig</name><anchor>lighteval.models.vllm.vllm_model.VLLMModelConfig</anchor><source>https://github.com/huggingface/lighteval/blob/vr_980/src/lighteval/models/vllm/vllm_model.py#L76</source><parameters>[{"name": "model_name", "val": ": str"}, {"name": "generation_parameters", "val": ": GenerationParameters = GenerationParameters(num_blocks=None, block_size=None, early_stopping=None, repetition_penalty=None, frequency_penalty=None, length_penalty=None, presence_penalty=None, max_new_tokens=None, min_new_tokens=None, seed=None, stop_tokens=None, temperature=0, top_k=None, min_p=None, top_p=None, truncate_prompt=None, cache_implementation=None, response_format=None)"}, {"name": "system_prompt", "val": ": str | None = None"}, {"name": "cache_dir", "val": ": str = '~/.cache/huggingface/lighteval'"}, {"name": "tokenizer", "val": ": str | None = None"}, {"name": "revision", "val": ": str = 'main'"}, {"name": "dtype", "val": ": str = 'bfloat16'"}, {"name": "tensor_parallel_size", "val": ": typing.Annotated[int, Gt(gt=0)] = 1"}, {"name": "data_parallel_size", "val": ": typing.Annotated[int, Gt(gt=0)] = 1"}, {"name": "pipeline_parallel_size", "val": ": typing.Annotated[int, Gt(gt=0)] = 1"}, {"name": "gpu_memory_utilization", "val": ": typing.Annotated[float, Ge(ge=0)] = 0.9"}, {"name": "enable_prefix_caching", "val": ": bool = None"}, {"name": "max_model_length", "val": ": typing.Optional[typing.Annotated[int, Gt(gt=0)]] = None"}, {"name": "quantization", "val": ": str | None = None"}, {"name": "load_format", "val": ": str | None = None"}, {"name": "swap_space", "val": ": typing.Annotated[int, Gt(gt=0)] = 4"}, {"name": "seed", "val": ": typing.Annotated[int, Ge(ge=0)] = 1234"}, {"name": "trust_remote_code", "val": ": bool = False"}, {"name": "add_special_tokens", "val": ": bool = True"}, {"name": "multichoice_continuations_start_space", "val": ": bool = True"}, {"name": "pairwise_tokenization", "val": ": bool = False"}, {"name": "max_num_seqs", "val": ": typing.Annotated[int, Gt(gt=0)] = 128"}, {"name": "max_num_batched_tokens", "val": ": typing.Annotated[int, Gt(gt=0)] = 2048"}, {"name": "subfolder", "val": ": str | None = None"}, {"name": "is_async", "val": ": bool = False"}, {"name": "override_chat_template", "val": ": bool = None"}]</parameters><paramsdesc>- **model_name** (str) -- | |
| HuggingFace Hub model ID or path to the model to load. | |
| - **tokenizer** (str | None) -- | |
| HuggingFace Hub model ID or path to the tokenizer to load. | |
| - **revision** (str) -- | |
| Git revision of the model. Defaults to "main". | |
| - **dtype** (str) -- | |
| Data type for model weights. Defaults to "bfloat16". Options: "float16", "bfloat16", "float32". | |
| - **tensor_parallel_size** (PositiveInt) -- | |
| Number of GPUs to use for tensor parallelism. Defaults to 1. | |
| - **data_parallel_size** (PositiveInt) -- | |
| Number of GPUs to use for data parallelism. Defaults to 1. | |
| - **pipeline_parallel_size** (PositiveInt) -- | |
| Number of GPUs to use for pipeline parallelism. Defaults to 1. | |
| - **gpu_memory_utilization** (NonNegativeFloat) -- | |
| Fraction of GPU memory to use. Lower this if running out of memory. Defaults to 0.9. | |
| - **enable_prefix_caching** (bool) -- | |
| Whether to enable prefix caching to speed up generation. May use more memory. Should be disabled for LFM2. Defaults to True. | |
| - **max_model_length** (PositiveInt | None) -- | |
| Maximum sequence length for the model. If None, automatically inferred. | |
| Reduce this if encountering OOM issues (4096 is usually sufficient). | |
| - **quantization** (str | None) -- | |
| Quantization method. | |
| - **load_format** (str | None) -- | |
| The format of the model weights to load. choices: auto, pt, safetensors, npcache, dummy, tensorizer, sharded_state, gguf, bitsandbytes, mistral, runai_streamer. | |
| - **swap_space** (PositiveInt) -- | |
| CPU swap space size in GiB per GPU. Defaults to 4. | |
| - **seed** (NonNegativeInt) -- | |
| Random seed for reproducibility. Defaults to 1234. | |
| - **trust_remote_code** (bool) -- | |
| Whether to trust remote code when loading models. Defaults to False. | |
| - **add_special_tokens** (bool) -- | |
| Whether to add special tokens during tokenization. Defaults to True. | |
| - **multichoice_continuations_start_space** (bool) -- | |
| Whether to add a space before multiple choice continuations. Defaults to True. | |
| - **pairwise_tokenization** (bool) -- | |
| Whether to tokenize context and continuation separately for loglikelihood evals. Defaults to False. | |
| - **max_num_seqs** (PositiveInt) -- | |
| Maximum number of sequences per iteration. Controls batch size at prefill stage. Defaults to 128. | |
| - **max_num_batched_tokens** (PositiveInt) -- | |
| Maximum number of tokens per batch. Defaults to 2048. | |
| - **subfolder** (str | None) -- | |
| Subfolder within the model repository. Defaults to None. | |
| - **is_async** (bool) -- | |
| Whether to use the async version of VLLM. Defaults to False. | |
| - **override_chat_template** (bool) -- | |
| If True, we force the model to use a chat template. If alse, we prevent the model from using | |
| a chat template. If None, we use the default (true if present in the tokenizer, false otherwise) | |
| - **generation_parameters** (GenerationParameters, optional, defaults to empty GenerationParameters) -- | |
| Configuration parameters that control text generation behavior, including | |
| temperature, top_p, max_new_tokens, etc. | |
| - **system_prompt** (str | None, optional, defaults to None) -- Optional system prompt to be used with chat models. | |
| This prompt sets the behavior and context for the model during evaluation. | |
| - **cache_dir** (str, optional, defaults to "~/.cache/huggingface/lighteval") -- Directory to cache the model.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| Configuration class for VLLM inference engine. | |
| This configuration is used to load and configure models using the VLLM inference engine, | |
| which provides high-performance inference for large language models with features like | |
| PagedAttention, continuous batching, and efficient memory management. | |
| vllm doc: https://docs.vllm.ai/en/v0.7.1/serving/engine_args.html | |
| <ExampleCodeBlock anchor="lighteval.models.vllm.vllm_model.VLLMModelConfig.example"> | |
| Example: | |
| ```python | |
| config = VLLMModelConfig( | |
| model_name="meta-llama/Llama-3.1-8B-Instruct", | |
| tensor_parallel_size=2, | |
| gpu_memory_utilization=0.8, | |
| max_model_length=4096, | |
| generation_parameters=GenerationParameters( | |
| temperature=0.7, | |
| max_new_tokens=100 | |
| ) | |
| ) | |
| ``` | |
| </ExampleCodeBlock> | |
| </div> | |
| ### SGLang Model[[lighteval.models.sglang.sglang_model.SGLangModelConfig]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class lighteval.models.sglang.sglang_model.SGLangModelConfig</name><anchor>lighteval.models.sglang.sglang_model.SGLangModelConfig</anchor><source>https://github.com/huggingface/lighteval/blob/vr_980/src/lighteval/models/sglang/sglang_model.py#L54</source><parameters>[{"name": "model_name", "val": ": str"}, {"name": "generation_parameters", "val": ": GenerationParameters = GenerationParameters(num_blocks=None, block_size=None, early_stopping=None, repetition_penalty=None, frequency_penalty=None, length_penalty=None, presence_penalty=None, max_new_tokens=None, min_new_tokens=None, seed=None, stop_tokens=None, temperature=0, top_k=None, min_p=None, top_p=None, truncate_prompt=None, cache_implementation=None, response_format=None)"}, {"name": "system_prompt", "val": ": str | None = None"}, {"name": "cache_dir", "val": ": str = '~/.cache/huggingface/lighteval'"}, {"name": "load_format", "val": ": str = 'auto'"}, {"name": "dtype", "val": ": str = 'auto'"}, {"name": "tp_size", "val": ": typing.Annotated[int, Gt(gt=0)] = 1"}, {"name": "dp_size", "val": ": typing.Annotated[int, Gt(gt=0)] = 1"}, {"name": "context_length", "val": ": typing.Optional[typing.Annotated[int, Gt(gt=0)]] = None"}, {"name": "random_seed", "val": ": typing.Optional[typing.Annotated[int, Gt(gt=0)]] = 1234"}, {"name": "trust_remote_code", "val": ": bool = False"}, {"name": "device", "val": ": str = 'cuda'"}, {"name": "skip_tokenizer_init", "val": ": bool = False"}, {"name": "kv_cache_dtype", "val": ": str = 'auto'"}, {"name": "add_special_tokens", "val": ": bool = True"}, {"name": "pairwise_tokenization", "val": ": bool = False"}, {"name": "sampling_backend", "val": ": str | None = None"}, {"name": "attention_backend", "val": ": str | None = None"}, {"name": "mem_fraction_static", "val": ": typing.Annotated[float, Gt(gt=0)] = 0.8"}, {"name": "chunked_prefill_size", "val": ": typing.Annotated[int, Gt(gt=0)] = 4096"}, {"name": "override_chat_template", "val": ": bool = None"}]</parameters><paramsdesc>- **model_name** (str) -- | |
| HuggingFace Hub model ID or path to the model to load. | |
| - **load_format** (str) -- | |
| The format of the model weights to load. choices: auto, pt, safetensors, npcache, dummy, tensorizer, sharded_state, gguf, bitsandbytes, mistral, runai_streamer. | |
| - **dtype** (str) -- | |
| Data type for model weights. Defaults to "auto". Options: "auto", "float16", "bfloat16", "float32". | |
| - **tp_size** (PositiveInt) -- | |
| Number of GPUs to use for tensor parallelism. Defaults to 1. | |
| - **dp_size** (PositiveInt) -- | |
| Number of GPUs to use for data parallelism. Defaults to 1. | |
| - **context_length** (PositiveInt | None) -- | |
| Maximum context length for the model. | |
| - **random_seed** (PositiveInt | None) -- | |
| Random seed for reproducibility. Defaults to 1234. | |
| - **trust_remote_code** (bool) -- | |
| Whether to trust remote code when loading models. Defaults to False. | |
| - **device** (str) -- | |
| Device to load the model on. Defaults to "cuda". | |
| - **skip_tokenizer_init** (bool) -- | |
| Whether to skip tokenizer initialization. Defaults to False. | |
| - **kv_cache_dtype** (str) -- | |
| Data type for key-value cache. Defaults to "auto". | |
| - **add_special_tokens** (bool) -- | |
| Whether to add special tokens during tokenization. Defaults to True. | |
| - **pairwise_tokenization** (bool) -- | |
| Whether to tokenize context and continuation separately for loglikelihood evals. Defaults to False. | |
| - **sampling_backend** (str | None) -- | |
| Sampling backend to use. If None, uses default. | |
| - **attention_backend** (str | None) -- | |
| Attention backend to use. If None, uses default. | |
| - **mem_fraction_static** (PositiveFloat) -- | |
| Fraction of GPU memory to use for static allocation. Defaults to 0.8. | |
| - **chunked_prefill_size** (PositiveInt) -- | |
| Size of chunks for prefill operations. Defaults to 4096. | |
| - **override_chat_template** (bool) -- | |
| If True, we force the model to use a chat template. If alse, we prevent the model from using | |
| a chat template. If None, we use the default (true if present in the tokenizer, false otherwise) | |
| - **generation_parameters** (GenerationParameters, optional, defaults to empty GenerationParameters) -- | |
| Configuration parameters that control text generation behavior, including | |
| temperature, top_p, max_new_tokens, etc. | |
| - **system_prompt** (str | None, optional, defaults to None) -- Optional system prompt to be used with chat models. | |
| This prompt sets the behavior and context for the model during evaluation. | |
| - **cache_dir** (str, optional, defaults to "~/.cache/huggingface/lighteval") -- Directory to cache the model.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| Configuration class for SGLang inference engine. | |
| This configuration is used to load and configure models using the SGLang inference engine, | |
| which provides high-performance inference. | |
| sglang doc: https://docs.sglang.ai/index.html# | |
| <ExampleCodeBlock anchor="lighteval.models.sglang.sglang_model.SGLangModelConfig.example"> | |
| Example: | |
| ```python | |
| config = SGLangModelConfig( | |
| model_name="meta-llama/Llama-3.1-8B-Instruct", | |
| tp_size=2, | |
| context_length=8192, | |
| generation_parameters=GenerationParameters( | |
| temperature=0.7, | |
| max_new_tokens=100 | |
| ) | |
| ) | |
| ``` | |
| </ExampleCodeBlock> | |
| </div> | |
| ### Dummy Model[[lighteval.models.dummy.dummy_model.DummyModelConfig]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class lighteval.models.dummy.dummy_model.DummyModelConfig</name><anchor>lighteval.models.dummy.dummy_model.DummyModelConfig</anchor><source>https://github.com/huggingface/lighteval/blob/vr_980/src/lighteval/models/dummy/dummy_model.py#L35</source><parameters>[{"name": "model_name", "val": ": str = 'dummy'"}, {"name": "generation_parameters", "val": ": GenerationParameters = GenerationParameters(num_blocks=None, block_size=None, early_stopping=None, repetition_penalty=None, frequency_penalty=None, length_penalty=None, presence_penalty=None, max_new_tokens=None, min_new_tokens=None, seed=None, stop_tokens=None, temperature=0, top_k=None, min_p=None, top_p=None, truncate_prompt=None, cache_implementation=None, response_format=None)"}, {"name": "system_prompt", "val": ": str | None = None"}, {"name": "cache_dir", "val": ": str = '~/.cache/huggingface/lighteval'"}, {"name": "seed", "val": ": int = 42"}]</parameters><paramsdesc>- **model_name** (str) -- | |
| Name of your choice - "dummy" by default | |
| - **seed** (int) -- | |
| Random seed for reproducible dummy responses. Defaults to 42. | |
| This seed controls the randomness of the generated responses and log probabilities.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| Configuration class for dummy models used for testing and baselines. | |
| This configuration is used to create dummy models that generate random responses | |
| or baselines for evaluation purposes. Useful for testing evaluation pipelines | |
| without requiring actual model inference. | |
| <ExampleCodeBlock anchor="lighteval.models.dummy.dummy_model.DummyModelConfig.example"> | |
| Example: | |
| ```python | |
| config = DummyModelConfig( | |
| model_name="my_dummy", | |
| seed=123, | |
| ) | |
| ``` | |
| </ExampleCodeBlock> | |
| </div> | |
| ## Endpoints-based Models | |
| ### Inference Providers Model[[lighteval.models.endpoints.inference_providers_model.InferenceProvidersModelConfig]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class lighteval.models.endpoints.inference_providers_model.InferenceProvidersModelConfig</name><anchor>lighteval.models.endpoints.inference_providers_model.InferenceProvidersModelConfig</anchor><source>https://github.com/huggingface/lighteval/blob/vr_980/src/lighteval/models/endpoints/inference_providers_model.py#L45</source><parameters>[{"name": "model_name", "val": ": str"}, {"name": "generation_parameters", "val": ": GenerationParameters = GenerationParameters(num_blocks=None, block_size=None, early_stopping=None, repetition_penalty=None, frequency_penalty=None, length_penalty=None, presence_penalty=None, max_new_tokens=None, min_new_tokens=None, seed=None, stop_tokens=None, temperature=0, top_k=None, min_p=None, top_p=None, truncate_prompt=None, cache_implementation=None, response_format=None)"}, {"name": "system_prompt", "val": ": str | None = None"}, {"name": "cache_dir", "val": ": str = '~/.cache/huggingface/lighteval'"}, {"name": "provider", "val": ": str"}, {"name": "timeout", "val": ": int | None = None"}, {"name": "proxies", "val": ": typing.Optional[typing.Any] = None"}, {"name": "org_to_bill", "val": ": str | None = None"}, {"name": "parallel_calls_count", "val": ": typing.Annotated[int, Ge(ge=0)] = 10"}]</parameters><paramsdesc>- **model_name** (str) -- | |
| Name or identifier of the model to use. | |
| - **provider** (str) -- | |
| Name of the inference provider. Examples: "together", "anyscale", "runpod", etc. | |
| - **timeout** (int | None) -- | |
| Request timeout in seconds. If None, uses provider default. | |
| - **proxies** (Any | None) -- | |
| Proxy configuration for requests. Can be a dict or proxy URL string. | |
| - **org_to_bill** (str | None) -- | |
| Organization to bill for API usage. If None, bills the user's account. | |
| - **parallel_calls_count** (NonNegativeInt) -- | |
| Number of parallel API calls to make. Defaults to 10. | |
| Higher values increase throughput but may hit rate limits. | |
| - **generation_parameters** (GenerationParameters, optional, defaults to empty GenerationParameters) -- | |
| Configuration parameters that control text generation behavior, including | |
| temperature, top_p, max_new_tokens, etc. | |
| - **system_prompt** (str | None, optional, defaults to None) -- Optional system prompt to be used with chat models. | |
| This prompt sets the behavior and context for the model during evaluation. | |
| - **cache_dir** (str, optional, defaults to "~/.cache/huggingface/lighteval") -- Directory to cache the model.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| Configuration class for HuggingFace's inference providers (like Together AI, Anyscale, etc.). | |
| inference providers doc: https://huggingface.co/docs/inference-providers/en/index | |
| <ExampleCodeBlock anchor="lighteval.models.endpoints.inference_providers_model.InferenceProvidersModelConfig.example"> | |
| Example: | |
| ```python | |
| config = InferenceProvidersModelConfig( | |
| model_name="deepseek-ai/DeepSeek-R1-0528", | |
| provider="together", | |
| parallel_calls_count=5, | |
| generation_parameters=GenerationParameters( | |
| temperature=0.7, | |
| max_new_tokens=100 | |
| ) | |
| ) | |
| ``` | |
| </ExampleCodeBlock> | |
| Note: | |
| - Requires HF API keys to be set in environment variable | |
| - Different providers have different rate limits and pricing | |
| </div> | |
| ### InferenceEndpointModel[[lighteval.models.endpoints.endpoint_model.InferenceEndpointModelConfig]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class lighteval.models.endpoints.endpoint_model.InferenceEndpointModelConfig</name><anchor>lighteval.models.endpoints.endpoint_model.InferenceEndpointModelConfig</anchor><source>https://github.com/huggingface/lighteval/blob/vr_980/src/lighteval/models/endpoints/endpoint_model.py#L108</source><parameters>[{"name": "model_name", "val": ": str | None = None"}, {"name": "generation_parameters", "val": ": GenerationParameters = GenerationParameters(num_blocks=None, block_size=None, early_stopping=None, repetition_penalty=None, frequency_penalty=None, length_penalty=None, presence_penalty=None, max_new_tokens=None, min_new_tokens=None, seed=None, stop_tokens=None, temperature=0, top_k=None, min_p=None, top_p=None, truncate_prompt=None, cache_implementation=None, response_format=None)"}, {"name": "system_prompt", "val": ": str | None = None"}, {"name": "cache_dir", "val": ": str = '~/.cache/huggingface/lighteval'"}, {"name": "endpoint_name", "val": ": str | None = None"}, {"name": "reuse_existing", "val": ": bool = False"}, {"name": "accelerator", "val": ": str = 'gpu'"}, {"name": "dtype", "val": ": str | None = None"}, {"name": "vendor", "val": ": str = 'aws'"}, {"name": "region", "val": ": str = 'us-east-1'"}, {"name": "instance_size", "val": ": str | None = None"}, {"name": "instance_type", "val": ": str | None = None"}, {"name": "framework", "val": ": str = 'pytorch'"}, {"name": "endpoint_type", "val": ": str = 'protected'"}, {"name": "add_special_tokens", "val": ": bool = True"}, {"name": "revision", "val": ": str = 'main'"}, {"name": "namespace", "val": ": str | None = None"}, {"name": "image_url", "val": ": str | None = None"}, {"name": "env_vars", "val": ": dict | None = None"}, {"name": "batch_size", "val": ": int = 1"}]</parameters><paramsdesc>- **endpoint_name** (str | None) -- | |
| Name for the inference endpoint. If None, auto-generated from model_name. | |
| - **model_name** (str | None) -- | |
| HuggingFace Hub model ID to deploy. Required if endpoint_name is None. | |
| - **reuse_existing** (bool) -- | |
| Whether to reuse an existing endpoint with the same name. Defaults to False. | |
| - **accelerator** (str) -- | |
| Type of accelerator to use. Defaults to "gpu". Options: "gpu", "cpu". | |
| - **dtype** (str | None) -- | |
| Model data type. If None, uses model default. Options: "float16", "bfloat16", "awq", "gptq", "8bit", "4bit". | |
| - **vendor** (str) -- | |
| Cloud vendor for the endpoint. Defaults to "aws". Options: "aws", "azure", "gcp". | |
| - **region** (str) -- | |
| Cloud region for the endpoint. Defaults to "us-east-1". | |
| - **instance_size** (str | None) -- | |
| Instance size for the endpoint. If None, auto-scaled. | |
| - **instance_type** (str | None) -- | |
| Instance type for the endpoint. If None, auto-scaled. | |
| - **framework** (str) -- | |
| ML framework to use. Defaults to "pytorch". | |
| - **endpoint_type** (str) -- | |
| Type of endpoint. Defaults to "protected". Options: "protected", "public". | |
| - **add_special_tokens** (bool) -- | |
| Whether to add special tokens during tokenization. Defaults to True. | |
| - **revision** (str) -- | |
| Git revision of the model. Defaults to "main". | |
| - **namespace** (str | None) -- | |
| Namespace for the endpoint. If None, uses current user's namespace. | |
| - **image_url** (str | None) -- | |
| Custom Docker image URL. If None, uses default TGI image. | |
| - **env_vars** (dict | None) -- | |
| Additional environment variables for the endpoint. | |
| - **batch_size** (int) -- | |
| Batch size for requests. Defaults to 1. | |
| - **generation_parameters** (GenerationParameters, optional, defaults to empty GenerationParameters) -- | |
| Configuration parameters that control text generation behavior, including | |
| temperature, top_p, max_new_tokens, etc. | |
| - **system_prompt** (str | None, optional, defaults to None) -- Optional system prompt to be used with chat models. | |
| This prompt sets the behavior and context for the model during evaluation. | |
| - **cache_dir** (str, optional, defaults to "~/.cache/huggingface/lighteval") -- Directory to cache the model.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| Configuration class for HuggingFace Inference Endpoints (dedicated infrastructure). | |
| This configuration is used to create and manage dedicated inference endpoints | |
| on HuggingFace's infrastructure. These endpoints provide dedicated compute | |
| resources and can handle larger batch sizes and higher throughput. | |
| Methods: | |
| model_post_init(): | |
| Validates configuration and ensures proper parameter combinations. | |
| get_dtype_args(): | |
| Returns environment variables for dtype configuration. | |
| get_custom_env_vars(): | |
| Returns custom environment variables for the endpoint. | |
| <ExampleCodeBlock anchor="lighteval.models.endpoints.endpoint_model.InferenceEndpointModelConfig.example"> | |
| Example: | |
| ```python | |
| config = InferenceEndpointModelConfig( | |
| model_name="microsoft/DialoGPT-medium", | |
| instance_type="nvidia-a100", | |
| instance_size="x1", | |
| vendor="aws", | |
| region="us-east-1", | |
| dtype="float16", | |
| generation_parameters=GenerationParameters( | |
| temperature=0.7, | |
| max_new_tokens=100 | |
| ) | |
| ) | |
| ``` | |
| </ExampleCodeBlock> | |
| Note: | |
| - Creates dedicated infrastructure for model inference | |
| - Supports various quantization methods and hardware configurations | |
| - Auto-scaling available for optimal resource utilization | |
| - Requires HuggingFace Pro subscription for most features | |
| - Endpoints can take several minutes to start up | |
| - Billed based on compute usage and duration | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class lighteval.models.endpoints.endpoint_model.ServerlessEndpointModelConfig</name><anchor>lighteval.models.endpoints.endpoint_model.ServerlessEndpointModelConfig</anchor><source>https://github.com/huggingface/lighteval/blob/vr_980/src/lighteval/models/endpoints/endpoint_model.py#L71</source><parameters>[{"name": "model_name", "val": ": str"}, {"name": "generation_parameters", "val": ": GenerationParameters = GenerationParameters(num_blocks=None, block_size=None, early_stopping=None, repetition_penalty=None, frequency_penalty=None, length_penalty=None, presence_penalty=None, max_new_tokens=None, min_new_tokens=None, seed=None, stop_tokens=None, temperature=0, top_k=None, min_p=None, top_p=None, truncate_prompt=None, cache_implementation=None, response_format=None)"}, {"name": "system_prompt", "val": ": str | None = None"}, {"name": "cache_dir", "val": ": str = '~/.cache/huggingface/lighteval'"}, {"name": "add_special_tokens", "val": ": bool = True"}, {"name": "batch_size", "val": ": int = 1"}]</parameters><paramsdesc>- **model_name** (str) -- | |
| HuggingFace Hub model ID to use with the Inference API. | |
| Example: "meta-llama/Llama-3.1-8B-Instruct" | |
| - **add_special_tokens** (bool) -- | |
| Whether to add special tokens during tokenization. Defaults to True. | |
| - **batch_size** (int) -- | |
| Batch size for requests. Defaults to 1 (serverless API limitation). | |
| - **generation_parameters** (GenerationParameters, optional, defaults to empty GenerationParameters) -- | |
| Configuration parameters that control text generation behavior, including | |
| temperature, top_p, max_new_tokens, etc. | |
| - **system_prompt** (str | None, optional, defaults to None) -- Optional system prompt to be used with chat models. | |
| This prompt sets the behavior and context for the model during evaluation. | |
| - **cache_dir** (str, optional, defaults to "~/.cache/huggingface/lighteval") -- Directory to cache the model.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| Configuration class for HuggingFace Inference API (inference endpoints). | |
| https://huggingface.co/inference-endpoints/dedicated | |
| <ExampleCodeBlock anchor="lighteval.models.endpoints.endpoint_model.ServerlessEndpointModelConfig.example"> | |
| Example: | |
| ```python | |
| config = ServerlessEndpointModelConfig( | |
| model_name="meta-llama/Llama-3.1-8B-Instruct", | |
| generation_parameters=GenerationParameters( | |
| temperature=0.7, | |
| max_new_tokens=100 | |
| ) | |
| ) | |
| ``` | |
| </ExampleCodeBlock> | |
| </div> | |
| ### TGI ModelClient[[lighteval.models.endpoints.tgi_model.TGIModelConfig]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class lighteval.models.endpoints.tgi_model.TGIModelConfig</name><anchor>lighteval.models.endpoints.tgi_model.TGIModelConfig</anchor><source>https://github.com/huggingface/lighteval/blob/vr_980/src/lighteval/models/endpoints/tgi_model.py#L55</source><parameters>[{"name": "model_name", "val": ": str | None"}, {"name": "generation_parameters", "val": ": GenerationParameters = GenerationParameters(num_blocks=None, block_size=None, early_stopping=None, repetition_penalty=None, frequency_penalty=None, length_penalty=None, presence_penalty=None, max_new_tokens=None, min_new_tokens=None, seed=None, stop_tokens=None, temperature=0, top_k=None, min_p=None, top_p=None, truncate_prompt=None, cache_implementation=None, response_format=None)"}, {"name": "system_prompt", "val": ": str | None = None"}, {"name": "cache_dir", "val": ": str = '~/.cache/huggingface/lighteval'"}, {"name": "inference_server_address", "val": ": str | None = None"}, {"name": "inference_server_auth", "val": ": str | None = None"}, {"name": "model_info", "val": ": dict | None = None"}, {"name": "batch_size", "val": ": int = 1"}]</parameters><paramsdesc>- **inference_server_address** (str | None) -- | |
| Address of the TGI server. Format: "http://host:port" or "https://host:port". | |
| Example: "http://localhost:8080" | |
| - **inference_server_auth** (str | None) -- | |
| Authentication token for the TGI server. If None, no authentication is used. | |
| - **model_name** (str | None) -- | |
| Optional model name override. If None, uses the model name from server info. | |
| - **generation_parameters** (GenerationParameters, optional, defaults to empty GenerationParameters) -- | |
| Configuration parameters that control text generation behavior, including | |
| temperature, top_p, max_new_tokens, etc. | |
| - **system_prompt** (str | None, optional, defaults to None) -- Optional system prompt to be used with chat models. | |
| This prompt sets the behavior and context for the model during evaluation. | |
| - **cache_dir** (str, optional, defaults to "~/.cache/huggingface/lighteval") -- Directory to cache the model.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| Configuration class for Text Generation Inference (TGI) backend. | |
| doc: https://huggingface.co/docs/text-generation-inference/en/index | |
| This configuration is used to connect to TGI servers that serve HuggingFace models | |
| using the text-generation-inference library. TGI provides high-performance inference | |
| with features like continuous batching and efficient memory management. | |
| <ExampleCodeBlock anchor="lighteval.models.endpoints.tgi_model.TGIModelConfig.example"> | |
| Example: | |
| ```python | |
| config = TGIModelConfig( | |
| inference_server_address="http://localhost:8080", | |
| inference_server_auth="your-auth-token", | |
| model_name="meta-llama/Llama-3.1-8B-Instruct", | |
| generation_parameters=GenerationParameters( | |
| temperature=0.7, | |
| max_new_tokens=100 | |
| ) | |
| ) | |
| ``` | |
| </ExampleCodeBlock> | |
| </div> | |
| ### Litellm Model[[lighteval.models.endpoints.litellm_model.LiteLLMModelConfig]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class lighteval.models.endpoints.litellm_model.LiteLLMModelConfig</name><anchor>lighteval.models.endpoints.litellm_model.LiteLLMModelConfig</anchor><source>https://github.com/huggingface/lighteval/blob/vr_980/src/lighteval/models/endpoints/litellm_model.py#L61</source><parameters>[{"name": "model_name", "val": ": str"}, {"name": "generation_parameters", "val": ": GenerationParameters = GenerationParameters(num_blocks=None, block_size=None, early_stopping=None, repetition_penalty=None, frequency_penalty=None, length_penalty=None, presence_penalty=None, max_new_tokens=None, min_new_tokens=None, seed=None, stop_tokens=None, temperature=0, top_k=None, min_p=None, top_p=None, truncate_prompt=None, cache_implementation=None, response_format=None)"}, {"name": "system_prompt", "val": ": str | None = None"}, {"name": "cache_dir", "val": ": str = '~/.cache/huggingface/lighteval'"}, {"name": "provider", "val": ": str | None = None"}, {"name": "base_url", "val": ": str | None = None"}, {"name": "api_key", "val": ": str | None = None"}, {"name": "concurrent_requests", "val": ": int = 10"}, {"name": "verbose", "val": ": bool = False"}, {"name": "max_model_length", "val": ": int | None = None"}, {"name": "api_max_retry", "val": ": int = 8"}, {"name": "api_retry_sleep", "val": ": float = 1.0"}, {"name": "api_retry_multiplier", "val": ": float = 2.0"}, {"name": "timeout", "val": ": float | None = None"}]</parameters><paramsdesc>- **model_name** (str) -- | |
| Model identifier. Can include provider prefix (e.g., "gpt-4", "claude-3-sonnet") | |
| or use provider/model format (e.g., "openai/gpt-4", "anthropic/claude-3-sonnet"). | |
| - **provider** (str | None) -- | |
| Optional provider name override. If None, inferred from model_name. | |
| Examples: "openai", "anthropic", "google", "cohere", etc. | |
| - **base_url** (str | None) -- | |
| Custom base URL for the API. If None, uses provider's default URL. | |
| Useful for using custom endpoints or local deployments. | |
| - **api_key** (str | None) -- | |
| API key for authentication. If None, reads from environment variables. | |
| Environment variable names are provider-specific (e.g., OPENAI_API_KEY). | |
| - **concurrent_requests** (int) -- | |
| Maximum number of concurrent API requests to execute in parallel. | |
| Higher values can improve throughput for batch processing but may hit rate limits | |
| or exhaust API quotas faster. Default is 10. | |
| - **verbose** (bool) -- | |
| Whether to enable verbose logging. Default is False. | |
| - **max_model_length** (int | None) -- | |
| Maximum context length for the model. If None, infers the model's default max length. | |
| - **api_max_retry** (int) -- | |
| Maximum number of retries for API requests. Default is 8. | |
| - **api_retry_sleep** (float) -- | |
| Initial sleep time (in seconds) between retries. Default is 1.0. | |
| - **api_retry_multiplier** (float) -- | |
| Multiplier for increasing sleep time between retries. Default is 2.0. | |
| - **timeout** (float) -- | |
| Request timeout in seconds. Default is None (no timeout). | |
| - **generation_parameters** (GenerationParameters, optional, defaults to empty GenerationParameters) -- | |
| Configuration parameters that control text generation behavior, including | |
| temperature, top_p, max_new_tokens, etc. | |
| - **system_prompt** (str | None, optional, defaults to None) -- Optional system prompt to be used with chat models. | |
| This prompt sets the behavior and context for the model during evaluation. | |
| - **cache_dir** (str, optional, defaults to "~/.cache/huggingface/lighteval") -- Directory to cache the model.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| Configuration class for LiteLLM unified API client. | |
| This configuration is used to connect to various LLM providers through the LiteLLM | |
| unified API. LiteLLM provides a consistent interface to multiple providers including | |
| OpenAI, Anthropic, Google, and many others. | |
| litellm doc: https://docs.litellm.ai/docs/ | |
| <ExampleCodeBlock anchor="lighteval.models.endpoints.litellm_model.LiteLLMModelConfig.example"> | |
| Example: | |
| ```python | |
| config = LiteLLMModelConfig( | |
| model_name="gpt-4", | |
| provider="openai", | |
| base_url="https://api.openai.com/v1", | |
| concurrent_requests=5, | |
| generation_parameters=GenerationParameters( | |
| temperature=0.7, | |
| max_new_tokens=100 | |
| ) | |
| ) | |
| ``` | |
| </ExampleCodeBlock> | |
| </div> | |
| ## Custom Model[[lighteval.models.custom.custom_model.CustomModelConfig]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class lighteval.models.custom.custom_model.CustomModelConfig</name><anchor>lighteval.models.custom.custom_model.CustomModelConfig</anchor><source>https://github.com/huggingface/lighteval/blob/vr_980/src/lighteval/models/custom/custom_model.py#L26</source><parameters>[{"name": "model_name", "val": ": str"}, {"name": "generation_parameters", "val": ": GenerationParameters = GenerationParameters(num_blocks=None, block_size=None, early_stopping=None, repetition_penalty=None, frequency_penalty=None, length_penalty=None, presence_penalty=None, max_new_tokens=None, min_new_tokens=None, seed=None, stop_tokens=None, temperature=0, top_k=None, min_p=None, top_p=None, truncate_prompt=None, cache_implementation=None, response_format=None)"}, {"name": "system_prompt", "val": ": str | None = None"}, {"name": "cache_dir", "val": ": str = '~/.cache/huggingface/lighteval'"}, {"name": "model_definition_file_path", "val": ": str"}]</parameters><paramsdesc>- **model** (str) -- | |
| An identifier for the model. This can be used to track which model was evaluated | |
| in the results and logs. | |
| - **model_definition_file_path** (str) -- | |
| Path to a Python file containing the custom model implementation. This file must | |
| define exactly one class that inherits from LightevalModel. The class should | |
| implement all required methods from the LightevalModel interface.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| Configuration class for loading custom model implementations in Lighteval. | |
| This config allows users to define and load their own model implementations by specifying | |
| a Python file containing a custom model class that inherits from LightevalModel. | |
| The custom model file should contain exactly one class that inherits from LightevalModel. | |
| This class will be automatically detected and instantiated when loading the model. | |
| <ExampleCodeBlock anchor="lighteval.models.custom.custom_model.CustomModelConfig.example"> | |
| Example usage: | |
| ```python | |
| # Define config | |
| config = CustomModelConfig( | |
| model="my-custom-model", | |
| model_definition_file_path="path/to/my_model.py" | |
| ) | |
| # Example custom model file (my_model.py): | |
| from lighteval.models.abstract_model import LightevalModel | |
| class MyCustomModel(LightevalModel): | |
| def __init__(self, config, env_config): | |
| super().__init__(config, env_config) | |
| # Custom initialization... | |
| def greedy_until(self, docs: list[Doc]) -> list[ModelResponse]: | |
| # Custom generation logic... | |
| pass | |
| def loglikelihood(self, docs: list[Doc]) -> list[ModelResponse]: | |
| pass | |
| ``` | |
| </ExampleCodeBlock> | |
| An example of a custom model can be found in `examples/custom_models/google_translate_model.py`. | |
| </div> | |
| <EditOnGithub source="https://github.com/huggingface/lighteval/blob/main/docs/source/package_reference/models.mdx" /> |
Xet Storage Details
- Size:
- 48.5 kB
- Xet hash:
- 9cb34358f68f515356cc90b932c96004f0e3f735206caea281fef599b99f7740
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.