Buckets:

hf-doc-build
/

doc-dev

Files

xet

hf-doc-build/doc-dev / lighteval /pr_980 /en /package_reference /logging.md

rtrm

29 days ago

preview code

download

raw

15.7 kB

	# Logging

	## EvaluationTracker[[lighteval.logging.evaluation_tracker.EvaluationTracker]]
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>class lighteval.logging.evaluation_tracker.EvaluationTracker</name><anchor>lighteval.logging.evaluation_tracker.EvaluationTracker</anchor><source>https://github.com/huggingface/lighteval/blob/vr_980/src/lighteval/logging/evaluation_tracker.py#L95</source><parameters>[{"name": "output_dir", "val": ": str"}, {"name": "results_path_template", "val": ": str \| None = None"}, {"name": "save_details", "val": ": bool = True"}, {"name": "push_to_hub", "val": ": bool = False"}, {"name": "push_to_tensorboard", "val": ": bool = False"}, {"name": "hub_results_org", "val": ": str \| None = ''"}, {"name": "tensorboard_metric_prefix", "val": ": str = 'eval'"}, {"name": "public", "val": ": bool = False"}, {"name": "nanotron_run_info", "val": ": GeneralArgs = None"}, {"name": "use_wandb", "val": ": bool = False"}]</parameters><paramsdesc>- output_dir (str) -- Local directory to save evaluation results and logs
	- results_path_template (str, optional) -- Template for results directory structure.
	Example: "{output_dir}/results/{org}_{model}"
	- save_details (bool, defaults to True) -- Whether to save detailed evaluation records
	- push_to_hub (bool, defaults to False) -- Whether to push results to HF Hub
	- push_to_tensorboard (bool, defaults to False) -- Whether to push metrics to TensorBoard
	- hub_results_org (str, optional) -- HF Hub organization to push results to
	- tensorboard_metric_prefix (str, defaults to "eval") -- Prefix for TensorBoard metrics
	- public (bool, defaults to False) -- Whether to make Hub datasets public
	- nanotron_run_info (GeneralArgs, optional) -- Nanotron model run information
	- use_wandb (bool, defaults to False) -- Whether to log to Weights & Biases or Trackio if available</paramsdesc><paramgroups>0</paramgroups></docstring>
	Tracks and manages evaluation results, metrics, and logging for model evaluations.

	The EvaluationTracker coordinates multiple specialized loggers to track different aspects of model evaluation:

	- Details Logger (DetailsLogger): Records per-sample evaluation details and predictions
	- Metrics Logger (MetricsLogger): Tracks aggregate evaluation metrics and scores
	- Versions Logger (VersionsLogger): Records task and dataset versions
	- General Config Logger (GeneralConfigLogger): Stores overall evaluation configuration
	- Task Config Logger (TaskConfigLogger): Maintains per-task configuration details

	The tracker can save results locally and optionally push them to:
	- Hugging Face Hub as datasets
	- TensorBoard for visualization
	- Trackio or Weights & Biases for experiment tracking



	<ExampleCodeBlock anchor="lighteval.logging.evaluation_tracker.EvaluationTracker.example">

	Example:
	```python
	tracker = EvaluationTracker(
	output_dir="./eval_results",
	push_to_hub=True,
	hub_results_org="my-org",
	save_details=True
	)

	# Log evaluation results
	tracker.metrics_logger.add_metric("accuracy", 0.85)
	tracker.details_logger.add_detail(task_name="qa", prediction="Paris")

	# Save all results
	tracker.save()
	```

	</ExampleCodeBlock>



	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>generate_final_dict</name><anchor>lighteval.logging.evaluation_tracker.EvaluationTracker.generate_final_dict</anchor><source>https://github.com/huggingface/lighteval/blob/vr_980/src/lighteval/logging/evaluation_tracker.py#L363</source><parameters>[]</parameters><rettype>dict</rettype><retdesc>Dictionary containing all experiment information including config, results, versions, and summaries</retdesc></docstring>
	Aggregates and returns all the logger's experiment information in a dictionary.

	This function should be used to gather and display said information at the end of an evaluation run.






	</div>
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>push_to_hub</name><anchor>lighteval.logging.evaluation_tracker.EvaluationTracker.push_to_hub</anchor><source>https://github.com/huggingface/lighteval/blob/vr_980/src/lighteval/logging/evaluation_tracker.py#L387</source><parameters>[{"name": "date_id", "val": ": str"}, {"name": "details", "val": ": dict"}, {"name": "results_dict", "val": ": dict"}]</parameters></docstring>
	Pushes the experiment details (all the model predictions for every step) to the hub.

	</div>
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>recreate_metadata_card</name><anchor>lighteval.logging.evaluation_tracker.EvaluationTracker.recreate_metadata_card</anchor><source>https://github.com/huggingface/lighteval/blob/vr_980/src/lighteval/logging/evaluation_tracker.py#L454</source><parameters>[{"name": "repo_id", "val": ": str"}]</parameters><paramsdesc>- repo_id (str) -- Details dataset repository path on the hub (`org/dataset`)</paramsdesc><paramgroups>0</paramgroups></docstring>
	Fully updates the details repository metadata card for the currently evaluated model




	</div>
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>save</name><anchor>lighteval.logging.evaluation_tracker.EvaluationTracker.save</anchor><source>https://github.com/huggingface/lighteval/blob/vr_980/src/lighteval/logging/evaluation_tracker.py#L247</source><parameters>[]</parameters></docstring>
	Saves the experiment information and results to files, and to the hub if requested.

	</div></div>

	## GeneralConfigLogger[[lighteval.logging.info_loggers.GeneralConfigLogger]]
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>class lighteval.logging.info_loggers.GeneralConfigLogger</name><anchor>lighteval.logging.info_loggers.GeneralConfigLogger</anchor><source>https://github.com/huggingface/lighteval/blob/vr_980/src/lighteval/logging/info_loggers.py#L48</source><parameters>[]</parameters><paramsdesc>- lighteval_sha (str) -- Git commit SHA of lighteval used for evaluation, enabling exact version reproducibility.
	Set to "?" if not in a git repository.

	- num_fewshot_seeds (int) -- Number of random seeds used for few-shot example sampling.
	- If <= 1: Single evaluation with seed=0
	- If > 1: Multiple evaluations with different few-shot samplings (HELM-style)

	- max_samples (int, optional) -- Maximum number of samples to evaluate per task.
	Only used for debugging - truncates each task's dataset.

	- job_id (int, optional) -- Slurm job ID if running on a cluster.
	Used to cross-reference with scheduler logs.

	- start_time (float) -- Unix timestamp when evaluation started.
	Automatically set during logger initialization.

	- end_time (float) -- Unix timestamp when evaluation completed.
	Set by calling log_end_time().

	- total_evaluation_time_secondes (str) -- Total runtime in seconds.
	Calculated as end_time - start_time.

	- model_config (ModelConfig) -- Complete model configuration settings.
	Contains model architecture, tokenizer, and generation parameters.

	- model_name (str) -- Name identifier for the evaluated model.
	Extracted from model_config.</paramsdesc><paramgroups>0</paramgroups></docstring>
	Tracks general configuration and runtime information for model evaluations.

	This logger captures key configuration parameters, model details, and timing information
	to ensure reproducibility and provide insights into the evaluation process.





	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>log_args_info</name><anchor>lighteval.logging.info_loggers.GeneralConfigLogger.log_args_info</anchor><source>https://github.com/huggingface/lighteval/blob/vr_980/src/lighteval/logging/info_loggers.py#L106</source><parameters>[{"name": "num_fewshot_seeds", "val": ": int"}, {"name": "max_samples", "val": ": int \| None"}, {"name": "job_id", "val": ": str"}]</parameters><paramsdesc>- num_fewshot_seeds (int) -- number of few-shot seeds.
	- max_samples (int \| None) -- maximum number of samples, if None, use all the samples available.
	- job_id (str) -- job ID, used to retrieve logs.</paramsdesc><paramgroups>0</paramgroups></docstring>
	Logs the information about the arguments passed to the method.




	</div>
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>log_model_info</name><anchor>lighteval.logging.info_loggers.GeneralConfigLogger.log_model_info</anchor><source>https://github.com/huggingface/lighteval/blob/vr_980/src/lighteval/logging/info_loggers.py#L123</source><parameters>[{"name": "model_config", "val": ": ModelConfig"}]</parameters><paramsdesc>- model_config -- the model config used to initialize the model.</paramsdesc><paramgroups>0</paramgroups></docstring>
	Logs the model information.




	</div></div>

	## DetailsLogger[[lighteval.logging.info_loggers.DetailsLogger]]
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>class lighteval.logging.info_loggers.DetailsLogger</name><anchor>lighteval.logging.info_loggers.DetailsLogger</anchor><source>https://github.com/huggingface/lighteval/blob/vr_980/src/lighteval/logging/info_loggers.py#L138</source><parameters>[{"name": "hashes", "val": ": dict = <factory>"}, {"name": "compiled_hashes", "val": ": dict = <factory>"}, {"name": "details", "val": ": dict = <factory>"}, {"name": "compiled_details", "val": ": dict = <factory>"}, {"name": "compiled_details_over_all_tasks", "val": ": DetailsLogger.CompiledDetailOverAllTasks = <factory>"}]</parameters><paramsdesc>- hashes (dict[str, list`Hash`) -- Maps each task name to the list of all its samples' `Hash`.
	- compiled_hashes (dict[str, CompiledHash) -- Maps each task name to its `CompiledHas`, an aggregation of all the individual sample hashes.
	- details (dict[str, list`Detail`]) -- Maps each task name to the list of its samples' details.
	Example: winogrande: [sample1_details, sample2_details, ...]
	- compiled_details (dict[str, `CompiledDetail`]) -- : Maps each task name to the list of its samples' compiled details.
	- compiled_details_over_all_tasks (CompiledDetailOverAllTasks) -- Aggregated details over all the tasks.</paramsdesc><paramgroups>0</paramgroups></docstring>
	Logger for the experiment details.

	Stores and logs experiment information both at the task and at the sample level.





	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>aggregate</name><anchor>lighteval.logging.info_loggers.DetailsLogger.aggregate</anchor><source>https://github.com/huggingface/lighteval/blob/vr_980/src/lighteval/logging/info_loggers.py#L277</source><parameters>[]</parameters></docstring>
	Hashes the details for each task and then for all tasks.

	</div>
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>log</name><anchor>lighteval.logging.info_loggers.DetailsLogger.log</anchor><source>https://github.com/huggingface/lighteval/blob/vr_980/src/lighteval/logging/info_loggers.py#L253</source><parameters>[{"name": "task_name", "val": ": str"}, {"name": "doc", "val": ": Doc"}, {"name": "model_response", "val": ": ModelResponse"}, {"name": "metrics", "val": ": dict"}]</parameters><paramsdesc>- task_name (str) -- Name of the current task of interest.
	- doc (Doc) -- Current sample that we want to store.
	- model_response (ModelResponse) -- Model outputs for the current sample
	- metrics (dict) -- Model scores for said sample on the current task's metrics.</paramsdesc><paramgroups>0</paramgroups></docstring>
	Stores the relevant information for one sample of one task to the total list of samples stored in the DetailsLogger.




	</div></div>

	## MetricsLogger[[lighteval.logging.info_loggers.MetricsLogger]]
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>class lighteval.logging.info_loggers.MetricsLogger</name><anchor>lighteval.logging.info_loggers.MetricsLogger</anchor><source>https://github.com/huggingface/lighteval/blob/vr_980/src/lighteval/logging/info_loggers.py#L309</source><parameters>[{"name": "metrics_values", "val": ": dict = <factory>"}, {"name": "metric_aggregated", "val": ": dict = <factory>"}]</parameters><paramsdesc>- metrics_value (dict[str, dict[str, list[float]]]) -- Maps each task to its dictionary of metrics to scores for all the example of the task.
	Example: {"winogrande\|winogrande_xl": {"accuracy": [0.5, 0.5, 0.5, 0.5, 0.5, 0.5]}}
	- metric_aggregated (dict[str, dict[str, float]]) -- Maps each task to its dictionary of metrics to aggregated scores over all the example of the task.
	Example: {"winogrande\|winogrande_xl": {"accuracy": 0.5}}</paramsdesc><paramgroups>0</paramgroups></docstring>
	Logs the actual scores for each metric of each task.





	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>aggregate</name><anchor>lighteval.logging.info_loggers.MetricsLogger.aggregate</anchor><source>https://github.com/huggingface/lighteval/blob/vr_980/src/lighteval/logging/info_loggers.py#L330</source><parameters>[{"name": "task_dict", "val": ": dict"}, {"name": "bootstrap_iters", "val": ": int = 1000"}]</parameters><paramsdesc>- task_dict (dict[str, LightevalTask]) -- used to determine what aggregation function to use for each metric
	- bootstrap_iters (int, optional) -- Number of runs used to run the statistical bootstrap. Defaults to 1000.</paramsdesc><paramgroups>0</paramgroups></docstring>
	Aggregate the metrics for each task and then for all tasks.




	</div></div>

	## VersionsLogger[[lighteval.logging.info_loggers.VersionsLogger]]
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>class lighteval.logging.info_loggers.VersionsLogger</name><anchor>lighteval.logging.info_loggers.VersionsLogger</anchor><source>https://github.com/huggingface/lighteval/blob/vr_980/src/lighteval/logging/info_loggers.py#L406</source><parameters>[{"name": "versions", "val": ": dict = <factory>"}]</parameters><paramsdesc>- version (dict[str, int]) -- Maps the task names with the task versions.</paramsdesc><paramgroups>0</paramgroups></docstring>
	Logger of the tasks versions.

	Tasks can have a version number/date, which indicates what is the precise metric definition and dataset version used for an evaluation.




	</div>

	## TaskConfigLogger[[lighteval.logging.info_loggers.TaskConfigLogger]]
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>class lighteval.logging.info_loggers.TaskConfigLogger</name><anchor>lighteval.logging.info_loggers.TaskConfigLogger</anchor><source>https://github.com/huggingface/lighteval/blob/vr_980/src/lighteval/logging/info_loggers.py#L425</source><parameters>[{"name": "tasks_configs", "val": ": dict = <factory>"}]</parameters><paramsdesc>- tasks_config (dict[str, LightevalTaskConfig]) -- Maps each task to its associated `LightevalTaskConfig`</paramsdesc><paramgroups>0</paramgroups></docstring>
	Logs the different parameters of the current `LightevalTask` of interest.




	</div>

	<EditOnGithub source="https://github.com/huggingface/lighteval/blob/main/docs/source/package_reference/logging.mdx" />

Xet Storage Details

Size:: 15.7 kB
Xet hash:: 0f43cdaf24f913646e3163038f6e8862d6f69c6c3c3fb19d3c4b025fe6ac3f2e

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.