Buckets:

hf-doc-build
/

doc-dev

Files

xet

hf-doc-build/doc-dev / lighteval /pr_980 /en /package_reference /evaluation_tracker.md

rtrm

29 days ago

preview code

download

raw

5.81 kB

	# EvaluationTracker[[lighteval.logging.evaluation_tracker.EvaluationTracker]]

	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>class lighteval.logging.evaluation_tracker.EvaluationTracker</name><anchor>lighteval.logging.evaluation_tracker.EvaluationTracker</anchor><source>https://github.com/huggingface/lighteval/blob/vr_980/src/lighteval/logging/evaluation_tracker.py#L95</source><parameters>[{"name": "output_dir", "val": ": str"}, {"name": "results_path_template", "val": ": str \| None = None"}, {"name": "save_details", "val": ": bool = True"}, {"name": "push_to_hub", "val": ": bool = False"}, {"name": "push_to_tensorboard", "val": ": bool = False"}, {"name": "hub_results_org", "val": ": str \| None = ''"}, {"name": "tensorboard_metric_prefix", "val": ": str = 'eval'"}, {"name": "public", "val": ": bool = False"}, {"name": "nanotron_run_info", "val": ": GeneralArgs = None"}, {"name": "use_wandb", "val": ": bool = False"}]</parameters><paramsdesc>- output_dir (str) -- Local directory to save evaluation results and logs
	- results_path_template (str, optional) -- Template for results directory structure.
	Example: "{output_dir}/results/{org}_{model}"
	- save_details (bool, defaults to True) -- Whether to save detailed evaluation records
	- push_to_hub (bool, defaults to False) -- Whether to push results to HF Hub
	- push_to_tensorboard (bool, defaults to False) -- Whether to push metrics to TensorBoard
	- hub_results_org (str, optional) -- HF Hub organization to push results to
	- tensorboard_metric_prefix (str, defaults to "eval") -- Prefix for TensorBoard metrics
	- public (bool, defaults to False) -- Whether to make Hub datasets public
	- nanotron_run_info (GeneralArgs, optional) -- Nanotron model run information
	- use_wandb (bool, defaults to False) -- Whether to log to Weights & Biases or Trackio if available</paramsdesc><paramgroups>0</paramgroups></docstring>
	Tracks and manages evaluation results, metrics, and logging for model evaluations.

	The EvaluationTracker coordinates multiple specialized loggers to track different aspects of model evaluation:

	- Details Logger (DetailsLogger): Records per-sample evaluation details and predictions
	- Metrics Logger (MetricsLogger): Tracks aggregate evaluation metrics and scores
	- Versions Logger (VersionsLogger): Records task and dataset versions
	- General Config Logger (GeneralConfigLogger): Stores overall evaluation configuration
	- Task Config Logger (TaskConfigLogger): Maintains per-task configuration details

	The tracker can save results locally and optionally push them to:
	- Hugging Face Hub as datasets
	- TensorBoard for visualization
	- Trackio or Weights & Biases for experiment tracking



	<ExampleCodeBlock anchor="lighteval.logging.evaluation_tracker.EvaluationTracker.example">

	Example:
	```python
	tracker = EvaluationTracker(
	output_dir="./eval_results",
	push_to_hub=True,
	hub_results_org="my-org",
	save_details=True
	)

	# Log evaluation results
	tracker.metrics_logger.add_metric("accuracy", 0.85)
	tracker.details_logger.add_detail(task_name="qa", prediction="Paris")

	# Save all results
	tracker.save()
	```

	</ExampleCodeBlock>



	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>generate_final_dict</name><anchor>lighteval.logging.evaluation_tracker.EvaluationTracker.generate_final_dict</anchor><source>https://github.com/huggingface/lighteval/blob/vr_980/src/lighteval/logging/evaluation_tracker.py#L363</source><parameters>[]</parameters><rettype>dict</rettype><retdesc>Dictionary containing all experiment information including config, results, versions, and summaries</retdesc></docstring>
	Aggregates and returns all the logger's experiment information in a dictionary.

	This function should be used to gather and display said information at the end of an evaluation run.






	</div>
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>push_to_hub</name><anchor>lighteval.logging.evaluation_tracker.EvaluationTracker.push_to_hub</anchor><source>https://github.com/huggingface/lighteval/blob/vr_980/src/lighteval/logging/evaluation_tracker.py#L387</source><parameters>[{"name": "date_id", "val": ": str"}, {"name": "details", "val": ": dict"}, {"name": "results_dict", "val": ": dict"}]</parameters></docstring>
	Pushes the experiment details (all the model predictions for every step) to the hub.

	</div>
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>recreate_metadata_card</name><anchor>lighteval.logging.evaluation_tracker.EvaluationTracker.recreate_metadata_card</anchor><source>https://github.com/huggingface/lighteval/blob/vr_980/src/lighteval/logging/evaluation_tracker.py#L454</source><parameters>[{"name": "repo_id", "val": ": str"}]</parameters><paramsdesc>- repo_id (str) -- Details dataset repository path on the hub (`org/dataset`)</paramsdesc><paramgroups>0</paramgroups></docstring>
	Fully updates the details repository metadata card for the currently evaluated model




	</div>
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>save</name><anchor>lighteval.logging.evaluation_tracker.EvaluationTracker.save</anchor><source>https://github.com/huggingface/lighteval/blob/vr_980/src/lighteval/logging/evaluation_tracker.py#L247</source><parameters>[]</parameters></docstring>
	Saves the experiment information and results to files, and to the hub if requested.

	</div></div>

	<EditOnGithub source="https://github.com/huggingface/lighteval/blob/main/docs/source/package_reference/evaluation_tracker.mdx" />

Xet Storage Details

Size:: 5.81 kB
Xet hash:: 15dd4e934371eabcbad4cf8b221f737aeb14b0121dc8116c167e96c6e7bc5e60

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.