Buckets:

hf-doc-build
/

doc-dev

Files

xet

hf-doc-build/doc-dev / lighteval /pr_1003 /en /package_reference /evaluation_tracker.md

rtrm

29 days ago

preview code

download

raw

5.82 kB

EvaluationTracker[[lighteval.logging.evaluation_tracker.EvaluationTracker]]

class lighteval.logging.evaluation_tracker.EvaluationTrackerlighteval.logging.evaluation_tracker.EvaluationTrackerhttps://github.com/huggingface/lighteval/blob/vr_1003/src/lighteval/logging/evaluation_tracker.py#L92[{"name": "output_dir", "val": ": str"}, {"name": "results_path_template", "val": ": str | None = None"}, {"name": "save_details", "val": ": bool = True"}, {"name": "push_to_hub", "val": ": bool = False"}, {"name": "push_to_tensorboard", "val": ": bool = False"}, {"name": "hub_results_org", "val": ": str | None = ''"}, {"name": "tensorboard_metric_prefix", "val": ": str = 'eval'"}, {"name": "public", "val": ": bool = False"}, {"name": "nanotron_run_info", "val": ": GeneralArgs = None"}, {"name": "use_wandb", "val": ": bool = False"}]- output_dir (str) -- Local directory to save evaluation results and logs

results_path_template (str, optional) -- Template for results directory structure. Example: "{output_dir}/results/{org}_{model}"
save_details (bool, defaults to True) -- Whether to save detailed evaluation records
push_to_hub (bool, defaults to False) -- Whether to push results to HF Hub
push_to_tensorboard (bool, defaults to False) -- Whether to push metrics to TensorBoard
hub_results_org (str, optional) -- HF Hub organization to push results to
tensorboard_metric_prefix (str, defaults to "eval") -- Prefix for TensorBoard metrics
public (bool, defaults to False) -- Whether to make Hub datasets public
nanotron_run_info (GeneralArgs, optional) -- Nanotron model run information
use_wandb (bool, defaults to False) -- Whether to log to Weights & Biases or Trackio if available0 Tracks and manages evaluation results, metrics, and logging for model evaluations.

The EvaluationTracker coordinates multiple specialized loggers to track different aspects of model evaluation:

Details Logger (DetailsLogger): Records per-sample evaluation details and predictions
Metrics Logger (MetricsLogger): Tracks aggregate evaluation metrics and scores
Versions Logger (VersionsLogger): Records task and dataset versions
General Config Logger (GeneralConfigLogger): Stores overall evaluation configuration
Task Config Logger (TaskConfigLogger): Maintains per-task configuration details

The tracker can save results locally and optionally push them to:

Hugging Face Hub as datasets
TensorBoard for visualization
Trackio or Weights & Biases for experiment tracking

Example:

tracker = EvaluationTracker(
    output_dir="./eval_results",
    push_to_hub=True,
    hub_results_org="my-org",
    save_details=True
)

# Log evaluation results
tracker.metrics_logger.add_metric("accuracy", 0.85)
tracker.details_logger.add_detail(task_name="qa", prediction="Paris")

# Save all results
tracker.save()

generate_final_dictlighteval.logging.evaluation_tracker.EvaluationTracker.generate_final_dicthttps://github.com/huggingface/lighteval/blob/vr_1003/src/lighteval/logging/evaluation_tracker.py#L360[]dictDictionary containing all experiment information including config, results, versions, and summaries Aggregates and returns all the logger's experiment information in a dictionary.

This function should be used to gather and display said information at the end of an evaluation run.

push_to_hublighteval.logging.evaluation_tracker.EvaluationTracker.push_to_hubhttps://github.com/huggingface/lighteval/blob/vr_1003/src/lighteval/logging/evaluation_tracker.py#L384[{"name": "date_id", "val": ": str"}, {"name": "details", "val": ": dict"}, {"name": "results_dict", "val": ": dict"}] Pushes the experiment details (all the model predictions for every step) to the hub.

recreate_metadata_cardlighteval.logging.evaluation_tracker.EvaluationTracker.recreate_metadata_cardhttps://github.com/huggingface/lighteval/blob/vr_1003/src/lighteval/logging/evaluation_tracker.py#L451[{"name": "repo_id", "val": ": str"}]- repo_id (str) -- Details dataset repository path on the hub (org/dataset)0 Fully updates the details repository metadata card for the currently evaluated model

savelighteval.logging.evaluation_tracker.EvaluationTracker.savehttps://github.com/huggingface/lighteval/blob/vr_1003/src/lighteval/logging/evaluation_tracker.py#L244[] Saves the experiment information and results to files, and to the hub if requested.

Xet Storage Details

Size:: 5.82 kB
Xet hash:: 7c8b4f9ab5e6cc5f826e8e2179d422ee4d607e9d38ec548ed79ae22146f36d0a

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.