Vedisasi's picture
Upload folder using huggingface_hub
54c5666 verified

Evaluation

Built-in benchmark runner

from src.evaluation.benchmarks import ComprehensiveBenchmarkSuite, BenchmarkConfig

suite = ComprehensiveBenchmarkSuite(BenchmarkConfig())
results = suite.run_all_benchmarks(model, datasets)
print(results["summary"])

Practical tips

  • Some public benchmarks require specific data fields; dummy/val sets may be incompatible.
  • Run evaluations periodically: --eval_frequency 1 during early runs to verify trends.
  • Log results to W&B when available.