File size: 525 Bytes
54c5666 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
# Evaluation
## Built-in benchmark runner
```python
from src.evaluation.benchmarks import ComprehensiveBenchmarkSuite, BenchmarkConfig
suite = ComprehensiveBenchmarkSuite(BenchmarkConfig())
results = suite.run_all_benchmarks(model, datasets)
print(results["summary"])
```
## Practical tips
- Some public benchmarks require specific data fields; dummy/val sets may be incompatible.
- Run evaluations periodically: `--eval_frequency 1` during early runs to verify trends.
- Log results to W&B when available.
|