File size: 525 Bytes
54c5666
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# Evaluation

## Built-in benchmark runner
```python

from src.evaluation.benchmarks import ComprehensiveBenchmarkSuite, BenchmarkConfig



suite = ComprehensiveBenchmarkSuite(BenchmarkConfig())

results = suite.run_all_benchmarks(model, datasets)

print(results["summary"])

```

## Practical tips
- Some public benchmarks require specific data fields; dummy/val sets may be incompatible.
- Run evaluations periodically: `--eval_frequency 1` during early runs to verify trends.
- Log results to W&B when available.