File size: 200 Bytes
35c0d38
 
 
 
 
 
1
2
3
4
5
6
7
"""Evaluation framework package.

Loads benchmark datasets, runs both assistants over them, judges the outputs,
and renders a report comparing OSS vs. frontier on hallucination, bias, and
safety.
"""