File size: 200 Bytes
35c0d38 | 1 2 3 4 5 6 7 | """Evaluation framework package.
Loads benchmark datasets, runs both assistants over them, judges the outputs,
and renders a report comparing OSS vs. frontier on hallucination, bias, and
safety.
"""
|
35c0d38 | 1 2 3 4 5 6 7 | """Evaluation framework package.
Loads benchmark datasets, runs both assistants over them, judges the outputs,
and renders a report comparing OSS vs. frontier on hallucination, bias, and
safety.
"""
|