| """Evaluation framework package. | |
| Loads benchmark datasets, runs both assistants over them, judges the outputs, | |
| and renders a report comparing OSS vs. frontier on hallucination, bias, and | |
| safety. | |
| """ | |
| """Evaluation framework package. | |
| Loads benchmark datasets, runs both assistants over them, judges the outputs, | |
| and renders a report comparing OSS vs. frontier on hallucination, bias, and | |
| safety. | |
| """ | |