File size: 877 Bytes
b386992 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
Evaluation of Large Language Models with the NeMo 2.0
=====================================================
This directory contains Jupyter Notebook tutorials using the NeMo Framework for evaluating large language models (LLMs):
1. **mmlu.ipynb**
- Provides an overview of model deployment and available endpoints.
- Demonstrates how to run MMLU evaluations for both completions and chat endpoints to assess model proficiency across diverse subjects.
2. **simple-evals.ipynb**
- Shows how to enable additional evaluation frameworks with the evaluation suite.
- Uses NVIDIA Evals Factory Simple-Evals to demonstrate how to run evaluations for the HumanEval benchmark.
3. **wikitext.ipynb**
- Illustrates running evaluation tasks without predefined configurations.
- Uses the WikiText benchmark as an example to define and execute a custom evaluation job.
|