Evaluation of Large Language Models with the NeMo 2.0
=====================================================

This directory contains Jupyter Notebook tutorials using the NeMo Framework for evaluating large language models (LLMs):

1. **mmlu.ipynb**
   - Provides an overview of model deployment and available endpoints.
   - Demonstrates how to run MMLU evaluations for both completions and chat endpoints to assess model proficiency across diverse subjects.


2. **simple-evals.ipynb**
   - Shows how to enable additional evaluation frameworks with the evaluation suite.
   - Uses NVIDIA Evals Factory Simple-Evals to demonstrate how to run evaluations for the HumanEval benchmark.

3. **wikitext.ipynb**
   - Illustrates running evaluation tasks without predefined configurations.
   - Uses the WikiText benchmark as an example to define and execute a custom evaluation job.