Respair's picture
Upload folder using huggingface_hub
b386992 verified
Evaluation of Large Language Models with the NeMo 2.0
=====================================================
This directory contains Jupyter Notebook tutorials using the NeMo Framework for evaluating large language models (LLMs):
1. **mmlu.ipynb**
- Provides an overview of model deployment and available endpoints.
- Demonstrates how to run MMLU evaluations for both completions and chat endpoints to assess model proficiency across diverse subjects.
2. **simple-evals.ipynb**
- Shows how to enable additional evaluation frameworks with the evaluation suite.
- Uses NVIDIA Evals Factory Simple-Evals to demonstrate how to run evaluations for the HumanEval benchmark.
3. **wikitext.ipynb**
- Illustrates running evaluation tasks without predefined configurations.
- Uses the WikiText benchmark as an example to define and execute a custom evaluation job.