| TITLE = '<h1 align="center" id="space-title">Open Multilingual LLM Evaluation Leaderboard (Dutch only)</h1>' | |
| INTRO_TEXT = f""" | |
| ## About | |
| This is a fork of the [Open Multilingual LLM Evaluation Leaderboard](https://huggingface.co/spaces/uonlp/open_multilingual_llm_leaderboard), but restricted to only Dutch models and augmented with additional model results. | |
| We test the models on the following benchmarks **for the Dutch version only!!**, which have been translated into Dutch automatically by the original authors of the Open Multilingual LLM Evaluation Leaderboard with `gpt-35-turbo`. | |
| - <a href="https://arxiv.org/abs/1803.05457" target="_blank"> AI2 Reasoning Challenge </a> (25-shot) | |
| - <a href="https://arxiv.org/abs/1905.07830" target="_blank"> HellaSwag </a> (10-shot) | |
| - <a href="https://arxiv.org/abs/2009.03300" target="_blank"> MMLU </a> (5-shot) | |
| - <a href="https://arxiv.org/abs/2109.07958" target="_blank"> TruthfulQA </a> (0-shot) | |
| I do not maintain those datasets, I only run benchmarks and add the results to this space. For questions regarding the test sets or running them yourself, see [the original Github repository](https://github.com/laiviet/lm-evaluation-harness). | |
| All models are benchmarked in 8-bit precision. | |
| """ | |
| CREDIT = f""" | |
| ## Credit | |
| This leaderboard has borrowed heavily from the following sources: | |
| - Datasets (AI2_ARC, HellaSwag, MMLU, TruthfulQA) | |
| - Evaluation code (EleutherAI's lm_evaluation_harness repo) | |
| - Leaderboard code (Huggingface4's open_llm_leaderboard repo) | |
| - The multilingual version of the leaderboard (uonlp's open_multilingual_llm_leaderboard repo) | |
| """ | |
| CITATION = f""" | |
| ## Citation | |
| If you use or cite the Dutch benchmark results or this specific leaderboard page, please cite the following paper: | |
| TDB | |
| If you use the multilingual benchmarks, please cite the following paper: | |
| ```bibtex | |
| @misc{{lai2023openllmbenchmark, | |
| author = {{Viet Lai and Nghia Trung Ngo and Amir Pouran Ben Veyseh and Franck Dernoncourt and Thien Huu Nguyen}}, | |
| title={{Open Multilingual LLM Evaluation Leaderboard}}, | |
| year={{2023}} | |
| }} | |
| ``` | |
| """ |