EuroEval Leaderboard
The robust European language model benchmark.
Benchmarks for evaluating Danish Models.
The robust European language model benchmark.
Note The de-facto benchmark for generative language models (NLG) and encoders such as BERT using Natural language understanding tasks (NLU).
Note Evaluate Scandinavian models for natural language generation (NLG) and natural language understanding (NLU). Later renamed to EuroEval and expanded to European languages
Embedding Leaderboard
Note The de-facto leaderboard for evaluating Embeddings and search systems like BM25 and similar. Includes benchmarks targeting multiple modalities and languages, notably including a benchmark for Scandinavian languages.
Note Evaluated for embeddings models, e.g. used for retrieval e.g. within retrieval augmented generation, classification such setfit models, clustering and more. It evaluated models on a representative set of tasks covering Danish, Swedish, and Norwegian (Nynorsk and Norwegian).