The Nordic Pile: A 1.2TB Nordic Dataset for Language Modeling Paper • 2303.17183 • Published Mar 30, 2023 • 1
GPT-SW3: An Autoregressive Language Model for the Nordic Languages Paper • 2305.12987 • Published May 22, 2023
Why Not Simply Translate? A First Swedish Evaluation Benchmark for Semantic Similarity Paper • 2009.03116 • Published Sep 7, 2020
Should we Stop Training More Monolingual Models, and Simply Use Machine Translation Instead? Paper • 2104.10441 • Published Apr 21, 2021
SWEb: A Large Web Dataset for the Scandinavian Languages Paper • 2410.04456 • Published Oct 6, 2024 • 1
BOE-XSUM: Extreme Summarization in Clear Language of Spanish Legal Decrees and Notifications Paper • 2509.24908 • Published Sep 29, 2025 • 3
The Impact of Copyrighted Material on Large Language Models: A Norwegian Perspective Paper • 2412.09460 • Published Dec 12, 2024 • 9
Whispering in Norwegian: Navigating Orthographic and Dialectic Challenges Paper • 2402.01917 • Published Feb 2, 2024