KletterMix: Climbing Toward High-Quality German Pretraining Data Paper • 2606.03773 • Published 26 days ago • 21
CommonLID: Re-evaluating State-of-the-Art Language Identification Performance on Web Data Paper • 2601.18026 • Published Jan 25
UniSkill: A Dataset for Matching University Curricula to Professional Competencies Paper • 2603.03134 • Published Mar 3
WorkRB: A Community-Driven Evaluation Framework for AI in the Work Domain Paper • 2604.13055 • Published Mar 17
CroCo: Cross-Lingual Contrastive Preference Tuning on Self-Generations Paper • 2605.26293 • Published May 25 • 6
CroCo: Cross-Lingual Contrastive Preference Tuning on Self-Generations Paper • 2605.26293 • Published May 25 • 6
Global PIQA: Evaluating Physical Commonsense Reasoning Across 100+ Languages and Cultures Paper • 2510.24081 • Published Oct 28, 2025 • 24
A Dataset for Probing Translationese Preferences in English-to-Swedish Translation Paper • 2603.08450 • Published Mar 9
Preferences for Idiomatic Language are Acquired Slowly -- and Forgotten Quickly: A Case Study on Swedish Paper • 2602.03484 • Published Feb 3
Family Matters: Language Transfer and Merging for Adapting Small LLMs to Faroese Paper • 2510.00810 • Published Oct 1, 2025
A Diagnostic Benchmark for Sweden-Related Factual Knowledge Paper • 2510.21360 • Published Oct 24, 2025
Grow Up and Merge: Scaling Strategies for Efficient Language Adaptation Paper • 2512.10772 • Published Dec 11, 2025
Grow Up and Merge: Scaling Strategies for Efficient Language Adaptation Paper • 2512.10772 • Published Dec 11, 2025
SkillSpan: Hard and Soft Skill Extraction from English Job Postings Paper • 2204.12811 • Published Apr 27, 2022 • 2
Kompetencer: Fine-grained Skill Classification in Danish Job Postings via Distant Supervision and Transfer Learning Paper • 2205.01381 • Published May 3, 2022
Dynaword: From One-shot to Continuously Developed Datasets Paper • 2508.02271 • Published Aug 4, 2025 • 15