PyTorch
File size: 2,331 Bytes
e3298c9
 
 
 
 
 
 
 
 
742756a
 
 
 
 
e3298c9
 
 
 
 
 
 
 
 
1bc08dd
e3298c9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
edac049
1bc08dd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
---
language:
- nb
- nn
- sv
- da
- 'no'
license: apache-2.0
---
## SLIDE-fast

This is an updated version of the fast multilabel Scandinavian language identification model described in our [paper](https://aclanthology.org/2025.resourceful-1.33/).
The updated version is `able' to distinguish Nynorsk from Icelandic/Faroese, scoring Strict Accuracy **93.6** on our test dataset and **94.9** on [Haas and Derczynski, 2021](https://aclanthology.org/2021.vardial-1.8/).

## Example usage

```commandline
git clone git@github.com:ltgoslo/slide.git
cd src/
python3 fast_usage_example.py
```

## Cite us
```
@inproceedings{fedorova-etal-2025-multi,
    title = "Multi-label {S}candinavian Language Identification ({SLIDE})",
    author = "Fedorova, Mariia  and
      Frydenberg, Jonas Sebulon  and
      Handford, Victoria  and
      Lang{\o}, Victoria Ovedie Chruickshank  and
      Willoch, Solveig Helene  and
      Midtgaard, Marthe L{\o}ken  and
      Scherrer, Yves  and
      M{\ae}hlum, Petter  and
      Samuel, David",
    editor = "Holdt, {\v{S}}pela Arhar  and
      Ilinykh, Nikolai  and
      Scalvini, Barbara  and
      Bruton, Micaella  and
      Debess, Iben Nyholm  and
      Tudor, Crina Madalina",
    booktitle = "Proceedings of the Third Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2025)",
    month = mar,
    year = "2025",
    address = "Tallinn, Estonia",
    publisher = "University of Tartu Library, Estonia",
    url = "https://aclanthology.org/2025.resourceful-1.33/",
    pages = "179--189",
    ISBN = "978-9908-53-121-2",
    abstract = "Identifying closely related languages at sentence level is difficult, in particular because it is often impossible to assign a sentence to a single language. In this paper, we focus on multi-label sentence-level Scandinavian language identification (LID) for Danish, Norwegian Bokm{\r{a}}l, Norwegian Nynorsk, and Swedish. We present the Scandinavian Language Identification and Evaluation, SLIDE, a manually curated multi-label evaluation dataset and a suite of LID models with varying speed{--}accuracy tradeoffs. We demonstrate that the ability to identify multiple languages simultaneously is necessary for any accurate LID method, and present a novel approach to training such multi-label LID models."
}
```