Mapping from language to CommonLID languages
#1
by stephantulkens - opened
Hello!
I'd like to know how you mapped from the languages in the https://huggingface.co/datasets/PleIAs/CommonLingua-Train to the language set in CommonLID. You mention that you:
performed "iso639-lang normalisation, equivalence-class collapsing applied identically"
and "discarded Lingala from our evaluation since most samples from CommonLID turned out to belong to other close languages."
Is there a piece of code that implements these things? I'd like to reproduce your results.
Thanks!