Mapping from language to CommonLID languages

#1
by stephantulkens - opened

Hello!

I'd like to know how you mapped from the languages in the https://huggingface.co/datasets/PleIAs/CommonLingua-Train to the language set in CommonLID. You mention that you:

performed "iso639-lang normalisation, equivalence-class collapsing applied identically"
and "discarded Lingala from our evaluation since most samples from CommonLID turned out to belong to other close languages."

Is there a piece of code that implements these things? I'd like to reproduce your results.

Thanks!

Sign up or log in to comment