cis-lmu/glotlid-corpus
Viewer • Updated • 311M • 129 • 13
How to use paruwka/LiteLID with fastText:
from huggingface_hub import hf_hub_download
import fasttext
model = fasttext.load_model(hf_hub_download("paruwka/LiteLID", "model.bin"))This model is a reproduction of GlotLID on 125 languages using the Latn script, trained on the original GlotLID-C dataset for these languages, enriched by 1 million word-level examples per language. The word-level examples were obtained from splitting sentences from the dataset. It has also been trained with a bigger hashmap than GlotLID (2e6 instead of 1e6)..
import fasttext
from huggingface_hub import hf_hub_download
model_path = hf_hub_download(repo_id="paruwka/LiteLID", filename="wordlid_v3.ftz", cache_dir=None)
model = fasttext.load_model(model_path)
model.predict(['predicting', 'language'], k=3) # this will return a tuple: (list of lists of top-k language labels, list of lists of their respective probabilities)
lr=0.8, epochs=1, dim=256, minn=2, maxn=5, bucket=2000000, loss='softmax'
...
Base model
cis-lmu/glotlid