| --- |
| license: unlicense |
| --- |
| # OkayLID |
| OkayLID is a language identification model in FastText that is only 3 megabytes, meant for basic language detection. It can detect over 201 languages, at an extremely small size. OkayLID trained on a smaller subset of the OpenLID dataset. |
|
|
| ## Installation |
|
|
| ```bash |
| pip install fasttext huggingface_hub |
| ``` |
|
|
| ## Usage |
|
|
| ```python |
| import numpy as np |
| import fasttext |
| from huggingface_hub import hf_hub_download |
| |
| def setup_environment(): |
| original_array = np.array |
| def fixed_array(obj, *args, **kwargs): |
| if kwargs.get("copy") is False: |
| return np.asarray(obj) |
| return original_array(obj, *args, **kwargs) |
| np.array = fixed_array |
| |
| setup_environment() |
| |
| model_path = hf_hub_download(repo_id="Cutecat6152/OkayLID", filename="OkayLID.bin") |
| model = fasttext.load_model(model_path) |
| |
| text = "The quick brown fox jumps over the lazy dog." |
| labels, probs = model.predict(text, k=1) |
| |
| print(f"Language: {labels[0].replace('__label__', '')}") |
| ``` |