Instructions to use whitefoxredhell/language_identification with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use whitefoxredhell/language_identification with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="whitefoxredhell/language_identification")# Load model directly from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("whitefoxredhell/language_identification") model = AutoModelForSeq2SeqLM.from_pretrained("whitefoxredhell/language_identification") - Notebooks
- Google Colab
- Kaggle
Commit ·
1f8179c
1
Parent(s): 5d688ea
Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,87 @@
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: mit
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# language identification mt0
|
| 2 |
+
|
| 3 |
---
|
| 4 |
license: mit
|
| 5 |
+
language:
|
| 6 |
+
- fr
|
| 7 |
+
- zh
|
| 8 |
+
- fa
|
| 9 |
+
- ky
|
| 10 |
+
- ru
|
| 11 |
+
- lt
|
| 12 |
+
- uz
|
| 13 |
+
- en
|
| 14 |
+
- pt
|
| 15 |
+
- bg
|
| 16 |
+
- th
|
| 17 |
+
- pl
|
| 18 |
+
- ur
|
| 19 |
+
- sw
|
| 20 |
+
- tr
|
| 21 |
+
- es
|
| 22 |
+
- ar
|
| 23 |
+
- it
|
| 24 |
+
- hi
|
| 25 |
+
- de
|
| 26 |
+
- el
|
| 27 |
+
- nl
|
| 28 |
+
- vi
|
| 29 |
+
- ja
|
| 30 |
+
pipeline_tag: text-classification
|
| 31 |
---
|
| 32 |
+
|
| 33 |
+
This model is a fine-tuned version of encoder from [bigscience/mt0-small](https://huggingface.co/bigscience/mt0-small) on the [Language Identification](https://huggingface.co/datasets/papluca/language-identification#additional-information) dataset as well as some private data.
|
| 34 |
+
|
| 35 |
+
## Limitations
|
| 36 |
+
|
| 37 |
+
Currently, it supports the following 20 languages:
|
| 38 |
+
|
| 39 |
+
arabic (ar), bulgarian (bg), german (de), modern greek (el), english (en), spanish (es), french (fr), hindi (hi), italian (it), kyrgyz (ky), uzbek (uz), persian (fa), lithuanian (lt), japanese (ja), dutch (nl), polish (pl), portuguese (pt), russian (ru), swahili (sw), thai (th), turkish (tr), urdu (ur), vietnamese (vi), and chinese (zh)
|
| 40 |
+
|
| 41 |
+
## Inference
|
| 42 |
+
|
| 43 |
+
First you will need to have this library installed
|
| 44 |
+
|
| 45 |
+
```python
|
| 46 |
+
pip install bert-for-sequence classfication
|
| 47 |
+
```
|
| 48 |
+
|
| 49 |
+
|
| 50 |
+
```python
|
| 51 |
+
from bert_clf import EncoderCLF
|
| 52 |
+
|
| 53 |
+
model = EncoderCLF("whitefoxredhell/language_identification")
|
| 54 |
+
|
| 55 |
+
text = "London is the capital of Great Britain"
|
| 56 |
+
|
| 57 |
+
model.predict(text)
|
| 58 |
+
# 'en'
|
| 59 |
+
|
| 60 |
+
model.predict_proba(text)
|
| 61 |
+
# {
|
| 62 |
+
# 'fr': 3.022890814463608e-05,
|
| 63 |
+
# 'zh': 2.328997834410984e-05,
|
| 64 |
+
# 'fa': 5.344639430404641e-05,
|
| 65 |
+
# 'ky': 3.5296812711749226e-05,
|
| 66 |
+
# 'ru': 2.3277720174519345e-05,
|
| 67 |
+
# 'lt': 0.00021786204888485372,
|
| 68 |
+
# 'uz': 3.461417873040773e-05,
|
| 69 |
+
# 'en': 0.999232292175293,
|
| 70 |
+
# 'pt': 1.2590448022820055e-05,
|
| 71 |
+
# 'bg': 1.5775613064761274e-05,
|
| 72 |
+
# 'th': 9.429674719285686e-06,
|
| 73 |
+
# 'pl': 2.4624938305350952e-05,
|
| 74 |
+
# 'ur': 3.982995986007154e-05,
|
| 75 |
+
# 'sw': 4.8921840061666444e-05,
|
| 76 |
+
# 'tr': 2.6844283638638444e-05,
|
| 77 |
+
# 'es': 2.325668538105674e-05,
|
| 78 |
+
# 'ar': 2.4103366740746424e-05,
|
| 79 |
+
# 'it': 1.8611381165101193e-05,
|
| 80 |
+
# 'hi': 1.4575023669749498e-05,
|
| 81 |
+
# 'de': 2.210299498983659e-05,
|
| 82 |
+
# 'el': 1.3880739061278291e-05,
|
| 83 |
+
# 'nl': 2.767637124634348e-05,
|
| 84 |
+
# 'vi': 1.3878144272894133e-05,
|
| 85 |
+
# 'ja': 1.3629408385895658e-05
|
| 86 |
+
# }
|
| 87 |
+
```
|