How to use from the
Use from the
Transformers library
# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-classification", model="whitefoxredhell/language_identification")
# Load model directly
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("whitefoxredhell/language_identification")
model = AutoModelForSeq2SeqLM.from_pretrained("whitefoxredhell/language_identification")
Quick Links

language identification mt0

This model is a fine-tuned version of encoder from bigscience/mt0-small on the Language Identification dataset as well as some private data.

Limitations

Currently, it supports the following 20 languages:

arabic (ar), bulgarian (bg), german (de), modern greek (el), english (en), spanish (es), french (fr), hindi (hi), italian (it), kyrgyz (ky), uzbek (uz), persian (fa), lithuanian (lt), japanese (ja), dutch (nl), polish (pl), portuguese (pt), russian (ru), swahili (sw), thai (th), turkish (tr), urdu (ur), vietnamese (vi), and chinese (zh)

Inference

First you will need to have this library installed

pip install bert-for-sequence classification
from bert_clf import EncoderCLF
import torch

model = EncoderCLF("whitefoxredhell/language_identification")
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)
model = model.eval()

text = "London is the capital of Great Britain"

model.predict(text)
# 'en'

model.predict_proba(text)
# {
#   'fr': 3.022890814463608e-05,
#   'zh': 2.328997834410984e-05,
#   'fa': 5.344639430404641e-05,
#   'ky': 3.5296812711749226e-05,
#   'ru': 2.3277720174519345e-05,
#   'lt': 0.00021786204888485372,
#   'uz': 3.461417873040773e-05,
#   'en': 0.999232292175293,
#   'pt': 1.2590448022820055e-05,
#   'bg': 1.5775613064761274e-05,
#   'th': 9.429674719285686e-06,
#   'pl': 2.4624938305350952e-05,
#   'ur': 3.982995986007154e-05,
#   'sw': 4.8921840061666444e-05,
#   'tr': 2.6844283638638444e-05,
#   'es': 2.325668538105674e-05,
#   'ar': 2.4103366740746424e-05,
#   'it': 1.8611381165101193e-05,
#   'hi': 1.4575023669749498e-05,
#   'de': 2.210299498983659e-05,
#   'el': 1.3880739061278291e-05,
#   'nl': 2.767637124634348e-05,
#   'vi': 1.3878144272894133e-05,
#   'ja': 1.3629408385895658e-05
# }
Downloads last month
14
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using whitefoxredhell/language_identification 1