Instructions to use LocalDoc/language_detection with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use LocalDoc/language_detection with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="LocalDoc/language_detection")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("LocalDoc/language_detection") model = AutoModelForSequenceClassification.from_pretrained("LocalDoc/language_detection") - Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,92 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: cc-by-nc-4.0
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: cc-by-nc-4.0
|
| 3 |
+
language:
|
| 4 |
+
- ar
|
| 5 |
+
- az
|
| 6 |
+
- bg
|
| 7 |
+
- de
|
| 8 |
+
- el
|
| 9 |
+
- en
|
| 10 |
+
- es
|
| 11 |
+
- fr
|
| 12 |
+
- hi
|
| 13 |
+
- it
|
| 14 |
+
- ja
|
| 15 |
+
- nl
|
| 16 |
+
- pl
|
| 17 |
+
- pt
|
| 18 |
+
- ru
|
| 19 |
+
- sw
|
| 20 |
+
- th
|
| 21 |
+
- tr
|
| 22 |
+
- ur
|
| 23 |
+
- vi
|
| 24 |
+
- zh
|
| 25 |
+
pipeline_tag: text-classification
|
| 26 |
+
tags:
|
| 27 |
+
- language detect
|
| 28 |
+
---
|
| 29 |
+
|
| 30 |
+
# Multilingual Language Detection Model
|
| 31 |
+
|
| 32 |
+
## Model Description
|
| 33 |
+
This repository contains a multilingual language detection model based on the XLM-RoBERTa base architecture. The model is capable of distinguishing between 21 different languages including Arabic, Azerbaijani, Bulgarian, German, Greek, English, Spanish, French, Hindi, Italian, Japanese, Dutch, Polish, Portuguese, Russian, Swahili, Thai, Turkish, Urdu, Vietnamese, and Chinese.
|
| 34 |
+
|
| 35 |
+
## How to Use
|
| 36 |
+
You can use this model directly with a pipeline for text classification, or you can use it with the `transformers` library for more custom usage, as shown in the example below.
|
| 37 |
+
|
| 38 |
+
### Quick Start
|
| 39 |
+
First, install the transformers library if you haven't already:
|
| 40 |
+
```bash
|
| 41 |
+
pip install transformers
|
| 42 |
+
|
| 43 |
+
|
| 44 |
+
from transformers import AutoModelForSequenceClassification, AutoTokenizer
|
| 45 |
+
import torch
|
| 46 |
+
|
| 47 |
+
# Load tokenizer and model
|
| 48 |
+
tokenizer = AutoTokenizer.from_pretrained("LocalDoc/language_detection")
|
| 49 |
+
model = AutoModelForSequenceClassification.from_pretrained("LocalDoc/language_detection")
|
| 50 |
+
|
| 51 |
+
# Prepare text
|
| 52 |
+
text = "Əlqasım oğulları vorzakondu"
|
| 53 |
+
encoded_input = tokenizer(text, return_tensors='pt', truncation=True, max_length=512)
|
| 54 |
+
|
| 55 |
+
# Prediction
|
| 56 |
+
model.eval()
|
| 57 |
+
with torch.no_grad():
|
| 58 |
+
outputs = model(**encoded_input)
|
| 59 |
+
|
| 60 |
+
# Process the outputs
|
| 61 |
+
logits = outputs.logits
|
| 62 |
+
probabilities = torch.nn.functional.softmax(logits, dim=-1)
|
| 63 |
+
predicted_class_index = probabilities.argmax().item()
|
| 64 |
+
labels = ["az", "ar", "bg", "de", "el", "en", "es", "fr", "hi", "it", "ja", "nl", "pl", "pt", "ru", "sw", "th", "tr", "ur", "vi", "zh"]
|
| 65 |
+
predicted_label = labels[predicted_class_index]
|
| 66 |
+
print(f"Predicted Language: {predicted_label}")
|
| 67 |
+
|
| 68 |
+
|
| 69 |
+
Training Performance
|
| 70 |
+
|
| 71 |
+
The model was trained over three epochs, showing consistent improvement in accuracy and loss:
|
| 72 |
+
|
| 73 |
+
<b>Epoch 1:</b> Training Loss: 0.0127, Validation Loss: 0.0174, Accuracy: 0.9966, F1 Score: 0.9966
|
| 74 |
+
<b>Epoch 2:</b> Training Loss: 0.0149, Validation Loss: 0.0141, Accuracy: 0.9973, F1 Score: 0.9973
|
| 75 |
+
<b>Epoch 3:</b> Training Loss: 0.0001, Validation Loss: 0.0109, Accuracy: 0.9984, F1 Score: 0.9984
|
| 76 |
+
|
| 77 |
+
Test Results
|
| 78 |
+
|
| 79 |
+
The model achieved the following results on the test set:
|
| 80 |
+
|
| 81 |
+
Loss: 0.0133
|
| 82 |
+
Accuracy: 0.9975
|
| 83 |
+
F1 Score: 0.9975
|
| 84 |
+
Precision: 0.9975
|
| 85 |
+
Recall: 0.9975
|
| 86 |
+
Evaluation Time: 17.5 seconds
|
| 87 |
+
Samples per Second: 599.685
|
| 88 |
+
Steps per Second: 9.424
|
| 89 |
+
|
| 90 |
+
Licensing
|
| 91 |
+
|
| 92 |
+
This model is released under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license. You are free to use, modify, and distribute this model non-commercially, provided you attribute the original creation.
|