Update README.md
Browse files
README.md
CHANGED
|
@@ -85,7 +85,7 @@ language:
|
|
| 85 |
---
|
| 86 |
|
| 87 |
|
| 88 |
-
# Polyglot Tagger: 60L
|
| 89 |
|
| 90 |
This model is a fine-tuned version of [xlm-roberta-base](https://huggingface.co/xlm-roberta-base).
|
| 91 |
It achieves the following results on the evaluation set:
|
|
@@ -97,8 +97,8 @@ It achieves the following results on the evaluation set:
|
|
| 97 |
|
| 98 |
## Model description
|
| 99 |
|
| 100 |
-
Introducing Polyglot Tagger 60L, a new way to classify multi-lingual documents. By training specifically on token classification on individual sentences, the model
|
| 101 |
-
on a variety of languages, while also behaves as a multi-label classifier, and extracts sentences based on its language.
|
| 102 |
|
| 103 |
## Intended uses & limitations
|
| 104 |
This model can be treated as a base model for further fine-tuning on specific language identification extraction tasks.
|
|
@@ -171,6 +171,8 @@ Top token languages:
|
|
| 171 |
ko 3958
|
| 172 |
|
| 173 |
## Evaluation
|
|
|
|
|
|
|
| 174 |
### The model scored the following on `papulca/language-identification`'s test set
|
| 175 |
|Language | Correct | Total | Accuracy |
|
| 176 |
|-------------|----------|-------------|--------|
|
|
|
|
| 85 |
---
|
| 86 |
|
| 87 |
|
| 88 |
+
# Polyglot Tagger: 60L (Experimental)
|
| 89 |
|
| 90 |
This model is a fine-tuned version of [xlm-roberta-base](https://huggingface.co/xlm-roberta-base).
|
| 91 |
It achieves the following results on the evaluation set:
|
|
|
|
| 97 |
|
| 98 |
## Model description
|
| 99 |
|
| 100 |
+
Introducing Polyglot Tagger 60L, a new way to classify multi-lingual documents. By training specifically on token classification on individual sentences, the model
|
| 101 |
+
generalizes well on a variety of languages, while also behaves as a multi-label classifier, and extracts sentences based on its language.
|
| 102 |
|
| 103 |
## Intended uses & limitations
|
| 104 |
This model can be treated as a base model for further fine-tuning on specific language identification extraction tasks.
|
|
|
|
| 171 |
ko 3958
|
| 172 |
|
| 173 |
## Evaluation
|
| 174 |
+
> Please note that these results are not indicative that token classification can substitute for sequence classification.
|
| 175 |
+
|
| 176 |
### The model scored the following on `papulca/language-identification`'s test set
|
| 177 |
|Language | Correct | Total | Accuracy |
|
| 178 |
|-------------|----------|-------------|--------|
|