Token Classification
Transformers
TensorBoard
Safetensors
xlm-roberta
Generated from Trainer
language-identification
codeswitching
Instructions to use DerivedFunction/polyglot-tagger-v2.2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use DerivedFunction/polyglot-tagger-v2.2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("token-classification", model="DerivedFunction/polyglot-tagger-v2.2")# Load model directly from transformers import AutoTokenizer, AutoModelForTokenClassification tokenizer = AutoTokenizer.from_pretrained("DerivedFunction/polyglot-tagger-v2.2") model = AutoModelForTokenClassification.from_pretrained("DerivedFunction/polyglot-tagger-v2.2") - Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -145,7 +145,7 @@ generalizes well on a variety of languages, while also behaves as a multi-label
|
|
| 145 |
|
| 146 |
## Intended uses & limitations
|
| 147 |
This model can be treated as a base model for further fine-tuning on specific language identification extraction tasks.
|
| 148 |
-
Note that as a general language tagging model, it can potentially get confused from shared language families or from short texts. For example,
|
| 149 |
|
| 150 |
The model is trained on a sentence with a minimum of four tokens, so it may not accurately classify very short and ambigous statements. Note that this model is experimental
|
| 151 |
and may produce unexpected results compared to generic text classifiers. It is trained on cleaned text, therefore, "messy" text may unexpectedly produce different results.
|
|
|
|
| 145 |
|
| 146 |
## Intended uses & limitations
|
| 147 |
This model can be treated as a base model for further fine-tuning on specific language identification extraction tasks.
|
| 148 |
+
Note that as a general language tagging model, it can potentially get confused from shared language families or from short texts. For example, Danish and Norwegian, Spanish and Portuguese, and Russian and Ukrainian.
|
| 149 |
|
| 150 |
The model is trained on a sentence with a minimum of four tokens, so it may not accurately classify very short and ambigous statements. Note that this model is experimental
|
| 151 |
and may produce unexpected results compared to generic text classifiers. It is trained on cleaned text, therefore, "messy" text may unexpectedly produce different results.
|