Token Classification
Transformers
TensorBoard
Safetensors
xlm-roberta
Generated from Trainer
language-identification
codeswitching
Instructions to use DerivedFunction/polyglot-tagger-v2.2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use DerivedFunction/polyglot-tagger-v2.2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("token-classification", model="DerivedFunction/polyglot-tagger-v2.2")# Load model directly from transformers import AutoTokenizer, AutoModelForTokenClassification tokenizer = AutoTokenizer.from_pretrained("DerivedFunction/polyglot-tagger-v2.2") model = AutoModelForTokenClassification.from_pretrained("DerivedFunction/polyglot-tagger-v2.2") - Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -168,6 +168,8 @@ To generalize well on both the target language and code switching a circulumn is
|
|
| 168 |
- Homogenous 25%: Single language + one foreign sentence to learn simple code switching
|
| 169 |
- Spliced 10%: A foreign sentence is centered between two same-language sentence, with the first sentence's punctuation stripped, and second sentence's forced to be lowercased.
|
| 170 |
- Mixed 10%: Generic mix of any languages.
|
|
|
|
|
|
|
| 171 |
| lang | train sentences | train tokens | eval sentences | eval tokens | all sentences | all tokens |
|
| 172 |
| :--- | ---: | ---: | ---: | ---: | ---: | ---: |
|
| 173 |
| en | 342138 (2.14%) | 8515554 (1.58%) | 2925 (3.89%) | 29279 (1.57%) | 345063 (2.14%) | 8544833 (1.58%) |
|
|
|
|
| 168 |
- Homogenous 25%: Single language + one foreign sentence to learn simple code switching
|
| 169 |
- Spliced 10%: A foreign sentence is centered between two same-language sentence, with the first sentence's punctuation stripped, and second sentence's forced to be lowercased.
|
| 170 |
- Mixed 10%: Generic mix of any languages.
|
| 171 |
+
-
|
| 172 |
+
### Training Data Breakdown
|
| 173 |
| lang | train sentences | train tokens | eval sentences | eval tokens | all sentences | all tokens |
|
| 174 |
| :--- | ---: | ---: | ---: | ---: | ---: | ---: |
|
| 175 |
| en | 342138 (2.14%) | 8515554 (1.58%) | 2925 (3.89%) | 29279 (1.57%) | 345063 (2.14%) | 8544833 (1.58%) |
|