DerivedFunction commited on
Commit
a427c70
·
verified ·
1 Parent(s): 5fc6f5c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -11
README.md CHANGED
@@ -127,24 +127,19 @@ datasets:
127
  pipeline_tag: token-classification
128
  ---
129
 
130
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
131
- should probably proofread and complete it, then remove this comment. -->
132
 
133
- # Polyglot Tagger: 100 Languages
 
 
 
 
134
 
135
- This model is a fine-tuned version of [xlm-roberta-base](https://huggingface.co/xlm-roberta-base).
136
- It achieves the following results on the evaluation set:
137
- - Loss: 0.0452
138
- - Precision: 0.8626
139
- - Recall: 0.8916
140
- - F1: 0.8769
141
- - Accuracy: 0.9892
142
 
143
 
144
  ## Model description
145
 
146
  Introducing Polyglot Tagger, a new way to classify multi-lingual documents. By training specifically on token classification on individual sentences, the model
147
- generalizes well on a variety of languages, while also behaves as a multi-label classifier, and extracts sentences based on its language.
148
 
149
  ## Intended uses & limitations
150
  This model can be treated as a base model for further fine-tuning on specific language identification extraction tasks.
@@ -167,6 +162,7 @@ The data composition follows a strategic curriculum:
167
 
168
 
169
 
 
170
  ## Training procedure
171
 
172
  ### Training hyperparameters
@@ -185,6 +181,13 @@ The following hyperparameters were used during training:
185
 
186
  ### Training results
187
 
 
 
 
 
 
 
 
188
  | Training Loss | Epoch | Step | Validation Loss | Precision | Recall | F1 | Accuracy |
189
  |:-------------:|:------:|:-----:|:---------------:|:---------:|:------:|:------:|:--------:|
190
  | 0.0730 | 0.0905 | 2500 | 0.1081 | 0.7241 | 0.8260 | 0.7717 | 0.9760 |
 
127
  pipeline_tag: token-classification
128
  ---
129
 
 
 
130
 
131
+ ![Polyglot Tagger banner](assets/model-card-banner.svg)
132
+
133
+ Fine-tuned `xlm-roberta-base` for sentence-level language tagging across 100 languages.
134
+ The model predicts BIO-style language tags over tokens, which makes it useful for
135
+ language identification, code-switch detection, and multilingual document analysis.
136
 
 
 
 
 
 
 
 
137
 
138
 
139
  ## Model description
140
 
141
  Introducing Polyglot Tagger, a new way to classify multi-lingual documents. By training specifically on token classification on individual sentences, the model
142
+ generalizes well on a variety of languages, while also behaves as a multi-label classifier, and extracts sentences based on its language.
143
 
144
  ## Intended uses & limitations
145
  This model can be treated as a base model for further fine-tuning on specific language identification extraction tasks.
 
162
 
163
 
164
 
165
+
166
  ## Training procedure
167
 
168
  ### Training hyperparameters
 
181
 
182
  ### Training results
183
 
184
+ It achieves the following results on the evaluation set:
185
+ - Loss: 0.0452
186
+ - Precision: 0.8626
187
+ - Recall: 0.8916
188
+ - F1: 0.8769
189
+ - Accuracy: 0.9892
190
+
191
  | Training Loss | Epoch | Step | Validation Loss | Precision | Recall | F1 | Accuracy |
192
  |:-------------:|:------:|:-----:|:---------------:|:---------:|:------:|:------:|:--------:|
193
  | 0.0730 | 0.0905 | 2500 | 0.1081 | 0.7241 | 0.8260 | 0.7717 | 0.9760 |