| --- |
| library_name: transformers |
| license: mit |
| base_model: xlm-roberta-base |
| tags: |
| - generated_from_trainer |
| - language-identification |
| metrics: |
| - precision |
| - recall |
| - f1 |
| - accuracy |
| language: |
| - multilingual |
| - af |
| - am |
| - ar |
| - as |
| - ba |
| - be |
| - bg |
| - bn |
| - bo |
| - br |
| - bs |
| - ca |
| - ce |
| - ckb |
| - cs |
| - cy |
| - da |
| - de |
| - dv |
| - el |
| - en |
| - eo |
| - es |
| - et |
| - eu |
| - fa |
| - fi |
| - fr |
| - ga |
| - gd |
| - gl |
| - gu |
| - he |
| - hi |
| - hr |
| - hu |
| - hy |
| - id |
| - is |
| - it |
| - ja |
| - jv |
| - ka |
| - kk |
| - km |
| - kn |
| - ko |
| - ku |
| - ky |
| - la |
| - lb |
| - lo |
| - lt |
| - lv |
| - mg |
| - mk |
| - ml |
| - mn |
| - mr |
| - ms |
| - mt |
| - my |
| - ne |
| - nl |
| - 'no' |
| - ny |
| - oc |
| - om |
| - or |
| - pa |
| - pl |
| - ps |
| - pt |
| - rm |
| - ro |
| - ru |
| - sd |
| - si |
| - sk |
| - sl |
| - so |
| - sq |
| - sr |
| - su |
| - sv |
| - sw |
| - ta |
| - te |
| - tg |
| - th |
| - ti |
| - tl |
| - tr |
| - tt |
| - ug |
| - uk |
| - ur |
| - uz |
| - vi |
| - yo |
| - yi |
| - zh |
| - zu |
| model-index: |
| - name: polyglot-tagger |
| results: [] |
| datasets: |
| - wikimedia/wikipedia |
| - HuggingFaceFW/finetranslations |
| - google/smol |
| - polyglot-tagger/nlp-noise-snippets |
| - polyglot-tagger/wikipedia-language-snippets-filtered |
| - polyglot-tagger/finetranslations-filtered |
| - polyglot-tagger/tatoeba-filtered |
| pipeline_tag: text-classification |
| --- |
| |
| # Polyglot Tagger: Multi-label Language Identification |
|
|
| Refer to `polyglot-tagger/language-identification`. It is trained on the same dataset as a text-classifier rather than as a token classifier. |
|
|
| This model is a fine-tuned version of [xlm-roberta-base](https://huggingface.co/xlm-roberta-base). |
| It achieves the following results on the evaluation set: |
| - Loss: 0.0123 |
| - Precision: 0.9859 |
| - Recall: 0.9831 |
| - F1: 0.9845 |
| - Accuracy: 0.9412 |
|
|
|
|
|
|
| ## Training procedure |
|
|
| ### Training hyperparameters |
|
|
| The following hyperparameters were used during training: |
| - learning_rate: 5e-05 |
| - train_batch_size: 32 |
| - eval_batch_size: 32 |
| - seed: 42 |
| - gradient_accumulation_steps: 18 |
| - total_train_batch_size: 576 |
| - optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments |
| - lr_scheduler_type: linear |
| - num_epochs: 2 |
| - mixed_precision_training: Native AMP |
|
|
| ### Training results |
|
|
| | Training Loss | Epoch | Step | Accuracy | F1 | Validation Loss | Precision | Recall | |
| |:-------------:|:------:|:-----:|:--------:|:------:|:---------------:|:---------:|:------:| |
| | 0.2186 | 0.2925 | 2500 | 0.8560 | 0.9651 | 0.0395 | 0.9778 | 0.9528 | |
| | 0.1331 | 0.5851 | 5000 | 0.0232 | 0.9803 | 0.9717 | 0.9760 | 0.9070 | |
| | 0.1044 | 0.8776 | 7500 | 0.0172 | 0.9828 | 0.9774 | 0.9801 | 0.9218 | |
| | 0.0851 | 1.1700 | 10000 | 0.0150 | 0.9844 | 0.9801 | 0.9822 | 0.9311 | |
| | 0.0783 | 1.4626 | 12500 | 0.0136 | 0.9859 | 0.9809 | 0.9834 | 0.9354 | |
| | 0.0705 | 1.7551 | 15000 | 0.0126 | 0.9861 | 0.9826 | 0.9843 | 0.9399 | |
| | 0.0692 | 2.0 | 17094 | 0.0123 | 0.9859 | 0.9831 | 0.9845 | 0.9412 | |
|
|
|
|
| ### Framework versions |
|
|
| - Transformers 5.5.4 |
| - Pytorch 2.11.0+cu128 |
| - Datasets 4.8.4 |
| - Tokenizers 0.22.2 |
|
|