| language: | |
| - en | |
| - tok | |
| - multilingual | |
| license: apache-2.0 | |
| tags: | |
| - generated_from_trainer | |
| - translation | |
| widget: | |
| - text: Hello, my name is Tom. | |
| - text: Can the cat speak English? | |
| base_model: Helsinki-NLP/opus-mt-en-ROMANCE | |
| model-index: | |
| - name: en-toki-mt | |
| results: [] | |
| # en-toki-mt | |
| This model is a fine-tuned version of [Helsinki-NLP/opus-mt-en-ROMANCE](https://huggingface.co/Helsinki-NLP/opus-mt-en-ROMANCE) on the English - toki pona translation dataset on Tatoeba. | |
| ## Model description | |
| toki pona is a minimalist constructed language created in 2014 by Sonja Lang. The language features a very small volcabulary (~130 words) and a very simple grammar structure. | |
| ## Intended uses & limitations | |
| This model aims to translate English to Toki pona. | |
| ## Training and evaluation data | |
| The training data is acquired from all En-Toki sentence pairs on [Tatoeba](https://tatoeba.org/en) (~20000 pairs), without any filtering. Since this dataset mostly only includes core words (pu), it may produce inaccurate results when encountering more complex words. The model achieved a BLEU score of 54 on the testing set. | |
| ## Training procedure | |
| ### Training hyperparameters | |
| The following hyperparameters were used during training: | |
| - learning_rate: 2e-05 | |
| - train_batch_size: 16 | |
| - eval_batch_size: 16 | |
| - seed: 42 | |
| - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 | |
| - lr_scheduler_type: linear | |
| - num_epochs: 10 | |
| - mixed_precision_training: Native AMP | |
| ### Framework versions | |
| - Transformers 4.20.1 | |
| - Pytorch 1.11.0 | |
| - Datasets 2.3.2 | |
| - Tokenizers 0.12.1 | |