| | --- |
| | library_name: transformers |
| | license: cc-by-nc-4.0 |
| | base_model: facebook/nllb-200-distilled-600M |
| | tags: |
| | - generated_from_trainer |
| | metrics: |
| | - bleu |
| | model-index: |
| | - name: nllb_complete |
| | results: [] |
| | --- |
| | |
| | <!-- This model card has been generated automatically according to the information the Trainer had access to. You |
| | should probably proofread and complete it, then remove this comment. --> |
| |
|
| | # nllb_complete |
| | |
| | This model is a fine-tuned version of [facebook/nllb-200-distilled-600M](https://huggingface.co/facebook/nllb-200-distilled-600M) on an unknown dataset. |
| | It achieves the following results on the evaluation set: |
| | - Loss: 0.5537 |
| | - Bleu: 54.0987 |
| | - Gen Len: 17.1547 |
| | |
| | ## Model description |
| | |
| | More information needed |
| | |
| | ## Intended uses & limitations |
| | |
| | More information needed |
| | |
| | ## Training and evaluation data |
| | |
| | More information needed |
| | |
| | ## Training procedure |
| | |
| | ### Training hyperparameters |
| | |
| | The following hyperparameters were used during training: |
| | - learning_rate: 3e-05 |
| | - train_batch_size: 2 |
| | - eval_batch_size: 2 |
| | - seed: 42 |
| | - gradient_accumulation_steps: 16 |
| | - total_train_batch_size: 32 |
| | - optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments |
| | - lr_scheduler_type: linear |
| | - lr_scheduler_warmup_steps: 5000 |
| | - num_epochs: 24.0 |
| | - mixed_precision_training: Native AMP |
| | |
| | ### Training results |
| | |
| | | Training Loss | Epoch | Step | Validation Loss | Bleu | Gen Len | |
| | |:-------------:|:-------:|:------:|:---------------:|:-------:|:-------:| |
| | | 0.7527 | 2.0916 | 10000 | 0.7154 | 41.0087 | 17.0123 | |
| | | 0.5274 | 4.1832 | 20000 | 0.5901 | 46.2079 | 17.2597 | |
| | | 0.4209 | 6.2748 | 30000 | 0.5502 | 49.6827 | 17.0787 | |
| | | 0.381 | 8.3665 | 40000 | 0.5324 | 51.0468 | 17.1997 | |
| | | 0.3123 | 10.4581 | 50000 | 0.5264 | 52.239 | 17.0687 | |
| | | 0.279 | 12.5497 | 60000 | 0.5292 | 52.8077 | 17.163 | |
| | | 0.2568 | 14.6413 | 70000 | 0.5320 | 53.2148 | 17.1983 | |
| | | 0.2234 | 16.7329 | 80000 | 0.5415 | 53.2988 | 17.1817 | |
| | | 0.2208 | 18.8245 | 90000 | 0.5455 | 53.9008 | 17.1253 | |
| | | 0.2179 | 20.9162 | 100000 | 0.5501 | 54.2302 | 17.107 | |
| | | 0.2057 | 23.0077 | 110000 | 0.5537 | 54.0987 | 17.1547 | |
| | |
| | |
| | ### Framework versions |
| | |
| | - Transformers 4.53.3 |
| | - Pytorch 2.7.1+cu126 |
| | - Datasets 3.6.0 |
| | - Tokenizers 0.21.2 |
| | |