Billyyy
/

mn_nllb_3.3B_continue

PEFT

Safetensors

Generated from Trainer

Model card Files Files and versions

xet

Community

Billyyy commited on Apr 5, 2025

Commit

7389a02

verified ·

1 Parent(s): c187f19

Model save

Browse files

Files changed (1) hide show

README.md +40 -21

README.md CHANGED Viewed

@@ -1,22 +1,22 @@
 ---
 library_name: peft
 license: cc-by-nc-4.0
-base_model: facebook/nllb-200-distilled-1.3B
 tags:
 - generated_from_trainer
 model-index:
-- name: mn_nllb_1.3B_continue
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
-# mn_nllb_1.3B_continue
-This model is a fine-tuned version of [facebook/nllb-200-distilled-1.3B](https://huggingface.co/facebook/nllb-200-distilled-1.3B) on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Loss: 7.1680
 ## Model description
@@ -36,11 +36,11 @@ More information needed
 The following hyperparameters were used during training:
 - learning_rate: 5e-05
-- train_batch_size: 40
 - eval_batch_size: 16
 - seed: 42
 - gradient_accumulation_steps: 4
-- total_train_batch_size: 160
 - optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_steps: 10
@@ -49,20 +49,39 @@ The following hyperparameters were used during training:
 ### Training results
-| Training Loss | Epoch | Step | Validation Loss |
-|:-------------:|:-----:|:----:|:---------------:|
-| 7.0421        | 0.32  | 20   | 7.1684          |
-| 7.0297        | 0.64  | 40   | 7.1668          |
-| 7.0254        | 0.96  | 60   | 7.1666          |
-| 7.0207        | 1.272 | 80   | 7.1669          |
-| 7.0429        | 1.592 | 100  | 7.1672          |
-| 7.0276        | 1.912 | 120  | 7.1675          |
-| 7.0199        | 2.224 | 140  | 7.1675          |
-| 7.0254        | 2.544 | 160  | 7.1678          |
-| 7.0379        | 2.864 | 180  | 7.1678          |
-| 7.0454        | 3.176 | 200  | 7.1680          |
-| 7.0415        | 3.496 | 220  | 7.1680          |
-| 7.0466        | 3.816 | 240  | 7.1680          |
 ### Framework versions

 ---
 library_name: peft
 license: cc-by-nc-4.0
+base_model: facebook/nllb-200-3.3B
 tags:
 - generated_from_trainer
 model-index:
+- name: mn_nllb_3.3B_continue
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
+# mn_nllb_3.3B_continue
+This model is a fine-tuned version of [facebook/nllb-200-3.3B](https://huggingface.co/facebook/nllb-200-3.3B) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Loss: 6.1049
 ## Model description
 The following hyperparameters were used during training:
 - learning_rate: 5e-05
+- train_batch_size: 16
 - eval_batch_size: 16
 - seed: 42
 - gradient_accumulation_steps: 4
+- total_train_batch_size: 64
 - optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_steps: 10
 ### Training results
+| Training Loss | Epoch  | Step | Validation Loss |
+|:-------------:|:------:|:----:|:---------------:|
+| 5.9984        | 0.128  | 20   | 6.0977          |
+| 5.9679        | 0.256  | 40   | 6.0984          |
+| 6.0227        | 0.384  | 60   | 6.0978          |
+| 5.9831        | 0.512  | 80   | 6.0994          |
+| 5.9682        | 0.64   | 100  | 6.0994          |
+| 5.9982        | 0.768  | 120  | 6.1004          |
+| 5.9731        | 0.896  | 140  | 6.1007          |
+| 5.5217        | 1.0192 | 160  | 6.1018          |
+| 5.9654        | 1.1472 | 180  | 6.1024          |
+| 5.9801        | 1.2752 | 200  | 6.1027          |
+| 5.9906        | 1.4032 | 220  | 6.1030          |
+| 5.9799        | 1.5312 | 240  | 6.1031          |
+| 5.9459        | 1.6592 | 260  | 6.1041          |
+| 5.9605        | 1.7872 | 280  | 6.1036          |
+| 5.9875        | 1.9152 | 300  | 6.1037          |
+| 5.5313        | 2.0384 | 320  | 6.1040          |
+| 5.9655        | 2.1664 | 340  | 6.1039          |
+| 5.9331        | 2.2944 | 360  | 6.1043          |
+| 5.9879        | 2.4224 | 380  | 6.1046          |
+| 5.9833        | 2.5504 | 400  | 6.1045          |
+| 5.9688        | 2.6784 | 420  | 6.1045          |
+| 5.9644        | 2.8064 | 440  | 6.1045          |
+| 5.9543        | 2.9344 | 460  | 6.1047          |
+| 5.5421        | 3.0576 | 480  | 6.1048          |
+| 5.9495        | 3.1856 | 500  | 6.1048          |
+| 5.9712        | 3.3136 | 520  | 6.1049          |
+| 6.0095        | 3.4416 | 540  | 6.1049          |
+| 5.9649        | 3.5696 | 560  | 6.1049          |
+| 5.9968        | 3.6976 | 580  | 6.1049          |
+| 5.9725        | 3.8256 | 600  | 6.1049          |
+| 5.9317        | 3.9536 | 620  | 6.1049          |
 ### Framework versions