Billyyy
/

mon_nllb_3B_r32

Generated from Trainer

Model card Files Files and versions

Billyyy commited on Feb 28, 2025

Commit

ff085b6

·

verified ·

1 Parent(s): cff3dac

Model save

Files changed (1) hide show

README.md +88 -0

README.md ADDED Viewed

	@@ -0,0 +1,88 @@

+---
+library_name: peft
+license: cc-by-nc-4.0
+base_model: facebook/nllb-200-distilled-1.3B
+tags:
+- generated_from_trainer
+model-index:
+- name: mon_nllb_3B_r32
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# mon_nllb_3B_r32
+This model is a fine-tuned version of [facebook/nllb-200-distilled-1.3B](https://huggingface.co/facebook/nllb-200-distilled-1.3B) on an unknown dataset.
+It achieves the following results on the evaluation set:
+- Loss: 7.2132
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 0.0001
+- train_batch_size: 40
+- eval_batch_size: 16
+- seed: 42
+- gradient_accumulation_steps: 4
+- total_train_batch_size: 160
+- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
+- lr_scheduler_type: cosine
+- lr_scheduler_warmup_steps: 500
+- num_epochs: 2
+- mixed_precision_training: Native AMP
+### Training results
+| Training Loss | Epoch  | Step  | Validation Loss |
+|:-------------:|:------:|:-----:|:---------------:|
+| 7.4511        | 0.0761 | 500   | 7.2785          |
+| 7.3373        | 0.1522 | 1000  | 7.2305          |
+| 7.2568        | 0.2283 | 1500  | 7.2138          |
+| 7.2365        | 0.3044 | 2000  | 7.2126          |
+| 7.2619        | 0.3805 | 2500  | 7.2130          |
+| 7.2272        | 0.4567 | 3000  | 7.2117          |
+| 7.2336        | 0.5328 | 3500  | 7.2137          |
+| 7.2263        | 0.6089 | 4000  | 7.2139          |
+| 7.2321        | 0.6850 | 4500  | 7.2129          |
+| 7.2257        | 0.7611 | 5000  | 7.2124          |
+| 7.2248        | 0.8372 | 5500  | 7.2121          |
+| 7.2289        | 0.9133 | 6000  | 7.2121          |
+| 7.2144        | 0.9894 | 6500  | 7.2131          |
+| 7.2155        | 1.0656 | 7000  | 7.2133          |
+| 7.215         | 1.1417 | 7500  | 7.2130          |
+| 7.2146        | 1.2178 | 8000  | 7.2122          |
+| 7.1995        | 1.2939 | 8500  | 7.2126          |
+| 7.2025        | 1.3700 | 9000  | 7.2136          |
+| 7.2302        | 1.4462 | 9500  | 7.2128          |
+| 7.2078        | 1.5223 | 10000 | 7.2133          |
+| 7.2063        | 1.5984 | 10500 | 7.2133          |
+| 7.216         | 1.6745 | 11000 | 7.2128          |
+| 7.1949        | 1.7506 | 11500 | 7.2132          |
+| 7.2213        | 1.8267 | 12000 | 7.2131          |
+| 7.2236        | 1.9028 | 12500 | 7.2132          |
+| 7.2244        | 1.9789 | 13000 | 7.2132          |
+### Framework versions
+- PEFT 0.14.0
+- Transformers 4.49.0
+- Pytorch 2.6.0+cu124
+- Datasets 3.3.2
+- Tokenizers 0.21.0