yalhessi
/

lemexp-task1-v3-template_full-deepseek-coder-6.7b-base

+---
+library_name: peft
+license: other
+base_model: deepseek-ai/deepseek-coder-6.7b-base
+tags:
+- generated_from_trainer
+model-index:
+- name: lemexp-task1-v3-template_full-deepseek-coder-6.7b-base
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# lemexp-task1-v3-template_full-deepseek-coder-6.7b-base
+This model is a fine-tuned version of [deepseek-ai/deepseek-coder-6.7b-base](https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-base) on an unknown dataset.
+It achieves the following results on the evaluation set:
+- Loss: 0.0906
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 0.0004
+- train_batch_size: 2
+- eval_batch_size: 2
+- seed: 42
+- distributed_type: multi-GPU
+- num_devices: 4
+- gradient_accumulation_steps: 2
+- total_train_batch_size: 16
+- total_eval_batch_size: 8
+- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
+- lr_scheduler_type: linear
+- num_epochs: 12
+- mixed_precision_training: Native AMP
+### Training results
+| Training Loss | Epoch   | Step   | Validation Loss |
+|:-------------:|:-------:|:------:|:---------------:|
+| 0.3446        | 0.2000  | 3114   | 0.1641          |
+| 0.305         | 0.4000  | 6228   | 0.1487          |
+| 0.2848        | 0.6000  | 9342   | 0.1397          |
+| 0.2755        | 0.8001  | 12456  | 0.1333          |
+| 0.2646        | 1.0001  | 15570  | 0.1328          |
+| 0.2486        | 1.2001  | 18684  | 0.1218          |
+| 0.2438        | 1.4001  | 21798  | 0.1231          |
+| 0.2413        | 1.6001  | 24912  | 0.1177          |
+| 0.2375        | 1.8001  | 28026  | 0.1173          |
+| 0.2326        | 2.0001  | 31140  | 0.1166          |
+| 0.2178        | 2.2001  | 34254  | 0.1147          |
+| 0.2187        | 2.4002  | 37368  | 0.1126          |
+| 0.2196        | 2.6002  | 40482  | 0.1110          |
+| 0.2163        | 2.8002  | 43596  | 0.1093          |
+| 0.2101        | 3.0002  | 46710  | 0.1079          |
+| 0.199         | 3.2002  | 49824  | 0.1073          |
+| 0.2002        | 3.4002  | 52938  | 0.1073          |
+| 0.1985        | 3.6002  | 56052  | 0.1070          |
+| 0.1966        | 3.8002  | 59166  | 0.1017          |
+| 0.197         | 4.0003  | 62280  | 0.1033          |
+| 0.1819        | 4.2003  | 65394  | 0.1020          |
+| 0.1835        | 4.4003  | 68508  | 0.1000          |
+| 0.181         | 4.6003  | 71622  | 0.1032          |
+| 0.1808        | 4.8003  | 74736  | 0.0971          |
+| 0.1802        | 5.0003  | 77850  | 0.0960          |
+| 0.1663        | 5.2003  | 80964  | 0.0967          |
+| 0.1687        | 5.4003  | 84078  | 0.0966          |
+| 0.1694        | 5.6004  | 87192  | 0.0958          |
+| 0.1654        | 5.8004  | 90306  | 0.0933          |
+| 0.167         | 6.0004  | 93420  | 0.0910          |
+| 0.1528        | 6.2004  | 96534  | 0.0927          |
+| 0.1534        | 6.4004  | 99648  | 0.0934          |
+| 0.1542        | 6.6004  | 102762 | 0.0925          |
+| 0.1519        | 6.8004  | 105876 | 0.0921          |
+| 0.1546        | 7.0004  | 108990 | 0.0888          |
+| 0.139         | 7.2005  | 112104 | 0.0926          |
+| 0.1365        | 7.4005  | 115218 | 0.0882          |
+| 0.1387        | 7.6005  | 118332 | 0.0874          |
+| 0.1366        | 7.8005  | 121446 | 0.0848          |
+| 0.1361        | 8.0005  | 124560 | 0.0867          |
+| 0.1225        | 8.2005  | 127674 | 0.0887          |
+| 0.123         | 8.4005  | 130788 | 0.0867          |
+| 0.1254        | 8.6006  | 133902 | 0.0871          |
+| 0.1259        | 8.8006  | 137016 | 0.0865          |
+| 0.1205        | 9.0006  | 140130 | 0.0842          |
+| 0.1064        | 9.2006  | 143244 | 0.0890          |
+| 0.1071        | 9.4006  | 146358 | 0.0865          |
+| 0.109         | 9.6006  | 149472 | 0.0853          |
+| 0.1087        | 9.8006  | 152586 | 0.0844          |
+| 0.1066        | 10.0006 | 155700 | 0.0846          |
+| 0.0938        | 10.2007 | 158814 | 0.0877          |
+| 0.0936        | 10.4007 | 161928 | 0.0892          |
+| 0.0961        | 10.6007 | 165042 | 0.0880          |
+| 0.0923        | 10.8007 | 168156 | 0.0882          |
+| 0.0924        | 11.0007 | 171270 | 0.0863          |
+| 0.082         | 11.2007 | 174384 | 0.0897          |
+| 0.0828        | 11.4007 | 177498 | 0.0928          |
+| 0.0799        | 11.6007 | 180612 | 0.0909          |
+| 0.0823        | 11.8008 | 183726 | 0.0906          |
+### Framework versions
+- PEFT 0.14.0
+- Transformers 4.47.0
+- Pytorch 2.5.1+cu124
+- Datasets 3.2.0
+- Tokenizers 0.21.1