| 1289.4068 seconds used for training. | |
| 21.49 minutes used for training. | |
| Peak reserved memory = 9.545 GB. | |
| Peak reserved memory for training = 4.018 GB. | |
| Peak reserved memory % of max memory = 43.058 %. | |
| Peak reserved memory for training % of max memory = 18.125 %. | |
| args = TrainingArguments( | |
| per_device_train_batch_size = 2, | |
| gradient_accumulation_steps = 4, | |
| warmup_steps = 10, # Augmenté le nombre de steps de warmup | |
| max_steps = 200, # Augmenté le nombre total de steps | |
| learning_rate = 1e-4, # Réduit le taux d'apprentissage | |
| fp16 = not torch.cuda.is_bf16_supported(), | |
| bf16 = torch.cuda.is_bf16_supported(), | |
| logging_steps = 1, | |
| optim = "adamw_8bit", | |
| weight_decay = 0.01, | |
| lr_scheduler_type = "linear", | |
| seed = 42, | |
| output_dir = "outputs", | |
| ==((====))== Unsloth - 2x faster free finetuning | Num GPUs = 1 | |
| \\ /| Num examples = 399 | Num Epochs = 4 | |
| O^O/ \_/ \ Batch size per device = 2 | Gradient Accumulation steps = 4 | |
| \ / Total batch size = 8 | Total steps = 200 | |
| "-____-" Number of trainable parameters = 20,971,520 | |
| [200/200 21:17, Epoch 4/4] | |
| Step Training Loss | |
| 1 2.027900 | |
| 2 2.008700 | |
| 3 1.946100 | |
| 4 1.924700 | |
| 5 1.995000 | |
| 6 1.999000 | |
| 7 1.870100 | |
| 8 1.891400 | |
| 9 1.807600 | |
| 10 1.723200 | |
| 11 1.665100 | |
| 12 1.541000 | |
| 13 1.509100 | |
| 14 1.416600 | |
| 15 1.398600 | |
| 16 1.233200 | |
| 17 1.172100 | |
| 18 1.272100 | |
| 19 1.146000 | |
| 20 1.179000 | |
| 21 1.206400 | |
| 22 1.095400 | |
| 23 0.937300 | |
| 24 1.214300 | |
| 25 1.040200 | |
| 26 1.183400 | |
| 27 1.033900 | |
| 28 0.953100 | |
| 29 0.935700 | |
| 30 0.962200 | |
| 31 0.908900 | |
| 32 0.924900 | |
| 33 0.931000 | |
| 34 1.011300 | |
| 35 0.951900 | |
| 36 0.936000 | |
| 37 0.903000 | |
| 38 0.906900 | |
| 39 0.945700 | |
| 40 0.827000 | |
| 41 0.931800 | |
| 42 0.919600 | |
| 43 0.926900 | |
| 44 0.932900 | |
| 45 0.872700 | |
| 46 0.795200 | |
| 47 0.888700 | |
| 48 0.956800 | |
| 49 1.004200 | |
| 50 0.859500 | |
| 51 0.802500 | |
| 52 0.855400 | |
| 53 0.885500 | |
| 54 1.026600 | |
| 55 0.844100 | |
| 56 0.879800 | |
| 57 0.797400 | |
| 58 0.885300 | |
| 59 0.842800 | |
| 60 0.861600 | |
| 61 0.789100 | |
| 62 0.861600 | |
| 63 0.856700 | |
| 64 0.929200 | |
| 65 0.782500 | |
| 66 0.713600 | |
| 67 0.781000 | |
| 68 0.765100 | |
| 69 0.784700 | |
| 70 0.869500 | |
| 71 0.742900 | |
| 72 0.787900 | |
| 73 0.750800 | |
| 74 0.931700 | |
| 75 0.713000 | |
| 76 0.832100 | |
| 77 0.928300 | |
| 78 0.777600 | |
| 79 0.694000 | |
| 80 0.835400 | |
| 81 0.822000 | |
| 82 0.754600 | |
| 83 0.813400 | |
| 84 0.868800 | |
| 85 0.732400 | |
| 86 0.803700 | |
| 87 0.694400 | |
| 88 0.771300 | |
| 89 0.864400 | |
| 90 0.646700 | |
| 91 0.690800 | |
| 92 0.695000 | |
| 93 0.732300 | |
| 94 0.766900 | |
| 95 0.864100 | |
| 96 0.867200 | |
| 97 0.774300 | |
| 98 0.797700 | |
| 99 0.772100 | |
| 100 0.906700 | |
| 101 0.693400 | |
| 102 0.685500 | |
| 103 0.712200 | |
| 104 0.678400 | |
| 105 0.761900 | |
| 106 0.705300 | |
| 107 0.775700 | |
| 108 0.627600 | |
| 109 0.599300 | |
| 110 0.615100 | |
| 111 0.618200 | |
| 112 0.668700 | |
| 113 0.699900 | |
| 114 0.577000 | |
| 115 0.711600 | |
| 116 0.692900 | |
| 117 0.585400 | |
| 118 0.646400 | |
| 119 0.569200 | |
| 120 0.752300 | |
| 121 0.745000 | |
| 122 0.690100 | |
| 123 0.744700 | |
| 124 0.665800 | |
| 125 0.866100 | |
| 126 0.707400 | |
| 127 0.679300 | |
| 128 0.591400 | |
| 129 0.655100 | |
| 130 0.734000 | |
| 131 0.637900 | |
| 132 0.733900 | |
| 133 0.652500 | |
| 134 0.685400 | |
| 135 0.641300 | |
| 136 0.608200 | |
| 137 0.754100 | |
| 138 0.753700 | |
| 139 0.671000 | |
| 140 0.767200 | |
| 141 0.668700 | |
| 142 0.630300 | |
| 143 0.734700 | |
| 144 0.767700 | |
| 145 0.722200 | |
| 146 0.694400 | |
| 147 0.710100 | |
| 148 0.696300 | |
| 149 0.612600 | |
| 150 0.670400 | |
| 151 0.512900 | |
| 152 0.675100 | |
| 153 0.579900 | |
| 154 0.622900 | |
| 155 0.652500 | |
| 156 0.649200 | |
| 157 0.546700 | |
| 158 0.521600 | |
| 159 0.522200 | |
| 160 0.589400 | |
| 161 0.552600 | |
| 162 0.630700 | |
| 163 0.595600 | |
| 164 0.614300 | |
| 165 0.489400 | |
| 166 0.634500 | |
| 167 0.620800 | |
| 168 0.618600 | |
| 169 0.637900 | |
| 170 0.553900 | |
| 171 0.656000 | |
| 172 0.644000 | |
| 173 0.694300 | |
| 174 0.608900 | |
| 175 0.673000 | |
| 176 0.612500 | |
| 177 0.654200 | |
| 178 0.639200 | |
| 179 0.599100 | |
| 180 0.642100 | |
| 181 0.529700 | |
| 182 0.614000 | |
| 183 0.582900 | |
| 184 0.765100 | |
| 185 0.502700 | |
| 186 0.564300 | |
| 187 0.740200 | |
| 188 0.636100 | |
| 189 0.638800 | |
| 190 0.560100 | |
| 191 0.620000 | |
| 192 0.712800 | |
| 193 0.531000 | |
| 194 0.591600 | |
| 195 0.608600 | |
| 196 0.671800 | |
| 197 0.572900 | |
| 198 0.600900 | |
| 199 0.586800 | |
| 200 0.545900 | |
| --- | |
| base_model: unsloth/llama-3-8b-bnb-4bit | |
| language: | |
| - en | |
| license: apache-2.0 | |
| tags: | |
| - text-generation-inference | |
| - transformers | |
| - unsloth | |
| - llama | |
| - gguf | |
| --- | |
| # Uploaded model | |
| - **Developed by:** Mathoufle13 | |
| - **License:** apache-2.0 | |
| - **Finetuned from model :** unsloth/llama-3-8b-bnb-4bit | |
| This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library. | |
| [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth) | |