--- tags: - generated_from_trainer model-index: - name: Baby-Llama-58M results: [] --- # Baby-Llama-58M This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 4.7109 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.00025 - train_batch_size: 128 - eval_batch_size: 8 - seed: 42 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 50 - num_epochs: 80 - mixed_precision_training: Native AMP ### Training results | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:-----:|:----:|:---------------:| | 311.1646 | 1.0 | 3 | 287.5772 | | 309.9048 | 2.0 | 6 | 282.5104 | | 295.7833 | 3.0 | 9 | 266.8010 | | 269.5852 | 4.0 | 12 | 247.3416 | | 250.6772 | 5.0 | 15 | 231.4105 | | 243.0754 | 6.0 | 18 | 224.6885 | | 235.779 | 7.0 | 21 | 217.7554 | | 235.8358 | 8.0 | 24 | 211.6984 | | 224.1199 | 9.0 | 27 | 204.9522 | | 216.0247 | 10.0 | 30 | 197.5209 | | 206.4354 | 11.0 | 33 | 189.5172 | | 189.1456 | 12.0 | 36 | 179.2765 | | 181.0333 | 13.0 | 39 | 157.3401 | | 152.062 | 14.0 | 42 | 137.4234 | | 132.3128 | 15.0 | 45 | 120.5469 | | 118.0474 | 16.0 | 48 | 106.6884 | | 107.6354 | 17.0 | 51 | 97.7495 | | 98.2458 | 18.0 | 54 | 88.4898 | | 86.4009 | 19.0 | 57 | 77.8249 | | 75.9386 | 20.0 | 60 | 67.9337 | | 65.627 | 21.0 | 63 | 58.1877 | | 53.5903 | 22.0 | 66 | 49.0234 | | 47.114 | 23.0 | 69 | 41.2838 | | 38.9667 | 24.0 | 72 | 34.4503 | | 32.8846 | 25.0 | 75 | 29.7438 | | 27.1886 | 26.0 | 78 | 24.2863 | | 23.0713 | 27.0 | 81 | 20.1505 | | 18.9003 | 28.0 | 84 | 16.9556 | | 15.9133 | 29.0 | 87 | 14.4738 | | 13.5544 | 30.0 | 90 | 12.6399 | | 11.6834 | 31.0 | 93 | 11.1016 | | 10.2371 | 32.0 | 96 | 9.9052 | | 9.2371 | 33.0 | 99 | 8.9413 | | 8.352 | 34.0 | 102 | 8.1600 | | 7.5322 | 35.0 | 105 | 7.6794 | | 7.0653 | 36.0 | 108 | 7.3031 | | 6.6853 | 37.0 | 111 | 6.9564 | | 6.3257 | 38.0 | 114 | 6.7247 | | 5.9869 | 39.0 | 117 | 6.4649 | | 5.8618 | 40.0 | 120 | 6.2734 | | 5.6025 | 41.0 | 123 | 6.1253 | | 5.4913 | 42.0 | 126 | 6.0822 | | 5.3086 | 43.0 | 129 | 5.8575 | | 5.1904 | 44.0 | 132 | 5.6860 | | 5.1193 | 45.0 | 135 | 5.6821 | | 5.0846 | 46.0 | 138 | 5.5831 | | 5.017 | 47.0 | 141 | 5.5245 | | 4.7435 | 48.0 | 144 | 5.3877 | | 4.7546 | 49.0 | 147 | 5.3523 | | 4.8606 | 50.0 | 150 | 5.3845 | | 4.7146 | 51.0 | 153 | 5.2239 | | 4.6273 | 52.0 | 156 | 5.1927 | | 4.4469 | 53.0 | 159 | 5.1898 | | 4.5135 | 54.0 | 162 | 5.0846 | | 4.4061 | 55.0 | 165 | 5.0756 | | 4.3577 | 56.0 | 168 | 5.0474 | | 4.2169 | 57.0 | 171 | 5.0125 | | 4.3001 | 58.0 | 174 | 4.9770 | | 4.2399 | 59.0 | 177 | 4.9469 | | 4.3372 | 60.0 | 180 | 4.9162 | | 4.2669 | 61.0 | 183 | 4.9166 | | 4.2394 | 62.0 | 186 | 4.8618 | | 4.2965 | 63.0 | 189 | 4.8595 | | 4.1188 | 64.0 | 192 | 4.8285 | | 4.2886 | 65.0 | 195 | 4.8265 | | 4.2688 | 66.0 | 198 | 4.8103 | | 4.2429 | 67.0 | 201 | 4.7904 | | 3.9653 | 68.0 | 204 | 4.7787 | | 4.2676 | 69.0 | 207 | 4.7604 | | 4.2029 | 70.0 | 210 | 4.7588 | | 4.0962 | 71.0 | 213 | 4.7560 | | 4.0643 | 72.0 | 216 | 4.7449 | | 4.0713 | 73.0 | 219 | 4.7341 | | 4.1192 | 74.0 | 222 | 4.7275 | | 4.135 | 75.0 | 225 | 4.7186 | | 3.9914 | 76.0 | 228 | 4.7135 | | 4.0225 | 77.0 | 231 | 4.7144 | | 3.9907 | 78.0 | 234 | 4.7152 | | 4.0444 | 79.0 | 237 | 4.7123 | | 4.0321 | 80.0 | 240 | 4.7109 | ### Framework versions - Transformers 4.39.1 - Pytorch 2.1.2+cu121 - Datasets 2.16.1 - Tokenizers 0.15.0