ninagroot
/

Baby-Llama-58M-RUN1

@@ -13,7 +13,7 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Loss: 12.7407
 ## Model description
@@ -39,53 +39,93 @@ The following hyperparameters were used during training:
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_steps: 50
-- num_epochs: 40
 - mixed_precision_training: Native AMP
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss |
 |:-------------:|:-----:|:----:|:---------------:|
-| 81.6068       | 1.0   | 2    | 75.8719         |
-| 81.1403       | 2.0   | 4    | 74.6429         |
-| 78.4746       | 3.0   | 6    | 72.5604         |
-| 78.6147       | 4.0   | 8    | 69.6859         |
-| 75.1485       | 5.0   | 10   | 67.9944         |
-| 73.5182       | 6.0   | 12   | 64.5075         |
-| 69.6393       | 7.0   | 14   | 61.3852         |
-| 66.9895       | 8.0   | 16   | 58.5262         |
-| 64.4746       | 9.0   | 18   | 55.6940         |
-| 60.8097       | 10.0  | 20   | 52.6993         |
-| 57.1714       | 11.0  | 22   | 49.5786         |
-| 53.8474       | 12.0  | 24   | 46.5081         |
-| 49.9873       | 13.0  | 26   | 43.6358         |
-| 48.7366       | 14.0  | 28   | 41.0406         |
-| 45.0539       | 15.0  | 30   | 38.7263         |
-| 44.0504       | 16.0  | 32   | 36.6352         |
-| 40.9533       | 17.0  | 34   | 34.6685         |
-| 39.9931       | 18.0  | 36   | 32.7875         |
-| 38.116        | 19.0  | 38   | 30.8567         |
-| 35.4181       | 20.0  | 40   | 28.9705         |
-| 34.0383       | 21.0  | 42   | 27.4282         |
-| 30.7991       | 22.0  | 44   | 26.4171         |
-| 29.8348       | 23.0  | 46   | 24.9225         |
-| 27.9282       | 24.0  | 48   | 23.9103         |
-| 25.8511       | 25.0  | 50   | 22.9495         |
-| 25.1711       | 26.0  | 52   | 21.5530         |
-| 24.2361       | 27.0  | 54   | 20.5871         |
-| 21.9294       | 28.0  | 56   | 19.0727         |
-| 20.435        | 29.0  | 58   | 18.0482         |
-| 18.682        | 30.0  | 60   | 17.0037         |
-| 17.4144       | 31.0  | 62   | 16.0468         |
-| 16.4872       | 32.0  | 64   | 15.2828         |
-| 16.2417       | 33.0  | 66   | 14.6359         |
-| 15.1244       | 34.0  | 68   | 14.1234         |
-| 14.0602       | 35.0  | 70   | 13.5799         |
-| 13.7722       | 36.0  | 72   | 13.3509         |
-| 13.377        | 37.0  | 74   | 12.9960         |
-| 13.4091       | 38.0  | 76   | 12.8183         |
-| 13.1398       | 39.0  | 78   | 12.7614         |
-| 13.1002       | 40.0  | 80   | 12.7407         |
 ### Framework versions

 This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Loss: 6.1610
 ## Model description
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_steps: 50
+- num_epochs: 80
 - mixed_precision_training: Native AMP
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss |
 |:-------------:|:-----:|:----:|:---------------:|
+| 81.7512       | 1.0   | 2    | 74.4291         |
+| 81.3083       | 2.0   | 4    | 73.3596         |
+| 78.6216       | 3.0   | 6    | 71.5365         |
+| 80.396        | 4.0   | 8    | 70.3538         |
+| 75.3713       | 5.0   | 10   | 67.4044         |
+| 74.0418       | 6.0   | 12   | 64.0233         |
+| 70.1637       | 7.0   | 14   | 60.8437         |
+| 67.5864       | 8.0   | 16   | 57.9300         |
+| 64.8984       | 9.0   | 18   | 55.0383         |
+| 61.2535       | 10.0  | 20   | 52.0253         |
+| 57.6171       | 11.0  | 22   | 48.9365         |
+| 54.2922       | 12.0  | 24   | 45.8747         |
+| 50.3849       | 13.0  | 26   | 43.0132         |
+| 49.0703       | 14.0  | 28   | 40.4715         |
+| 45.5158       | 15.0  | 30   | 38.1415         |
+| 44.3002       | 16.0  | 32   | 35.9572         |
+| 41.2208       | 17.0  | 34   | 33.8684         |
+| 39.8837       | 18.0  | 36   | 31.8991         |
+| 38.1152       | 19.0  | 38   | 29.8574         |
+| 35.239        | 20.0  | 40   | 28.0249         |
+| 33.6748       | 21.0  | 42   | 26.4792         |
+| 30.4729       | 22.0  | 44   | 25.4216         |
+| 29.436        | 23.0  | 46   | 24.1119         |
+| 27.72         | 24.0  | 48   | 22.8196         |
+| 25.5231       | 25.0  | 50   | 21.7862         |
+| 24.8119       | 26.0  | 52   | 20.4891         |
+| 23.3658       | 27.0  | 54   | 19.3795         |
+| 21.4143       | 28.0  | 56   | 18.1634         |
+| 20.032        | 29.0  | 58   | 17.0348         |
+| 18.43         | 30.0  | 60   | 16.1163         |
+| 16.897        | 31.0  | 62   | 15.2508         |
+| 15.7483       | 32.0  | 64   | 14.3147         |
+| 15.1794       | 33.0  | 66   | 13.5753         |
+| 13.7129       | 34.0  | 68   | 12.8868         |
+| 12.6031       | 35.0  | 70   | 12.6810         |
+| 11.8192       | 36.0  | 72   | 11.9060         |
+| 11.6487       | 37.0  | 74   | 11.3454         |
+| 10.9525       | 38.0  | 76   | 10.8465         |
+| 10.2164       | 39.0  | 78   | 10.1026         |
+| 9.5492        | 40.0  | 80   | 9.6511          |
+| 9.0438        | 41.0  | 82   | 9.2800          |
+| 8.6141        | 42.0  | 84   | 8.8036          |
+| 7.9373        | 43.0  | 86   | 8.6612          |
+| 7.5371        | 44.0  | 88   | 8.1757          |
+| 7.3186        | 45.0  | 90   | 8.1665          |
+| 7.033         | 46.0  | 92   | 7.7424          |
+| 6.7923        | 47.0  | 94   | 7.6650          |
+| 6.4384        | 48.0  | 96   | 7.4306          |
+| 6.2449        | 49.0  | 98   | 7.4175          |
+| 6.1012        | 50.0  | 100  | 7.1466          |
+| 6.0502        | 51.0  | 102  | 7.1740          |
+| 5.7839        | 52.0  | 104  | 6.9619          |
+| 5.6905        | 53.0  | 106  | 6.9416          |
+| 5.665         | 54.0  | 108  | 6.7945          |
+| 5.5401        | 55.0  | 110  | 6.7485          |
+| 5.4773        | 56.0  | 112  | 6.6674          |
+| 5.4169        | 57.0  | 114  | 6.6132          |
+| 5.3628        | 58.0  | 116  | 6.5787          |
+| 5.2021        | 59.0  | 118  | 6.4972          |
+| 5.2817        | 60.0  | 120  | 6.4866          |
+| 5.1901        | 61.0  | 122  | 6.4256          |
+| 5.1268        | 62.0  | 124  | 6.3659          |
+| 5.1105        | 63.0  | 126  | 6.3563          |
+| 5.0539        | 64.0  | 128  | 6.3159          |
+| 4.9715        | 65.0  | 130  | 6.3178          |
+| 4.872         | 66.0  | 132  | 6.2741          |
+| 4.9422        | 67.0  | 134  | 6.2699          |
+| 4.944         | 68.0  | 136  | 6.2551          |
+| 4.9487        | 69.0  | 138  | 6.2148          |
+| 4.8968        | 70.0  | 140  | 6.2089          |
+| 4.822         | 71.0  | 142  | 6.2093          |
+| 4.965         | 72.0  | 144  | 6.1853          |
+| 4.8401        | 73.0  | 146  | 6.1747          |
+| 4.8539        | 74.0  | 148  | 6.1738          |
+| 4.7751        | 75.0  | 150  | 6.1674          |
+| 4.8871        | 76.0  | 152  | 6.1644          |
+| 4.9347        | 77.0  | 154  | 6.1618          |
+| 4.8009        | 78.0  | 156  | 6.1613          |
+| 4.8121        | 79.0  | 158  | 6.1610          |
+| 4.8048        | 80.0  | 160  | 6.1610          |
 ### Framework versions

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:b95b266c906f8a10c9a06b7ef00f57ba328d35b02576511cc7d51b8ef129717f
 size 217819016

 version https://git-lfs.github.com/spec/v1
+oid sha256:53a8b889e06337de6b9af958e4990a6791a3350eaf51ebd772cab14ea3a2988a
 size 217819016

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:6d49873d6da6e46a5804e20834fe83e4d20fbe84a83bb526ea7410b0ff377414
 size 4984

 version https://git-lfs.github.com/spec/v1
+oid sha256:9a0208eef6e56a2ded47925bdfa0c4203cf9442e7d67cc216a9b300f64cce77e
 size 4984