---
tags:
- generated_from_trainer
model-index:
- name: Baby-Llama-58M
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# Baby-Llama-58M

This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 4.7109

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.00025
- train_batch_size: 128
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 50
- num_epochs: 80
- mixed_precision_training: Native AMP

### Training results

| Training Loss | Epoch | Step | Validation Loss |
|:-------------:|:-----:|:----:|:---------------:|
| 311.1646      | 1.0   | 3    | 287.5772        |
| 309.9048      | 2.0   | 6    | 282.5104        |
| 295.7833      | 3.0   | 9    | 266.8010        |
| 269.5852      | 4.0   | 12   | 247.3416        |
| 250.6772      | 5.0   | 15   | 231.4105        |
| 243.0754      | 6.0   | 18   | 224.6885        |
| 235.779       | 7.0   | 21   | 217.7554        |
| 235.8358      | 8.0   | 24   | 211.6984        |
| 224.1199      | 9.0   | 27   | 204.9522        |
| 216.0247      | 10.0  | 30   | 197.5209        |
| 206.4354      | 11.0  | 33   | 189.5172        |
| 189.1456      | 12.0  | 36   | 179.2765        |
| 181.0333      | 13.0  | 39   | 157.3401        |
| 152.062       | 14.0  | 42   | 137.4234        |
| 132.3128      | 15.0  | 45   | 120.5469        |
| 118.0474      | 16.0  | 48   | 106.6884        |
| 107.6354      | 17.0  | 51   | 97.7495         |
| 98.2458       | 18.0  | 54   | 88.4898         |
| 86.4009       | 19.0  | 57   | 77.8249         |
| 75.9386       | 20.0  | 60   | 67.9337         |
| 65.627        | 21.0  | 63   | 58.1877         |
| 53.5903       | 22.0  | 66   | 49.0234         |
| 47.114        | 23.0  | 69   | 41.2838         |
| 38.9667       | 24.0  | 72   | 34.4503         |
| 32.8846       | 25.0  | 75   | 29.7438         |
| 27.1886       | 26.0  | 78   | 24.2863         |
| 23.0713       | 27.0  | 81   | 20.1505         |
| 18.9003       | 28.0  | 84   | 16.9556         |
| 15.9133       | 29.0  | 87   | 14.4738         |
| 13.5544       | 30.0  | 90   | 12.6399         |
| 11.6834       | 31.0  | 93   | 11.1016         |
| 10.2371       | 32.0  | 96   | 9.9052          |
| 9.2371        | 33.0  | 99   | 8.9413          |
| 8.352         | 34.0  | 102  | 8.1600          |
| 7.5322        | 35.0  | 105  | 7.6794          |
| 7.0653        | 36.0  | 108  | 7.3031          |
| 6.6853        | 37.0  | 111  | 6.9564          |
| 6.3257        | 38.0  | 114  | 6.7247          |
| 5.9869        | 39.0  | 117  | 6.4649          |
| 5.8618        | 40.0  | 120  | 6.2734          |
| 5.6025        | 41.0  | 123  | 6.1253          |
| 5.4913        | 42.0  | 126  | 6.0822          |
| 5.3086        | 43.0  | 129  | 5.8575          |
| 5.1904        | 44.0  | 132  | 5.6860          |
| 5.1193        | 45.0  | 135  | 5.6821          |
| 5.0846        | 46.0  | 138  | 5.5831          |
| 5.017         | 47.0  | 141  | 5.5245          |
| 4.7435        | 48.0  | 144  | 5.3877          |
| 4.7546        | 49.0  | 147  | 5.3523          |
| 4.8606        | 50.0  | 150  | 5.3845          |
| 4.7146        | 51.0  | 153  | 5.2239          |
| 4.6273        | 52.0  | 156  | 5.1927          |
| 4.4469        | 53.0  | 159  | 5.1898          |
| 4.5135        | 54.0  | 162  | 5.0846          |
| 4.4061        | 55.0  | 165  | 5.0756          |
| 4.3577        | 56.0  | 168  | 5.0474          |
| 4.2169        | 57.0  | 171  | 5.0125          |
| 4.3001        | 58.0  | 174  | 4.9770          |
| 4.2399        | 59.0  | 177  | 4.9469          |
| 4.3372        | 60.0  | 180  | 4.9162          |
| 4.2669        | 61.0  | 183  | 4.9166          |
| 4.2394        | 62.0  | 186  | 4.8618          |
| 4.2965        | 63.0  | 189  | 4.8595          |
| 4.1188        | 64.0  | 192  | 4.8285          |
| 4.2886        | 65.0  | 195  | 4.8265          |
| 4.2688        | 66.0  | 198  | 4.8103          |
| 4.2429        | 67.0  | 201  | 4.7904          |
| 3.9653        | 68.0  | 204  | 4.7787          |
| 4.2676        | 69.0  | 207  | 4.7604          |
| 4.2029        | 70.0  | 210  | 4.7588          |
| 4.0962        | 71.0  | 213  | 4.7560          |
| 4.0643        | 72.0  | 216  | 4.7449          |
| 4.0713        | 73.0  | 219  | 4.7341          |
| 4.1192        | 74.0  | 222  | 4.7275          |
| 4.135         | 75.0  | 225  | 4.7186          |
| 3.9914        | 76.0  | 228  | 4.7135          |
| 4.0225        | 77.0  | 231  | 4.7144          |
| 3.9907        | 78.0  | 234  | 4.7152          |
| 4.0444        | 79.0  | 237  | 4.7123          |
| 4.0321        | 80.0  | 240  | 4.7109          |


### Framework versions

- Transformers 4.39.1
- Pytorch 2.1.2+cu121
- Datasets 2.16.1
- Tokenizers 0.15.0