---
tags:
- generated_from_trainer
model-index:
- name: Baby-Llama-58M
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# Baby-Llama-58M

This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 4.9058

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.00025
- train_batch_size: 128
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 50
- num_epochs: 80
- mixed_precision_training: Native AMP

### Training results

| Training Loss | Epoch | Step | Validation Loss |
|:-------------:|:-----:|:----:|:---------------:|
| 308.4964      | 1.0   | 3    | 274.9261        |
| 307.2173      | 2.0   | 6    | 270.1939        |
| 293.1988      | 3.0   | 9    | 254.5227        |
| 274.059       | 4.0   | 12   | 241.7988        |
| 254.2515      | 5.0   | 15   | 224.8893        |
| 242.4326      | 6.0   | 18   | 214.8814        |
| 235.586       | 7.0   | 21   | 208.6857        |
| 235.9312      | 8.0   | 24   | 202.9560        |
| 224.2102      | 9.0   | 27   | 196.3082        |
| 215.8342      | 10.0  | 30   | 188.9904        |
| 206.017       | 11.0  | 33   | 180.7418        |
| 186.8781      | 12.0  | 36   | 168.0520        |
| 172.4825      | 13.0  | 39   | 145.3422        |
| 152.0806      | 14.0  | 42   | 126.3429        |
| 127.6911      | 15.0  | 45   | 111.5025        |
| 114.9669      | 16.0  | 48   | 99.2848         |
| 105.7803      | 17.0  | 51   | 91.4366         |
| 96.6882       | 18.0  | 54   | 83.6074         |
| 85.8417       | 19.0  | 57   | 74.4550         |
| 74.8959       | 20.0  | 60   | 64.7636         |
| 65.7121       | 21.0  | 63   | 56.4248         |
| 54.3815       | 22.0  | 66   | 48.4127         |
| 47.917        | 23.0  | 69   | 40.9706         |
| 39.5198       | 24.0  | 72   | 34.3440         |
| 33.711        | 25.0  | 75   | 28.6207         |
| 27.3896       | 26.0  | 78   | 23.5210         |
| 23.4138       | 27.0  | 81   | 19.5687         |
| 18.9363       | 28.0  | 84   | 16.8098         |
| 16.6662       | 29.0  | 87   | 14.3299         |
| 13.9003       | 30.0  | 90   | 12.4524         |
| 12.0831       | 31.0  | 93   | 11.2232         |
| 10.505        | 32.0  | 96   | 10.0853         |
| 9.5992        | 33.0  | 99   | 9.3580          |
| 8.8814        | 34.0  | 102  | 8.9046          |
| 7.9504        | 35.0  | 105  | 8.1708          |
| 7.3651        | 36.0  | 108  | 7.7294          |
| 6.8279        | 37.0  | 111  | 7.2767          |
| 6.507         | 38.0  | 114  | 7.0724          |
| 6.228         | 39.0  | 117  | 6.9470          |
| 6.0787        | 40.0  | 120  | 6.5948          |
| 5.7443        | 41.0  | 123  | 6.4305          |
| 5.607         | 42.0  | 126  | 6.2583          |
| 5.3911        | 43.0  | 129  | 6.0870          |
| 5.2864        | 44.0  | 132  | 5.9922          |
| 5.2063        | 45.0  | 135  | 5.8702          |
| 5.1295        | 46.0  | 138  | 5.7636          |
| 5.0156        | 47.0  | 141  | 5.7078          |
| 4.7705        | 48.0  | 144  | 5.7188          |
| 4.8265        | 49.0  | 147  | 5.5697          |
| 4.8814        | 50.0  | 150  | 5.4942          |
| 4.7241        | 51.0  | 153  | 5.4862          |
| 4.6709        | 52.0  | 156  | 5.4192          |
| 4.473         | 53.0  | 159  | 5.3817          |
| 4.5304        | 54.0  | 162  | 5.3086          |
| 4.4462        | 55.0  | 165  | 5.2772          |
| 4.3478        | 56.0  | 168  | 5.2420          |
| 4.1911        | 57.0  | 171  | 5.2188          |
| 4.3088        | 58.0  | 174  | 5.1736          |
| 4.2529        | 59.0  | 177  | 5.1341          |
| 4.3505        | 60.0  | 180  | 5.1085          |
| 4.2754        | 61.0  | 183  | 5.0898          |
| 4.2691        | 62.0  | 186  | 5.0628          |
| 4.3049        | 63.0  | 189  | 5.0646          |
| 4.1317        | 64.0  | 192  | 5.0228          |
| 4.2919        | 65.0  | 195  | 5.0214          |
| 4.2777        | 66.0  | 198  | 4.9936          |
| 4.2473        | 67.0  | 201  | 4.9851          |
| 3.9754        | 68.0  | 204  | 4.9721          |
| 4.2845        | 69.0  | 207  | 4.9520          |
| 4.1962        | 70.0  | 210  | 4.9529          |
| 4.0952        | 71.0  | 213  | 4.9481          |
| 4.0827        | 72.0  | 216  | 4.9285          |
| 4.0752        | 73.0  | 219  | 4.9251          |
| 4.1187        | 74.0  | 222  | 4.9239          |
| 4.144         | 75.0  | 225  | 4.9110          |
| 4.0002        | 76.0  | 228  | 4.9076          |
| 4.0264        | 77.0  | 231  | 4.9095          |
| 4.0018        | 78.0  | 234  | 4.9098          |
| 4.052         | 79.0  | 237  | 4.9071          |
| 4.0436        | 80.0  | 240  | 4.9058          |


### Framework versions

- Transformers 4.39.1
- Pytorch 2.1.2+cu121
- Datasets 2.16.1
- Tokenizers 0.15.0