---
tags:
- generated_from_trainer
model-index:
- name: Baby-Llama-58M
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# Baby-Llama-58M

This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 6.1610

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.00025
- train_batch_size: 128
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 50
- num_epochs: 80
- mixed_precision_training: Native AMP

### Training results

| Training Loss | Epoch | Step | Validation Loss |
|:-------------:|:-----:|:----:|:---------------:|
| 81.7512       | 1.0   | 2    | 74.4291         |
| 81.3083       | 2.0   | 4    | 73.3596         |
| 78.6216       | 3.0   | 6    | 71.5365         |
| 80.396        | 4.0   | 8    | 70.3538         |
| 75.3713       | 5.0   | 10   | 67.4044         |
| 74.0418       | 6.0   | 12   | 64.0233         |
| 70.1637       | 7.0   | 14   | 60.8437         |
| 67.5864       | 8.0   | 16   | 57.9300         |
| 64.8984       | 9.0   | 18   | 55.0383         |
| 61.2535       | 10.0  | 20   | 52.0253         |
| 57.6171       | 11.0  | 22   | 48.9365         |
| 54.2922       | 12.0  | 24   | 45.8747         |
| 50.3849       | 13.0  | 26   | 43.0132         |
| 49.0703       | 14.0  | 28   | 40.4715         |
| 45.5158       | 15.0  | 30   | 38.1415         |
| 44.3002       | 16.0  | 32   | 35.9572         |
| 41.2208       | 17.0  | 34   | 33.8684         |
| 39.8837       | 18.0  | 36   | 31.8991         |
| 38.1152       | 19.0  | 38   | 29.8574         |
| 35.239        | 20.0  | 40   | 28.0249         |
| 33.6748       | 21.0  | 42   | 26.4792         |
| 30.4729       | 22.0  | 44   | 25.4216         |
| 29.436        | 23.0  | 46   | 24.1119         |
| 27.72         | 24.0  | 48   | 22.8196         |
| 25.5231       | 25.0  | 50   | 21.7862         |
| 24.8119       | 26.0  | 52   | 20.4891         |
| 23.3658       | 27.0  | 54   | 19.3795         |
| 21.4143       | 28.0  | 56   | 18.1634         |
| 20.032        | 29.0  | 58   | 17.0348         |
| 18.43         | 30.0  | 60   | 16.1163         |
| 16.897        | 31.0  | 62   | 15.2508         |
| 15.7483       | 32.0  | 64   | 14.3147         |
| 15.1794       | 33.0  | 66   | 13.5753         |
| 13.7129       | 34.0  | 68   | 12.8868         |
| 12.6031       | 35.0  | 70   | 12.6810         |
| 11.8192       | 36.0  | 72   | 11.9060         |
| 11.6487       | 37.0  | 74   | 11.3454         |
| 10.9525       | 38.0  | 76   | 10.8465         |
| 10.2164       | 39.0  | 78   | 10.1026         |
| 9.5492        | 40.0  | 80   | 9.6511          |
| 9.0438        | 41.0  | 82   | 9.2800          |
| 8.6141        | 42.0  | 84   | 8.8036          |
| 7.9373        | 43.0  | 86   | 8.6612          |
| 7.5371        | 44.0  | 88   | 8.1757          |
| 7.3186        | 45.0  | 90   | 8.1665          |
| 7.033         | 46.0  | 92   | 7.7424          |
| 6.7923        | 47.0  | 94   | 7.6650          |
| 6.4384        | 48.0  | 96   | 7.4306          |
| 6.2449        | 49.0  | 98   | 7.4175          |
| 6.1012        | 50.0  | 100  | 7.1466          |
| 6.0502        | 51.0  | 102  | 7.1740          |
| 5.7839        | 52.0  | 104  | 6.9619          |
| 5.6905        | 53.0  | 106  | 6.9416          |
| 5.665         | 54.0  | 108  | 6.7945          |
| 5.5401        | 55.0  | 110  | 6.7485          |
| 5.4773        | 56.0  | 112  | 6.6674          |
| 5.4169        | 57.0  | 114  | 6.6132          |
| 5.3628        | 58.0  | 116  | 6.5787          |
| 5.2021        | 59.0  | 118  | 6.4972          |
| 5.2817        | 60.0  | 120  | 6.4866          |
| 5.1901        | 61.0  | 122  | 6.4256          |
| 5.1268        | 62.0  | 124  | 6.3659          |
| 5.1105        | 63.0  | 126  | 6.3563          |
| 5.0539        | 64.0  | 128  | 6.3159          |
| 4.9715        | 65.0  | 130  | 6.3178          |
| 4.872         | 66.0  | 132  | 6.2741          |
| 4.9422        | 67.0  | 134  | 6.2699          |
| 4.944         | 68.0  | 136  | 6.2551          |
| 4.9487        | 69.0  | 138  | 6.2148          |
| 4.8968        | 70.0  | 140  | 6.2089          |
| 4.822         | 71.0  | 142  | 6.2093          |
| 4.965         | 72.0  | 144  | 6.1853          |
| 4.8401        | 73.0  | 146  | 6.1747          |
| 4.8539        | 74.0  | 148  | 6.1738          |
| 4.7751        | 75.0  | 150  | 6.1674          |
| 4.8871        | 76.0  | 152  | 6.1644          |
| 4.9347        | 77.0  | 154  | 6.1618          |
| 4.8009        | 78.0  | 156  | 6.1613          |
| 4.8121        | 79.0  | 158  | 6.1610          |
| 4.8048        | 80.0  | 160  | 6.1610          |


### Framework versions

- Transformers 4.39.1
- Pytorch 2.1.2+cu121
- Datasets 2.16.1
- Tokenizers 0.15.0