---
tags:
- generated_from_trainer
model-index:
- name: Baby-Llama-58M-RUN3_4
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# Baby-Llama-58M-RUN3_4

This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 4.1614

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.00025
- train_batch_size: 32
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 50
- num_epochs: 50
- mixed_precision_training: Native AMP

### Training results

| Training Loss | Epoch | Step | Validation Loss |
|:-------------:|:-----:|:----:|:---------------:|
| 296.6382      | 1.0   | 12   | 255.9934        |
| 229.3247      | 2.0   | 24   | 211.9065        |
| 207.0496      | 3.0   | 36   | 181.8885        |
| 123.9346      | 4.0   | 48   | 109.3748        |
| 82.0349       | 5.0   | 60   | 72.1227         |
| 45.9392       | 6.0   | 72   | 39.7369         |
| 25.2634       | 7.0   | 84   | 22.4471         |
| 15.2842       | 8.0   | 96   | 13.8333         |
| 10.3515       | 9.0   | 108  | 10.2077         |
| 8.1678        | 10.0  | 120  | 7.8930          |
| 6.461         | 11.0  | 132  | 6.9546          |
| 6.073         | 12.0  | 144  | 6.3275          |
| 5.4812        | 13.0  | 156  | 5.9462          |
| 5.5237        | 14.0  | 168  | 5.6727          |
| 4.727         | 15.0  | 180  | 5.5723          |
| 4.6544        | 16.0  | 192  | 5.2316          |
| 4.641         | 17.0  | 204  | 5.2542          |
| 4.5579        | 18.0  | 216  | 5.1794          |
| 4.6136        | 19.0  | 228  | 4.9774          |
| 4.1043        | 20.0  | 240  | 4.9214          |
| 4.1177        | 21.0  | 252  | 4.8358          |
| 4.6799        | 22.0  | 264  | 4.7847          |
| 4.0522        | 23.0  | 276  | 4.7018          |
| 4.2287        | 24.0  | 288  | 4.6770          |
| 3.9668        | 25.0  | 300  | 4.6077          |
| 4.1524        | 26.0  | 312  | 4.6043          |
| 3.8744        | 27.0  | 324  | 4.5508          |
| 3.9389        | 28.0  | 336  | 4.4908          |
| 3.9329        | 29.0  | 348  | 4.4882          |
| 3.9034        | 30.0  | 360  | 4.4708          |
| 3.9221        | 31.0  | 372  | 4.4729          |
| 3.8269        | 32.0  | 384  | 4.3710          |
| 3.8344        | 33.0  | 396  | 4.3734          |
| 3.3988        | 34.0  | 408  | 4.2938          |
| 3.4335        | 35.0  | 420  | 4.3189          |
| 3.521         | 36.0  | 432  | 4.2749          |
| 3.5696        | 37.0  | 444  | 4.2773          |
| 3.6298        | 38.0  | 456  | 4.2541          |
| 3.6759        | 39.0  | 468  | 4.2371          |
| 3.6787        | 40.0  | 480  | 4.2151          |
| 3.3474        | 41.0  | 492  | 4.1932          |
| 3.5124        | 42.0  | 504  | 4.1978          |
| 3.1906        | 43.0  | 516  | 4.1859          |
| 3.4355        | 44.0  | 528  | 4.1770          |
| 3.3138        | 45.0  | 540  | 4.1743          |
| 3.6061        | 46.0  | 552  | 4.1742          |
| 3.8685        | 47.0  | 564  | 4.1653          |
| 3.4448        | 48.0  | 576  | 4.1635          |
| 3.5253        | 49.0  | 588  | 4.1623          |
| 3.6948        | 50.0  | 600  | 4.1614          |


### Framework versions

- Transformers 4.39.1
- Pytorch 2.1.2+cu121
- Datasets 2.16.1
- Tokenizers 0.15.0