---
tags:
- generated_from_trainer
model-index:
- name: Baby-Llama-58M
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# Baby-Llama-58M

This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 6.7221

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.00025
- train_batch_size: 128
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 50
- num_epochs: 80
- mixed_precision_training: Native AMP

### Training results

| Training Loss | Epoch | Step | Validation Loss |
|:-------------:|:-----:|:----:|:---------------:|
| 135.1538      | 1.0   | 8    | 118.8448        |
| 112.3406      | 2.0   | 16   | 102.1364        |
| 107.9124      | 3.0   | 24   | 86.8275         |
| 85.5837       | 4.0   | 32   | 71.8709         |
| 82.7059       | 5.0   | 40   | 60.4278         |
| 62.0973       | 6.0   | 48   | 51.7763         |
| 56.6325       | 7.0   | 56   | 44.4392         |
| 46.5864       | 8.0   | 64   | 39.5206         |
| 40.749        | 9.0   | 72   | 36.8323         |
| 34.1225       | 10.0  | 80   | 30.4178         |
| 26.3662       | 11.0  | 88   | 25.6518         |
| 21.4543       | 12.0  | 96   | 21.5034         |
| 17.4064       | 13.0  | 104  | 18.2917         |
| 14.5338       | 14.0  | 112  | 16.0543         |
| 12.8652       | 15.0  | 120  | 14.5666         |
| 11.1266       | 16.0  | 128  | 13.6536         |
| 9.5181        | 17.0  | 136  | 12.6228         |
| 8.0769        | 18.0  | 144  | 11.2297         |
| 7.3252        | 19.0  | 152  | 10.6871         |
| 6.7225        | 20.0  | 160  | 10.5576         |
| 6.1834        | 21.0  | 168  | 9.6600          |
| 6.0954        | 22.0  | 176  | 9.5832          |
| 5.715         | 23.0  | 184  | 9.4159          |
| 5.5297        | 24.0  | 192  | 8.8495          |
| 5.1538        | 25.0  | 200  | 8.6964          |
| 5.0472        | 26.0  | 208  | 8.4671          |
| 5.0581        | 27.0  | 216  | 8.3979          |
| 4.6914        | 28.0  | 224  | 8.2086          |
| 4.6117        | 29.0  | 232  | 8.2212          |
| 4.5157        | 30.0  | 240  | 8.1633          |
| 4.1918        | 31.0  | 248  | 8.1399          |
| 4.5274        | 32.0  | 256  | 7.7368          |
| 4.0493        | 33.0  | 264  | 7.7647          |
| 4.2799        | 34.0  | 272  | 7.8127          |
| 4.5331        | 35.0  | 280  | 7.6971          |
| 4.5937        | 36.0  | 288  | 7.6908          |
| 3.9957        | 37.0  | 296  | 7.6509          |
| 4.3035        | 38.0  | 304  | 7.5682          |
| 4.2626        | 39.0  | 312  | 7.4550          |
| 3.7238        | 40.0  | 320  | 7.4516          |
| 3.9562        | 41.0  | 328  | 7.2862          |
| 3.8612        | 42.0  | 336  | 7.3332          |
| 3.6178        | 43.0  | 344  | 7.3013          |
| 3.7672        | 44.0  | 352  | 7.2144          |
| 3.715         | 45.0  | 360  | 7.2103          |
| 3.7594        | 46.0  | 368  | 7.2457          |
| 4.3614        | 47.0  | 376  | 7.1274          |
| 4.0406        | 48.0  | 384  | 7.0472          |
| 3.5213        | 49.0  | 392  | 6.9963          |
| 3.7373        | 50.0  | 400  | 7.0503          |
| 3.7399        | 51.0  | 408  | 6.9916          |
| 3.8109        | 52.0  | 416  | 6.9899          |
| 3.3897        | 53.0  | 424  | 6.9132          |
| 3.2456        | 54.0  | 432  | 6.9393          |
| 3.8682        | 55.0  | 440  | 6.9017          |
| 3.3904        | 56.0  | 448  | 6.8995          |
| 3.8449        | 57.0  | 456  | 6.8478          |
| 3.6319        | 58.0  | 464  | 6.8388          |
| 3.4726        | 59.0  | 472  | 6.8123          |
| 3.5895        | 60.0  | 480  | 6.8452          |
| 3.4           | 61.0  | 488  | 6.7875          |
| 3.6904        | 62.0  | 496  | 6.7963          |
| 3.3957        | 63.0  | 504  | 6.7976          |
| 3.4602        | 64.0  | 512  | 6.8317          |
| 3.2714        | 65.0  | 520  | 6.8063          |
| 3.5695        | 66.0  | 528  | 6.7709          |
| 3.1538        | 67.0  | 536  | 6.7849          |
| 3.5586        | 68.0  | 544  | 6.7565          |
| 3.194         | 69.0  | 552  | 6.7629          |
| 3.0488        | 70.0  | 560  | 6.7462          |
| 3.6931        | 71.0  | 568  | 6.7269          |
| 3.7324        | 72.0  | 576  | 6.7367          |
| 3.2075        | 73.0  | 584  | 6.7460          |
| 3.3394        | 74.0  | 592  | 6.7111          |
| 3.4074        | 75.0  | 600  | 6.7456          |
| 3.3679        | 76.0  | 608  | 6.7225          |
| 3.2689        | 77.0  | 616  | 6.7234          |
| 3.6886        | 78.0  | 624  | 6.7247          |
| 3.4587        | 79.0  | 632  | 6.7224          |
| 3.6444        | 80.0  | 640  | 6.7221          |


### Framework versions

- Transformers 4.39.1
- Pytorch 2.1.2+cu121
- Datasets 2.16.1
- Tokenizers 0.15.0