Llama-360M-RUN3 / README.md
ninagroot's picture
ninagroot/Llama-360Mtest
a85514c verified
|
raw
history blame
3.31 kB
metadata
tags:
  - generated_from_trainer
model-index:
  - name: Llama-360M
    results: []

Llama-360M

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 3.7456

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0003
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 50
  • num_epochs: 40
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
8.4691 1.0 3 8.2938
7.3138 2.0 6 7.2507
6.3593 3.0 9 6.5011
5.9241 4.0 12 6.1371
5.3868 5.0 15 5.4600
4.5288 6.0 18 4.9927
4.1209 7.0 21 4.6527
3.9354 8.0 24 4.3246
3.6035 9.0 27 4.0905
3.1919 10.0 30 3.8428
2.9556 11.0 33 3.6960
2.8387 12.0 36 3.6594
2.4002 13.0 39 3.5849
2.3326 14.0 42 3.5220
1.8688 15.0 45 3.5476
1.5608 16.0 48 3.5408
1.2246 17.0 51 3.5744
0.8367 18.0 54 3.5714
0.5211 19.0 57 3.5859
0.4086 20.0 60 3.6011
0.2747 21.0 63 3.6640
0.1623 22.0 66 3.6834
0.1227 23.0 69 3.6675
0.0901 24.0 72 3.7159
0.0632 25.0 75 3.7165
0.0448 26.0 78 3.7104
0.039 27.0 81 3.7026
0.0296 28.0 84 3.6959
0.0269 29.0 87 3.7012
0.0232 30.0 90 3.7212
0.0225 31.0 93 3.7328
0.0214 32.0 96 3.7393
0.0182 33.0 99 3.7409
0.0176 34.0 102 3.7432
0.0175 35.0 105 3.7441
0.0147 36.0 108 3.7449
0.0146 37.0 111 3.7453
0.0172 38.0 114 3.7455
0.0147 39.0 117 3.7455
0.0153 40.0 120 3.7456

Framework versions

  • Transformers 4.39.1
  • Pytorch 2.1.2+cu121
  • Datasets 2.16.1
  • Tokenizers 0.15.0