AdoCleanCode's picture
End of training
4dae241 verified
metadata
library_name: transformers
tags:
  - generated_from_trainer
model-index:
  - name: TBD-LLaMA-500M-Final-Direction-500M
    results: []

TBD-LLaMA-500M-Final-Direction-500M

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 3.6898

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 64
  • total_eval_batch_size: 4
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 173
  • training_steps: 17360

Training results

Training Loss Epoch Step Validation Loss
7.1404 0.0115 200 7.0388
6.6918 0.0230 400 6.6803
6.6408 0.0346 600 6.6217
6.5923 0.0461 800 6.5769
6.561 0.0576 1000 6.5328
6.4665 0.0691 1200 6.4522
6.2754 0.0806 1400 6.2240
5.7839 0.0922 1600 5.6270
5.161 0.1037 1800 4.9428
4.78 0.1152 2000 4.6158
4.6603 0.1267 2200 4.4470
4.554 0.1382 2400 4.3597
4.4901 0.1498 2600 4.3011
4.416 0.1613 2800 4.2596
4.3742 0.1728 3000 4.2117
4.2719 0.1843 3200 4.1822
4.2189 0.1959 3400 4.1528
4.2023 0.2074 3600 4.1241
4.2014 0.2189 3800 4.1053
4.1731 0.2304 4000 4.0840
4.1578 0.2419 4200 4.0602
4.1387 0.2535 4400 4.0537
4.0884 0.2650 4600 4.0289
4.0962 0.2765 4800 4.0129
4.0141 0.2880 5000 4.0047
4.0292 0.2995 5200 3.9874
4.0243 0.3111 5400 3.9742
4.001 0.3226 5600 3.9644
3.9626 0.3341 5800 3.9509
3.9376 0.3456 6000 3.9434
3.9762 0.3571 6200 3.9331
4.0447 0.3687 6400 3.9221
3.977 0.3802 6600 3.9098
4.0106 0.3917 6800 3.9016
3.9686 0.4032 7000 3.8928
3.9114 0.4147 7200 3.8835
3.9024 0.4263 7400 3.8755
3.9965 0.4378 7600 3.8659
4.0031 0.4493 7800 3.8594
3.9794 0.4608 8000 3.8530
3.855 0.4723 8200 3.8455
3.8848 0.4839 8400 3.8365
3.8435 0.4954 8600 3.8292
3.9157 0.5069 8800 3.8207
3.938 0.5184 9000 3.8147
3.8188 0.5299 9200 3.8088
3.864 0.5415 9400 3.8125
3.8439 0.5530 9600 3.7972
3.8419 0.5645 9800 3.7906
3.8761 0.5760 10000 3.7852
3.7693 0.5876 10200 3.7789
3.8506 0.5991 10400 3.7734
3.8403 0.6106 10600 3.7687
3.8663 0.6221 10800 3.7635
3.7548 0.6336 11000 3.7597
3.9174 0.6452 11200 3.7538
3.8308 0.6567 11400 3.7486
3.7601 0.6682 11600 3.7452
3.8296 0.6797 11800 3.7421
3.7379 0.6912 12000 3.7375
3.8726 0.7028 12200 3.7332
3.8376 0.7143 12400 3.7298
3.8514 0.7258 12600 3.7260
3.7554 0.7373 12800 3.7229
3.7744 0.7488 13000 3.7196
3.7656 0.7604 13200 3.7161
3.7097 0.7719 13400 3.7140
3.7673 0.7834 13600 3.7113
3.81 0.7949 13800 3.7089
3.8687 0.8064 14000 3.7062
3.7848 0.8180 14200 3.7043
3.7425 0.8295 14400 3.7021
3.7567 0.8410 14600 3.7000
3.7133 0.8525 14800 3.6985
3.7089 0.8640 15000 3.6972
3.7652 0.8756 15200 3.6954
3.764 0.8871 15400 3.6941
3.7658 0.8986 15600 3.6933
3.6308 0.9101 15800 3.6922
3.5539 0.9216 16000 3.6916
3.714 0.9332 16200 3.6910
3.7669 0.9447 16400 3.6904
3.7044 0.9562 16600 3.6901
3.6267 0.9677 16800 3.6899
3.8177 0.9793 17000 3.6897
3.8063 0.9908 17200 3.6898

Framework versions

  • Transformers 4.56.1
  • Pytorch 2.8.0a0+5228986c39.nv25.05
  • Datasets 4.0.0
  • Tokenizers 0.22.0