TBD-LLaMA-500M-Final-Direction-500M
This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 3.6898
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 16
- total_train_batch_size: 64
- total_eval_batch_size: 4
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 173
- training_steps: 17360
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| 7.1404 | 0.0115 | 200 | 7.0388 |
| 6.6918 | 0.0230 | 400 | 6.6803 |
| 6.6408 | 0.0346 | 600 | 6.6217 |
| 6.5923 | 0.0461 | 800 | 6.5769 |
| 6.561 | 0.0576 | 1000 | 6.5328 |
| 6.4665 | 0.0691 | 1200 | 6.4522 |
| 6.2754 | 0.0806 | 1400 | 6.2240 |
| 5.7839 | 0.0922 | 1600 | 5.6270 |
| 5.161 | 0.1037 | 1800 | 4.9428 |
| 4.78 | 0.1152 | 2000 | 4.6158 |
| 4.6603 | 0.1267 | 2200 | 4.4470 |
| 4.554 | 0.1382 | 2400 | 4.3597 |
| 4.4901 | 0.1498 | 2600 | 4.3011 |
| 4.416 | 0.1613 | 2800 | 4.2596 |
| 4.3742 | 0.1728 | 3000 | 4.2117 |
| 4.2719 | 0.1843 | 3200 | 4.1822 |
| 4.2189 | 0.1959 | 3400 | 4.1528 |
| 4.2023 | 0.2074 | 3600 | 4.1241 |
| 4.2014 | 0.2189 | 3800 | 4.1053 |
| 4.1731 | 0.2304 | 4000 | 4.0840 |
| 4.1578 | 0.2419 | 4200 | 4.0602 |
| 4.1387 | 0.2535 | 4400 | 4.0537 |
| 4.0884 | 0.2650 | 4600 | 4.0289 |
| 4.0962 | 0.2765 | 4800 | 4.0129 |
| 4.0141 | 0.2880 | 5000 | 4.0047 |
| 4.0292 | 0.2995 | 5200 | 3.9874 |
| 4.0243 | 0.3111 | 5400 | 3.9742 |
| 4.001 | 0.3226 | 5600 | 3.9644 |
| 3.9626 | 0.3341 | 5800 | 3.9509 |
| 3.9376 | 0.3456 | 6000 | 3.9434 |
| 3.9762 | 0.3571 | 6200 | 3.9331 |
| 4.0447 | 0.3687 | 6400 | 3.9221 |
| 3.977 | 0.3802 | 6600 | 3.9098 |
| 4.0106 | 0.3917 | 6800 | 3.9016 |
| 3.9686 | 0.4032 | 7000 | 3.8928 |
| 3.9114 | 0.4147 | 7200 | 3.8835 |
| 3.9024 | 0.4263 | 7400 | 3.8755 |
| 3.9965 | 0.4378 | 7600 | 3.8659 |
| 4.0031 | 0.4493 | 7800 | 3.8594 |
| 3.9794 | 0.4608 | 8000 | 3.8530 |
| 3.855 | 0.4723 | 8200 | 3.8455 |
| 3.8848 | 0.4839 | 8400 | 3.8365 |
| 3.8435 | 0.4954 | 8600 | 3.8292 |
| 3.9157 | 0.5069 | 8800 | 3.8207 |
| 3.938 | 0.5184 | 9000 | 3.8147 |
| 3.8188 | 0.5299 | 9200 | 3.8088 |
| 3.864 | 0.5415 | 9400 | 3.8125 |
| 3.8439 | 0.5530 | 9600 | 3.7972 |
| 3.8419 | 0.5645 | 9800 | 3.7906 |
| 3.8761 | 0.5760 | 10000 | 3.7852 |
| 3.7693 | 0.5876 | 10200 | 3.7789 |
| 3.8506 | 0.5991 | 10400 | 3.7734 |
| 3.8403 | 0.6106 | 10600 | 3.7687 |
| 3.8663 | 0.6221 | 10800 | 3.7635 |
| 3.7548 | 0.6336 | 11000 | 3.7597 |
| 3.9174 | 0.6452 | 11200 | 3.7538 |
| 3.8308 | 0.6567 | 11400 | 3.7486 |
| 3.7601 | 0.6682 | 11600 | 3.7452 |
| 3.8296 | 0.6797 | 11800 | 3.7421 |
| 3.7379 | 0.6912 | 12000 | 3.7375 |
| 3.8726 | 0.7028 | 12200 | 3.7332 |
| 3.8376 | 0.7143 | 12400 | 3.7298 |
| 3.8514 | 0.7258 | 12600 | 3.7260 |
| 3.7554 | 0.7373 | 12800 | 3.7229 |
| 3.7744 | 0.7488 | 13000 | 3.7196 |
| 3.7656 | 0.7604 | 13200 | 3.7161 |
| 3.7097 | 0.7719 | 13400 | 3.7140 |
| 3.7673 | 0.7834 | 13600 | 3.7113 |
| 3.81 | 0.7949 | 13800 | 3.7089 |
| 3.8687 | 0.8064 | 14000 | 3.7062 |
| 3.7848 | 0.8180 | 14200 | 3.7043 |
| 3.7425 | 0.8295 | 14400 | 3.7021 |
| 3.7567 | 0.8410 | 14600 | 3.7000 |
| 3.7133 | 0.8525 | 14800 | 3.6985 |
| 3.7089 | 0.8640 | 15000 | 3.6972 |
| 3.7652 | 0.8756 | 15200 | 3.6954 |
| 3.764 | 0.8871 | 15400 | 3.6941 |
| 3.7658 | 0.8986 | 15600 | 3.6933 |
| 3.6308 | 0.9101 | 15800 | 3.6922 |
| 3.5539 | 0.9216 | 16000 | 3.6916 |
| 3.714 | 0.9332 | 16200 | 3.6910 |
| 3.7669 | 0.9447 | 16400 | 3.6904 |
| 3.7044 | 0.9562 | 16600 | 3.6901 |
| 3.6267 | 0.9677 | 16800 | 3.6899 |
| 3.8177 | 0.9793 | 17000 | 3.6897 |
| 3.8063 | 0.9908 | 17200 | 3.6898 |
Framework versions
- Transformers 4.56.1
- Pytorch 2.8.0a0+5228986c39.nv25.05
- Datasets 4.0.0
- Tokenizers 0.22.0
- Downloads last month
- 5