End of training

4dae241 verified 3 months ago

6.05 kB

metadata

library_name: transformers
tags:
  - generated_from_trainer
model-index:
  - name: TBD-LLaMA-500M-Final-Direction-500M
    results: []

TBD-LLaMA-500M-Final-Direction-500M

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 3.6898

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 1
eval_batch_size: 1
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 16
total_train_batch_size: 64
total_eval_batch_size: 4
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 173
training_steps: 17360

Training results

Training Loss	Epoch	Step	Validation Loss
7.1404	0.0115	200	7.0388
6.6918	0.0230	400	6.6803
6.6408	0.0346	600	6.6217
6.5923	0.0461	800	6.5769
6.561	0.0576	1000	6.5328
6.4665	0.0691	1200	6.4522
6.2754	0.0806	1400	6.2240
5.7839	0.0922	1600	5.6270
5.161	0.1037	1800	4.9428
4.78	0.1152	2000	4.6158
4.6603	0.1267	2200	4.4470
4.554	0.1382	2400	4.3597
4.4901	0.1498	2600	4.3011
4.416	0.1613	2800	4.2596
4.3742	0.1728	3000	4.2117
4.2719	0.1843	3200	4.1822
4.2189	0.1959	3400	4.1528
4.2023	0.2074	3600	4.1241
4.2014	0.2189	3800	4.1053
4.1731	0.2304	4000	4.0840
4.1578	0.2419	4200	4.0602
4.1387	0.2535	4400	4.0537
4.0884	0.2650	4600	4.0289
4.0962	0.2765	4800	4.0129
4.0141	0.2880	5000	4.0047
4.0292	0.2995	5200	3.9874
4.0243	0.3111	5400	3.9742
4.001	0.3226	5600	3.9644
3.9626	0.3341	5800	3.9509
3.9376	0.3456	6000	3.9434
3.9762	0.3571	6200	3.9331
4.0447	0.3687	6400	3.9221
3.977	0.3802	6600	3.9098
4.0106	0.3917	6800	3.9016
3.9686	0.4032	7000	3.8928
3.9114	0.4147	7200	3.8835
3.9024	0.4263	7400	3.8755
3.9965	0.4378	7600	3.8659
4.0031	0.4493	7800	3.8594
3.9794	0.4608	8000	3.8530
3.855	0.4723	8200	3.8455
3.8848	0.4839	8400	3.8365
3.8435	0.4954	8600	3.8292
3.9157	0.5069	8800	3.8207
3.938	0.5184	9000	3.8147
3.8188	0.5299	9200	3.8088
3.864	0.5415	9400	3.8125
3.8439	0.5530	9600	3.7972
3.8419	0.5645	9800	3.7906
3.8761	0.5760	10000	3.7852
3.7693	0.5876	10200	3.7789
3.8506	0.5991	10400	3.7734
3.8403	0.6106	10600	3.7687
3.8663	0.6221	10800	3.7635
3.7548	0.6336	11000	3.7597
3.9174	0.6452	11200	3.7538
3.8308	0.6567	11400	3.7486
3.7601	0.6682	11600	3.7452
3.8296	0.6797	11800	3.7421
3.7379	0.6912	12000	3.7375
3.8726	0.7028	12200	3.7332
3.8376	0.7143	12400	3.7298
3.8514	0.7258	12600	3.7260
3.7554	0.7373	12800	3.7229
3.7744	0.7488	13000	3.7196
3.7656	0.7604	13200	3.7161
3.7097	0.7719	13400	3.7140
3.7673	0.7834	13600	3.7113
3.81	0.7949	13800	3.7089
3.8687	0.8064	14000	3.7062
3.7848	0.8180	14200	3.7043
3.7425	0.8295	14400	3.7021
3.7567	0.8410	14600	3.7000
3.7133	0.8525	14800	3.6985
3.7089	0.8640	15000	3.6972
3.7652	0.8756	15200	3.6954
3.764	0.8871	15400	3.6941
3.7658	0.8986	15600	3.6933
3.6308	0.9101	15800	3.6922
3.5539	0.9216	16000	3.6916
3.714	0.9332	16200	3.6910
3.7669	0.9447	16400	3.6904
3.7044	0.9562	16600	3.6901
3.6267	0.9677	16800	3.6899
3.8177	0.9793	17000	3.6897
3.8063	0.9908	17200	3.6898

Framework versions

Transformers 4.56.1
Pytorch 2.8.0a0+5228986c39.nv25.05
Datasets 4.0.0
Tokenizers 0.22.0