--- library_name: transformers tags: - generated_from_trainer model-index: - name: TBD-LLaMA-2B-Final-Direction-2B results: [] --- # TBD-LLaMA-2B-Final-Direction-2B This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 3.8900 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 2e-05 - train_batch_size: 1 - eval_batch_size: 1 - seed: 42 - distributed_type: multi-GPU - num_devices: 4 - gradient_accumulation_steps: 16 - total_train_batch_size: 64 - total_eval_batch_size: 4 - optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 139 - training_steps: 13966 ### Training results | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:------:|:-----:|:---------------:| | 8.9472 | 0.0143 | 200 | 8.9381 | | 6.7664 | 0.0286 | 400 | 6.7485 | | 6.6429 | 0.0430 | 600 | 6.6299 | | 6.5725 | 0.0573 | 800 | 6.5598 | | 6.4746 | 0.0716 | 1000 | 6.4666 | | 6.345 | 0.0859 | 1200 | 6.3290 | | 6.1452 | 0.1002 | 1400 | 6.1231 | | 5.9711 | 0.1146 | 1600 | 5.9283 | | 5.8076 | 0.1289 | 1800 | 5.7896 | | 5.718 | 0.1432 | 2000 | 5.6944 | | 5.6422 | 0.1575 | 2200 | 5.6219 | | 5.5956 | 0.1718 | 2400 | 5.5653 | | 5.5424 | 0.1862 | 2600 | 5.5163 | | 5.4527 | 0.2005 | 2800 | 5.4252 | | 4.7472 | 0.2148 | 3000 | 4.6523 | | 4.5528 | 0.2291 | 3200 | 4.4846 | | 4.503 | 0.2434 | 3400 | 4.3817 | | 4.427 | 0.2578 | 3600 | 4.3165 | | 4.4322 | 0.2721 | 3800 | 4.2725 | | 4.3265 | 0.2864 | 4000 | 4.2409 | | 4.3255 | 0.3007 | 4200 | 4.2157 | | 4.322 | 0.3150 | 4400 | 4.1930 | | 4.1982 | 0.3294 | 4600 | 4.1759 | | 4.2197 | 0.3437 | 4800 | 4.1609 | | 4.2109 | 0.3580 | 5000 | 4.1478 | | 4.1553 | 0.3723 | 5200 | 4.1329 | | 4.169 | 0.3866 | 5400 | 4.1215 | | 4.2068 | 0.4010 | 5600 | 4.1093 | | 4.182 | 0.4153 | 5800 | 4.0969 | | 4.2148 | 0.4296 | 6000 | 4.0841 | | 4.0511 | 0.4439 | 6200 | 4.0716 | | 4.0997 | 0.4582 | 6400 | 4.0592 | | 4.0322 | 0.4726 | 6600 | 4.0488 | | 3.9972 | 0.4869 | 6800 | 4.0372 | | 4.0335 | 0.5012 | 7000 | 4.0258 | | 4.0742 | 0.5155 | 7200 | 4.0168 | | 4.003 | 0.5298 | 7400 | 4.0082 | | 4.0007 | 0.5442 | 7600 | 3.9992 | | 4.1114 | 0.5585 | 7800 | 3.9898 | | 3.8742 | 0.5728 | 8000 | 3.9831 | | 4.0346 | 0.5871 | 8200 | 3.9765 | | 3.8871 | 0.6014 | 8400 | 3.9686 | | 3.9689 | 0.6158 | 8600 | 3.9626 | | 4.0003 | 0.6301 | 8800 | 3.9580 | | 4.0529 | 0.6444 | 9000 | 3.9496 | | 3.9973 | 0.6587 | 9200 | 3.9456 | | 4.0418 | 0.6730 | 9400 | 3.9409 | | 4.0237 | 0.6874 | 9600 | 3.9355 | | 3.9256 | 0.7017 | 9800 | 3.9299 | | 3.8549 | 0.7160 | 10000 | 3.9249 | | 3.9872 | 0.7303 | 10200 | 3.9215 | | 3.9918 | 0.7446 | 10400 | 3.9180 | | 4.0075 | 0.7590 | 10600 | 3.9137 | | 3.9235 | 0.7733 | 10800 | 3.9107 | | 3.9416 | 0.7876 | 11000 | 3.9069 | | 3.9939 | 0.8019 | 11200 | 3.9053 | | 4.0625 | 0.8162 | 11400 | 3.9030 | | 3.9773 | 0.8306 | 11600 | 3.9010 | | 3.8279 | 0.8449 | 11800 | 3.8990 | | 3.8631 | 0.8592 | 12000 | 3.8970 | | 3.8593 | 0.8735 | 12200 | 3.8953 | | 3.9531 | 0.8878 | 12400 | 3.8938 | | 3.8922 | 0.9022 | 12600 | 3.8927 | | 3.9151 | 0.9165 | 12800 | 3.8917 | | 3.9119 | 0.9308 | 13000 | 3.8910 | | 3.9261 | 0.9451 | 13200 | 3.8905 | | 3.9169 | 0.9594 | 13400 | 3.8903 | | 3.8439 | 0.9738 | 13600 | 3.8900 | | 3.8795 | 0.9881 | 13800 | 3.8900 | ### Framework versions - Transformers 4.56.1 - Pytorch 2.8.0a0+5228986c39.nv25.05 - Datasets 4.0.0 - Tokenizers 0.22.0