--- library_name: transformers tags: - generated_from_trainer model-index: - name: cllm-0.0.2 results: [] --- # cllm-0.0.2 This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 2.5767 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.0003 - train_batch_size: 8 - eval_batch_size: 4 - seed: 42 - distributed_type: multi-GPU - num_devices: 8 - gradient_accumulation_steps: 4 - total_train_batch_size: 256 - total_eval_batch_size: 32 - optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments - lr_scheduler_type: linear - lr_scheduler_warmup_steps: 50 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:------:|:-----:|:---------------:| | 4.8419 | 0.0214 | 500 | 4.7291 | | 3.891 | 0.0429 | 1000 | 3.8792 | | 3.5798 | 0.0643 | 1500 | 3.5656 | | 3.3861 | 0.0858 | 2000 | 3.4057 | | 3.2754 | 0.1072 | 2500 | 3.2925 | | 3.2039 | 0.1286 | 3000 | 3.2109 | | 3.1475 | 0.1501 | 3500 | 3.1513 | | 3.0936 | 0.1715 | 4000 | 3.0991 | | 3.0483 | 0.1930 | 4500 | 3.0603 | | 3.0036 | 0.2144 | 5000 | 3.0180 | | 2.9644 | 0.2358 | 5500 | 2.9900 | | 2.9374 | 0.2573 | 6000 | 2.9599 | | 2.901 | 0.2787 | 6500 | 2.9334 | | 2.8968 | 0.3002 | 7000 | 2.9124 | | 2.866 | 0.3216 | 7500 | 2.8889 | | 2.8614 | 0.3430 | 8000 | 2.8672 | | 2.8378 | 0.3645 | 8500 | 2.8489 | | 2.8242 | 0.3859 | 9000 | 2.8290 | | 2.7961 | 0.4074 | 9500 | 2.8133 | | 2.769 | 0.4288 | 10000 | 2.7962 | | 2.7619 | 0.4502 | 10500 | 2.7804 | | 2.7527 | 0.4717 | 11000 | 2.7687 | | 2.7457 | 0.4931 | 11500 | 2.7540 | | 2.7119 | 0.5146 | 12000 | 2.7441 | | 2.7089 | 0.5360 | 12500 | 2.7317 | | 2.7236 | 0.5574 | 13000 | 2.7218 | | 2.6984 | 0.5789 | 13500 | 2.7102 | | 2.6791 | 0.6003 | 14000 | 2.6998 | | 2.6764 | 0.6218 | 14500 | 2.6915 | | 2.6663 | 0.6432 | 15000 | 2.6806 | | 2.6424 | 0.6646 | 15500 | 2.6720 | | 2.6384 | 0.6861 | 16000 | 2.6612 | | 2.6343 | 0.7075 | 16500 | 2.6536 | | 2.6303 | 0.7290 | 17000 | 2.6471 | | 2.6115 | 0.7504 | 17500 | 2.6373 | | 2.6125 | 0.7718 | 18000 | 2.6310 | | 2.5983 | 0.7933 | 18500 | 2.6246 | | 2.6043 | 0.8147 | 19000 | 2.6173 | | 2.5876 | 0.8362 | 19500 | 2.6106 | | 2.5824 | 0.8576 | 20000 | 2.6043 | | 2.5802 | 0.8790 | 20500 | 2.5983 | | 2.5772 | 0.9005 | 21000 | 2.5927 | | 2.5584 | 0.9219 | 21500 | 2.5878 | | 2.5652 | 0.9434 | 22000 | 2.5835 | | 2.5593 | 0.9648 | 22500 | 2.5794 | | 2.5547 | 0.9862 | 23000 | 2.5767 | ### Framework versions - Transformers 4.47.1 - Pytorch 2.1.0+cu118 - Datasets 3.2.0 - Tokenizers 0.21.0