| --- |
| library_name: transformers |
| tags: |
| - generated_from_trainer |
| model-index: |
| - name: cllm-0.0.2 |
| results: [] |
| --- |
| |
| <!-- This model card has been generated automatically according to the information the Trainer had access to. You |
| should probably proofread and complete it, then remove this comment. --> |
|
|
| # cllm-0.0.2 |
|
|
| This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset. |
| It achieves the following results on the evaluation set: |
| - Loss: 2.5767 |
|
|
| ## Model description |
|
|
| More information needed |
|
|
| ## Intended uses & limitations |
|
|
| More information needed |
|
|
| ## Training and evaluation data |
|
|
| More information needed |
|
|
| ## Training procedure |
|
|
| ### Training hyperparameters |
|
|
| The following hyperparameters were used during training: |
| - learning_rate: 0.0003 |
| - train_batch_size: 8 |
| - eval_batch_size: 4 |
| - seed: 42 |
| - distributed_type: multi-GPU |
| - num_devices: 8 |
| - gradient_accumulation_steps: 4 |
| - total_train_batch_size: 256 |
| - total_eval_batch_size: 32 |
| - optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments |
| - lr_scheduler_type: linear |
| - lr_scheduler_warmup_steps: 50 |
| - num_epochs: 1 |
| |
| ### Training results |
| |
| | Training Loss | Epoch | Step | Validation Loss | |
| |:-------------:|:------:|:-----:|:---------------:| |
| | 4.8419 | 0.0214 | 500 | 4.7291 | |
| | 3.891 | 0.0429 | 1000 | 3.8792 | |
| | 3.5798 | 0.0643 | 1500 | 3.5656 | |
| | 3.3861 | 0.0858 | 2000 | 3.4057 | |
| | 3.2754 | 0.1072 | 2500 | 3.2925 | |
| | 3.2039 | 0.1286 | 3000 | 3.2109 | |
| | 3.1475 | 0.1501 | 3500 | 3.1513 | |
| | 3.0936 | 0.1715 | 4000 | 3.0991 | |
| | 3.0483 | 0.1930 | 4500 | 3.0603 | |
| | 3.0036 | 0.2144 | 5000 | 3.0180 | |
| | 2.9644 | 0.2358 | 5500 | 2.9900 | |
| | 2.9374 | 0.2573 | 6000 | 2.9599 | |
| | 2.901 | 0.2787 | 6500 | 2.9334 | |
| | 2.8968 | 0.3002 | 7000 | 2.9124 | |
| | 2.866 | 0.3216 | 7500 | 2.8889 | |
| | 2.8614 | 0.3430 | 8000 | 2.8672 | |
| | 2.8378 | 0.3645 | 8500 | 2.8489 | |
| | 2.8242 | 0.3859 | 9000 | 2.8290 | |
| | 2.7961 | 0.4074 | 9500 | 2.8133 | |
| | 2.769 | 0.4288 | 10000 | 2.7962 | |
| | 2.7619 | 0.4502 | 10500 | 2.7804 | |
| | 2.7527 | 0.4717 | 11000 | 2.7687 | |
| | 2.7457 | 0.4931 | 11500 | 2.7540 | |
| | 2.7119 | 0.5146 | 12000 | 2.7441 | |
| | 2.7089 | 0.5360 | 12500 | 2.7317 | |
| | 2.7236 | 0.5574 | 13000 | 2.7218 | |
| | 2.6984 | 0.5789 | 13500 | 2.7102 | |
| | 2.6791 | 0.6003 | 14000 | 2.6998 | |
| | 2.6764 | 0.6218 | 14500 | 2.6915 | |
| | 2.6663 | 0.6432 | 15000 | 2.6806 | |
| | 2.6424 | 0.6646 | 15500 | 2.6720 | |
| | 2.6384 | 0.6861 | 16000 | 2.6612 | |
| | 2.6343 | 0.7075 | 16500 | 2.6536 | |
| | 2.6303 | 0.7290 | 17000 | 2.6471 | |
| | 2.6115 | 0.7504 | 17500 | 2.6373 | |
| | 2.6125 | 0.7718 | 18000 | 2.6310 | |
| | 2.5983 | 0.7933 | 18500 | 2.6246 | |
| | 2.6043 | 0.8147 | 19000 | 2.6173 | |
| | 2.5876 | 0.8362 | 19500 | 2.6106 | |
| | 2.5824 | 0.8576 | 20000 | 2.6043 | |
| | 2.5802 | 0.8790 | 20500 | 2.5983 | |
| | 2.5772 | 0.9005 | 21000 | 2.5927 | |
| | 2.5584 | 0.9219 | 21500 | 2.5878 | |
| | 2.5652 | 0.9434 | 22000 | 2.5835 | |
| | 2.5593 | 0.9648 | 22500 | 2.5794 | |
| | 2.5547 | 0.9862 | 23000 | 2.5767 | |
| |
| |
| ### Framework versions |
| |
| - Transformers 4.47.1 |
| - Pytorch 2.1.0+cu118 |
| - Datasets 3.2.0 |
| - Tokenizers 0.21.0 |
| |