Llama-3.1-8B-CP

This model is a fine-tuned version of meta-llama/Meta-Llama-3.1-8B on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6873

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.00076
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 3
  • total_train_batch_size: 24
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 300
  • training_steps: 40000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
1.6364 0.0162 500 1.6291
1.8881 0.0323 1000 1.6484
1.3449 0.0485 1500 1.6546
1.2844 0.0646 2000 1.3885
1.1625 0.0808 2500 1.2894
1.3277 0.0969 3000 1.2306
1.0918 0.1131 3500 1.1926
1.2045 0.1292 4000 1.1877
1.2891 0.1454 4500 1.1591
0.9548 0.1615 5000 1.1310
0.9194 0.1777 5500 1.1317
1.0301 0.1938 6000 1.1028
0.7556 0.2100 6500 1.0873
1.3132 0.2261 7000 1.0657
0.9011 0.2423 7500 1.0672
0.4571 0.2584 8000 1.0451
0.7676 0.2746 8500 1.0388
0.7849 0.2907 9000 1.0189
0.7334 0.3069 9500 1.0115
1.6365 0.3230 10000 0.9953
1.0915 0.3392 10500 0.9818
0.747 0.3553 11000 0.9779
1.3978 0.3715 11500 0.9615
0.788 0.3876 12000 0.9459
1.1506 0.4038 12500 0.9452
0.7654 0.4199 13000 0.9311
0.5329 0.4361 13500 0.9189
0.8296 0.4522 14000 0.9138
0.5077 0.4684 14500 0.9013
0.9649 0.4845 15000 0.8932
0.7357 0.5007 15500 0.8842
0.5084 0.5168 16000 0.8762
1.4496 0.5330 16500 0.8661
0.9816 0.5491 17000 0.8584
0.7001 0.5653 17500 0.8524
0.6837 0.5814 18000 0.8471
1.1999 0.5976 18500 0.8353
0.6105 0.6137 19000 0.8263
0.9702 0.6299 19500 0.8194
1.0147 0.6461 20000 0.8160
0.801 0.6622 20500 0.8057
0.5603 0.6784 21000 0.8019
0.9175 0.6945 21500 0.7926
0.7342 0.7107 22000 0.7907
1.048 0.7268 22500 0.7825
0.6107 0.7430 23000 0.7789
0.96 0.7591 23500 0.7712
0.6569 0.7753 24000 0.7665
0.5254 0.7914 24500 0.7622
0.6324 0.8076 25000 0.7592
0.9067 0.8237 25500 0.7519
0.734 0.8399 26000 0.7458
0.6229 0.8560 26500 0.7410
0.666 0.8722 27000 0.7355
0.543 0.8883 27500 0.7316
0.6257 0.9045 28000 0.7274
0.6312 0.9206 28500 0.7236
0.771 0.9368 29000 0.7201
0.7321 0.9529 29500 0.7156
0.8699 0.9691 30000 0.7127
0.6221 0.9852 30500 0.7086
0.6411 1.0014 31000 0.7068
0.7002 1.0175 31500 0.7054
0.7456 1.0337 32000 0.7026
0.9146 1.0498 32500 0.7008
0.6147 1.0660 33000 0.6989
0.8083 1.0821 33500 0.6972
0.4773 1.0983 34000 0.6954
0.5371 1.1144 34500 0.6944
0.6141 1.1306 35000 0.6924
0.7319 1.1467 35500 0.6914
0.4331 1.1629 36000 0.6901
0.6879 1.1790 36500 0.6895
0.5038 1.1952 37000 0.6887
0.5199 1.2113 37500 0.6883
0.4836 1.2275 38000 0.6877
0.7254 1.2436 38500 0.6875
0.5729 1.2598 39000 0.6874
0.8602 1.2759 39500 0.6873
0.7488 1.2921 40000 0.6873

Framework versions

  • Transformers 4.47.1
  • Pytorch 2.5.1+cu124
  • Datasets 3.1.0
  • Tokenizers 0.21.1
Downloads last month
-
Safetensors
Model size
8B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for decryptellix/Llama-3.1-8B-CP

Finetuned
(1722)
this model
Adapters
1 model