llama_1b_step2_batch_grad_v1

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3257

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 40
  • seed: 42
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 2

Training results

Training Loss Epoch Step Validation Loss
1.2174 0.0341 50 1.3438
1.2511 0.0682 100 1.1355
1.0915 0.1022 150 1.0445
0.9386 0.1363 200 0.9348
0.9123 0.1704 250 0.8550
0.5047 0.2045 300 0.7949
0.8395 0.2386 350 0.7368
0.9298 0.2727 400 0.6975
0.7816 0.3067 450 0.6627
0.6203 0.3408 500 0.6151
0.4658 0.3749 550 0.5737
0.5259 0.4090 600 0.5599
0.4488 0.4431 650 0.5293
0.6154 0.4772 700 0.5100
0.5796 0.5112 750 0.4931
0.6068 0.5453 800 0.4726
0.482 0.5794 850 0.4616
0.2877 0.6135 900 0.4501
0.34 0.6476 950 0.4360
0.4047 0.6817 1000 0.4295
0.4238 0.7157 1050 0.4200
0.5062 0.7498 1100 0.4041
0.7784 0.7839 1150 0.3911
0.2211 0.8180 1200 0.3856
0.4954 0.8521 1250 0.3777
0.424 0.8862 1300 0.3710
0.3539 0.9202 1350 0.3640
0.27 0.9543 1400 0.3591
0.4994 0.9884 1450 0.3518
0.2257 1.0225 1500 0.3614
0.3277 1.0566 1550 0.3609
0.2337 1.0907 1600 0.3590
0.2015 1.1247 1650 0.3522
0.1872 1.1588 1700 0.3530
0.168 1.1929 1750 0.3520
0.2204 1.2270 1800 0.3505
0.1524 1.2611 1850 0.3477
0.1608 1.2952 1900 0.3439
0.2468 1.3292 1950 0.3399
0.2048 1.3633 2000 0.3396
0.2225 1.3974 2050 0.3376
0.2628 1.4315 2100 0.3342
0.214 1.4656 2150 0.3337
0.1878 1.4997 2200 0.3298
0.2482 1.5337 2250 0.3300
0.2568 1.5678 2300 0.3289
0.2257 1.6019 2350 0.3299
0.2225 1.6360 2400 0.3290
0.1962 1.6701 2450 0.3284
0.2478 1.7042 2500 0.3269
0.1841 1.7382 2550 0.3270
0.215 1.7723 2600 0.3269
0.1999 1.8064 2650 0.3264
0.2391 1.8405 2700 0.3261
0.1559 1.8746 2750 0.3258
0.1577 1.9087 2800 0.3256
0.1831 1.9427 2850 0.3256
0.2495 1.9768 2900 0.3257

Framework versions

  • Transformers 4.46.0
  • Pytorch 2.1.0+cu118
  • Datasets 3.0.2
  • Tokenizers 0.20.1
Downloads last month
9
Safetensors
Model size
1B params
Tensor type
BF16
·
Inference Providers NEW
Input a message to start chatting with danielgombas/llama_1b_step2_batch_grad_v1.

Evaluation results