train_record_42_1773765559

This model is a fine-tuned version of meta-llama/Llama-3.2-1B-Instruct on the record dataset. It achieves the following results on the evaluation set:

  • Loss: 0.8647
  • Num Input Tokens Seen: 245808128

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 5

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
1.2402 0.2500 3906 1.0148 12292032
1.4262 0.5001 7812 1.2947 24620672
1.1304 0.7501 11718 1.2590 36894016
0.9504 1.0002 15624 1.1777 49176512
0.9611 1.2502 19530 1.1499 61465280
0.7862 1.5003 23436 1.0926 73739776
0.8905 1.7503 27342 1.0198 86015936
0.7648 2.0004 31248 0.9767 98341056
0.7543 2.2504 35154 1.0004 110649216
0.5464 2.5005 39060 0.9313 122910592
0.5669 2.7505 42966 0.9105 135222656
0.2979 3.0006 46872 0.8803 147516736
0.21 3.2506 50778 0.9521 159826368
0.4287 3.5007 54684 0.9180 172084032
0.2835 3.7507 58590 0.8755 184402752
0.2202 4.0008 62496 0.8647 196687936
0.1392 4.2508 66402 1.0180 209017024
0.2328 4.5009 70308 0.9995 221278272
0.1733 4.7509 74214 0.9999 233564288

Framework versions

  • Transformers 4.51.3
  • Pytorch 2.10.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
201
Safetensors
Model size
1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_record_42_1773765559

Finetuned
(1512)
this model