fairness-reward-model

This model is a fine-tuned version of meta-llama/Llama-3.2-1B-Instruct on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3645

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 4e-05
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 512
  • total_eval_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.15
  • num_epochs: 2

Training results

Training Loss Epoch Step Validation Loss
0.4465 0.1057 50 0.4179
0.3522 0.2114 100 0.3972
0.3873 0.3170 150 0.3940
0.3559 0.4227 200 0.3889
0.3383 0.5284 250 0.3881
0.379 0.6341 300 0.3797
0.3841 0.7398 350 0.3724
0.4278 0.8454 400 0.3739
0.388 0.9511 450 0.3687
0.3528 1.0568 500 0.3725
0.3352 1.1625 550 0.3675
0.3479 1.2682 600 0.3677
0.2742 1.3738 650 0.3662
0.2717 1.4795 700 0.3650
0.3343 1.5852 750 0.3632
0.3261 1.6909 800 0.3642
0.355 1.7966 850 0.3646
0.3153 1.9022 900 0.3645

Framework versions

  • Transformers 4.43.3
  • Pytorch 2.1.2+cu121
  • Datasets 2.19.1
  • Tokenizers 0.19.1
Downloads last month
1
Safetensors
Model size
1B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for zarahall/fairness-reward-model

Finetuned
(1459)
this model