fairness-reward-model

This model is a fine-tuned version of meta-llama/Llama-3.2-1B-Instruct on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss
0.4465	0.1057	50	0.4179
0.3522	0.2114	100	0.3972
0.3873	0.3170	150	0.3940
0.3559	0.4227	200	0.3889
0.3383	0.5284	250	0.3881
0.379	0.6341	300	0.3797
0.3841	0.7398	350	0.3724
0.4278	0.8454	400	0.3739
0.388	0.9511	450	0.3687
0.3528	1.0568	500	0.3725
0.3352	1.1625	550	0.3675
0.3479	1.2682	600	0.3677
0.2742	1.3738	650	0.3662
0.2717	1.4795	700	0.3650
0.3343	1.5852	750	0.3632
0.3261	1.6909	800	0.3642
0.355	1.7966	850	0.3646
0.3153	1.9022	900	0.3645

Safetensors

Model size

1B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Base model

Finetuned

this model