balanced_seed-42_1e-3

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 32
eval_batch_size: 64
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 256
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 32000
num_epochs: 20.0
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss	Accuracy
6.1281	0.9995	1776	4.2574	0.3057
4.0355	1.9996	3553	3.7318	0.3474
3.5725	2.9998	5330	3.4725	0.3715
3.3409	3.9999	7107	3.3353	0.3845
3.2496	4.9995	8883	3.2559	0.3917
3.1452	5.9996	10660	3.2099	0.3962
3.0833	6.9998	12437	3.1762	0.3993
3.0415	7.9999	14214	3.1537	0.4018
3.0011	8.9995	15990	3.1412	0.4032
2.9645	9.9996	17767	3.1304	0.4050
2.9513	10.9998	19544	3.1203	0.4056
2.9433	11.9999	21321	3.1141	0.4067
2.9381	12.9995	23097	3.1090	0.4070
2.8963	13.9996	24874	3.1062	0.4075
2.8927	14.9998	26651	3.1013	0.4078
2.8961	15.9999	28428	3.1004	0.4083
2.9024	16.9995	30204	3.0929	0.4090
2.8719	17.9996	31981	3.0953	0.4087
2.8398	18.9998	33758	3.0459	0.4152
2.6969	19.9915	35520	3.0198	0.4204

Safetensors

Model size

0.1B params

Tensor type

F32