5a2a5c0a0eb450885cd5fb1af9824857

This model is a fine-tuned version of albert/albert-xxlarge-v2 on the contemmcm/cls_mmlu dataset. It achieves the following results on the evaluation set:

  • Loss: 1.3877
  • Data Size: 1.0
  • Epoch Runtime: 120.0157
  • Accuracy: 0.2487
  • F1 Macro: 0.0996

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Accuracy F1 Macro
No log 0 0 1.7784 0 3.2331 0.2460 0.1725
No log 1 438 1.4876 0.0078 4.3292 0.2407 0.2224
No log 2 876 1.4012 0.0156 5.1143 0.2427 0.2079
No log 3 1314 1.4484 0.0312 7.1618 0.2620 0.1681
No log 4 1752 1.3940 0.0625 11.0504 0.2527 0.1008
0.0825 5 2190 1.3985 0.125 18.0212 0.2453 0.0985
0.1948 6 2628 1.4229 0.25 32.7564 0.2487 0.0996
1.4657 7 3066 1.4154 0.5 61.8446 0.2453 0.0985
1.3901 8.0 3504 1.3900 1.0 121.1050 0.2487 0.0996
1.3862 9.0 3942 1.3896 1.0 120.6294 0.2527 0.1008
1.3877 10.0 4380 1.3903 1.0 119.7003 0.2527 0.1008
1.3873 11.0 4818 1.3876 1.0 120.3298 0.2533 0.1011
1.3881 12.0 5256 1.3880 1.0 120.7355 0.2527 0.1008
1.3885 13.0 5694 1.3871 1.0 119.5208 0.2487 0.0996
1.3885 14.0 6132 1.3877 1.0 120.3651 0.2527 0.1008
1.3874 15.0 6570 1.3895 1.0 120.5198 0.2527 0.1008
1.3875 16.0 7008 1.3881 1.0 119.8156 0.2527 0.1008
1.3837 17.0 7446 1.3867 1.0 120.1265 0.2487 0.0996
1.389 18.0 7884 1.3861 1.0 120.4231 0.2527 0.1008
1.3861 19.0 8322 1.3865 1.0 120.5597 0.2533 0.1011
1.3883 20.0 8760 1.3844 1.0 120.1635 0.2527 0.1008
1.387 21.0 9198 1.3880 1.0 120.1970 0.2527 0.1008
1.3874 22.0 9636 1.3847 1.0 120.1456 0.2533 0.1011
1.3855 23.0 10074 1.3882 1.0 119.9105 0.2533 0.1011
1.3856 24.0 10512 1.3877 1.0 120.0157 0.2487 0.0996

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.3.0
  • Tokenizers 0.22.1
Downloads last month
1
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/5a2a5c0a0eb450885cd5fb1af9824857

Finetuned
(27)
this model