9f66e008156f1f42466199bfb7be3e62

This model is a fine-tuned version of meta-llama/Llama-3.2-1B on the contemmcm/cls_mmlu dataset. It achieves the following results on the evaluation set:

Loss: 6.4448
Data Size: 1.0
Epoch Runtime: 78.2710
Accuracy: 0.2666
F1 Macro: 0.2366

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training results

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Accuracy	F1 Macro
No log	0	0	10.0210	0	3.3118	0.2434	0.1767
No log	1	438	11.6774	0.0078	3.5466	0.2533	0.1534
No log	2	876	8.3193	0.0156	5.1940	0.2527	0.1008
No log	3	1314	6.8895	0.0312	8.2007	0.2487	0.0996
No log	4	1752	6.3369	0.0625	11.4107	0.2533	0.1011
0.3894	5	2190	5.7759	0.125	16.5952	0.2453	0.1850
0.7599	6	2628	5.6854	0.25	26.1734	0.2453	0.0985
5.6658	7	3066	5.5716	0.5	44.3615	0.2540	0.1457
5.7936	8.0	3504	5.6529	1.0	81.2670	0.2507	0.1037
5.4888	9.0	3942	5.8002	1.0	80.8304	0.2699	0.1552
5.1531	10.0	4380	5.9782	1.0	78.0883	0.2660	0.1983
4.3526	11.0	4818	6.4448	1.0	78.2710	0.2666	0.2366

Framework versions

Transformers 4.57.0
Pytorch 2.8.0+cu128
Datasets 4.3.0
Tokenizers 0.22.1

Downloads last month: 2

Safetensors

Model size

1B params

Tensor type

F32

Model tree for contemmcm/9f66e008156f1f42466199bfb7be3e62

Base model

meta-llama/Llama-3.2-1B

Finetuned

(898)

this model