train_sst2_42_1763998304

This model is a fine-tuned version of meta-llama/Llama-3.2-1B-Instruct on the sst2 dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.2185	0.5	7577	0.1267	1694528
0.0068	1.0	15154	0.0887	3383904
0.0956	1.5	22731	0.0969	5076704
0.0944	2.0	30308	0.0884	6774480
0.0448	2.5	37885	0.0718	8465648
0.0019	3.0	45462	0.0824	10163152
0.0022	3.5	53039	0.0901	11864016
0.0663	4.0	60616	0.1025	13552544
0.0718	4.5	68193	0.0938	15246496
0.2206	5.0	75770	0.1239	16941120
0.0549	5.5	83347	0.1489	18638816
0.1129	6.0	90924	0.1408	20331520
0.0001	6.5	98501	0.1868	22022752
0.0	7.0	106078	0.2242	23721840
0.0	7.5	113655	0.2396	25419600
0.0	8.0	121232	0.2138	27111168
0.0	8.5	128809	0.2617	28801984
0.0504	9.0	136386	0.2643	30498912
0.0	9.5	143963	0.2765	32191968
0.0	10.0	151540	0.2757	33886240

Base model

Adapter

(600)

this model