train_sst2_42_1763998302

This model is a fine-tuned version of meta-llama/Llama-3.2-1B-Instruct on the sst2 dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.1947	0.5	7577	0.1283	1694528
0.1175	1.0	15154	0.1080	3383904
0.0679	1.5	22731	0.0982	5076704
0.0245	2.0	30308	0.1011	6774480
0.0139	2.5	37885	0.0890	8465648
0.0388	3.0	45462	0.0879	10163152
0.1137	3.5	53039	0.0864	11864016
0.2298	4.0	60616	0.0848	13552544
0.1389	4.5	68193	0.0840	15246496
0.1076	5.0	75770	0.0830	16941120
0.0975	5.5	83347	0.0830	18638816
0.1707	6.0	90924	0.0811	20331520
0.0239	6.5	98501	0.0823	22022752
0.0124	7.0	106078	0.0794	23721840
0.1312	7.5	113655	0.0814	25419600
0.0941	8.0	121232	0.0799	27111168
0.0626	8.5	128809	0.0808	28801984
0.198	9.0	136386	0.0807	30498912
0.1411	9.5	143963	0.0807	32191968
0.0957	10.0	151540	0.0805	33886240

Base model

Adapter

(511)

this model