train_sst2_42_1773148418

This model is a fine-tuned version of meta-llama/Llama-3.2-1B-Instruct on the sst2 dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 5

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.2252	0.2501	1895	0.1185	930944
0.2031	0.5002	3790	0.1181	1864128
0.0111	0.7503	5685	0.0807	2790656
0.0636	1.0004	7580	0.0821	3726464
0.0492	1.2505	9475	0.0806	4658240
0.1228	1.5006	11370	0.0785	5591680
0.2702	1.7507	13265	0.0703	6528448
0.0132	2.0008	15160	0.0814	7463024
0.206	2.2509	17055	0.0801	8395632
0.0702	2.5010	18950	0.0716	9326256
0.1272	2.7511	20845	0.0724	10259504
0.0797	3.0012	22740	0.0747	11196096
0.084	3.2513	24635	0.0783	12128448
0.0036	3.5014	26530	0.0770	13069824
0.2228	3.7515	28425	0.0751	13996672
0.0571	4.0016	30320	0.0782	14924944
0.0997	4.2517	32215	0.0773	15859920
0.0545	4.5018	34110	0.0784	16790288
0.0069	4.7519	36005	0.0784	17721744

Base model

Adapter

(600)

this model