factory_qwen_results1

This model is a fine-tuned version of Qwen/Qwen3-Coder-30B-A3B-Instruct on the train dataset. It achieves the following results on the evaluation set:

Loss: 0.1022
Accuracy: 0.9780

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0005
train_batch_size: 1
eval_batch_size: 2
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 2
total_train_batch_size: 8
total_eval_batch_size: 8
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 50
num_epochs: 3.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
0.3137	0.0724	30	0.3017	0.9260
0.2642	0.1448	60	0.2687	0.9343
0.2464	0.2171	90	0.2397	0.9387
0.2345	0.2895	120	0.2179	0.9462
0.2104	0.3619	150	0.2028	0.9488
0.1645	0.4343	180	0.2001	0.9499
0.1761	0.5066	210	0.1826	0.9543
0.1668	0.5790	240	0.1741	0.9568
0.156	0.6514	270	0.1672	0.9566
0.1416	0.7238	300	0.1686	0.9553
0.1361	0.7961	330	0.1587	0.9592
0.162	0.8685	360	0.1539	0.9607
0.1177	0.9409	390	0.1495	0.9621
0.1276	1.0121	420	0.1450	0.9640
0.113	1.0844	450	0.1454	0.9626
0.0844	1.1568	480	0.1387	0.9642
0.1035	1.2292	510	0.1353	0.9660
0.0903	1.3016	540	0.1352	0.9660
0.0927	1.3739	570	0.1316	0.9672
0.1017	1.4463	600	0.1259	0.9695
0.0805	1.5187	630	0.1295	0.9691
0.1307	1.5911	660	0.1211	0.9709
0.0863	1.6634	690	0.1184	0.9711
0.065	1.7358	720	0.1169	0.9714
0.0899	1.8082	750	0.1112	0.9724
0.0736	1.8806	780	0.1083	0.9734
0.0772	1.9530	810	0.1094	0.9728
0.047	2.0241	840	0.1118	0.9734
0.0389	2.0965	870	0.1143	0.9735
0.0519	2.1689	900	0.1111	0.9742
0.0417	2.2413	930	0.1100	0.9751
0.0485	2.3136	960	0.1085	0.9748
0.0539	2.3860	990	0.1055	0.9758
0.031	2.4584	1020	0.1068	0.9760
0.0367	2.5308	1050	0.1076	0.9761
0.0294	2.6031	1080	0.1054	0.9773
0.0329	2.6755	1110	0.1049	0.9771
0.0358	2.7479	1140	0.1027	0.9773
0.0321	2.8203	1170	0.1033	0.9776
0.0337	2.8926	1200	0.1033	0.9777
0.0456	2.9650	1230	0.1022	0.9780

Framework versions

PEFT 0.17.1
Transformers 4.57.1
Pytorch 2.10.0+cu128
Datasets 4.0.0
Tokenizers 0.22.2

Downloads last month: 15

Model tree for finalform/foamQwen-30B

Base model

Qwen/Qwen3-Coder-30B-A3B-Instruct

Adapter

(32)

this model