felixwangg
/

trained_scode

Model card Files Files and versions

trained_scode / qwen3-4b_std /train.log

felixwangg's picture

Upload folder using huggingface_hub

6ea5bf4 verified 5 days ago

history blame contribute delete

1.81 kB

	05/15/2026 17:53:01 - INFO - accelerate.utils.modeling - We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set `max_memory` in to a higher value to use more memory (at your own risk).
	05/15/2026 17:53:43 - INFO - root - Training args Namespace(output_name='qwen3-4b_std', datasets=['evol'], pretrain_name='qwen3-4b', loss_weight=1.0, sven=False, num_train_epochs=2, learning_rate=2e-05, max_num_tokens=1024, batch_size=1, grad_acc_steps=16, weight_decay=0.01, adam_epsilon=1e-08, warmup_steps=0, max_grad_norm=1.0, dropout=0.1, kl_loss_weight=0, exclude_neg=False, no_weights=False, lora=False, r=16, lora_alpha=32, lora_dropout=0.1, sampling_size=20, sampling_method='minority', cwes=['all'], langs=['all'], logging_steps=50, save_epochs=10, seed=2, data_dir='/mnt/scratch/QRM/experiments/SCoDE/data_train_val', model_dir='/mnt/scratch/QRM/experiments/SCoDE/trained', output_dir='/mnt/scratch/QRM/experiments/SCoDE/trained/qwen3-4b_std', logger=<RootLogger root (INFO)>)
	05/15/2026 17:53:43 - INFO - root - *** Running training ***
	05/15/2026 17:53:43 - INFO - root - Num samples = 29691
	05/15/2026 17:53:43 - INFO - root - Num epoch = 2
	05/15/2026 17:53:43 - INFO - root - Batch size= 1
	05/15/2026 17:53:43 - INFO - root - Total batch size (w. accumulation) = 16
	05/15/2026 17:53:43 - INFO - root - Gradient Accumulation steps = 16
	05/15/2026 17:53:43 - INFO - root - Total optimization steps = 3710
	05/15/2026 17:53:43 - INFO - root - Num val samples = 3308
	05/15/2026 17:53:43 - INFO - root - Num parameters = 4021784576
	05/15/2026 17:53:43 - INFO - root - Num trainable parameters = 4021784576
	05/15/2026 17:56:26 - INFO - root - epochs: 1/2, steps: 50/3710, func: 0.05647, 1%: 3h 18m 21s