felixwangg's picture
Upload folder using huggingface_hub
6ea5bf4 verified
05/15/2026 17:53:01 - INFO - accelerate.utils.modeling - We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set `max_memory` in to a higher value to use more memory (at your own risk).
05/15/2026 17:53:43 - INFO - root - Training args Namespace(output_name='qwen3-4b_std', datasets=['evol'], pretrain_name='qwen3-4b', loss_weight=1.0, sven=False, num_train_epochs=2, learning_rate=2e-05, max_num_tokens=1024, batch_size=1, grad_acc_steps=16, weight_decay=0.01, adam_epsilon=1e-08, warmup_steps=0, max_grad_norm=1.0, dropout=0.1, kl_loss_weight=0, exclude_neg=False, no_weights=False, lora=False, r=16, lora_alpha=32, lora_dropout=0.1, sampling_size=20, sampling_method='minority', cwes=['all'], langs=['all'], logging_steps=50, save_epochs=10, seed=2, data_dir='/mnt/scratch/QRM/experiments/SCoDE/data_train_val', model_dir='/mnt/scratch/QRM/experiments/SCoDE/trained', output_dir='/mnt/scratch/QRM/experiments/SCoDE/trained/qwen3-4b_std', logger=<RootLogger root (INFO)>)
05/15/2026 17:53:43 - INFO - root - ***** Running training *****
05/15/2026 17:53:43 - INFO - root - Num samples = 29691
05/15/2026 17:53:43 - INFO - root - Num epoch = 2
05/15/2026 17:53:43 - INFO - root - Batch size= 1
05/15/2026 17:53:43 - INFO - root - Total batch size (w. accumulation) = 16
05/15/2026 17:53:43 - INFO - root - Gradient Accumulation steps = 16
05/15/2026 17:53:43 - INFO - root - Total optimization steps = 3710
05/15/2026 17:53:43 - INFO - root - Num val samples = 3308
05/15/2026 17:53:43 - INFO - root - Num parameters = 4021784576
05/15/2026 17:53:43 - INFO - root - Num trainable parameters = 4021784576
05/15/2026 17:56:26 - INFO - root - epochs: 1/2, steps: 50/3710, func: 0.05647, 1%: 3h 18m 21s