| 05/15/2026 17:53:01 - INFO - accelerate.utils.modeling - We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set `max_memory` in to a higher value to use more memory (at your own risk). |
| 05/15/2026 17:53:44 - INFO - root - Training args Namespace(output_name='qwen2.5-coder-3b_std', datasets=['evol'], pretrain_name='qwen2.5-coder-3b', loss_weight=1.0, sven=False, num_train_epochs=2, learning_rate=2e-05, max_num_tokens=1024, batch_size=1, grad_acc_steps=16, weight_decay=0.01, adam_epsilon=1e-08, warmup_steps=0, max_grad_norm=1.0, dropout=0.1, kl_loss_weight=0, exclude_neg=False, no_weights=False, lora=False, r=16, lora_alpha=32, lora_dropout=0.1, sampling_size=20, sampling_method='minority', cwes=['all'], langs=['all'], logging_steps=50, save_epochs=10, seed=2, data_dir='/mnt/scratch/QRM/experiments/SCoDE/data_train_val', model_dir='/mnt/scratch/QRM/experiments/SCoDE/trained', output_dir='/mnt/scratch/QRM/experiments/SCoDE/trained/qwen2.5-coder-3b_std', logger=<RootLogger root (INFO)>) |
| 05/15/2026 17:53:44 - INFO - root - ***** Running training ***** |
| 05/15/2026 17:53:44 - INFO - root - Num samples = 29691 |
| 05/15/2026 17:53:44 - INFO - root - Num epoch = 2 |
| 05/15/2026 17:53:44 - INFO - root - Batch size= 1 |
| 05/15/2026 17:53:44 - INFO - root - Total batch size (w. accumulation) = 16 |
| 05/15/2026 17:53:44 - INFO - root - Gradient Accumulation steps = 16 |
| 05/15/2026 17:53:44 - INFO - root - Total optimization steps = 3710 |
| 05/15/2026 17:53:44 - INFO - root - Num val samples = 3308 |
| 05/15/2026 17:53:44 - INFO - root - Num parameters = 3085383680 |
| 05/15/2026 17:53:44 - INFO - root - Num trainable parameters = 3085383680 |
| 05/15/2026 17:55:42 - INFO - root - epochs: 1/2, steps: 50/3710, func: 0.054119, 1%: 2h 24m 4s |
| 05/15/2026 17:57:42 - INFO - root - epochs: 1/2, steps: 100/3710, func: 0.051626, 2%: 2h 23m 34s |
| 05/15/2026 17:59:49 - INFO - root - epochs: 1/2, steps: 150/3710, func: 0.051893, 4%: 2h 24m 27s |
| 05/15/2026 18:01:55 - INFO - root - epochs: 1/2, steps: 200/3710, func: 0.052662, 5%: 2h 23m 39s |
| 05/15/2026 18:04:00 - INFO - root - epochs: 1/2, steps: 250/3710, func: 0.050138, 6%: 2h 22m 8s |
| 05/15/2026 18:06:06 - INFO - root - epochs: 1/2, steps: 300/3710, func: 0.05182, 8%: 2h 20m 35s |
| 05/15/2026 18:08:11 - INFO - root - epochs: 1/2, steps: 350/3710, func: 0.051337, 9%: 2h 18m 50s |
| 05/15/2026 18:10:17 - INFO - root - epochs: 1/2, steps: 400/3710, func: 0.051932, 10%: 2h 17m 0s |
| 05/15/2026 18:12:23 - INFO - root - epochs: 1/2, steps: 450/3710, func: 0.052326, 12%: 2h 15m 10s |
| 05/15/2026 18:14:28 - INFO - root - epochs: 1/2, steps: 500/3710, func: 0.05205, 13%: 2h 13m 14s |
| 05/15/2026 18:16:34 - INFO - root - epochs: 1/2, steps: 550/3710, func: 0.051055, 14%: 2h 11m 15s |
| 05/15/2026 18:18:39 - INFO - root - epochs: 1/2, steps: 600/3710, func: 0.05092, 16%: 2h 9m 15s |
| 05/15/2026 18:20:45 - INFO - root - epochs: 1/2, steps: 650/3710, func: 0.052663, 17%: 2h 7m 15s |
| 05/15/2026 18:22:50 - INFO - root - epochs: 1/2, steps: 700/3710, func: 0.051628, 18%: 2h 5m 13s |
| 05/15/2026 18:24:56 - INFO - root - epochs: 1/2, steps: 750/3710, func: 0.05118, 20%: 2h 3m 12s |
| 05/15/2026 18:27:03 - INFO - root - epochs: 1/2, steps: 800/3710, func: 0.051354, 21%: 2h 1m 14s |
| 05/15/2026 18:29:08 - INFO - root - epochs: 1/2, steps: 850/3710, func: 0.050612, 22%: 1h 59m 10s |
| 05/15/2026 18:31:15 - INFO - root - epochs: 1/2, steps: 900/3710, func: 0.051299, 24%: 1h 57m 10s |
| 05/15/2026 18:33:20 - INFO - root - epochs: 1/2, steps: 950/3710, func: 0.051677, 25%: 1h 55m 7s |
| 05/15/2026 18:35:26 - INFO - root - epochs: 1/2, steps: 1000/3710, func: 0.051758, 26%: 1h 53m 5s |
| 05/15/2026 18:37:32 - INFO - root - epochs: 1/2, steps: 1050/3710, func: 0.051708, 28%: 1h 51m 0s |
| 05/15/2026 18:39:38 - INFO - root - epochs: 1/2, steps: 1100/3710, func: 0.050593, 29%: 1h 48m 57s |
| 05/15/2026 18:41:44 - INFO - root - epochs: 1/2, steps: 1150/3710, func: 0.051722, 30%: 1h 46m 53s |
| 05/15/2026 18:43:49 - INFO - root - epochs: 1/2, steps: 1200/3710, func: 0.051253, 32%: 1h 44m 48s |
| 05/15/2026 18:45:56 - INFO - root - epochs: 1/2, steps: 1250/3710, func: 0.051088, 33%: 1h 42m 46s |
| 05/15/2026 18:48:02 - INFO - root - epochs: 1/2, steps: 1300/3710, func: 0.051718, 35%: 1h 40m 42s |
| 05/15/2026 18:50:07 - INFO - root - epochs: 1/2, steps: 1350/3710, func: 0.051266, 36%: 1h 38m 38s |
| 05/15/2026 18:52:13 - INFO - root - epochs: 1/2, steps: 1400/3710, func: 0.050852, 37%: 1h 36m 32s |
| 05/15/2026 18:54:19 - INFO - root - epochs: 1/2, steps: 1450/3710, func: 0.050519, 39%: 1h 34m 28s |
| 05/15/2026 18:56:25 - INFO - root - epochs: 1/2, steps: 1500/3710, func: 0.050726, 40%: 1h 32m 23s |
| 05/15/2026 18:58:30 - INFO - root - epochs: 1/2, steps: 1550/3710, func: 0.051331, 41%: 1h 30m 18s |
| 05/15/2026 19:00:36 - INFO - root - epochs: 1/2, steps: 1600/3710, func: 0.050676, 43%: 1h 28m 13s |
| 05/15/2026 19:02:41 - INFO - root - epochs: 1/2, steps: 1650/3710, func: 0.050371, 44%: 1h 26m 8s |
| 05/15/2026 19:04:47 - INFO - root - epochs: 1/2, steps: 1700/3710, func: 0.05151, 45%: 1h 24m 3s |
| 05/15/2026 19:06:46 - INFO - root - epochs: 1/2, steps: 1750/3710, func: 0.051629, 47%: 1h 21m 51s |
| 05/15/2026 19:08:42 - INFO - root - epochs: 1/2, steps: 1800/3710, func: 0.050945, 48%: 1h 19m 36s |
| 05/15/2026 19:10:38 - INFO - root - epochs: 1/2, steps: 1850/3710, func: 0.04991, 49%: 1h 17m 21s |
| 05/15/2026 19:12:36 - INFO - root - epochs: 2/2, steps: 1900/3710, func: 0.049737, 51%: 1h 15m 11s |
| 05/15/2026 19:14:33 - INFO - root - epochs: 2/2, steps: 1950/3710, func: 0.049237, 52%: 1h 12m 59s |
| 05/15/2026 19:16:31 - INFO - root - epochs: 2/2, steps: 2000/3710, func: 0.047785, 53%: 1h 10m 49s |
| 05/15/2026 19:18:29 - INFO - root - epochs: 2/2, steps: 2050/3710, func: 0.049386, 55%: 1h 8m 40s |
| 05/15/2026 19:20:26 - INFO - root - epochs: 2/2, steps: 2100/3710, func: 0.048201, 56%: 1h 6m 30s |
| 05/15/2026 19:22:22 - INFO - root - epochs: 2/2, steps: 2150/3710, func: 0.048058, 57%: 1h 4m 21s |
| 05/15/2026 19:24:19 - INFO - root - epochs: 2/2, steps: 2200/3710, func: 0.048816, 59%: 1h 2m 13s |
| 05/15/2026 19:26:16 - INFO - root - epochs: 2/2, steps: 2250/3710, func: 0.048451, 60%: 1h 0m 5s |
| 05/15/2026 19:28:13 - INFO - root - epochs: 2/2, steps: 2300/3710, func: 0.048546, 61%: 0h 57m 58s |
| 05/15/2026 19:30:10 - INFO - root - epochs: 2/2, steps: 2350/3710, func: 0.048176, 63%: 0h 55m 51s |
| 05/15/2026 19:32:07 - INFO - root - epochs: 2/2, steps: 2400/3710, func: 0.047996, 64%: 0h 53m 44s |
| 05/15/2026 19:34:04 - INFO - root - epochs: 2/2, steps: 2450/3710, func: 0.048384, 66%: 0h 51m 38s |
| 05/15/2026 19:36:02 - INFO - root - epochs: 2/2, steps: 2500/3710, func: 0.04858, 67%: 0h 49m 33s |
| 05/15/2026 19:37:58 - INFO - root - epochs: 2/2, steps: 2550/3710, func: 0.047511, 68%: 0h 47m 27s |
| 05/15/2026 19:39:55 - INFO - root - epochs: 2/2, steps: 2600/3710, func: 0.047964, 70%: 0h 45m 22s |
| 05/15/2026 19:41:51 - INFO - root - epochs: 2/2, steps: 2650/3710, func: 0.048194, 71%: 0h 43m 17s |
| 05/15/2026 19:43:49 - INFO - root - epochs: 2/2, steps: 2700/3710, func: 0.049223, 72%: 0h 41m 13s |
| 05/15/2026 19:45:46 - INFO - root - epochs: 2/2, steps: 2750/3710, func: 0.049112, 74%: 0h 39m 9s |
| 05/15/2026 19:47:43 - INFO - root - epochs: 2/2, steps: 2800/3710, func: 0.049866, 75%: 0h 37m 5s |
| 05/15/2026 19:49:40 - INFO - root - epochs: 2/2, steps: 2850/3710, func: 0.048317, 76%: 0h 35m 1s |
| 05/15/2026 19:51:37 - INFO - root - epochs: 2/2, steps: 2900/3710, func: 0.048544, 78%: 0h 32m 58s |
| 05/15/2026 19:53:33 - INFO - root - epochs: 2/2, steps: 2950/3710, func: 0.048578, 79%: 0h 30m 54s |
| 05/15/2026 19:55:30 - INFO - root - epochs: 2/2, steps: 3000/3710, func: 0.048732, 80%: 0h 28m 51s |
| 05/15/2026 19:57:26 - INFO - root - epochs: 2/2, steps: 3050/3710, func: 0.049371, 82%: 0h 26m 48s |
| 05/15/2026 19:59:21 - INFO - root - epochs: 2/2, steps: 3100/3710, func: 0.048626, 83%: 0h 24m 45s |
| 05/15/2026 20:01:18 - INFO - root - epochs: 2/2, steps: 3150/3710, func: 0.048307, 84%: 0h 22m 43s |
| 05/15/2026 20:03:14 - INFO - root - epochs: 2/2, steps: 3200/3710, func: 0.048949, 86%: 0h 20m 40s |
| 05/15/2026 20:05:11 - INFO - root - epochs: 2/2, steps: 3250/3710, func: 0.049473, 87%: 0h 18m 38s |
| 05/15/2026 20:07:12 - INFO - root - epochs: 2/2, steps: 3300/3710, func: 0.048624, 88%: 0h 16m 37s |
| 05/15/2026 20:09:18 - INFO - root - epochs: 2/2, steps: 3350/3710, func: 0.048781, 90%: 0h 14m 36s |
| 05/15/2026 20:11:23 - INFO - root - epochs: 2/2, steps: 3400/3710, func: 0.04795, 91%: 0h 12m 35s |
| 05/15/2026 20:13:28 - INFO - root - epochs: 2/2, steps: 3450/3710, func: 0.04739, 92%: 0h 10m 34s |
| 05/15/2026 20:15:34 - INFO - root - epochs: 2/2, steps: 3500/3710, func: 0.048725, 94%: 0h 8m 33s |
| 05/15/2026 20:17:39 - INFO - root - epochs: 2/2, steps: 3550/3710, func: 0.048294, 95%: 0h 6m 31s |
| 05/15/2026 20:19:45 - INFO - root - epochs: 2/2, steps: 3600/3710, func: 0.047885, 97%: 0h 4m 30s |
| 05/15/2026 20:21:51 - INFO - root - epochs: 2/2, steps: 3650/3710, func: 0.047021, 98%: 0h 2m 28s |
| 05/15/2026 20:23:56 - INFO - root - epochs: 2/2, steps: 3700/3710, func: 0.048976, 99%: 0h 0m 26s |
| 05/15/2026 20:26:32 - INFO - root - final eval loss: func: 0.050997 |
| 05/15/2026 20:26:32 - INFO - root - Saving model checkpoint to /mnt/scratch/QRM/experiments/SCoDE/trained/qwen2.5-coder-3b_std/checkpoint-last |
|
|