| 05/12/2026 02:58:53 - INFO - accelerate.utils.modeling - We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set `max_memory` in to a higher value to use more memory (at your own risk). |
| 05/12/2026 02:59:58 - INFO - root - number of sec samples before upsampling: 1810 |
| 05/12/2026 02:59:58 - INFO - root - number of sec samples after upsampling: 2415 |
| 05/12/2026 03:00:06 - INFO - root - Training args Namespace(output_name='deepseek-coder-1.3b-lora-safecoder', datasets=['evol', 'sec-desc', 'sec-new-desc'], pretrain_name='deepseek-coder-1.3b', loss_weight=1.0, sven=False, num_train_epochs=2, learning_rate=2e-05, max_num_tokens=1024, batch_size=1, grad_acc_steps=16, weight_decay=0.01, adam_epsilon=1e-08, warmup_steps=0, max_grad_norm=1.0, dropout=0.1, kl_loss_weight=0, exclude_neg=False, no_weights=False, lora=True, r=16, lora_alpha=32, lora_dropout=0.1, sampling_size=20, sampling_method='minority', cwes=['all'], langs=['all'], logging_steps=50, save_epochs=10, seed=2, data_dir='../data_train_val', model_dir='../trained/', output_dir='../trained/deepseek-coder-1.3b-lora-safecoder', logger=<RootLogger root (INFO)>) |
| 05/12/2026 03:00:06 - INFO - root - ***** Running training ***** |
| 05/12/2026 03:00:06 - INFO - root - Num samples = 30979 |
| 05/12/2026 03:00:06 - INFO - root - Num epoch = 2 |
| 05/12/2026 03:00:06 - INFO - root - Batch size= 1 |
| 05/12/2026 03:00:06 - INFO - root - Total batch size (w. accumulation) = 16 |
| 05/12/2026 03:00:06 - INFO - root - Gradient Accumulation steps = 16 |
| 05/12/2026 03:00:06 - INFO - root - Total optimization steps = 3872 |
| 05/12/2026 03:00:06 - INFO - root - Num val samples = 3371 |
| 05/12/2026 03:00:06 - INFO - root - Num parameters = 1361049952 |
| 05/12/2026 03:00:06 - INFO - root - Num trainable parameters = 15536480 |
| 05/12/2026 03:02:57 - INFO - root - epochs: 1/2, steps: 50/3872, func: 0.058498, pos: 0.063538, neg: 0.177169, 1%: 3h 38m 54s |
| 05/12/2026 03:05:50 - INFO - root - epochs: 1/2, steps: 100/3872, func: 0.055839, pos: 0.074381, neg: 0.136812, 2%: 3h 36m 27s |
| 05/12/2026 03:08:41 - INFO - root - epochs: 1/2, steps: 150/3872, func: 0.054501, pos: 0.079344, neg: 0.138904, 3%: 3h 32m 59s |
| 05/12/2026 03:11:33 - INFO - root - epochs: 1/2, steps: 200/3872, func: 0.054391, pos: 0.069465, neg: 0.120665, 5%: 3h 30m 31s |
| 05/12/2026 03:14:23 - INFO - root - epochs: 1/2, steps: 250/3872, func: 0.055671, pos: 0.080729, neg: 0.104012, 6%: 3h 26m 58s |
| 05/12/2026 03:17:12 - INFO - root - epochs: 1/2, steps: 300/3872, func: 0.053413, pos: 0.065483, neg: 0.067479, 7%: 3h 23m 48s |
| 05/12/2026 03:20:01 - INFO - root - epochs: 1/2, steps: 350/3872, func: 0.055215, pos: 0.075811, neg: 0.051169, 9%: 3h 20m 35s |
| 05/12/2026 03:22:55 - INFO - root - epochs: 1/2, steps: 400/3872, func: 0.053353, pos: 0.079163, neg: 0.046558, 10%: 3h 18m 6s |
| 05/12/2026 03:25:48 - INFO - root - epochs: 1/2, steps: 450/3872, func: 0.05248, pos: 0.080903, neg: 0.050838, 11%: 3h 15m 31s |
| 05/12/2026 03:28:41 - INFO - root - epochs: 1/2, steps: 500/3872, func: 0.052438, pos: 0.07525, neg: 0.06079, 12%: 3h 12m 53s |
| 05/12/2026 03:31:31 - INFO - root - epochs: 1/2, steps: 550/3872, func: 0.052956, pos: 0.064179, neg: 0.037968, 14%: 3h 9m 48s |
| 05/12/2026 03:34:25 - INFO - root - epochs: 1/2, steps: 600/3872, func: 0.052458, pos: 0.085765, neg: 0.048997, 15%: 3h 7m 7s |
| 05/12/2026 03:37:14 - INFO - root - epochs: 1/2, steps: 650/3872, func: 0.051482, pos: 0.065869, neg: 0.041883, 16%: 3h 4m 7s |
| 05/12/2026 03:40:10 - INFO - root - epochs: 1/2, steps: 700/3872, func: 0.053914, pos: 0.080435, neg: 0.027877, 18%: 3h 1m 40s |
| 05/12/2026 03:43:01 - INFO - root - epochs: 1/2, steps: 750/3872, func: 0.053491, pos: 0.072454, neg: 0.035127, 19%: 2h 58m 42s |
| 05/12/2026 03:45:50 - INFO - root - epochs: 1/2, steps: 800/3872, func: 0.053087, pos: 0.069237, neg: 0.031153, 20%: 2h 55m 39s |
| 05/12/2026 03:48:40 - INFO - root - epochs: 1/2, steps: 850/3872, func: 0.052351, pos: 0.0773, neg: 0.033514, 21%: 2h 52m 47s |
| 05/12/2026 03:51:31 - INFO - root - epochs: 1/2, steps: 900/3872, func: 0.052265, pos: 0.089922, neg: 0.029248, 23%: 2h 49m 53s |
| 05/12/2026 03:54:22 - INFO - root - epochs: 1/2, steps: 950/3872, func: 0.052332, pos: 0.066269, neg: 0.033777, 24%: 2h 46m 57s |
| 05/12/2026 03:57:13 - INFO - root - epochs: 1/2, steps: 1000/3872, func: 0.052323, pos: 0.056476, neg: 0.039315, 25%: 2h 44m 6s |
| 05/12/2026 04:00:04 - INFO - root - epochs: 1/2, steps: 1050/3872, func: 0.052345, pos: 0.079464, neg: 0.029279, 27%: 2h 41m 14s |
| 05/12/2026 04:03:00 - INFO - root - epochs: 1/2, steps: 1100/3872, func: 0.0532, pos: 0.075645, neg: 0.028954, 28%: 2h 38m 34s |
| 05/12/2026 04:05:52 - INFO - root - epochs: 1/2, steps: 1150/3872, func: 0.051499, pos: 0.053088, neg: 0.037814, 29%: 2h 35m 44s |
| 05/12/2026 04:08:43 - INFO - root - epochs: 1/2, steps: 1200/3872, func: 0.051378, pos: 0.066166, neg: 0.03289, 30%: 2h 32m 52s |
| 05/12/2026 04:11:36 - INFO - root - epochs: 1/2, steps: 1250/3872, func: 0.052493, pos: 0.075589, neg: 0.030921, 32%: 2h 30m 1s |
| 05/12/2026 04:14:26 - INFO - root - epochs: 1/2, steps: 1300/3872, func: 0.051639, pos: 0.062583, neg: 0.037883, 33%: 2h 27m 8s |
| 05/12/2026 04:17:13 - INFO - root - epochs: 1/2, steps: 1350/3872, func: 0.050644, pos: 0.065618, neg: 0.034679, 34%: 2h 24m 7s |
| 05/12/2026 04:20:07 - INFO - root - epochs: 1/2, steps: 1400/3872, func: 0.052182, pos: 0.058463, neg: 0.035739, 36%: 2h 21m 21s |
| 05/12/2026 04:22:59 - INFO - root - epochs: 1/2, steps: 1450/3872, func: 0.052085, pos: 0.059989, neg: 0.034663, 37%: 2h 18m 29s |
| 05/12/2026 04:25:54 - INFO - root - epochs: 1/2, steps: 1500/3872, func: 0.052605, pos: 0.067227, neg: 0.030407, 38%: 2h 15m 44s |
| 05/12/2026 04:28:46 - INFO - root - epochs: 1/2, steps: 1550/3872, func: 0.05215, pos: 0.053883, neg: 0.039369, 40%: 2h 12m 53s |
| 05/12/2026 04:31:37 - INFO - root - epochs: 1/2, steps: 1600/3872, func: 0.053258, pos: 0.069293, neg: 0.033501, 41%: 2h 10m 1s |
| 05/12/2026 04:34:29 - INFO - root - epochs: 1/2, steps: 1650/3872, func: 0.051802, pos: 0.098566, neg: 0.023335, 42%: 2h 7m 10s |
| 05/12/2026 04:37:23 - INFO - root - epochs: 1/2, steps: 1700/3872, func: 0.053117, pos: 0.071336, neg: 0.027185, 43%: 2h 4m 21s |
| 05/12/2026 04:40:13 - INFO - root - epochs: 1/2, steps: 1750/3872, func: 0.0523, pos: 0.059277, neg: 0.023508, 45%: 2h 1m 28s |
| 05/12/2026 04:43:06 - INFO - root - epochs: 1/2, steps: 1800/3872, func: 0.051384, pos: 0.0708, neg: 0.025847, 46%: 1h 58m 37s |
| 05/12/2026 04:45:56 - INFO - root - epochs: 1/2, steps: 1850/3872, func: 0.051056, pos: 0.058926, neg: 0.023186, 47%: 1h 55m 43s |
| 05/12/2026 04:48:43 - INFO - root - epochs: 1/2, steps: 1900/3872, func: 0.051681, pos: 0.052175, neg: 0.026962, 49%: 1h 52m 47s |
| 05/12/2026 04:51:35 - INFO - root - epochs: 2/2, steps: 1950/3872, func: 0.051019, pos: 0.057938, neg: 0.028023, 50%: 1h 49m 55s |
| 05/12/2026 04:54:29 - INFO - root - epochs: 2/2, steps: 2000/3872, func: 0.051679, pos: 0.063034, neg: 0.015457, 51%: 1h 47m 7s |
| 05/12/2026 04:57:21 - INFO - root - epochs: 2/2, steps: 2050/3872, func: 0.052008, pos: 0.057971, neg: 0.025081, 52%: 1h 44m 15s |
| 05/12/2026 05:00:15 - INFO - root - epochs: 2/2, steps: 2100/3872, func: 0.050413, pos: 0.057375, neg: 0.020022, 54%: 1h 41m 26s |
| 05/12/2026 05:03:05 - INFO - root - epochs: 2/2, steps: 2150/3872, func: 0.051819, pos: 0.061493, neg: 0.02545, 55%: 1h 38m 33s |
| 05/12/2026 05:05:58 - INFO - root - epochs: 2/2, steps: 2200/3872, func: 0.051764, pos: 0.05972, neg: 0.026174, 56%: 1h 35m 42s |
| 05/12/2026 05:08:44 - INFO - root - epochs: 2/2, steps: 2250/3872, func: 0.050719, pos: 0.053116, neg: 0.01747, 58%: 1h 32m 47s |
| 05/12/2026 05:11:36 - INFO - root - epochs: 2/2, steps: 2300/3872, func: 0.050605, pos: 0.036939, neg: 0.015887, 59%: 1h 29m 56s |
| 05/12/2026 05:14:28 - INFO - root - epochs: 2/2, steps: 2350/3872, func: 0.051366, pos: 0.044932, neg: 0.028288, 60%: 1h 27m 4s |
| 05/12/2026 05:17:17 - INFO - root - epochs: 2/2, steps: 2400/3872, func: 0.05086, pos: 0.056528, neg: 0.019473, 61%: 1h 24m 12s |
| 05/12/2026 05:20:13 - INFO - root - epochs: 2/2, steps: 2450/3872, func: 0.050416, pos: 0.056659, neg: 0.026592, 63%: 1h 21m 23s |
| 05/12/2026 05:23:05 - INFO - root - epochs: 2/2, steps: 2500/3872, func: 0.051684, pos: 0.070944, neg: 0.023598, 64%: 1h 18m 31s |
| 05/12/2026 05:25:55 - INFO - root - epochs: 2/2, steps: 2550/3872, func: 0.051356, pos: 0.054086, neg: 0.021948, 65%: 1h 15m 39s |
| 05/12/2026 05:28:47 - INFO - root - epochs: 2/2, steps: 2600/3872, func: 0.050023, pos: 0.045485, neg: 0.024609, 67%: 1h 12m 48s |
| 05/12/2026 05:31:37 - INFO - root - epochs: 2/2, steps: 2650/3872, func: 0.05107, pos: 0.038352, neg: 0.025448, 68%: 1h 9m 55s |
| 05/12/2026 05:34:29 - INFO - root - epochs: 2/2, steps: 2700/3872, func: 0.05065, pos: 0.076865, neg: 0.02363, 69%: 1h 7m 4s |
| 05/12/2026 05:37:23 - INFO - root - epochs: 2/2, steps: 2750/3872, func: 0.051486, pos: 0.040869, neg: 0.013997, 70%: 1h 4m 13s |
| 05/12/2026 05:40:16 - INFO - root - epochs: 2/2, steps: 2800/3872, func: 0.051778, pos: 0.053576, neg: 0.019081, 72%: 1h 1m 22s |
| 05/12/2026 05:43:09 - INFO - root - epochs: 2/2, steps: 2850/3872, func: 0.050467, pos: 0.064041, neg: 0.024908, 73%: 0h 58m 31s |
| 05/12/2026 05:46:02 - INFO - root - epochs: 2/2, steps: 2900/3872, func: 0.050543, pos: 0.053341, neg: 0.026226, 74%: 0h 55m 40s |
| 05/12/2026 05:48:58 - INFO - root - epochs: 2/2, steps: 2950/3872, func: 0.051039, pos: 0.054438, neg: 0.026954, 76%: 0h 52m 50s |
| 05/12/2026 05:51:48 - INFO - root - epochs: 2/2, steps: 3000/3872, func: 0.050754, pos: 0.049894, neg: 0.018876, 77%: 0h 49m 57s |
| 05/12/2026 05:54:38 - INFO - root - epochs: 2/2, steps: 3050/3872, func: 0.050262, pos: 0.067919, neg: 0.024587, 78%: 0h 47m 5s |
| 05/12/2026 05:57:28 - INFO - root - epochs: 2/2, steps: 3100/3872, func: 0.050994, pos: 0.065647, neg: 0.015296, 80%: 0h 44m 13s |
| 05/12/2026 06:00:19 - INFO - root - epochs: 2/2, steps: 3150/3872, func: 0.051998, pos: 0.0547, neg: 0.019967, 81%: 0h 41m 21s |
| 05/12/2026 06:03:10 - INFO - root - epochs: 2/2, steps: 3200/3872, func: 0.050904, pos: 0.051526, neg: 0.014749, 82%: 0h 38m 30s |
| 05/12/2026 06:05:58 - INFO - root - epochs: 2/2, steps: 3250/3872, func: 0.051783, pos: 0.051603, neg: 0.020233, 83%: 0h 35m 37s |
| 05/12/2026 06:08:50 - INFO - root - epochs: 2/2, steps: 3300/3872, func: 0.052165, pos: 0.068071, neg: 0.023161, 85%: 0h 32m 46s |
| 05/12/2026 06:11:41 - INFO - root - epochs: 2/2, steps: 3350/3872, func: 0.050292, pos: 0.044843, neg: 0.018546, 86%: 0h 29m 54s |
| 05/12/2026 06:14:31 - INFO - root - epochs: 2/2, steps: 3400/3872, func: 0.050373, pos: 0.035594, neg: 0.023604, 87%: 0h 27m 2s |
| 05/12/2026 06:17:19 - INFO - root - epochs: 2/2, steps: 3450/3872, func: 0.051031, pos: 0.051028, neg: 0.022269, 89%: 0h 24m 10s |
| 05/12/2026 06:20:10 - INFO - root - epochs: 2/2, steps: 3500/3872, func: 0.050252, pos: 0.049759, neg: 0.018131, 90%: 0h 21m 19s |
| 05/12/2026 06:23:02 - INFO - root - epochs: 2/2, steps: 3550/3872, func: 0.05084, pos: 0.039536, neg: 0.027638, 91%: 0h 18m 27s |
| 05/12/2026 06:25:50 - INFO - root - epochs: 2/2, steps: 3600/3872, func: 0.05144, pos: 0.048153, neg: 0.024605, 92%: 0h 15m 36s |
| 05/12/2026 06:28:41 - INFO - root - epochs: 2/2, steps: 3650/3872, func: 0.05042, pos: 0.045741, neg: 0.017756, 94%: 0h 12m 44s |
| 05/12/2026 06:31:33 - INFO - root - epochs: 2/2, steps: 3700/3872, func: 0.051448, pos: 0.036074, neg: 0.013317, 95%: 0h 9m 53s |
| 05/12/2026 06:34:25 - INFO - root - epochs: 2/2, steps: 3750/3872, func: 0.050891, pos: 0.063433, neg: 0.028583, 96%: 0h 7m 1s |
| 05/12/2026 06:37:18 - INFO - root - epochs: 2/2, steps: 3800/3872, func: 0.051766, pos: 0.056375, neg: 0.030348, 98%: 0h 4m 10s |
| 05/12/2026 06:40:11 - INFO - root - epochs: 2/2, steps: 3850/3872, func: 0.051351, pos: 0.049689, neg: 0.018709, 99%: 0h 1m 18s |
| 05/12/2026 06:47:02 - INFO - root - final eval loss: func: 0.051307, pos: 0.062347, neg: 0.029495 |
| 05/12/2026 06:47:02 - INFO - root - Saving model checkpoint to ../trained/deepseek-coder-1.3b-lora-safecoder/checkpoint-last |
|
|