| 05/12/2026 04:26:58 - INFO - accelerate.utils.modeling - We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set `max_memory` in to a higher value to use more memory (at your own risk). |
| 05/12/2026 04:28:00 - INFO - root - number of sec samples before upsampling: 1810 |
| 05/12/2026 04:28:00 - INFO - root - number of sec samples after upsampling: 2415 |
| 05/12/2026 04:28:08 - INFO - root - Training args Namespace(output_name='deepseek-coder-6.7b-lora-safecoder', datasets=['evol', 'sec-desc', 'sec-new-desc'], pretrain_name='deepseek-coder-6.7b', loss_weight=1.0, sven=False, num_train_epochs=2, learning_rate=2e-05, max_num_tokens=1024, batch_size=1, grad_acc_steps=16, weight_decay=0.01, adam_epsilon=1e-08, warmup_steps=0, max_grad_norm=1.0, dropout=0.1, kl_loss_weight=0, exclude_neg=False, no_weights=False, lora=True, r=16, lora_alpha=32, lora_dropout=0.1, sampling_size=20, sampling_method='minority', cwes=['all'], langs=['all'], logging_steps=50, save_epochs=10, seed=2, data_dir='../data_train_val', model_dir='../trained/', output_dir='../trained/deepseek-coder-6.7b-lora-safecoder', logger=<RootLogger root (INFO)>) |
| 05/12/2026 04:28:08 - INFO - root - ***** Running training ***** |
| 05/12/2026 04:28:08 - INFO - root - Num samples = 30979 |
| 05/12/2026 04:28:08 - INFO - root - Num epoch = 2 |
| 05/12/2026 04:28:08 - INFO - root - Batch size= 1 |
| 05/12/2026 04:28:08 - INFO - root - Total batch size (w. accumulation) = 16 |
| 05/12/2026 04:28:08 - INFO - root - Gradient Accumulation steps = 16 |
| 05/12/2026 04:28:08 - INFO - root - Total optimization steps = 3872 |
| 05/12/2026 04:28:08 - INFO - root - Num val samples = 3371 |
| 05/12/2026 04:28:08 - INFO - root - Num parameters = 6779150688 |
| 05/12/2026 04:28:08 - INFO - root - Num trainable parameters = 40554848 |
| 05/12/2026 04:32:09 - INFO - root - epochs: 1/2, steps: 50/3872, func: 0.051139, pos: 0.0577, neg: 0.181555, 1%: 5h 8m 19s |
| 05/12/2026 04:36:12 - INFO - root - epochs: 1/2, steps: 100/3872, func: 0.048759, pos: 0.062282, neg: 0.140178, 2%: 5h 4m 23s |
| 05/12/2026 04:40:09 - INFO - root - epochs: 1/2, steps: 150/3872, func: 0.047331, pos: 0.077192, neg: 0.118673, 3%: 4h 58m 31s |
| 05/12/2026 04:44:10 - INFO - root - epochs: 1/2, steps: 200/3872, func: 0.047128, pos: 0.066722, neg: 0.09821, 5%: 4h 54m 36s |
| 05/12/2026 04:48:07 - INFO - root - epochs: 1/2, steps: 250/3872, func: 0.048349, pos: 0.090577, neg: 0.056817, 6%: 4h 49m 36s |
| 05/12/2026 04:52:04 - INFO - root - epochs: 1/2, steps: 300/3872, func: 0.046514, pos: 0.078, neg: 0.04461, 7%: 4h 45m 13s |
| 05/12/2026 04:56:03 - INFO - root - epochs: 1/2, steps: 350/3872, func: 0.048205, pos: 0.065144, neg: 0.056317, 9%: 4h 41m 5s |
| 05/12/2026 05:00:05 - INFO - root - epochs: 1/2, steps: 400/3872, func: 0.046193, pos: 0.075261, neg: 0.040788, 10%: 4h 37m 26s |
| 05/12/2026 05:04:07 - INFO - root - epochs: 1/2, steps: 450/3872, func: 0.045684, pos: 0.077037, neg: 0.050153, 11%: 4h 33m 47s |
| 05/12/2026 05:08:11 - INFO - root - epochs: 1/2, steps: 500/3872, func: 0.045614, pos: 0.072255, neg: 0.053337, 12%: 4h 30m 17s |
| 05/12/2026 05:12:10 - INFO - root - epochs: 1/2, steps: 550/3872, func: 0.046064, pos: 0.060699, neg: 0.041991, 14%: 4h 26m 7s |
| 05/12/2026 05:16:15 - INFO - root - epochs: 1/2, steps: 600/3872, func: 0.045751, pos: 0.081914, neg: 0.042419, 15%: 4h 22m 25s |
| 05/12/2026 05:20:14 - INFO - root - epochs: 1/2, steps: 650/3872, func: 0.044901, pos: 0.06346, neg: 0.040288, 16%: 4h 18m 24s |
| 05/12/2026 05:24:22 - INFO - root - epochs: 1/2, steps: 700/3872, func: 0.046995, pos: 0.089755, neg: 0.028394, 18%: 4h 14m 57s |
| 05/12/2026 05:28:23 - INFO - root - epochs: 1/2, steps: 750/3872, func: 0.046683, pos: 0.067772, neg: 0.032075, 19%: 4h 10m 52s |
| 05/12/2026 05:32:19 - INFO - root - epochs: 1/2, steps: 800/3872, func: 0.046182, pos: 0.066598, neg: 0.032235, 20%: 4h 6m 31s |
| 05/12/2026 05:36:21 - INFO - root - epochs: 1/2, steps: 850/3872, func: 0.045583, pos: 0.068573, neg: 0.031431, 21%: 4h 2m 38s |
| 05/12/2026 05:40:20 - INFO - root - epochs: 1/2, steps: 900/3872, func: 0.045578, pos: 0.08372, neg: 0.030609, 23%: 3h 58m 33s |
| 05/12/2026 05:44:20 - INFO - root - epochs: 1/2, steps: 950/3872, func: 0.045548, pos: 0.060191, neg: 0.033258, 24%: 3h 54m 28s |
| 05/12/2026 05:48:23 - INFO - root - epochs: 1/2, steps: 1000/3872, func: 0.045722, pos: 0.052546, neg: 0.031617, 25%: 3h 50m 35s |
| 05/12/2026 05:52:24 - INFO - root - epochs: 1/2, steps: 1050/3872, func: 0.045856, pos: 0.071707, neg: 0.02853, 27%: 3h 46m 33s |
| 05/12/2026 05:56:28 - INFO - root - epochs: 1/2, steps: 1100/3872, func: 0.046406, pos: 0.07153, neg: 0.033045, 28%: 3h 42m 41s |
| 05/12/2026 06:00:33 - INFO - root - epochs: 1/2, steps: 1150/3872, func: 0.045131, pos: 0.050152, neg: 0.035439, 29%: 3h 38m 49s |
| 05/12/2026 06:04:33 - INFO - root - epochs: 1/2, steps: 1200/3872, func: 0.044958, pos: 0.056376, neg: 0.03231, 30%: 3h 34m 47s |
| 05/12/2026 06:08:35 - INFO - root - epochs: 1/2, steps: 1250/3872, func: 0.045854, pos: 0.069139, neg: 0.031689, 32%: 3h 30m 46s |
| 05/12/2026 06:12:35 - INFO - root - epochs: 1/2, steps: 1300/3872, func: 0.045146, pos: 0.059226, neg: 0.032273, 33%: 3h 26m 45s |
| 05/12/2026 06:16:28 - INFO - root - epochs: 1/2, steps: 1350/3872, func: 0.044379, pos: 0.064239, neg: 0.033579, 34%: 3h 22m 28s |
| 05/12/2026 06:20:35 - INFO - root - epochs: 1/2, steps: 1400/3872, func: 0.045807, pos: 0.054144, neg: 0.032231, 36%: 3h 18m 39s |
| 05/12/2026 06:24:37 - INFO - root - epochs: 1/2, steps: 1450/3872, func: 0.045381, pos: 0.050893, neg: 0.030635, 37%: 3h 14m 39s |
| 05/12/2026 06:28:45 - INFO - root - epochs: 1/2, steps: 1500/3872, func: 0.045888, pos: 0.062029, neg: 0.028009, 38%: 3h 10m 48s |
| 05/12/2026 06:32:47 - INFO - root - epochs: 1/2, steps: 1550/3872, func: 0.045597, pos: 0.04823, neg: 0.033559, 40%: 3h 6m 49s |
| 05/12/2026 06:36:47 - INFO - root - epochs: 1/2, steps: 1600/3872, func: 0.046528, pos: 0.060938, neg: 0.023728, 41%: 3h 2m 46s |
| |