felixwangg's picture
Upload folder using huggingface_hub
f3ccc64 verified
05/12/2026 02:48:32 - INFO - accelerate.utils.modeling - We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set `max_memory` in to a higher value to use more memory (at your own risk).
05/12/2026 02:48:53 - INFO - root - number of sec samples before upsampling: 1809
05/12/2026 02:48:53 - INFO - root - number of sec samples after upsampling: 3344
05/12/2026 02:48:56 - INFO - root - Training args Namespace(output_name='phi-2-lora-safecoder', datasets=['lmsys', 'sec-desc', 'sec-new-desc'], pretrain_name='phi-2', loss_weight=1.0, sven=False, num_train_epochs=2, learning_rate=2e-05, max_num_tokens=1024, batch_size=1, grad_acc_steps=16, weight_decay=0.01, adam_epsilon=1e-08, warmup_steps=0, max_grad_norm=1.0, dropout=0.1, kl_loss_weight=0, exclude_neg=False, no_weights=False, lora=True, r=16, lora_alpha=32, lora_dropout=0.1, sampling_size=40, sampling_method='minority', cwes=['all'], langs=['all'], logging_steps=50, save_epochs=10, seed=2, data_dir='../data_train_val', model_dir='../trained/', output_dir='../trained/phi-2-lora-safecoder', logger=<RootLogger root (INFO)>)
05/12/2026 02:48:56 - INFO - root - ***** Running training *****
05/12/2026 02:48:56 - INFO - root - Num samples = 19626
05/12/2026 02:48:56 - INFO - root - Num epoch = 2
05/12/2026 02:48:56 - INFO - root - Batch size= 1
05/12/2026 02:48:56 - INFO - root - Total batch size (w. accumulation) = 16
05/12/2026 02:48:56 - INFO - root - Gradient Accumulation steps = 16
05/12/2026 02:48:56 - INFO - root - Total optimization steps = 2452
05/12/2026 02:48:56 - INFO - root - Num val samples = 2008
05/12/2026 02:48:56 - INFO - root - Num parameters = 2799487975
05/12/2026 02:48:56 - INFO - root - Num trainable parameters = 24438640
05/12/2026 02:52:58 - INFO - root - epochs: 1/2, steps: 50/2452, func: 0.087809, pos: 0.115657, neg: 0.123793, 1%: 3h 14m 29s
05/12/2026 02:56:56 - INFO - root - epochs: 1/2, steps: 100/2452, func: 0.078225, pos: 0.093975, neg: 0.098627, 4%: 3h 8m 16s
05/12/2026 03:00:50 - INFO - root - epochs: 1/2, steps: 150/2452, func: 0.073382, pos: 0.100339, neg: 0.069674, 6%: 3h 2m 54s
05/12/2026 03:04:47 - INFO - root - epochs: 1/2, steps: 200/2452, func: 0.072408, pos: 0.092015, neg: 0.061007, 8%: 2h 58m 33s
05/12/2026 03:08:39 - INFO - root - epochs: 1/2, steps: 250/2452, func: 0.069396, pos: 0.097156, neg: 0.060685, 10%: 2h 53m 41s
05/12/2026 03:12:32 - INFO - root - epochs: 1/2, steps: 300/2452, func: 0.070725, pos: 0.099053, neg: 0.050802, 12%: 2h 49m 18s
05/12/2026 03:16:21 - INFO - root - epochs: 1/2, steps: 350/2452, func: 0.069297, pos: 0.092138, neg: 0.050556, 14%: 2h 44m 48s
05/12/2026 03:20:14 - INFO - root - epochs: 1/2, steps: 400/2452, func: 0.069739, pos: 0.093173, neg: 0.051642, 16%: 2h 40m 37s
05/12/2026 03:24:01 - INFO - root - epochs: 1/2, steps: 450/2452, func: 0.06755, pos: 0.090788, neg: 0.049833, 18%: 2h 36m 11s
05/12/2026 03:27:55 - INFO - root - epochs: 1/2, steps: 500/2452, func: 0.067713, pos: 0.092051, neg: 0.044417, 20%: 2h 32m 16s
05/12/2026 03:31:38 - INFO - root - epochs: 1/2, steps: 550/2452, func: 0.067988, pos: 0.075132, neg: 0.03882, 22%: 2h 27m 45s
05/12/2026 03:35:33 - INFO - root - epochs: 1/2, steps: 600/2452, func: 0.064584, pos: 0.083783, neg: 0.03723, 24%: 2h 23m 58s
05/12/2026 03:39:25 - INFO - root - epochs: 1/2, steps: 650/2452, func: 0.066767, pos: 0.08717, neg: 0.029386, 26%: 2h 20m 1s
05/12/2026 03:43:18 - INFO - root - epochs: 1/2, steps: 700/2452, func: 0.068934, pos: 0.107346, neg: 0.033175, 28%: 2h 16m 10s
05/12/2026 03:47:08 - INFO - root - epochs: 1/2, steps: 750/2452, func: 0.066618, pos: 0.081592, neg: 0.033057, 30%: 2h 12m 10s
05/12/2026 03:50:56 - INFO - root - epochs: 1/2, steps: 800/2452, func: 0.064837, pos: 0.086676, neg: 0.028554, 32%: 2h 8m 5s
05/12/2026 03:54:42 - INFO - root - epochs: 1/2, steps: 850/2452, func: 0.064665, pos: 0.082752, neg: 0.035432, 34%: 2h 4m 2s
05/12/2026 03:58:34 - INFO - root - epochs: 1/2, steps: 900/2452, func: 0.064953, pos: 0.079248, neg: 0.030881, 36%: 2h 0m 8s
05/12/2026 04:02:29 - INFO - root - epochs: 1/2, steps: 950/2452, func: 0.065705, pos: 0.075704, neg: 0.02721, 38%: 1h 56m 21s
05/12/2026 04:06:28 - INFO - root - epochs: 1/2, steps: 1000/2452, func: 0.06857, pos: 0.082659, neg: 0.033367, 40%: 1h 52m 39s
05/12/2026 04:10:28 - INFO - root - epochs: 1/2, steps: 1050/2452, func: 0.066857, pos: 0.071119, neg: 0.032058, 42%: 1h 48m 57s
05/12/2026 04:14:21 - INFO - root - epochs: 1/2, steps: 1100/2452, func: 0.065121, pos: 0.072074, neg: 0.034161, 44%: 1h 45m 4s
05/12/2026 04:18:13 - INFO - root - epochs: 1/2, steps: 1150/2452, func: 0.067815, pos: 0.064138, neg: 0.02986, 46%: 1h 41m 10s
05/12/2026 04:22:04 - INFO - root - epochs: 1/2, steps: 1200/2452, func: 0.065764, pos: 0.064906, neg: 0.028599, 48%: 1h 37m 14s
05/12/2026 04:26:05 - INFO - root - epochs: 2/2, steps: 1250/2452, func: 0.064512, pos: 0.077691, neg: 0.032867, 50%: 1h 33m 30s
05/12/2026 04:29:58 - INFO - root - epochs: 2/2, steps: 1300/2452, func: 0.064248, pos: 0.082205, neg: 0.026269, 52%: 1h 29m 36s
05/12/2026 04:33:54 - INFO - root - epochs: 2/2, steps: 1350/2452, func: 0.066843, pos: 0.063788, neg: 0.029562, 55%: 1h 25m 45s
05/12/2026 04:37:47 - INFO - root - epochs: 2/2, steps: 1400/2452, func: 0.061568, pos: 0.058695, neg: 0.032394, 57%: 1h 21m 52s
05/12/2026 04:41:41 - INFO - root - epochs: 2/2, steps: 1450/2452, func: 0.061593, pos: 0.070217, neg: 0.023268, 59%: 1h 17m 58s
05/12/2026 04:45:39 - INFO - root - epochs: 2/2, steps: 1500/2452, func: 0.063944, pos: 0.072002, neg: 0.030755, 61%: 1h 14m 9s
05/12/2026 04:49:32 - INFO - root - epochs: 2/2, steps: 1550/2452, func: 0.063425, pos: 0.050255, neg: 0.027983, 63%: 1h 10m 15s
05/12/2026 04:53:25 - INFO - root - epochs: 2/2, steps: 1600/2452, func: 0.062742, pos: 0.073932, neg: 0.022697, 65%: 1h 6m 21s
05/12/2026 04:57:15 - INFO - root - epochs: 2/2, steps: 1650/2452, func: 0.063407, pos: 0.065383, neg: 0.029697, 67%: 1h 2m 27s
05/12/2026 05:01:06 - INFO - root - epochs: 2/2, steps: 1700/2452, func: 0.063459, pos: 0.065671, neg: 0.020674, 69%: 0h 58m 33s
05/12/2026 05:04:58 - INFO - root - epochs: 2/2, steps: 1750/2452, func: 0.064214, pos: 0.058392, neg: 0.021934, 71%: 0h 54m 38s
05/12/2026 05:08:42 - INFO - root - epochs: 2/2, steps: 1800/2452, func: 0.064039, pos: 0.076992, neg: 0.02203, 73%: 0h 50m 42s
05/12/2026 05:12:39 - INFO - root - epochs: 2/2, steps: 1850/2452, func: 0.063196, pos: 0.049055, neg: 0.026583, 75%: 0h 46m 50s
05/12/2026 05:16:32 - INFO - root - epochs: 2/2, steps: 1900/2452, func: 0.063654, pos: 0.059067, neg: 0.024511, 77%: 0h 42m 57s
05/12/2026 05:20:23 - INFO - root - epochs: 2/2, steps: 1950/2452, func: 0.063591, pos: 0.066934, neg: 0.022828, 79%: 0h 39m 4s
05/12/2026 05:24:17 - INFO - root - epochs: 2/2, steps: 2000/2452, func: 0.063182, pos: 0.078367, neg: 0.028742, 81%: 0h 35m 11s
05/12/2026 05:28:02 - INFO - root - epochs: 2/2, steps: 2050/2452, func: 0.061853, pos: 0.071412, neg: 0.026568, 83%: 0h 31m 16s
05/12/2026 05:31:53 - INFO - root - epochs: 2/2, steps: 2100/2452, func: 0.062466, pos: 0.064775, neg: 0.03288, 85%: 0h 27m 23s
05/12/2026 05:35:51 - INFO - root - epochs: 2/2, steps: 2150/2452, func: 0.066738, pos: 0.050037, neg: 0.019877, 87%: 0h 23m 31s
05/12/2026 05:39:47 - INFO - root - epochs: 2/2, steps: 2200/2452, func: 0.064148, pos: 0.061206, neg: 0.027983, 89%: 0h 19m 38s
05/12/2026 05:43:39 - INFO - root - epochs: 2/2, steps: 2250/2452, func: 0.064126, pos: 0.06231, neg: 0.027208, 91%: 0h 15m 45s
05/12/2026 05:47:34 - INFO - root - epochs: 2/2, steps: 2300/2452, func: 0.062785, pos: 0.081734, neg: 0.021968, 93%: 0h 11m 53s
05/12/2026 05:51:27 - INFO - root - epochs: 2/2, steps: 2350/2452, func: 0.063474, pos: 0.065291, neg: 0.030004, 95%: 0h 8m 0s
05/12/2026 05:55:28 - INFO - root - epochs: 2/2, steps: 2400/2452, func: 0.064366, pos: 0.063885, neg: 0.02885, 97%: 0h 4m 7s
05/12/2026 05:59:30 - INFO - root - epochs: 2/2, steps: 2450/2452, func: 0.065632, pos: 0.059616, neg: 0.02787, 99%: 0h 0m 14s
05/12/2026 06:04:10 - INFO - root - final eval loss: func: 0.062987, pos: 0.079249, neg: 0.036373
05/12/2026 06:04:10 - INFO - root - Saving model checkpoint to ../trained/phi-2-lora-safecoder/checkpoint-last