felixwangg's picture
Upload folder using huggingface_hub
6ea5bf4 verified
05/15/2026 17:53:01 - INFO - accelerate.utils.modeling - We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set `max_memory` in to a higher value to use more memory (at your own risk).
05/15/2026 17:54:08 - INFO - root - Training args Namespace(output_name='deepseek-coder-6.7b-lora-deepseek-coder-6.7b_std', datasets=['evol'], pretrain_name='deepseek-coder-6.7b', loss_weight=1.0, sven=False, num_train_epochs=2, learning_rate=2e-05, max_num_tokens=1024, batch_size=1, grad_acc_steps=16, weight_decay=0.01, adam_epsilon=1e-08, warmup_steps=0, max_grad_norm=1.0, dropout=0.1, kl_loss_weight=0, exclude_neg=False, no_weights=False, lora=True, r=16, lora_alpha=32, lora_dropout=0.1, sampling_size=20, sampling_method='minority', cwes=['all'], langs=['all'], logging_steps=50, save_epochs=10, seed=2, data_dir='/mnt/scratch/QRM/experiments/SCoDE/data_train_val', model_dir='/mnt/scratch/QRM/experiments/SCoDE/trained', output_dir='/mnt/scratch/QRM/experiments/SCoDE/trained/deepseek-coder-6.7b-lora-deepseek-coder-6.7b_std', logger=<RootLogger root (INFO)>)
05/15/2026 17:54:08 - INFO - root - ***** Running training *****
05/15/2026 17:54:08 - INFO - root - Num samples = 28564
05/15/2026 17:54:08 - INFO - root - Num epoch = 2
05/15/2026 17:54:08 - INFO - root - Batch size= 1
05/15/2026 17:54:08 - INFO - root - Total batch size (w. accumulation) = 16
05/15/2026 17:54:08 - INFO - root - Gradient Accumulation steps = 16
05/15/2026 17:54:08 - INFO - root - Total optimization steps = 3570
05/15/2026 17:54:08 - INFO - root - Num val samples = 3173
05/15/2026 17:54:08 - INFO - root - Num parameters = 6779150688
05/15/2026 17:54:08 - INFO - root - Num trainable parameters = 40554848
05/15/2026 17:56:48 - INFO - root - epochs: 1/2, steps: 50/3570, func: 0.050436, 1%: 3h 8m 23s
05/15/2026 17:59:16 - INFO - root - epochs: 1/2, steps: 100/3570, func: 0.048539, 2%: 2h 58m 38s
05/15/2026 18:01:43 - INFO - root - epochs: 1/2, steps: 150/3570, func: 0.047644, 4%: 2h 53m 9s
05/15/2026 18:04:11 - INFO - root - epochs: 1/2, steps: 200/3570, func: 0.047594, 5%: 2h 49m 26s
05/15/2026 18:06:39 - INFO - root - epochs: 1/2, steps: 250/3570, func: 0.047501, 6%: 2h 46m 22s
05/15/2026 18:09:06 - INFO - root - epochs: 1/2, steps: 300/3570, func: 0.046075, 8%: 2h 43m 16s
05/15/2026 18:11:34 - INFO - root - epochs: 1/2, steps: 350/3570, func: 0.047514, 9%: 2h 40m 31s
05/15/2026 18:14:01 - INFO - root - epochs: 1/2, steps: 400/3570, func: 0.046485, 11%: 2h 37m 39s
05/15/2026 18:16:28 - INFO - root - epochs: 1/2, steps: 450/3570, func: 0.046384, 12%: 2h 35m 0s
05/15/2026 18:18:56 - INFO - root - epochs: 1/2, steps: 500/3570, func: 0.047028, 13%: 2h 32m 20s
05/15/2026 18:21:23 - INFO - root - epochs: 1/2, steps: 550/3570, func: 0.046042, 15%: 2h 29m 44s
05/15/2026 18:23:50 - INFO - root - epochs: 1/2, steps: 600/3570, func: 0.046992, 16%: 2h 27m 4s
05/15/2026 18:26:17 - INFO - root - epochs: 1/2, steps: 650/3570, func: 0.045483, 18%: 2h 24m 29s
05/15/2026 18:28:44 - INFO - root - epochs: 1/2, steps: 700/3570, func: 0.046212, 19%: 2h 21m 55s
05/15/2026 18:31:11 - INFO - root - epochs: 1/2, steps: 750/3570, func: 0.044684, 20%: 2h 19m 23s
05/15/2026 18:33:38 - INFO - root - epochs: 1/2, steps: 800/3570, func: 0.045541, 22%: 2h 16m 50s
05/15/2026 18:36:06 - INFO - root - epochs: 1/2, steps: 850/3570, func: 0.046732, 23%: 2h 14m 22s
05/15/2026 18:38:33 - INFO - root - epochs: 1/2, steps: 900/3570, func: 0.045579, 25%: 2h 11m 49s
05/15/2026 18:41:00 - INFO - root - epochs: 1/2, steps: 950/3570, func: 0.045319, 26%: 2h 9m 19s
05/15/2026 18:43:27 - INFO - root - epochs: 1/2, steps: 1000/3570, func: 0.045586, 27%: 2h 6m 48s
05/15/2026 18:45:55 - INFO - root - epochs: 1/2, steps: 1050/3570, func: 0.045421, 29%: 2h 4m 19s
05/15/2026 18:48:20 - INFO - root - epochs: 1/2, steps: 1100/3570, func: 0.045093, 30%: 2h 1m 46s
05/15/2026 18:50:49 - INFO - root - epochs: 1/2, steps: 1150/3570, func: 0.045568, 32%: 1h 59m 19s
05/15/2026 18:53:15 - INFO - root - epochs: 1/2, steps: 1200/3570, func: 0.045754, 33%: 1h 56m 49s
05/15/2026 18:55:43 - INFO - root - epochs: 1/2, steps: 1250/3570, func: 0.04605, 34%: 1h 54m 22s
05/15/2026 18:58:11 - INFO - root - epochs: 1/2, steps: 1300/3570, func: 0.045232, 36%: 1h 51m 53s
05/15/2026 19:00:38 - INFO - root - epochs: 1/2, steps: 1350/3570, func: 0.046241, 37%: 1h 49m 24s
05/15/2026 19:03:04 - INFO - root - epochs: 1/2, steps: 1400/3570, func: 0.045171, 39%: 1h 46m 55s
05/15/2026 19:05:32 - INFO - root - epochs: 1/2, steps: 1450/3570, func: 0.045191, 40%: 1h 44m 26s
05/15/2026 19:07:58 - INFO - root - epochs: 1/2, steps: 1500/3570, func: 0.045337, 41%: 1h 41m 57s
05/15/2026 19:10:26 - INFO - root - epochs: 1/2, steps: 1550/3570, func: 0.045237, 43%: 1h 39m 29s
05/15/2026 19:12:55 - INFO - root - epochs: 1/2, steps: 1600/3570, func: 0.046228, 44%: 1h 37m 3s
05/15/2026 19:15:24 - INFO - root - epochs: 1/2, steps: 1650/3570, func: 0.044411, 46%: 1h 34m 37s
05/15/2026 19:17:52 - INFO - root - epochs: 1/2, steps: 1700/3570, func: 0.045536, 47%: 1h 32m 10s
05/15/2026 19:20:22 - INFO - root - epochs: 1/2, steps: 1750/3570, func: 0.046268, 48%: 1h 29m 43s
05/15/2026 19:22:51 - INFO - root - epochs: 2/2, steps: 1800/3570, func: 0.046214, 50%: 1h 27m 17s
05/15/2026 19:25:19 - INFO - root - epochs: 2/2, steps: 1850/3570, func: 0.045282, 51%: 1h 24m 49s
05/15/2026 19:27:47 - INFO - root - epochs: 2/2, steps: 1900/3570, func: 0.045458, 53%: 1h 22m 22s
05/15/2026 19:30:16 - INFO - root - epochs: 2/2, steps: 1950/3570, func: 0.044733, 54%: 1h 19m 54s
05/15/2026 19:32:43 - INFO - root - epochs: 2/2, steps: 2000/3570, func: 0.046205, 55%: 1h 17m 26s
05/15/2026 19:35:12 - INFO - root - epochs: 2/2, steps: 2050/3570, func: 0.045026, 57%: 1h 14m 59s
05/15/2026 19:37:40 - INFO - root - epochs: 2/2, steps: 2100/3570, func: 0.045443, 58%: 1h 12m 31s
05/15/2026 19:40:08 - INFO - root - epochs: 2/2, steps: 2150/3570, func: 0.04582, 60%: 1h 10m 3s
05/15/2026 19:42:36 - INFO - root - epochs: 2/2, steps: 2200/3570, func: 0.0463, 61%: 1h 7m 35s
05/15/2026 19:45:05 - INFO - root - epochs: 2/2, steps: 2250/3570, func: 0.044733, 62%: 1h 5m 8s
05/15/2026 19:47:33 - INFO - root - epochs: 2/2, steps: 2300/3570, func: 0.044792, 64%: 1h 2m 40s
05/15/2026 19:50:03 - INFO - root - epochs: 2/2, steps: 2350/3570, func: 0.045966, 65%: 1h 0m 13s
05/15/2026 19:52:31 - INFO - root - epochs: 2/2, steps: 2400/3570, func: 0.045263, 67%: 0h 57m 45s
05/15/2026 19:54:59 - INFO - root - epochs: 2/2, steps: 2450/3570, func: 0.045609, 68%: 0h 55m 17s
05/15/2026 19:57:25 - INFO - root - epochs: 2/2, steps: 2500/3570, func: 0.045199, 70%: 0h 52m 48s
05/15/2026 19:59:53 - INFO - root - epochs: 2/2, steps: 2550/3570, func: 0.045369, 71%: 0h 50m 20s
05/15/2026 20:02:21 - INFO - root - epochs: 2/2, steps: 2600/3570, func: 0.044456, 72%: 0h 47m 53s
05/15/2026 20:04:48 - INFO - root - epochs: 2/2, steps: 2650/3570, func: 0.045693, 74%: 0h 45m 25s
05/15/2026 20:07:16 - INFO - root - epochs: 2/2, steps: 2700/3570, func: 0.046093, 75%: 0h 42m 56s
05/15/2026 20:09:43 - INFO - root - epochs: 2/2, steps: 2750/3570, func: 0.044909, 77%: 0h 40m 28s
05/15/2026 20:12:10 - INFO - root - epochs: 2/2, steps: 2800/3570, func: 0.044151, 78%: 0h 38m 0s
05/15/2026 20:14:37 - INFO - root - epochs: 2/2, steps: 2850/3570, func: 0.044291, 79%: 0h 35m 32s
05/15/2026 20:17:05 - INFO - root - epochs: 2/2, steps: 2900/3570, func: 0.044967, 81%: 0h 33m 4s
05/15/2026 20:19:31 - INFO - root - epochs: 2/2, steps: 2950/3570, func: 0.044799, 82%: 0h 30m 36s
05/15/2026 20:21:59 - INFO - root - epochs: 2/2, steps: 3000/3570, func: 0.04522, 84%: 0h 28m 8s
05/15/2026 20:24:26 - INFO - root - epochs: 2/2, steps: 3050/3570, func: 0.045257, 85%: 0h 25m 40s
05/15/2026 20:26:51 - INFO - root - epochs: 2/2, steps: 3100/3570, func: 0.04447, 86%: 0h 23m 12s
05/15/2026 20:29:17 - INFO - root - epochs: 2/2, steps: 3150/3570, func: 0.044678, 88%: 0h 20m 44s
05/15/2026 20:31:42 - INFO - root - epochs: 2/2, steps: 3200/3570, func: 0.045512, 89%: 0h 18m 16s
05/15/2026 20:34:08 - INFO - root - epochs: 2/2, steps: 3250/3570, func: 0.04486, 91%: 0h 15m 48s
05/15/2026 20:36:32 - INFO - root - epochs: 2/2, steps: 3300/3570, func: 0.045303, 92%: 0h 13m 20s
05/15/2026 20:38:57 - INFO - root - epochs: 2/2, steps: 3350/3570, func: 0.045726, 93%: 0h 10m 52s
05/15/2026 20:41:22 - INFO - root - epochs: 2/2, steps: 3400/3570, func: 0.045211, 95%: 0h 8m 24s
05/15/2026 20:43:48 - INFO - root - epochs: 2/2, steps: 3450/3570, func: 0.044747, 96%: 0h 5m 57s
05/15/2026 20:46:12 - INFO - root - epochs: 2/2, steps: 3500/3570, func: 0.046464, 98%: 0h 3m 29s
05/15/2026 20:48:37 - INFO - root - epochs: 2/2, steps: 3550/3570, func: 0.045305, 99%: 0h 1m 1s
05/15/2026 20:53:03 - INFO - root - final eval loss: func: 0.045452
05/15/2026 20:53:03 - INFO - root - Saving model checkpoint to /mnt/scratch/QRM/experiments/SCoDE/trained/deepseek-coder-6.7b-lora-deepseek-coder-6.7b_std/checkpoint-last