felixwangg's picture
Upload folder using huggingface_hub
6ea5bf4 verified
05/15/2026 17:53:01 - INFO - accelerate.utils.modeling - We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set `max_memory` in to a higher value to use more memory (at your own risk).
05/15/2026 17:54:04 - INFO - root - Training args Namespace(output_name='deepseek-coder-1b_std', datasets=['evol'], pretrain_name='deepseek-coder-1b', loss_weight=1.0, sven=False, num_train_epochs=2, learning_rate=2e-05, max_num_tokens=1024, batch_size=1, grad_acc_steps=16, weight_decay=0.01, adam_epsilon=1e-08, warmup_steps=0, max_grad_norm=1.0, dropout=0.1, kl_loss_weight=0, exclude_neg=False, no_weights=False, lora=False, r=16, lora_alpha=32, lora_dropout=0.1, sampling_size=20, sampling_method='minority', cwes=['all'], langs=['all'], logging_steps=50, save_epochs=10, seed=2, data_dir='/mnt/scratch/QRM/experiments/SCoDE/data_train_val', model_dir='/mnt/scratch/QRM/experiments/SCoDE/trained', output_dir='/mnt/scratch/QRM/experiments/SCoDE/trained/deepseek-coder-1b_std', logger=<RootLogger root (INFO)>)
05/15/2026 17:54:04 - INFO - root - ***** Running training *****
05/15/2026 17:54:04 - INFO - root - Num samples = 28564
05/15/2026 17:54:04 - INFO - root - Num epoch = 2
05/15/2026 17:54:04 - INFO - root - Batch size= 1
05/15/2026 17:54:04 - INFO - root - Total batch size (w. accumulation) = 16
05/15/2026 17:54:04 - INFO - root - Gradient Accumulation steps = 16
05/15/2026 17:54:04 - INFO - root - Total optimization steps = 3570
05/15/2026 17:54:04 - INFO - root - Num val samples = 3173
05/15/2026 17:54:04 - INFO - root - Num parameters = 1345513472
05/15/2026 17:54:04 - INFO - root - Num trainable parameters = 1345513472
05/15/2026 17:55:09 - INFO - root - epochs: 1/2, steps: 50/3570, func: 0.054721, 1%: 1h 16m 25s
05/15/2026 17:56:12 - INFO - root - epochs: 1/2, steps: 100/3570, func: 0.053819, 2%: 1h 14m 16s
05/15/2026 17:57:16 - INFO - root - epochs: 1/2, steps: 150/3570, func: 0.053071, 4%: 1h 12m 56s
05/15/2026 17:58:20 - INFO - root - epochs: 1/2, steps: 200/3570, func: 0.053075, 5%: 1h 11m 49s
05/15/2026 17:59:24 - INFO - root - epochs: 1/2, steps: 250/3570, func: 0.053061, 6%: 1h 10m 51s
05/15/2026 18:00:27 - INFO - root - epochs: 1/2, steps: 300/3570, func: 0.051511, 8%: 1h 9m 42s
05/15/2026 18:01:31 - INFO - root - epochs: 1/2, steps: 350/3570, func: 0.053023, 9%: 1h 8m 35s
05/15/2026 18:02:35 - INFO - root - epochs: 1/2, steps: 400/3570, func: 0.051844, 11%: 1h 7m 30s
05/15/2026 18:03:39 - INFO - root - epochs: 1/2, steps: 450/3570, func: 0.051769, 12%: 1h 6m 26s
05/15/2026 18:04:43 - INFO - root - epochs: 1/2, steps: 500/3570, func: 0.052392, 13%: 1h 5m 22s
05/15/2026 18:05:46 - INFO - root - epochs: 1/2, steps: 550/3570, func: 0.051493, 15%: 1h 4m 19s
05/15/2026 18:06:50 - INFO - root - epochs: 1/2, steps: 600/3570, func: 0.052346, 16%: 1h 3m 14s
05/15/2026 18:07:54 - INFO - root - epochs: 1/2, steps: 650/3570, func: 0.050731, 18%: 1h 2m 9s
05/15/2026 18:08:58 - INFO - root - epochs: 1/2, steps: 700/3570, func: 0.051375, 19%: 1h 1m 5s
05/15/2026 18:10:01 - INFO - root - epochs: 1/2, steps: 750/3570, func: 0.049853, 20%: 1h 0m 1s
05/15/2026 18:11:06 - INFO - root - epochs: 1/2, steps: 800/3570, func: 0.050873, 22%: 0h 58m 59s
05/15/2026 18:12:10 - INFO - root - epochs: 1/2, steps: 850/3570, func: 0.052238, 23%: 0h 57m 55s
05/15/2026 18:13:13 - INFO - root - epochs: 1/2, steps: 900/3570, func: 0.050813, 25%: 0h 56m 51s
05/15/2026 18:14:17 - INFO - root - epochs: 1/2, steps: 950/3570, func: 0.050359, 26%: 0h 55m 47s
05/15/2026 18:15:21 - INFO - root - epochs: 1/2, steps: 1000/3570, func: 0.050728, 27%: 0h 54m 43s
05/15/2026 18:16:25 - INFO - root - epochs: 1/2, steps: 1050/3570, func: 0.050755, 29%: 0h 53m 38s
05/15/2026 18:17:28 - INFO - root - epochs: 1/2, steps: 1100/3570, func: 0.050294, 30%: 0h 52m 33s
05/15/2026 18:18:32 - INFO - root - epochs: 1/2, steps: 1150/3570, func: 0.050878, 32%: 0h 51m 30s
05/15/2026 18:19:35 - INFO - root - epochs: 1/2, steps: 1200/3570, func: 0.050872, 33%: 0h 50m 26s
05/15/2026 18:20:39 - INFO - root - epochs: 1/2, steps: 1250/3570, func: 0.051367, 34%: 0h 49m 21s
05/15/2026 18:21:43 - INFO - root - epochs: 1/2, steps: 1300/3570, func: 0.05063, 36%: 0h 48m 17s
05/15/2026 18:22:46 - INFO - root - epochs: 1/2, steps: 1350/3570, func: 0.05148, 37%: 0h 47m 13s
05/15/2026 18:23:50 - INFO - root - epochs: 1/2, steps: 1400/3570, func: 0.050229, 39%: 0h 46m 9s
05/15/2026 18:24:53 - INFO - root - epochs: 1/2, steps: 1450/3570, func: 0.050373, 40%: 0h 45m 5s
05/15/2026 18:25:57 - INFO - root - epochs: 1/2, steps: 1500/3570, func: 0.050628, 41%: 0h 44m 0s
05/15/2026 18:27:00 - INFO - root - epochs: 1/2, steps: 1550/3570, func: 0.050485, 43%: 0h 42m 57s
05/15/2026 18:28:04 - INFO - root - epochs: 1/2, steps: 1600/3570, func: 0.051414, 44%: 0h 41m 52s
05/15/2026 18:29:07 - INFO - root - epochs: 1/2, steps: 1650/3570, func: 0.049564, 46%: 0h 40m 48s
05/15/2026 18:30:11 - INFO - root - epochs: 1/2, steps: 1700/3570, func: 0.050796, 47%: 0h 39m 44s
05/15/2026 18:31:14 - INFO - root - epochs: 1/2, steps: 1750/3570, func: 0.05154, 48%: 0h 38m 40s
05/15/2026 18:32:18 - INFO - root - epochs: 2/2, steps: 1800/3570, func: 0.051208, 50%: 0h 37m 37s
05/15/2026 18:33:21 - INFO - root - epochs: 2/2, steps: 1850/3570, func: 0.049527, 51%: 0h 36m 33s
05/15/2026 18:34:25 - INFO - root - epochs: 2/2, steps: 1900/3570, func: 0.049709, 53%: 0h 35m 29s
05/15/2026 18:35:28 - INFO - root - epochs: 2/2, steps: 1950/3570, func: 0.049024, 54%: 0h 34m 25s
05/15/2026 18:36:32 - INFO - root - epochs: 2/2, steps: 2000/3570, func: 0.050508, 55%: 0h 33m 21s
05/15/2026 18:37:35 - INFO - root - epochs: 2/2, steps: 2050/3570, func: 0.049172, 57%: 0h 32m 17s
05/15/2026 18:38:39 - INFO - root - epochs: 2/2, steps: 2100/3570, func: 0.049585, 58%: 0h 31m 13s
05/15/2026 18:39:43 - INFO - root - epochs: 2/2, steps: 2150/3570, func: 0.050109, 60%: 0h 30m 10s
05/15/2026 18:40:46 - INFO - root - epochs: 2/2, steps: 2200/3570, func: 0.050542, 61%: 0h 29m 6s
05/15/2026 18:41:50 - INFO - root - epochs: 2/2, steps: 2250/3570, func: 0.048915, 62%: 0h 28m 2s
05/15/2026 18:42:54 - INFO - root - epochs: 2/2, steps: 2300/3570, func: 0.048932, 64%: 0h 26m 58s
05/15/2026 18:43:57 - INFO - root - epochs: 2/2, steps: 2350/3570, func: 0.050188, 65%: 0h 25m 55s
05/15/2026 18:45:01 - INFO - root - epochs: 2/2, steps: 2400/3570, func: 0.049453, 67%: 0h 24m 51s
05/15/2026 18:46:05 - INFO - root - epochs: 2/2, steps: 2450/3570, func: 0.050009, 68%: 0h 23m 48s
05/15/2026 18:47:09 - INFO - root - epochs: 2/2, steps: 2500/3570, func: 0.049283, 70%: 0h 22m 44s
05/15/2026 18:48:14 - INFO - root - epochs: 2/2, steps: 2550/3570, func: 0.049506, 71%: 0h 21m 41s
05/15/2026 18:49:18 - INFO - root - epochs: 2/2, steps: 2600/3570, func: 0.048537, 72%: 0h 20m 37s
05/15/2026 18:50:21 - INFO - root - epochs: 2/2, steps: 2650/3570, func: 0.050053, 74%: 0h 19m 33s
05/15/2026 18:51:25 - INFO - root - epochs: 2/2, steps: 2700/3570, func: 0.050696, 75%: 0h 18m 30s
05/15/2026 18:52:28 - INFO - root - epochs: 2/2, steps: 2750/3570, func: 0.049054, 77%: 0h 17m 26s
05/15/2026 18:53:32 - INFO - root - epochs: 2/2, steps: 2800/3570, func: 0.048454, 78%: 0h 16m 22s
05/15/2026 18:54:36 - INFO - root - epochs: 2/2, steps: 2850/3570, func: 0.048336, 79%: 0h 15m 18s
05/15/2026 18:55:40 - INFO - root - epochs: 2/2, steps: 2900/3570, func: 0.048973, 81%: 0h 14m 15s
05/15/2026 18:56:43 - INFO - root - epochs: 2/2, steps: 2950/3570, func: 0.048978, 82%: 0h 13m 11s
05/15/2026 18:57:47 - INFO - root - epochs: 2/2, steps: 3000/3570, func: 0.049476, 84%: 0h 12m 7s
05/15/2026 18:58:51 - INFO - root - epochs: 2/2, steps: 3050/3570, func: 0.049558, 85%: 0h 11m 3s
05/15/2026 18:59:54 - INFO - root - epochs: 2/2, steps: 3100/3570, func: 0.048599, 86%: 0h 10m 0s
05/15/2026 19:00:58 - INFO - root - epochs: 2/2, steps: 3150/3570, func: 0.048974, 88%: 0h 8m 56s
05/15/2026 19:02:01 - INFO - root - epochs: 2/2, steps: 3200/3570, func: 0.04963, 89%: 0h 7m 52s
05/15/2026 19:03:05 - INFO - root - epochs: 2/2, steps: 3250/3570, func: 0.04918, 91%: 0h 6m 49s
05/15/2026 19:04:08 - INFO - root - epochs: 2/2, steps: 3300/3570, func: 0.049538, 92%: 0h 5m 45s
05/15/2026 19:05:12 - INFO - root - epochs: 2/2, steps: 3350/3570, func: 0.049933, 93%: 0h 4m 41s
05/15/2026 19:06:15 - INFO - root - epochs: 2/2, steps: 3400/3570, func: 0.049365, 95%: 0h 3m 37s
05/15/2026 19:07:19 - INFO - root - epochs: 2/2, steps: 3450/3570, func: 0.048782, 96%: 0h 2m 34s
05/15/2026 19:08:22 - INFO - root - epochs: 2/2, steps: 3500/3570, func: 0.050896, 98%: 0h 1m 30s
05/15/2026 19:09:26 - INFO - root - epochs: 2/2, steps: 3550/3570, func: 0.049513, 99%: 0h 0m 26s
05/15/2026 19:11:04 - INFO - root - final eval loss: func: 0.050652
05/15/2026 19:11:04 - INFO - root - Saving model checkpoint to /mnt/scratch/QRM/experiments/SCoDE/trained/deepseek-coder-1b_std/checkpoint-last