trained_scode / phi-2_std /train.log
felixwangg's picture
Upload folder using huggingface_hub
6ea5bf4 verified
05/15/2026 17:53:01 - INFO - accelerate.utils.modeling - We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set `max_memory` in to a higher value to use more memory (at your own risk).
05/15/2026 17:53:40 - INFO - root - Training args Namespace(output_name='phi-2_std', datasets=['evol'], pretrain_name='phi-2', loss_weight=1.0, sven=False, num_train_epochs=2, learning_rate=2e-05, max_num_tokens=1024, batch_size=1, grad_acc_steps=16, weight_decay=0.01, adam_epsilon=1e-08, warmup_steps=0, max_grad_norm=1.0, dropout=0.1, kl_loss_weight=0, exclude_neg=False, no_weights=False, lora=False, r=16, lora_alpha=32, lora_dropout=0.1, sampling_size=20, sampling_method='minority', cwes=['all'], langs=['all'], logging_steps=50, save_epochs=10, seed=2, data_dir='/mnt/scratch/QRM/experiments/SCoDE/data_train_val', model_dir='/mnt/scratch/QRM/experiments/SCoDE/trained', output_dir='/mnt/scratch/QRM/experiments/SCoDE/trained/phi-2_std', logger=<RootLogger root (INFO)>)
05/15/2026 17:53:40 - INFO - root - ***** Running training *****
05/15/2026 17:53:40 - INFO - root - Num samples = 27880
05/15/2026 17:53:40 - INFO - root - Num epoch = 2
05/15/2026 17:53:40 - INFO - root - Batch size= 1
05/15/2026 17:53:40 - INFO - root - Total batch size (w. accumulation) = 16
05/15/2026 17:53:40 - INFO - root - Gradient Accumulation steps = 16
05/15/2026 17:53:40 - INFO - root - Total optimization steps = 3484
05/15/2026 17:53:40 - INFO - root - Num val samples = 3084
05/15/2026 17:53:40 - INFO - root - Num parameters = 2775049335
05/15/2026 17:53:40 - INFO - root - Num trainable parameters = 2775049335
05/15/2026 17:55:22 - INFO - root - epochs: 1/2, steps: 50/3484, func: 0.056975, 1%: 1h 56m 5s
05/15/2026 17:57:01 - INFO - root - epochs: 1/2, steps: 100/3484, func: 0.05472, 2%: 1h 53m 2s
05/15/2026 17:58:38 - INFO - root - epochs: 1/2, steps: 150/3484, func: 0.05553, 4%: 1h 50m 8s
05/15/2026 18:00:16 - INFO - root - epochs: 1/2, steps: 200/3484, func: 0.054368, 5%: 1h 48m 16s
05/15/2026 18:01:55 - INFO - root - epochs: 1/2, steps: 250/3484, func: 0.053811, 7%: 1h 46m 39s
05/15/2026 18:03:33 - INFO - root - epochs: 1/2, steps: 300/3484, func: 0.053165, 8%: 1h 44m 48s
05/15/2026 18:05:09 - INFO - root - epochs: 1/2, steps: 350/3484, func: 0.054273, 10%: 1h 42m 51s
05/15/2026 18:06:47 - INFO - root - epochs: 1/2, steps: 400/3484, func: 0.052934, 11%: 1h 41m 8s
05/15/2026 18:08:26 - INFO - root - epochs: 1/2, steps: 450/3484, func: 0.052309, 12%: 1h 39m 30s
05/15/2026 18:10:04 - INFO - root - epochs: 1/2, steps: 500/3484, func: 0.053638, 14%: 1h 37m 51s
05/15/2026 18:11:42 - INFO - root - epochs: 1/2, steps: 550/3484, func: 0.053208, 15%: 1h 36m 14s
05/15/2026 18:13:21 - INFO - root - epochs: 1/2, steps: 600/3484, func: 0.054514, 17%: 1h 34m 38s
05/15/2026 18:15:00 - INFO - root - epochs: 1/2, steps: 650/3484, func: 0.053663, 18%: 1h 32m 58s
05/15/2026 18:16:38 - INFO - root - epochs: 1/2, steps: 700/3484, func: 0.053966, 20%: 1h 31m 19s
05/15/2026 18:18:17 - INFO - root - epochs: 1/2, steps: 750/3484, func: 0.053939, 21%: 1h 29m 44s
05/15/2026 18:19:55 - INFO - root - epochs: 1/2, steps: 800/3484, func: 0.051763, 22%: 1h 28m 4s
05/15/2026 18:21:34 - INFO - root - epochs: 1/2, steps: 850/3484, func: 0.053375, 24%: 1h 26m 27s
05/15/2026 18:23:13 - INFO - root - epochs: 1/2, steps: 900/3484, func: 0.052522, 25%: 1h 24m 51s
05/15/2026 18:24:52 - INFO - root - epochs: 1/2, steps: 950/3484, func: 0.051894, 27%: 1h 23m 15s
05/15/2026 18:26:31 - INFO - root - epochs: 1/2, steps: 1000/3484, func: 0.053875, 28%: 1h 21m 36s
05/15/2026 18:28:08 - INFO - root - epochs: 1/2, steps: 1050/3484, func: 0.052494, 30%: 1h 19m 55s
05/15/2026 18:29:47 - INFO - root - epochs: 1/2, steps: 1100/3484, func: 0.053013, 31%: 1h 18m 18s
05/15/2026 18:31:26 - INFO - root - epochs: 1/2, steps: 1150/3484, func: 0.053172, 32%: 1h 16m 39s
05/15/2026 18:33:05 - INFO - root - epochs: 1/2, steps: 1200/3484, func: 0.053076, 34%: 1h 15m 1s
05/15/2026 18:34:43 - INFO - root - epochs: 1/2, steps: 1250/3484, func: 0.052335, 35%: 1h 13m 22s
05/15/2026 18:36:21 - INFO - root - epochs: 1/2, steps: 1300/3484, func: 0.053168, 37%: 1h 11m 43s
05/15/2026 18:38:00 - INFO - root - epochs: 1/2, steps: 1350/3484, func: 0.053384, 38%: 1h 10m 5s
05/15/2026 18:39:38 - INFO - root - epochs: 1/2, steps: 1400/3484, func: 0.052082, 40%: 1h 8m 26s
05/15/2026 18:41:16 - INFO - root - epochs: 1/2, steps: 1450/3484, func: 0.053835, 41%: 1h 6m 47s
05/15/2026 18:42:54 - INFO - root - epochs: 1/2, steps: 1500/3484, func: 0.052597, 43%: 1h 5m 8s
05/15/2026 18:44:31 - INFO - root - epochs: 1/2, steps: 1550/3484, func: 0.052964, 44%: 1h 3m 28s
05/15/2026 18:46:09 - INFO - root - epochs: 1/2, steps: 1600/3484, func: 0.053002, 45%: 1h 1m 49s
05/15/2026 18:47:46 - INFO - root - epochs: 1/2, steps: 1650/3484, func: 0.053394, 47%: 1h 0m 9s
05/15/2026 18:49:24 - INFO - root - epochs: 1/2, steps: 1700/3484, func: 0.052143, 48%: 0h 58m 30s
05/15/2026 18:51:02 - INFO - root - epochs: 2/2, steps: 1750/3484, func: 0.05297, 50%: 0h 56m 52s
05/15/2026 18:52:41 - INFO - root - epochs: 2/2, steps: 1800/3484, func: 0.051984, 51%: 0h 55m 13s
05/15/2026 18:54:18 - INFO - root - epochs: 2/2, steps: 1850/3484, func: 0.053168, 53%: 0h 53m 35s
05/15/2026 18:55:56 - INFO - root - epochs: 2/2, steps: 1900/3484, func: 0.052632, 54%: 0h 51m 56s
05/15/2026 18:57:33 - INFO - root - epochs: 2/2, steps: 1950/3484, func: 0.052213, 55%: 0h 50m 17s
05/15/2026 18:59:12 - INFO - root - epochs: 2/2, steps: 2000/3484, func: 0.052386, 57%: 0h 48m 39s
05/15/2026 19:00:50 - INFO - root - epochs: 2/2, steps: 2050/3484, func: 0.05191, 58%: 0h 47m 0s
05/15/2026 19:02:29 - INFO - root - epochs: 2/2, steps: 2100/3484, func: 0.052442, 60%: 0h 45m 22s
05/15/2026 19:04:06 - INFO - root - epochs: 2/2, steps: 2150/3484, func: 0.052136, 61%: 0h 43m 43s
05/15/2026 19:05:45 - INFO - root - epochs: 2/2, steps: 2200/3484, func: 0.052966, 63%: 0h 42m 5s
05/15/2026 19:07:22 - INFO - root - epochs: 2/2, steps: 2250/3484, func: 0.051064, 64%: 0h 40m 26s
05/15/2026 19:09:00 - INFO - root - epochs: 2/2, steps: 2300/3484, func: 0.052131, 65%: 0h 38m 48s
05/15/2026 19:10:38 - INFO - root - epochs: 2/2, steps: 2350/3484, func: 0.052373, 67%: 0h 37m 9s
05/15/2026 19:12:19 - INFO - root - epochs: 2/2, steps: 2400/3484, func: 0.052812, 68%: 0h 35m 33s
05/15/2026 19:14:04 - INFO - root - epochs: 2/2, steps: 2450/3484, func: 0.053216, 70%: 0h 33m 57s
05/15/2026 19:15:48 - INFO - root - epochs: 2/2, steps: 2500/3484, func: 0.052285, 71%: 0h 32m 21s
05/15/2026 19:17:33 - INFO - root - epochs: 2/2, steps: 2550/3484, func: 0.053235, 73%: 0h 30m 45s
05/15/2026 19:19:17 - INFO - root - epochs: 2/2, steps: 2600/3484, func: 0.051261, 74%: 0h 29m 8s
05/15/2026 19:21:01 - INFO - root - epochs: 2/2, steps: 2650/3484, func: 0.05255, 76%: 0h 27m 31s
05/15/2026 19:22:46 - INFO - root - epochs: 2/2, steps: 2700/3484, func: 0.05234, 77%: 0h 25m 54s
05/15/2026 19:24:27 - INFO - root - epochs: 2/2, steps: 2750/3484, func: 0.05121, 78%: 0h 24m 15s
05/15/2026 19:26:11 - INFO - root - epochs: 2/2, steps: 2800/3484, func: 0.052381, 80%: 0h 22m 37s
05/15/2026 19:27:55 - INFO - root - epochs: 2/2, steps: 2850/3484, func: 0.052768, 81%: 0h 20m 59s
05/15/2026 19:29:40 - INFO - root - epochs: 2/2, steps: 2900/3484, func: 0.052935, 83%: 0h 19m 21s
05/15/2026 19:31:25 - INFO - root - epochs: 2/2, steps: 2950/3484, func: 0.052797, 84%: 0h 17m 43s
05/15/2026 19:33:09 - INFO - root - epochs: 2/2, steps: 3000/3484, func: 0.052488, 86%: 0h 16m 4s
05/15/2026 19:34:54 - INFO - root - epochs: 2/2, steps: 3050/3484, func: 0.0531, 87%: 0h 14m 26s
05/15/2026 19:36:40 - INFO - root - epochs: 2/2, steps: 3100/3484, func: 0.051766, 88%: 0h 12m 47s
05/15/2026 19:38:24 - INFO - root - epochs: 2/2, steps: 3150/3484, func: 0.050824, 90%: 0h 11m 8s
05/15/2026 19:40:08 - INFO - root - epochs: 2/2, steps: 3200/3484, func: 0.052316, 91%: 0h 9m 28s
05/15/2026 19:41:54 - INFO - root - epochs: 2/2, steps: 3250/3484, func: 0.052342, 93%: 0h 7m 49s
05/15/2026 19:43:38 - INFO - root - epochs: 2/2, steps: 3300/3484, func: 0.052376, 94%: 0h 6m 9s
05/15/2026 19:45:23 - INFO - root - epochs: 2/2, steps: 3350/3484, func: 0.052187, 96%: 0h 4m 30s
05/15/2026 19:47:08 - INFO - root - epochs: 2/2, steps: 3400/3484, func: 0.053145, 97%: 0h 2m 50s
05/15/2026 19:48:53 - INFO - root - epochs: 2/2, steps: 3450/3484, func: 0.052315, 98%: 0h 1m 10s
05/15/2026 19:51:54 - INFO - root - final eval loss: func: 0.050976
05/15/2026 19:51:54 - INFO - root - Saving model checkpoint to /mnt/scratch/QRM/experiments/SCoDE/trained/phi-2_std/checkpoint-last