reiprasetya-study commited on
Commit
8394733
·
verified ·
1 Parent(s): 40b2f25

Upload logs/train_codegen_20251129_215852.log with huggingface_hub

Browse files
logs/train_codegen_20251129_215852.log ADDED
@@ -0,0 +1,84 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2025-11-29 21:58:52 - train_codegen - INFO - Logging to: logs/codegen/train_codegen_20251129_215852.log
2
+ 2025-11-29 21:58:52 - train_codegen - INFO - Monitor progress: tail -f logs/codegen/train_codegen_20251129_215852.log
3
+ 2025-11-29 21:58:52 - train_codegen - INFO - ============================================================
4
+ 2025-11-29 21:58:52 - train_codegen - INFO - CodeGen Training
5
+ 2025-11-29 21:58:52 - train_codegen - INFO - ============================================================
6
+ 2025-11-29 21:58:52 - train_codegen - INFO - Using CUDA device: 0
7
+ 2025-11-29 21:58:52 - train_codegen - INFO - GPU: NVIDIA GeForce RTX 5090
8
+ 2025-11-29 21:58:52 - train_codegen - INFO - Configuration:
9
+ 2025-11-29 21:58:52 - train_codegen - INFO - model: Salesforce/codegen-350M-mono
10
+ 2025-11-29 21:58:52 - train_codegen - INFO - data: datasets/python
11
+ 2025-11-29 21:58:52 - train_codegen - INFO - output: model/checkpoints/run1-python-codegen
12
+ 2025-11-29 21:58:52 - train_codegen - INFO - batch_size: 10
13
+ 2025-11-29 21:58:52 - train_codegen - INFO - gradient_accumulation_steps: 4
14
+ 2025-11-29 21:58:52 - train_codegen - INFO - effective_batch_size: 40
15
+ 2025-11-29 21:58:52 - train_codegen - INFO - learning_rate: 5e-05
16
+ 2025-11-29 21:58:52 - train_codegen - INFO - epochs: 5
17
+ 2025-11-29 21:58:52 - train_codegen - INFO - max_length: 1024
18
+ 2025-11-29 21:58:52 - train_codegen - INFO - max_steps: -1
19
+ 2025-11-29 21:58:52 - train_codegen - INFO - fp16: True
20
+ 2025-11-29 21:58:52 - train_codegen - INFO - gradient_checkpointing: True
21
+ 2025-11-29 21:58:52 - train_codegen - INFO - seed: 42
22
+ 2025-11-29 21:58:52 - train_codegen - INFO - Loading tokenizer and model: Salesforce/codegen-350M-mono
23
+ 2025-11-29 21:59:04 - train_codegen - INFO - Loading model with gradient checkpointing enabled
24
+ 2025-11-29 21:59:04 - train_codegen - INFO - Loading dataset...
25
+ 2025-11-29 21:59:04 - train_codegen - INFO - Loading dataset from datasets/python
26
+ 2025-11-29 21:59:05 - train_codegen - INFO - Train samples: 155411
27
+ 2025-11-29 21:59:05 - train_codegen - INFO - Validation samples: 19426
28
+ 2025-11-29 21:59:05 - train_codegen - INFO - ============================================================
29
+ 2025-11-29 21:59:05 - train_codegen - INFO - Dataset Preprocessing
30
+ 2025-11-29 21:59:05 - train_codegen - INFO - ============================================================
31
+ 2025-11-29 21:59:05 - train_codegen - INFO - Preprocessing 155411 samples (optimized eager loading)...
32
+ 2025-11-29 21:59:09 - train_codegen - INFO - Preprocessed 10000/155411 samples
33
+ 2025-11-29 21:59:14 - train_codegen - INFO - Preprocessed 20000/155411 samples
34
+ 2025-11-29 21:59:19 - train_codegen - INFO - Preprocessed 30000/155411 samples
35
+ 2025-11-29 21:59:24 - train_codegen - INFO - Preprocessed 40000/155411 samples
36
+ 2025-11-29 21:59:29 - train_codegen - INFO - Preprocessed 50000/155411 samples
37
+ 2025-11-29 21:59:33 - train_codegen - INFO - Preprocessed 60000/155411 samples
38
+ 2025-11-29 21:59:39 - train_codegen - INFO - Preprocessed 70000/155411 samples
39
+ 2025-11-29 21:59:43 - train_codegen - INFO - Preprocessed 80000/155411 samples
40
+ 2025-11-29 21:59:48 - train_codegen - INFO - Preprocessed 90000/155411 samples
41
+ 2025-11-29 21:59:53 - train_codegen - INFO - Preprocessed 100000/155411 samples
42
+ 2025-11-29 21:59:57 - train_codegen - INFO - Preprocessed 110000/155411 samples
43
+ 2025-11-29 22:00:02 - train_codegen - INFO - Preprocessed 120000/155411 samples
44
+ 2025-11-29 22:00:06 - train_codegen - INFO - Preprocessed 130000/155411 samples
45
+ 2025-11-29 22:00:12 - train_codegen - INFO - Preprocessed 140000/155411 samples
46
+ 2025-11-29 22:00:16 - train_codegen - INFO - Preprocessed 150000/155411 samples
47
+ 2025-11-29 22:00:19 - train_codegen - INFO - Preprocessed 155411/155411 samples
48
+ 2025-11-29 22:00:19 - train_codegen - INFO - Preprocessing complete: 155411 samples ready
49
+ 2025-11-29 22:00:19 - train_codegen - INFO - Preprocessing 19426 samples (optimized eager loading)...
50
+ 2025-11-29 22:00:23 - train_codegen - INFO - Preprocessed 10000/19426 samples
51
+ 2025-11-29 22:00:28 - train_codegen - INFO - Preprocessed 19426/19426 samples
52
+ 2025-11-29 22:00:28 - train_codegen - INFO - Preprocessing complete: 19426 samples ready
53
+ 2025-11-29 22:00:28 - train_codegen - INFO - ============================================================
54
+ 2025-11-29 22:00:28 - train_codegen - INFO - Training Arguments
55
+ 2025-11-29 22:00:28 - train_codegen - INFO - ============================================================
56
+ 2025-11-29 22:00:28 - train_codegen - INFO - Training log will be saved to: model/checkpoints/run1-python-codegen/training_log.csv
57
+ 2025-11-29 22:00:28 - train_codegen - INFO - ============================================================
58
+ 2025-11-29 22:00:28 - train_codegen - INFO - Training Strategy
59
+ 2025-11-29 22:00:28 - train_codegen - INFO - ============================================================
60
+ 2025-11-29 22:00:28 - train_codegen - INFO - Evaluation every 1000 steps (optimized for speed)
61
+ 2025-11-29 22:00:28 - train_codegen - INFO - Eval batch size: 20 (2x train batch)
62
+ 2025-11-29 22:00:28 - train_codegen - INFO - Eval accumulation steps: 4
63
+ 2025-11-29 22:00:28 - train_codegen - INFO - Save checkpoint every 2000 steps
64
+ 2025-11-29 22:00:28 - train_codegen - INFO - Gradient checkpointing: ENABLED (saves VRAM, slower training)
65
+ 2025-11-29 22:00:28 - train_codegen - INFO - FP16 mixed precision enabled
66
+ 2025-11-29 22:00:28 - train_codegen - INFO - Dynamic padding per batch (10-20x faster than max_length padding)
67
+ 2025-11-29 22:00:28 - train_codegen - INFO - ============================================================
68
+ 2025-11-29 22:00:28 - train_codegen - INFO - Starting Training
69
+ 2025-11-29 22:00:28 - train_codegen - INFO - ============================================================
70
+ 2025-11-29 22:00:28 - train_codegen - INFO - Total training samples: 155411
71
+ 2025-11-29 22:00:28 - train_codegen - INFO - Total validation samples: 19426
72
+ 2025-11-29 22:00:28 - train_codegen - INFO - Starting training from scratch
73
+ 2025-11-30 15:11:50 - train_codegen - INFO - Training completed successfully
74
+ 2025-11-30 15:11:50 - train_codegen - INFO - ============================================================
75
+ 2025-11-30 15:11:50 - train_codegen - INFO - Saving Final Model
76
+ 2025-11-30 15:11:50 - train_codegen - INFO - ============================================================
77
+ 2025-11-30 15:11:50 - train_codegen - INFO - Model and tokenizer saved to model/checkpoints/run1-python-codegen
78
+ 2025-11-30 15:11:50 - train_codegen - INFO - ============================================================
79
+ 2025-11-30 15:11:50 - train_codegen - INFO - Training Summary
80
+ 2025-11-30 15:11:50 - train_codegen - INFO - ============================================================
81
+ 2025-11-30 15:11:50 - train_codegen - INFO - Total steps: 19425
82
+ 2025-11-30 15:11:50 - train_codegen - INFO - Best model checkpoint: model/checkpoints/run1-python-codegen/checkpoint-10000
83
+ 2025-11-30 15:11:50 - train_codegen - INFO - Best eval loss: 0.7813047170639038
84
+ 2025-11-30 15:11:50 - train_codegen - INFO - Done.