Upload logs/train_codegen_20251129_215852.log with huggingface_hub
Browse files
logs/train_codegen_20251129_215852.log
ADDED
|
@@ -0,0 +1,84 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
2025-11-29 21:58:52 - train_codegen - INFO - Logging to: logs/codegen/train_codegen_20251129_215852.log
|
| 2 |
+
2025-11-29 21:58:52 - train_codegen - INFO - Monitor progress: tail -f logs/codegen/train_codegen_20251129_215852.log
|
| 3 |
+
2025-11-29 21:58:52 - train_codegen - INFO - ============================================================
|
| 4 |
+
2025-11-29 21:58:52 - train_codegen - INFO - CodeGen Training
|
| 5 |
+
2025-11-29 21:58:52 - train_codegen - INFO - ============================================================
|
| 6 |
+
2025-11-29 21:58:52 - train_codegen - INFO - Using CUDA device: 0
|
| 7 |
+
2025-11-29 21:58:52 - train_codegen - INFO - GPU: NVIDIA GeForce RTX 5090
|
| 8 |
+
2025-11-29 21:58:52 - train_codegen - INFO - Configuration:
|
| 9 |
+
2025-11-29 21:58:52 - train_codegen - INFO - model: Salesforce/codegen-350M-mono
|
| 10 |
+
2025-11-29 21:58:52 - train_codegen - INFO - data: datasets/python
|
| 11 |
+
2025-11-29 21:58:52 - train_codegen - INFO - output: model/checkpoints/run1-python-codegen
|
| 12 |
+
2025-11-29 21:58:52 - train_codegen - INFO - batch_size: 10
|
| 13 |
+
2025-11-29 21:58:52 - train_codegen - INFO - gradient_accumulation_steps: 4
|
| 14 |
+
2025-11-29 21:58:52 - train_codegen - INFO - effective_batch_size: 40
|
| 15 |
+
2025-11-29 21:58:52 - train_codegen - INFO - learning_rate: 5e-05
|
| 16 |
+
2025-11-29 21:58:52 - train_codegen - INFO - epochs: 5
|
| 17 |
+
2025-11-29 21:58:52 - train_codegen - INFO - max_length: 1024
|
| 18 |
+
2025-11-29 21:58:52 - train_codegen - INFO - max_steps: -1
|
| 19 |
+
2025-11-29 21:58:52 - train_codegen - INFO - fp16: True
|
| 20 |
+
2025-11-29 21:58:52 - train_codegen - INFO - gradient_checkpointing: True
|
| 21 |
+
2025-11-29 21:58:52 - train_codegen - INFO - seed: 42
|
| 22 |
+
2025-11-29 21:58:52 - train_codegen - INFO - Loading tokenizer and model: Salesforce/codegen-350M-mono
|
| 23 |
+
2025-11-29 21:59:04 - train_codegen - INFO - Loading model with gradient checkpointing enabled
|
| 24 |
+
2025-11-29 21:59:04 - train_codegen - INFO - Loading dataset...
|
| 25 |
+
2025-11-29 21:59:04 - train_codegen - INFO - Loading dataset from datasets/python
|
| 26 |
+
2025-11-29 21:59:05 - train_codegen - INFO - Train samples: 155411
|
| 27 |
+
2025-11-29 21:59:05 - train_codegen - INFO - Validation samples: 19426
|
| 28 |
+
2025-11-29 21:59:05 - train_codegen - INFO - ============================================================
|
| 29 |
+
2025-11-29 21:59:05 - train_codegen - INFO - Dataset Preprocessing
|
| 30 |
+
2025-11-29 21:59:05 - train_codegen - INFO - ============================================================
|
| 31 |
+
2025-11-29 21:59:05 - train_codegen - INFO - Preprocessing 155411 samples (optimized eager loading)...
|
| 32 |
+
2025-11-29 21:59:09 - train_codegen - INFO - Preprocessed 10000/155411 samples
|
| 33 |
+
2025-11-29 21:59:14 - train_codegen - INFO - Preprocessed 20000/155411 samples
|
| 34 |
+
2025-11-29 21:59:19 - train_codegen - INFO - Preprocessed 30000/155411 samples
|
| 35 |
+
2025-11-29 21:59:24 - train_codegen - INFO - Preprocessed 40000/155411 samples
|
| 36 |
+
2025-11-29 21:59:29 - train_codegen - INFO - Preprocessed 50000/155411 samples
|
| 37 |
+
2025-11-29 21:59:33 - train_codegen - INFO - Preprocessed 60000/155411 samples
|
| 38 |
+
2025-11-29 21:59:39 - train_codegen - INFO - Preprocessed 70000/155411 samples
|
| 39 |
+
2025-11-29 21:59:43 - train_codegen - INFO - Preprocessed 80000/155411 samples
|
| 40 |
+
2025-11-29 21:59:48 - train_codegen - INFO - Preprocessed 90000/155411 samples
|
| 41 |
+
2025-11-29 21:59:53 - train_codegen - INFO - Preprocessed 100000/155411 samples
|
| 42 |
+
2025-11-29 21:59:57 - train_codegen - INFO - Preprocessed 110000/155411 samples
|
| 43 |
+
2025-11-29 22:00:02 - train_codegen - INFO - Preprocessed 120000/155411 samples
|
| 44 |
+
2025-11-29 22:00:06 - train_codegen - INFO - Preprocessed 130000/155411 samples
|
| 45 |
+
2025-11-29 22:00:12 - train_codegen - INFO - Preprocessed 140000/155411 samples
|
| 46 |
+
2025-11-29 22:00:16 - train_codegen - INFO - Preprocessed 150000/155411 samples
|
| 47 |
+
2025-11-29 22:00:19 - train_codegen - INFO - Preprocessed 155411/155411 samples
|
| 48 |
+
2025-11-29 22:00:19 - train_codegen - INFO - Preprocessing complete: 155411 samples ready
|
| 49 |
+
2025-11-29 22:00:19 - train_codegen - INFO - Preprocessing 19426 samples (optimized eager loading)...
|
| 50 |
+
2025-11-29 22:00:23 - train_codegen - INFO - Preprocessed 10000/19426 samples
|
| 51 |
+
2025-11-29 22:00:28 - train_codegen - INFO - Preprocessed 19426/19426 samples
|
| 52 |
+
2025-11-29 22:00:28 - train_codegen - INFO - Preprocessing complete: 19426 samples ready
|
| 53 |
+
2025-11-29 22:00:28 - train_codegen - INFO - ============================================================
|
| 54 |
+
2025-11-29 22:00:28 - train_codegen - INFO - Training Arguments
|
| 55 |
+
2025-11-29 22:00:28 - train_codegen - INFO - ============================================================
|
| 56 |
+
2025-11-29 22:00:28 - train_codegen - INFO - Training log will be saved to: model/checkpoints/run1-python-codegen/training_log.csv
|
| 57 |
+
2025-11-29 22:00:28 - train_codegen - INFO - ============================================================
|
| 58 |
+
2025-11-29 22:00:28 - train_codegen - INFO - Training Strategy
|
| 59 |
+
2025-11-29 22:00:28 - train_codegen - INFO - ============================================================
|
| 60 |
+
2025-11-29 22:00:28 - train_codegen - INFO - Evaluation every 1000 steps (optimized for speed)
|
| 61 |
+
2025-11-29 22:00:28 - train_codegen - INFO - Eval batch size: 20 (2x train batch)
|
| 62 |
+
2025-11-29 22:00:28 - train_codegen - INFO - Eval accumulation steps: 4
|
| 63 |
+
2025-11-29 22:00:28 - train_codegen - INFO - Save checkpoint every 2000 steps
|
| 64 |
+
2025-11-29 22:00:28 - train_codegen - INFO - Gradient checkpointing: ENABLED (saves VRAM, slower training)
|
| 65 |
+
2025-11-29 22:00:28 - train_codegen - INFO - FP16 mixed precision enabled
|
| 66 |
+
2025-11-29 22:00:28 - train_codegen - INFO - Dynamic padding per batch (10-20x faster than max_length padding)
|
| 67 |
+
2025-11-29 22:00:28 - train_codegen - INFO - ============================================================
|
| 68 |
+
2025-11-29 22:00:28 - train_codegen - INFO - Starting Training
|
| 69 |
+
2025-11-29 22:00:28 - train_codegen - INFO - ============================================================
|
| 70 |
+
2025-11-29 22:00:28 - train_codegen - INFO - Total training samples: 155411
|
| 71 |
+
2025-11-29 22:00:28 - train_codegen - INFO - Total validation samples: 19426
|
| 72 |
+
2025-11-29 22:00:28 - train_codegen - INFO - Starting training from scratch
|
| 73 |
+
2025-11-30 15:11:50 - train_codegen - INFO - Training completed successfully
|
| 74 |
+
2025-11-30 15:11:50 - train_codegen - INFO - ============================================================
|
| 75 |
+
2025-11-30 15:11:50 - train_codegen - INFO - Saving Final Model
|
| 76 |
+
2025-11-30 15:11:50 - train_codegen - INFO - ============================================================
|
| 77 |
+
2025-11-30 15:11:50 - train_codegen - INFO - Model and tokenizer saved to model/checkpoints/run1-python-codegen
|
| 78 |
+
2025-11-30 15:11:50 - train_codegen - INFO - ============================================================
|
| 79 |
+
2025-11-30 15:11:50 - train_codegen - INFO - Training Summary
|
| 80 |
+
2025-11-30 15:11:50 - train_codegen - INFO - ============================================================
|
| 81 |
+
2025-11-30 15:11:50 - train_codegen - INFO - Total steps: 19425
|
| 82 |
+
2025-11-30 15:11:50 - train_codegen - INFO - Best model checkpoint: model/checkpoints/run1-python-codegen/checkpoint-10000
|
| 83 |
+
2025-11-30 15:11:50 - train_codegen - INFO - Best eval loss: 0.7813047170639038
|
| 84 |
+
2025-11-30 15:11:50 - train_codegen - INFO - Done.
|