| 2025-11-29 21:59:00 - train_codegen - INFO - Logging to: logs/codegen/train_codegen_20251129_215900.log | |
| 2025-11-29 21:59:00 - train_codegen - INFO - Monitor progress: tail -f logs/codegen/train_codegen_20251129_215900.log | |
| 2025-11-29 21:59:00 - train_codegen - INFO - ============================================================ | |
| 2025-11-29 21:59:00 - train_codegen - INFO - CodeGen Training | |
| 2025-11-29 21:59:00 - train_codegen - INFO - ============================================================ | |
| 2025-11-29 21:59:00 - train_codegen - INFO - Using CUDA device: 0 | |
| 2025-11-29 21:59:00 - train_codegen - INFO - GPU: NVIDIA GeForce RTX 5090 | |
| 2025-11-29 21:59:00 - train_codegen - INFO - Configuration: | |
| 2025-11-29 21:59:00 - train_codegen - INFO - model: Salesforce/codegen-350M-mono | |
| 2025-11-29 21:59:00 - train_codegen - INFO - data: datasets/java | |
| 2025-11-29 21:59:00 - train_codegen - INFO - output: model/checkpoints/run1-java-codegen | |
| 2025-11-29 21:59:00 - train_codegen - INFO - batch_size: 10 | |
| 2025-11-29 21:59:00 - train_codegen - INFO - gradient_accumulation_steps: 4 | |
| 2025-11-29 21:59:00 - train_codegen - INFO - effective_batch_size: 40 | |
| 2025-11-29 21:59:00 - train_codegen - INFO - learning_rate: 5e-05 | |
| 2025-11-29 21:59:00 - train_codegen - INFO - epochs: 5 | |
| 2025-11-29 21:59:00 - train_codegen - INFO - max_length: 1024 | |
| 2025-11-29 21:59:00 - train_codegen - INFO - max_steps: -1 | |
| 2025-11-29 21:59:00 - train_codegen - INFO - fp16: True | |
| 2025-11-29 21:59:00 - train_codegen - INFO - gradient_checkpointing: True | |
| 2025-11-29 21:59:00 - train_codegen - INFO - seed: 42 | |
| 2025-11-29 21:59:00 - train_codegen - INFO - Loading tokenizer and model: Salesforce/codegen-350M-mono | |
| 2025-11-29 21:59:13 - train_codegen - INFO - Loading model with gradient checkpointing enabled | |
| 2025-11-29 21:59:13 - train_codegen - INFO - Loading dataset... | |
| 2025-11-29 21:59:13 - train_codegen - INFO - Loading dataset from datasets/java | |
| 2025-11-29 21:59:16 - train_codegen - INFO - Train samples: 275962 | |
| 2025-11-29 21:59:16 - train_codegen - INFO - Validation samples: 34495 | |
| 2025-11-29 21:59:16 - train_codegen - INFO - ============================================================ | |
| 2025-11-29 21:59:16 - train_codegen - INFO - Dataset Preprocessing | |
| 2025-11-29 21:59:16 - train_codegen - INFO - ============================================================ | |
| 2025-11-29 21:59:16 - train_codegen - INFO - Preprocessing 275962 samples (optimized eager loading)... | |
| 2025-11-29 21:59:22 - train_codegen - INFO - Preprocessed 10000/275962 samples | |
| 2025-11-29 21:59:27 - train_codegen - INFO - Preprocessed 20000/275962 samples | |
| 2025-11-29 21:59:33 - train_codegen - INFO - Preprocessed 30000/275962 samples | |
| 2025-11-29 21:59:38 - train_codegen - INFO - Preprocessed 40000/275962 samples | |
| 2025-11-29 21:59:43 - train_codegen - INFO - Preprocessed 50000/275962 samples | |
| 2025-11-29 21:59:49 - train_codegen - INFO - Preprocessed 60000/275962 samples | |
| 2025-11-29 21:59:54 - train_codegen - INFO - Preprocessed 70000/275962 samples | |
| 2025-11-29 22:00:00 - train_codegen - INFO - Preprocessed 80000/275962 samples | |
| 2025-11-29 22:00:05 - train_codegen - INFO - Preprocessed 90000/275962 samples | |
| 2025-11-29 22:00:10 - train_codegen - INFO - Preprocessed 100000/275962 samples | |
| 2025-11-29 22:00:16 - train_codegen - INFO - Preprocessed 110000/275962 samples | |
| 2025-11-29 22:00:22 - train_codegen - INFO - Preprocessed 120000/275962 samples | |
| 2025-11-29 22:00:27 - train_codegen - INFO - Preprocessed 130000/275962 samples | |
| 2025-11-29 22:00:33 - train_codegen - INFO - Preprocessed 140000/275962 samples | |
| 2025-11-29 22:00:38 - train_codegen - INFO - Preprocessed 150000/275962 samples | |
| 2025-11-29 22:00:45 - train_codegen - INFO - Preprocessed 160000/275962 samples | |
| 2025-11-29 22:00:52 - train_codegen - INFO - Preprocessed 170000/275962 samples | |
| 2025-11-29 22:00:57 - train_codegen - INFO - Preprocessed 180000/275962 samples | |
| 2025-11-29 22:01:03 - train_codegen - INFO - Preprocessed 190000/275962 samples | |
| 2025-11-29 22:01:09 - train_codegen - INFO - Preprocessed 200000/275962 samples | |
| 2025-11-29 22:01:13 - train_codegen - INFO - Preprocessed 210000/275962 samples | |
| 2025-11-29 22:01:20 - train_codegen - INFO - Preprocessed 220000/275962 samples | |
| 2025-11-29 22:01:25 - train_codegen - INFO - Preprocessed 230000/275962 samples | |
| 2025-11-29 22:01:30 - train_codegen - INFO - Preprocessed 240000/275962 samples | |
| 2025-11-29 22:01:36 - train_codegen - INFO - Preprocessed 250000/275962 samples | |
| 2025-11-29 22:01:42 - train_codegen - INFO - Preprocessed 260000/275962 samples | |
| 2025-11-29 22:01:47 - train_codegen - INFO - Preprocessed 270000/275962 samples | |
| 2025-11-29 22:01:51 - train_codegen - INFO - Preprocessed 275962/275962 samples | |
| 2025-11-29 22:01:51 - train_codegen - INFO - Preprocessing complete: 275962 samples ready | |
| 2025-11-29 22:01:52 - train_codegen - INFO - Preprocessing 34495 samples (optimized eager loading)... | |
| 2025-11-29 22:01:56 - train_codegen - INFO - Preprocessed 10000/34495 samples | |
| 2025-11-29 22:02:01 - train_codegen - INFO - Preprocessed 20000/34495 samples | |
| 2025-11-29 22:02:08 - train_codegen - INFO - Preprocessed 30000/34495 samples | |
| 2025-11-29 22:02:10 - train_codegen - INFO - Preprocessed 34495/34495 samples | |
| 2025-11-29 22:02:10 - train_codegen - INFO - Preprocessing complete: 34495 samples ready | |
| 2025-11-29 22:02:11 - train_codegen - INFO - ============================================================ | |
| 2025-11-29 22:02:11 - train_codegen - INFO - Training Arguments | |
| 2025-11-29 22:02:11 - train_codegen - INFO - ============================================================ | |
| 2025-11-29 22:02:11 - train_codegen - INFO - Training log will be saved to: model/checkpoints/run1-java-codegen/training_log.csv | |
| 2025-11-29 22:02:11 - train_codegen - INFO - ============================================================ | |
| 2025-11-29 22:02:11 - train_codegen - INFO - Training Strategy | |
| 2025-11-29 22:02:11 - train_codegen - INFO - ============================================================ | |
| 2025-11-29 22:02:11 - train_codegen - INFO - Evaluation every 1000 steps (optimized for speed) | |
| 2025-11-29 22:02:11 - train_codegen - INFO - Eval batch size: 20 (2x train batch) | |
| 2025-11-29 22:02:11 - train_codegen - INFO - Eval accumulation steps: 4 | |
| 2025-11-29 22:02:11 - train_codegen - INFO - Save checkpoint every 2000 steps | |
| 2025-11-29 22:02:11 - train_codegen - INFO - Gradient checkpointing: ENABLED (saves VRAM, slower training) | |
| 2025-11-29 22:02:11 - train_codegen - INFO - FP16 mixed precision enabled | |
| 2025-11-29 22:02:11 - train_codegen - INFO - Dynamic padding per batch (10-20x faster than max_length padding) | |
| 2025-11-29 22:02:11 - train_codegen - INFO - ============================================================ | |
| 2025-11-29 22:02:11 - train_codegen - INFO - Starting Training | |
| 2025-11-29 22:02:11 - train_codegen - INFO - ============================================================ | |
| 2025-11-29 22:02:11 - train_codegen - INFO - Total training samples: 275962 | |
| 2025-11-29 22:02:11 - train_codegen - INFO - Total validation samples: 34495 | |
| 2025-11-29 22:02:11 - train_codegen - INFO - Starting training from scratch | |
| 2025-12-01 02:38:40 - train_codegen - INFO - Training completed successfully | |
| 2025-12-01 02:38:40 - train_codegen - INFO - ============================================================ | |
| 2025-12-01 02:38:40 - train_codegen - INFO - Saving Final Model | |
| 2025-12-01 02:38:40 - train_codegen - INFO - ============================================================ | |
| 2025-12-01 02:38:42 - train_codegen - INFO - Model and tokenizer saved to model/checkpoints/run1-java-codegen | |
| 2025-12-01 02:38:42 - train_codegen - INFO - ============================================================ | |
| 2025-12-01 02:38:42 - train_codegen - INFO - Training Summary | |
| 2025-12-01 02:38:42 - train_codegen - INFO - ============================================================ | |
| 2025-12-01 02:38:42 - train_codegen - INFO - Total steps: 34495 | |
| 2025-12-01 02:38:42 - train_codegen - INFO - Best model checkpoint: model/checkpoints/run1-java-codegen/checkpoint-20000 | |
| 2025-12-01 02:38:42 - train_codegen - INFO - Best eval loss: 0.7098406553268433 | |
| 2025-12-01 02:38:42 - train_codegen - INFO - Done. | |