whisper-tiny-llm-lingo / training_log.txt
RonanMcGovern's picture
Upload training_log.txt with huggingface_hub
d5308f1 verified
Training Log for Trelis/whisper-tiny-llm-lingo
==================================================
Base Model: openai/whisper-tiny
Train Dataset: Trelis/llm-lingo
==================================================
[16:12:47] Starting pipeline...
[16:12:47] [Modal] Submitting training job to Modal (H100 GPU)...
[16:12:47] [Modal] Base model: openai/whisper-tiny
[16:12:47] [Modal] Train dataset: Trelis/llm-lingo
[16:13:08] [Modal] Function call started (ID: fc-01KEYMFXG...)
[16:13:08] [Modal] Starting training pipeline on H100 GPU
[16:13:08] [Modal] CUDA available: True
[16:13:08] [Modal] GPU: NVIDIA H100 80GB HBM3
[16:13:08] [Modal] GPU memory: 79.2GB total
[16:13:08] [Modal] ============================================================
[16:13:08] [Modal] PHASE 1: Baseline Evaluation
[16:13:08] [Modal] ============================================================
[16:13:08] [Modal] Loading validation dataset: Trelis/llm-lingo (split=validation)
[16:13:09] [Modal] Loaded 6 validation samples
[16:13:16] [Modal] [Baseline] Loading model: openai/whisper-tiny
[16:13:16] [Modal] [Baseline] Evaluating on 6 samples
[16:13:16] [Modal] [Baseline] WER: 107.85% (6 samples)
[16:13:16] [Modal] ============================================================
[16:13:16] [Modal] PHASE 2: Training with Unsloth
[16:13:16] [Modal] ============================================================
[16:13:16] [Modal] Loading model: openai/whisper-tiny
[16:13:24] [Modal] Applying LoRA (rank=32, alpha=16)
[16:13:26] [Modal] GPU memory after model+LoRA: 0.1GB / 0.3GB peak
[16:13:26] [Modal] Loading dataset: Trelis/llm-lingo
[16:13:28] [Modal] Using 16 CPU workers for preprocessing...
[16:13:32] [Modal] Preprocessing complete in 3.1s
[16:13:32] [Modal] Dataset: 7 train, 6 validation
[16:13:32] [Modal] Total steps: 7, warmup: 1, eval every 1
[16:13:35] [Modal] WandB logging enabled: https://wandb.ai/trelis/trelis-whisper/runs/7r39d3u3
[16:13:35] [Modal] Using bf16=True, fp16=False
[16:13:35] [Modal] Starting training...
[16:13:37] [Modal] Starting: 7 steps, 1 epoch(s)
[16:13:48] [Modal] Step 1/7 (14%) | 0.2GB / 0.3GB peak | Elapsed: 0.2m | Remaining (est.): 1.1m
[16:13:48] [Modal] β†’ loss: 2.5725 | grad_norm: 50.6474 | lr: 0.00e+00
[16:13:49] [Modal] β†’ eval WER: 50.23%
[16:13:49] [Modal] β†’ eval loss: 2.4652
[16:13:50] [Modal] Step 2/7 (29%) | 0.2GB / 0.4GB peak | Elapsed: 0.2m | Remaining (est.): 0.5m
[16:13:50] [Modal] β†’ loss: 4.4833 | grad_norm: 32.5273 | lr: 5.00e-05
[16:13:50] [Modal] β†’ eval WER: 47.03%
[16:13:50] [Modal] β†’ eval loss: 2.3221
[16:13:50] [Modal] Step 3/7 (43%) | 0.2GB / 0.4GB peak | Elapsed: 0.2m | Remaining (est.): 0.3m
[16:13:51] [Modal] β†’ loss: 3.1520 | grad_norm: 61.8564 | lr: 5.00e-05
[16:13:51] [Modal] β†’ eval WER: 46.58%
[16:13:51] [Modal] β†’ eval loss: 2.1380
[16:13:51] [Modal] Step 4/7 (57%) | 0.2GB / 0.4GB peak | Elapsed: 0.2m | Remaining (est.): 0.2m
[16:13:51] [Modal] β†’ loss: 1.9816 | grad_norm: 24.9086 | lr: 5.00e-05
[16:13:52] [Modal] β†’ eval WER: 44.29%
[16:13:52] [Modal] β†’ eval loss: 2.0296
[16:13:52] [Modal] Step 5/7 (71%) | 0.2GB / 0.4GB peak | Elapsed: 0.3m | Remaining (est.): 0.1m
[16:13:52] [Modal] β†’ loss: 2.0131 | grad_norm: 36.7509 | lr: 5.00e-05
[16:13:53] [Modal] β†’ eval WER: 44.29%
[16:13:53] [Modal] β†’ eval loss: 1.9403
[16:13:53] [Modal] Step 6/7 (86%) | 0.2GB / 0.4GB peak | Elapsed: 0.3m | Remaining (est.): 0.0m
[16:13:53] [Modal] β†’ loss: 1.9703 | grad_norm: 12.7650 | lr: 5.00e-05
[16:13:54] [Modal] β†’ eval WER: 44.75%
[16:13:54] [Modal] β†’ eval loss: 1.8283
[16:13:54] [Modal] Step 7/7 (100%) | 0.2GB / 0.4GB peak | Elapsed: 0.3m | Remaining (est.): 0.0m
[16:13:54] [Modal] β†’ loss: 1.8809 | grad_norm: 11.2453 | lr: 5.00e-05
[16:13:55] [Modal] β†’ eval WER: 44.75%
[16:13:55] [Modal] β†’ eval loss: 1.7527
[16:13:55] [Modal] Training complete in 0.3 minutes
[16:13:55] [Modal] Training metrics: 0.3 min, loss=2.5791
[16:13:55] [Modal] Peak GPU memory during training: 0.2GB / 0.4GB peak
[16:13:55] [Modal] Merging LoRA weights...
[16:13:57] [Modal] Saved generation config from base model
[16:13:57] [Modal] Saved merged model to /tmp/merged_model
[16:13:57] [Modal] Pushing model to HuggingFace Hub: Trelis/whisper-tiny-llm-lingo...
[16:14:04] [Modal] Model pushed: https://huggingface.co/Trelis/whisper-tiny-llm-lingo
[16:14:04] [Modal] Converting to CTranslate2 format (bfloat16)...
[16:14:05] [Modal] CTranslate2 conversion complete
[16:14:08] [Modal] CTranslate2 pushed: https://huggingface.co/Trelis/whisper-tiny-llm-lingo/tree/ctranslate2
[16:14:11] [Modal] ============================================================
[16:14:11] [Modal] PHASE 3: Final Evaluation
[16:14:11] [Modal] ============================================================
[16:14:17] [Modal] [Final] Loading model: /tmp/merged_model
[16:14:17] [Modal] [Final] Evaluating on 6 samples
[16:14:17] [Modal] [Final] WER: 34.30% (6 samples)
[16:14:17] [Modal] ============================================================
[16:14:17] [Modal] TRAINING COMPLETE - SUMMARY
[16:14:17] [Modal] ============================================================
[16:14:17] [Modal] Baseline WER: 107.85%
[16:14:17] [Modal] Final WER: 34.30%
[16:14:17] [Modal] Improvement: 73.55 percentage points
[16:14:17] [Modal] ============================================================
[16:14:18] [Modal] Training complete: loss=2.5791, runtime=0.3min
[16:14:18] [Modal] Baseline WER: 107.85%
[16:14:18] [Modal] Final WER: 34.30%
[16:14:18] [Upload] Model pushed by Modal, uploading additional files...
[16:14:19] [Upload] Uploading model card...
[16:14:20] [Upload] Uploading training logs...