| Training Log for Trelis/whisper-tiny-llm-lingo |
| ================================================== |
| Base Model: openai/whisper-tiny |
| Train Dataset: Trelis/llm-lingo |
| ================================================== |
|
|
| [16:12:47] Starting pipeline... |
| [16:12:47] [Modal] Submitting training job to Modal (H100 GPU)... |
| [16:12:47] [Modal] Base model: openai/whisper-tiny |
| [16:12:47] [Modal] Train dataset: Trelis/llm-lingo |
| [16:13:08] [Modal] Function call started (ID: fc-01KEYMFXG...) |
| [16:13:08] [Modal] Starting training pipeline on H100 GPU |
| [16:13:08] [Modal] CUDA available: True |
| [16:13:08] [Modal] GPU: NVIDIA H100 80GB HBM3 |
| [16:13:08] [Modal] GPU memory: 79.2GB total |
| [16:13:08] [Modal] ============================================================ |
| [16:13:08] [Modal] PHASE 1: Baseline Evaluation |
| [16:13:08] [Modal] ============================================================ |
| [16:13:08] [Modal] Loading validation dataset: Trelis/llm-lingo (split=validation) |
| [16:13:09] [Modal] Loaded 6 validation samples |
| [16:13:16] [Modal] [Baseline] Loading model: openai/whisper-tiny |
| [16:13:16] [Modal] [Baseline] Evaluating on 6 samples |
| [16:13:16] [Modal] [Baseline] WER: 107.85% (6 samples) |
| [16:13:16] [Modal] ============================================================ |
| [16:13:16] [Modal] PHASE 2: Training with Unsloth |
| [16:13:16] [Modal] ============================================================ |
| [16:13:16] [Modal] Loading model: openai/whisper-tiny |
| [16:13:24] [Modal] Applying LoRA (rank=32, alpha=16) |
| [16:13:26] [Modal] GPU memory after model+LoRA: 0.1GB / 0.3GB peak |
| [16:13:26] [Modal] Loading dataset: Trelis/llm-lingo |
| [16:13:28] [Modal] Using 16 CPU workers for preprocessing... |
| [16:13:32] [Modal] Preprocessing complete in 3.1s |
| [16:13:32] [Modal] Dataset: 7 train, 6 validation |
| [16:13:32] [Modal] Total steps: 7, warmup: 1, eval every 1 |
| [16:13:35] [Modal] WandB logging enabled: https://wandb.ai/trelis/trelis-whisper/runs/7r39d3u3 |
| [16:13:35] [Modal] Using bf16=True, fp16=False |
| [16:13:35] [Modal] Starting training... |
| [16:13:37] [Modal] Starting: 7 steps, 1 epoch(s) |
| [16:13:48] [Modal] Step 1/7 (14%) | 0.2GB / 0.3GB peak | Elapsed: 0.2m | Remaining (est.): 1.1m |
| [16:13:48] [Modal] β loss: 2.5725 | grad_norm: 50.6474 | lr: 0.00e+00 |
| [16:13:49] [Modal] β eval WER: 50.23% |
| [16:13:49] [Modal] β eval loss: 2.4652 |
| [16:13:50] [Modal] Step 2/7 (29%) | 0.2GB / 0.4GB peak | Elapsed: 0.2m | Remaining (est.): 0.5m |
| [16:13:50] [Modal] β loss: 4.4833 | grad_norm: 32.5273 | lr: 5.00e-05 |
| [16:13:50] [Modal] β eval WER: 47.03% |
| [16:13:50] [Modal] β eval loss: 2.3221 |
| [16:13:50] [Modal] Step 3/7 (43%) | 0.2GB / 0.4GB peak | Elapsed: 0.2m | Remaining (est.): 0.3m |
| [16:13:51] [Modal] β loss: 3.1520 | grad_norm: 61.8564 | lr: 5.00e-05 |
| [16:13:51] [Modal] β eval WER: 46.58% |
| [16:13:51] [Modal] β eval loss: 2.1380 |
| [16:13:51] [Modal] Step 4/7 (57%) | 0.2GB / 0.4GB peak | Elapsed: 0.2m | Remaining (est.): 0.2m |
| [16:13:51] [Modal] β loss: 1.9816 | grad_norm: 24.9086 | lr: 5.00e-05 |
| [16:13:52] [Modal] β eval WER: 44.29% |
| [16:13:52] [Modal] β eval loss: 2.0296 |
| [16:13:52] [Modal] Step 5/7 (71%) | 0.2GB / 0.4GB peak | Elapsed: 0.3m | Remaining (est.): 0.1m |
| [16:13:52] [Modal] β loss: 2.0131 | grad_norm: 36.7509 | lr: 5.00e-05 |
| [16:13:53] [Modal] β eval WER: 44.29% |
| [16:13:53] [Modal] β eval loss: 1.9403 |
| [16:13:53] [Modal] Step 6/7 (86%) | 0.2GB / 0.4GB peak | Elapsed: 0.3m | Remaining (est.): 0.0m |
| [16:13:53] [Modal] β loss: 1.9703 | grad_norm: 12.7650 | lr: 5.00e-05 |
| [16:13:54] [Modal] β eval WER: 44.75% |
| [16:13:54] [Modal] β eval loss: 1.8283 |
| [16:13:54] [Modal] Step 7/7 (100%) | 0.2GB / 0.4GB peak | Elapsed: 0.3m | Remaining (est.): 0.0m |
| [16:13:54] [Modal] β loss: 1.8809 | grad_norm: 11.2453 | lr: 5.00e-05 |
| [16:13:55] [Modal] β eval WER: 44.75% |
| [16:13:55] [Modal] β eval loss: 1.7527 |
| [16:13:55] [Modal] Training complete in 0.3 minutes |
| [16:13:55] [Modal] Training metrics: 0.3 min, loss=2.5791 |
| [16:13:55] [Modal] Peak GPU memory during training: 0.2GB / 0.4GB peak |
| [16:13:55] [Modal] Merging LoRA weights... |
| [16:13:57] [Modal] Saved generation config from base model |
| [16:13:57] [Modal] Saved merged model to /tmp/merged_model |
| [16:13:57] [Modal] Pushing model to HuggingFace Hub: Trelis/whisper-tiny-llm-lingo... |
| [16:14:04] [Modal] Model pushed: https://huggingface.co/Trelis/whisper-tiny-llm-lingo |
| [16:14:04] [Modal] Converting to CTranslate2 format (bfloat16)... |
| [16:14:05] [Modal] CTranslate2 conversion complete |
| [16:14:08] [Modal] CTranslate2 pushed: https://huggingface.co/Trelis/whisper-tiny-llm-lingo/tree/ctranslate2 |
| [16:14:11] [Modal] ============================================================ |
| [16:14:11] [Modal] PHASE 3: Final Evaluation |
| [16:14:11] [Modal] ============================================================ |
| [16:14:17] [Modal] [Final] Loading model: /tmp/merged_model |
| [16:14:17] [Modal] [Final] Evaluating on 6 samples |
| [16:14:17] [Modal] [Final] WER: 34.30% (6 samples) |
| [16:14:17] [Modal] ============================================================ |
| [16:14:17] [Modal] TRAINING COMPLETE - SUMMARY |
| [16:14:17] [Modal] ============================================================ |
| [16:14:17] [Modal] Baseline WER: 107.85% |
| [16:14:17] [Modal] Final WER: 34.30% |
| [16:14:17] [Modal] Improvement: 73.55 percentage points |
| [16:14:17] [Modal] ============================================================ |
| [16:14:18] [Modal] Training complete: loss=2.5791, runtime=0.3min |
| [16:14:18] [Modal] Baseline WER: 107.85% |
| [16:14:18] [Modal] Final WER: 34.30% |
| [16:14:18] [Upload] Model pushed by Modal, uploading additional files... |
| [16:14:19] [Upload] Uploading model card... |
| [16:14:20] [Upload] Uploading training logs... |