GPUburnout-3B-75K-Chat-step3000
A 3.12 billion parameter chat model produced by supervised fine-tuning GPUburnout/GPUburnout-3B-75K on SlimOrca-50K. This is the step-3000 checkpoint, which achieved the lowest validation loss during the SFT run.
This is NOT the shipped chat model. The shipped version is GPUburnout/GPUburnout-3B-75K-Chat (step-1500), which was chosen via manual review of generated outputs despite having higher val loss than this checkpoint. This repo exists so the val-loss-vs-output-quality comparison can be reproduced.
SFT Configuration
- Base model: GPUburnout-3B-75K
- Data: SlimOrca 50K (cleaned subset)
- Method: LoRA, then merged
- LoRA rank: 16
- Learning rate: 5e-5
- Epochs: 1
- Step: 3000 (final step of epoch)
- Precision: float16
- Hardware: A100 SXM 80GB on RunPod
Benchmarks (0-shot, float16)
| Benchmark | Step 3000 (this) | Step 1500 (shipped) |
|---|---|---|
| TruthfulQA MC2 | 42.10% | 42.43% |
| HellaSwag (acc_norm) | 46.66% | 46.60% |
| ARC-Easy (acc_norm) | 50.63% | 49.83% |
| ARC-Challenge (acc_norm) | 33.11% | 32.34% |
| MMLU (5-shot acc) | 25.05% | 24.93% |
Step 3000 wins on 4 of 5 academic benchmarks. Despite this, step 1500 was selected as the production model because side-by-side output review showed step 1500's responses were noticeably stronger on open-ended generation tasks (the "use the knowledge" axis, which standard benchmarks underweight).
This is a concrete demonstration of the "val loss is roughly 75% reliable as a model-quality proxy" lesson from the GPUburnout SFT pipeline.
Intended Use
Research and reproduction of the val-loss vs output-quality comparison. For practical chat use, prefer GPUburnout-3B-75K-Chat (step 1500).
Blog
Discussed in Season 5 of the GPUburnout blog series.
License
Apache 2.0.
- Downloads last month
- 19
Model tree for GPUburnout/GPUburnout-3B-75K-Chat-step3000
Base model
GPUburnout/GPUburnout-3B-75K