GPUburnout-3B-75K-Chat-step3000

A 3.12 billion parameter chat model produced by supervised fine-tuning GPUburnout/GPUburnout-3B-75K on SlimOrca-50K. This is the step-3000 checkpoint, which achieved the lowest validation loss during the SFT run.

This is NOT the shipped chat model. The shipped version is GPUburnout/GPUburnout-3B-75K-Chat (step-1500), which was chosen via manual review of generated outputs despite having higher val loss than this checkpoint. This repo exists so the val-loss-vs-output-quality comparison can be reproduced.

SFT Configuration

  • Base model: GPUburnout-3B-75K
  • Data: SlimOrca 50K (cleaned subset)
  • Method: LoRA, then merged
  • LoRA rank: 16
  • Learning rate: 5e-5
  • Epochs: 1
  • Step: 3000 (final step of epoch)
  • Precision: float16
  • Hardware: A100 SXM 80GB on RunPod

Benchmarks (0-shot, float16)

Benchmark Step 3000 (this) Step 1500 (shipped)
TruthfulQA MC2 42.10% 42.43%
HellaSwag (acc_norm) 46.66% 46.60%
ARC-Easy (acc_norm) 50.63% 49.83%
ARC-Challenge (acc_norm) 33.11% 32.34%
MMLU (5-shot acc) 25.05% 24.93%

Step 3000 wins on 4 of 5 academic benchmarks. Despite this, step 1500 was selected as the production model because side-by-side output review showed step 1500's responses were noticeably stronger on open-ended generation tasks (the "use the knowledge" axis, which standard benchmarks underweight).

This is a concrete demonstration of the "val loss is roughly 75% reliable as a model-quality proxy" lesson from the GPUburnout SFT pipeline.

Intended Use

Research and reproduction of the val-loss vs output-quality comparison. For practical chat use, prefer GPUburnout-3B-75K-Chat (step 1500).

Blog

Discussed in Season 5 of the GPUburnout blog series.

https://gpuburnout.com

License

Apache 2.0.

Downloads last month
19
Safetensors
Model size
3B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for GPUburnout/GPUburnout-3B-75K-Chat-step3000

Finetuned
(1)
this model