GPUburnout-3B-75K-Chat-step3000

A 3.12 billion parameter chat model produced by supervised fine-tuning GPUburnout/GPUburnout-3B-75K on SlimOrca-50K. This is the step-3000 checkpoint, which achieved the lowest validation loss during the SFT run.

This is NOT the shipped chat model. The shipped version is GPUburnout/GPUburnout-3B-75K-Chat (step-1500), which was chosen via manual review of generated outputs despite having higher val loss than this checkpoint. This repo exists so the val-loss-vs-output-quality comparison can be reproduced.

SFT Configuration

Base model: GPUburnout-3B-75K
Data: SlimOrca 50K (cleaned subset)
Method: LoRA, then merged
LoRA rank: 16
Learning rate: 5e-5
Epochs: 1
Step: 3000 (final step of epoch)
Precision: float16
Hardware: A100 SXM 80GB on RunPod

Benchmarks (0-shot, float16)

Benchmark	Step 3000 (this)	Step 1500 (shipped)
TruthfulQA MC2	42.10%	42.43%
HellaSwag (acc_norm)	46.66%	46.60%
ARC-Easy (acc_norm)	50.63%	49.83%
ARC-Challenge (acc_norm)	33.11%	32.34%
MMLU (5-shot acc)	25.05%	24.93%

Step 3000 wins on 4 of 5 academic benchmarks. Despite this, step 1500 was selected as the production model because side-by-side output review showed step 1500's responses were noticeably stronger on open-ended generation tasks (the "use the knowledge" axis, which standard benchmarks underweight).

This is a concrete demonstration of the "val loss is roughly 75% reliable as a model-quality proxy" lesson from the GPUburnout SFT pipeline.

Intended Use

Research and reproduction of the val-loss vs output-quality comparison. For practical chat use, prefer GPUburnout-3B-75K-Chat (step 1500).

Blog

Discussed in Season 5 of the GPUburnout blog series.

https://gpuburnout.com

License

Apache 2.0.

Downloads last month: 19

Safetensors

Model size

3B params

Tensor type

F16

Model tree for GPUburnout/GPUburnout-3B-75K-Chat-step3000

Base model

GPUburnout/GPUburnout-3B-75K

Finetuned

(1)

this model