GPUburnout-2B-75K-Chat-DPO

A 1.92 billion parameter Llama-style chat model with DPO alignment. Trained from scratch, expanded from 1B, SFT'd on SlimOrca 50K, then DPO-aligned with 1,078 preference pairs.

Model Details

Architecture: Llama-style decoder-only transformer
Parameters: 1.92B
Hidden dim: 2304
Layers: 24
Attention: GQA (36 query heads, 9 KV heads)
FFN: SwiGLU (intermediate 9216)
Position encoding: RoPE (theta=500000)
Context length: 2048 tokens
Vocabulary: 32,005 tokens (BPE + 5 special tokens)

Training Pipeline

Pretraining: 1.04B model trained to Chinchilla-optimal (160K steps, 20.97B tokens)
Growth: Expanded 1B -> 1.92B via weight copying + new layer insertion
Continued pretraining: 75K steps on clean data (contaminated Python-Edu + FineMath replaced)
SFT: SlimOrca 50K, LoRA r=16/alpha=32, 1 epoch
DPO: 1,078 preference pairs, beta=0.1, lr=5e-7, LoRA r=16/alpha=32, 1 epoch

DPO Details

Preference data: 1,200 prompts across 10 categories, 5 responses per prompt at graduated temperatures (0.5-1.3)
Judge: Claude (via Claude.ai Max subscription) — evaluation only, no distillation
Result: 7/8 clean on garbage token check (vs 4/8 on 1B DPO)
Key insight: Clean pretraining data was the prerequisite — 1B DPO failed because garbage tokens were baked in from contaminated pretraining data

Garbage Token Check (8 standard prompts)

Prompt	Status
Explain how photosynthesis works	CLEAN
What is the theory of relativity?	CLEAN
Write a Python function to reverse a string	GARBAGE
Tell me a creative story about a robot learning to paint	CLEAN
Solve: If a train travels 60 mph for 2.5 hours, how far does it go?	CLEAN
What are the ethical implications of AI in healthcare?	CLEAN
Explain the water cycle to a 10-year-old	CLEAN
What is the difference between a virus and a bacterium?	CLEAN

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("GPUburnout/GPUburnout-2B-75K-Chat-DPO", torch_dtype="float16")
tokenizer = AutoTokenizer.from_pretrained("GPUburnout/GPUburnout-2B-75K-Chat-DPO")

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Explain how photosynthesis works."},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=200, temperature=0.7, top_p=0.9)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Related Models

GPUburnout-2B-75K — Base pretrained
GPUburnout-2B-75K-Chat — SFT only
GPUburnout-1B-160K — 1B base (Chinchilla-optimal)

Blog

Full training journey documented at gpuburnout.com

Author

Jun Park (@GPUburnout)

Downloads last month: 4

Safetensors

Model size

2B params

Tensor type

F16

GPUburnout
/

GPUburnout-2B-75K-Chat-DPO