GPUburnout-2B-75K-Chat-DPO
A 1.92 billion parameter Llama-style chat model with DPO alignment. Trained from scratch, expanded from 1B, SFT'd on SlimOrca 50K, then DPO-aligned with 1,078 preference pairs.
Model Details
- Architecture: Llama-style decoder-only transformer
- Parameters: 1.92B
- Hidden dim: 2304
- Layers: 24
- Attention: GQA (36 query heads, 9 KV heads)
- FFN: SwiGLU (intermediate 9216)
- Position encoding: RoPE (theta=500000)
- Context length: 2048 tokens
- Vocabulary: 32,005 tokens (BPE + 5 special tokens)
Training Pipeline
- Pretraining: 1.04B model trained to Chinchilla-optimal (160K steps, 20.97B tokens)
- Growth: Expanded 1B -> 1.92B via weight copying + new layer insertion
- Continued pretraining: 75K steps on clean data (contaminated Python-Edu + FineMath replaced)
- SFT: SlimOrca 50K, LoRA r=16/alpha=32, 1 epoch
- DPO: 1,078 preference pairs, beta=0.1, lr=5e-7, LoRA r=16/alpha=32, 1 epoch
DPO Details
- Preference data: 1,200 prompts across 10 categories, 5 responses per prompt at graduated temperatures (0.5-1.3)
- Judge: Claude (via Claude.ai Max subscription) โ evaluation only, no distillation
- Result: 7/8 clean on garbage token check (vs 4/8 on 1B DPO)
- Key insight: Clean pretraining data was the prerequisite โ 1B DPO failed because garbage tokens were baked in from contaminated pretraining data
Garbage Token Check (8 standard prompts)
| Prompt | Status |
|---|---|
| Explain how photosynthesis works | CLEAN |
| What is the theory of relativity? | CLEAN |
| Write a Python function to reverse a string | GARBAGE |
| Tell me a creative story about a robot learning to paint | CLEAN |
| Solve: If a train travels 60 mph for 2.5 hours, how far does it go? | CLEAN |
| What are the ethical implications of AI in healthcare? | CLEAN |
| Explain the water cycle to a 10-year-old | CLEAN |
| What is the difference between a virus and a bacterium? | CLEAN |
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("GPUburnout/GPUburnout-2B-75K-Chat-DPO", torch_dtype="float16")
tokenizer = AutoTokenizer.from_pretrained("GPUburnout/GPUburnout-2B-75K-Chat-DPO")
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain how photosynthesis works."},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=200, temperature=0.7, top_p=0.9)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
Related Models
- GPUburnout-2B-75K โ Base pretrained
- GPUburnout-2B-75K-Chat โ SFT only
- GPUburnout-1B-160K โ 1B base (Chinchilla-optimal)
Blog
Full training journey documented at gpuburnout.com
Author
Jun Park (@GPUburnout)
- Downloads last month
- 329