GPUburnout-2B-75K-Chat-DPO

A 1.92 billion parameter Llama-style chat model with DPO alignment. Trained from scratch, expanded from 1B, SFT'd on SlimOrca 50K, then DPO-aligned with 1,078 preference pairs.

Model Details

  • Architecture: Llama-style decoder-only transformer
  • Parameters: 1.92B
  • Hidden dim: 2304
  • Layers: 24
  • Attention: GQA (36 query heads, 9 KV heads)
  • FFN: SwiGLU (intermediate 9216)
  • Position encoding: RoPE (theta=500000)
  • Context length: 2048 tokens
  • Vocabulary: 32,005 tokens (BPE + 5 special tokens)

Training Pipeline

  1. Pretraining: 1.04B model trained to Chinchilla-optimal (160K steps, 20.97B tokens)
  2. Growth: Expanded 1B -> 1.92B via weight copying + new layer insertion
  3. Continued pretraining: 75K steps on clean data (contaminated Python-Edu + FineMath replaced)
  4. SFT: SlimOrca 50K, LoRA r=16/alpha=32, 1 epoch
  5. DPO: 1,078 preference pairs, beta=0.1, lr=5e-7, LoRA r=16/alpha=32, 1 epoch

DPO Details

  • Preference data: 1,200 prompts across 10 categories, 5 responses per prompt at graduated temperatures (0.5-1.3)
  • Judge: Claude (via Claude.ai Max subscription) โ€” evaluation only, no distillation
  • Result: 7/8 clean on garbage token check (vs 4/8 on 1B DPO)
  • Key insight: Clean pretraining data was the prerequisite โ€” 1B DPO failed because garbage tokens were baked in from contaminated pretraining data

Garbage Token Check (8 standard prompts)

Prompt Status
Explain how photosynthesis works CLEAN
What is the theory of relativity? CLEAN
Write a Python function to reverse a string GARBAGE
Tell me a creative story about a robot learning to paint CLEAN
Solve: If a train travels 60 mph for 2.5 hours, how far does it go? CLEAN
What are the ethical implications of AI in healthcare? CLEAN
Explain the water cycle to a 10-year-old CLEAN
What is the difference between a virus and a bacterium? CLEAN

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("GPUburnout/GPUburnout-2B-75K-Chat-DPO", torch_dtype="float16")
tokenizer = AutoTokenizer.from_pretrained("GPUburnout/GPUburnout-2B-75K-Chat-DPO")

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Explain how photosynthesis works."},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=200, temperature=0.7, top_p=0.9)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Related Models

Blog

Full training journey documented at gpuburnout.com

Author

Jun Park (@GPUburnout)

Downloads last month
329
Safetensors
Model size
2B params
Tensor type
F16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Space using GPUburnout/GPUburnout-2B-75K-Chat-DPO 1