Qwen2.5-Math-7B Abstract-CoT minimal warmup checkpoint

Post-warmup Phase B (iter1) checkpoint used as the starting point for the GRPO compression run at https://huggingface.co/LauraGG/qwen25math-7b-abstract-cot-grpo

Trained with 1 policy-iteration round (Phase A + Phase B), 3k examples from Dolci-Think-SFT-7B, 1 epoch each, full fine-tuning, on 1× H100. ~70 minutes.

Post-warmup MATH-500 cold-start probe: 15.6% (n=32, T=0.7).

Notes

  • Format-correct: emits ... followed by an answer.
  • Empty-z̃ basin already present: z̃ length pegged at ~9 tokens (the minimum).
  • Use as the input to GRPO; not useful standalone.
Downloads last month
16
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for LauraGG/qwen25math-7b-abstract-cot-warmup

Base model

Qwen/Qwen2.5-7B
Finetuned
(136)
this model

Datasets used to train LauraGG/qwen25math-7b-abstract-cot-warmup