Qwen2.5-Math-7B Abstract-CoT minimal warmup checkpoint

Post-warmup Phase B (iter1) checkpoint used as the starting point for the GRPO compression run at https://huggingface.co/LauraGG/qwen25math-7b-abstract-cot-grpo

Trained with 1 policy-iteration round (Phase A + Phase B), 3k examples from Dolci-Think-SFT-7B, 1 epoch each, full fine-tuning, on 1× H100. ~70 minutes.

Post-warmup MATH-500 cold-start probe: 15.6% (n=32, T=0.7).

Notes

Format-correct: emits ... followed by an answer.
Empty-z̃ basin already present: z̃ length pegged at ~9 tokens (the minimum).
Use as the input to GRPO; not useful standalone.

Downloads last month: -

Safetensors

Model size

8B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for LauraGG/qwen25math-7b-abstract-cot-warmup

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-Math-7B

Finetuned

Qwen/Qwen2.5-Math-7B-Instruct

Finetuned

(137)

this model

LauraGG
/

qwen25math-7b-abstract-cot-warmup

Qwen2.5-Math-7B Abstract-CoT minimal warmup checkpoint

Notes

Model tree for LauraGG/qwen25math-7b-abstract-cot-warmup

Datasets used to train LauraGG/qwen25math-7b-abstract-cot-warmup