Gemma-2-2B SFT Reasoning Model

A supervised fine-tuned version of google/gemma-2-2b, trained to produce structured chain-of-thought reasoning on mathematical and logical problems.

Model Lineage: This SFT model serves as the foundation for downstream training: Phonsiri/gemma-2-2b-GRPO-Reasoning-full


Model Highlights

This model was trained to explicitly separate its reasoning process from its final answer, using a structured output format. It learns the syntax and structure of chain-of-thought reasoning before any reinforcement signal is applied.

Output Format:

Section Tag Description
Chain-of-Thought <reasoning> ... </reasoning> Step-by-step internal reasoning
Final Answer <answer> ... </answer> Concise final answer

Training Details

Base Model

Fine-tuned from google/gemma-2-2b-it using full parameter fine-tuning (no LoRA/PEFT).

Datasets

Training data was combined from the following sources:

Dataset Type
nohurry/Opus-4.6-Reasoning-3000x-filtered HuggingFace β€” Reasoning
math_combined_2566_2567.json Local β€” Thai math problems
problems_1_5.json Local β€” Math problems
problems_6_10.json Local β€” Math problems
problems_101_125.json Local β€” Math problems
combined.json Local β€” Combined problems
all_solutions.json Local β€” Solutions

Hyperparameters

Parameter Value
Epochs 3
Learning Rate 2e-5
LR Scheduler Cosine
Max Seq Length 8192
Batch Size (per device) 4
Gradient Accumulation 4
Effective Batch Size 16
Warmup Ratio 0.1
Weight Decay 0.01
Precision bfloat16
Gradient Checkpointing Yes
Attention SDPA
Optimizer AdamW

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
import torch

model_id = "Phonsiri/gemma-2-2b-SFT-Reasoning-full-Model"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype=torch.bfloat16
)

system_prompt = (
    "You are a helpful assistant. Please reason step by step, "
    "and put your thoughts within <reasoning> and </reasoning> tags, "
    "and your final answer within <answer> and </answer> tags."
)
prompt = "Solve for x: 3x + 5 = 20"

messages = [{"role": "user", "content": f"{system_prompt}\n\n{prompt}"}]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=False)

with torch.no_grad():
    model.generate(
        **inputs,
        streamer=streamer,
        max_new_tokens=4096,
        temperature=0.6,
        top_p=0.9,
        repetition_penalty=1.1,
    )

Example Output

<reasoning>
We need to isolate x on one side of the equation.

Step 1: Subtract 5 from both sides.
  3x + 5 - 5 = 20 - 5
  3x = 15

Step 2: Divide both sides by 3.
  x = 15 / 3
  x = 5
</reasoning>

<answer>
x = 5
</answer>

Acknowledgements

Authors:

Project Advisor:

  • Supaporn Bunrit, Ph.D. β€” Suranaree University of Technology

Institutions & Credits:

  • Suranaree University of Technology (SUT) β€” Research support and computing resources
  • Google DeepMind β€” Open-weights Gemma 2 model
Downloads last month
365
Safetensors
Model size
3B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Phonsiri/gemma-2-2b-SFT-Reasoning-full-Model

Base model

google/gemma-2-2b
Finetuned
(424)
this model
Finetunes
1 model

Dataset used to train Phonsiri/gemma-2-2b-SFT-Reasoning-full-Model