HallD's picture
Upload Stage R2 format LoRA (2025-12-22T13:52:22)
bf04b27 verified
metadata
base_model: HallD/SkeptiSTEM-4B-v2-stageR1-merged-16bit
library_name: peft
pipeline_tag: text-generation
license: apache-2.0
tags:
  - lora
  - sft
  - trl
  - unsloth
  - reasoning
  - format-primer

SkeptiSTEM-4B-v2 Stage R2 (Format LoRA)

This is the Stage R2 format priming LoRA for SkeptiSTEM-4B-v2.

Purpose

Teaches the model to output structured reasoning in this format:

<start_working_out>
... working out the problem step by step ...
<end_working_out>

<SOLUTION>
final answer
</SOLUTION>

Training Details

  • Base model: HallD/SkeptiSTEM-4B-v2-stageR1-merged-16bit
  • Dataset: OpenMathReasoning-mini (CoT subset)
  • Examples: ~2,403
  • Epochs: 1 (format priming only)
  • LoRA rank: 64

Usage

from unsloth import FastLanguageModel
from peft import PeftModel

# Load base
base, tokenizer = FastLanguageModel.from_pretrained(
    "HallD/SkeptiSTEM-4B-v2-stageR1-merged-16bit",
    max_seq_length=4096,
    load_in_4bit=True,
)

# Apply R2 format adapter
model = PeftModel.from_pretrained(base, "HallD/SkeptiSTEM-4B-v2-stageR2-format-lora")

# Or merge it
model = model.merge_and_unload()

FastLanguageModel.for_inference(model)

Next Stage

This adapter is used as a foundation for:

  • Stage R3: GRPO training with DOUBT framework
  • Stage CD: Chat restoration + DPO

Trained with Unsloth.