HallD's picture
Upload final merged 16-bit checkpoint
67f3c64 verified
metadata
license: apache-2.0
pipeline_tag: text-generation
library_name: transformers
tags:
  - merged
  - sft
  - lora
  - grpo
  - trl
  - unsloth
base_model: HallD/SkeptiSTEM-4B-stageR1-merged-16bit

SkeptiSTEM-4B Final Merged (16-bit)

Merged checkpoint of:

  • Base: HallD/SkeptiSTEM-4B-stageR1-merged-16bit
  • Stage R2 (format): HallD/SkeptiSTEM-4B-stageR2-format-lora
  • Stage R3 (GRPO): HallD/SkeptiSTEM-4B-stageR3-grpo-lora

This checkpoint bakes both adapters into the weights for one-shot inference.