metadata
license: apache-2.0
pipeline_tag: text-generation
library_name: transformers
tags:
- merged
- sft
- lora
- grpo
- trl
- unsloth
base_model: HallD/SkeptiSTEM-4B-stageR1-merged-16bit
SkeptiSTEM-4B Final Merged (16-bit)
Merged checkpoint of:
- Base:
HallD/SkeptiSTEM-4B-stageR1-merged-16bit - Stage R2 (format):
HallD/SkeptiSTEM-4B-stageR2-format-lora - Stage R3 (GRPO):
HallD/SkeptiSTEM-4B-stageR3-grpo-lora
This checkpoint bakes both adapters into the weights for one-shot inference.