SkeptiSTEM-4B-v2 Final (Merged 16-bit)

Complete merged model combining all training stages:

  • R1: STEM SFT (math, science, code)
  • R2: Format primer (reasoning tags)
  • R3: GRPO verification (DOUBT framework)
  • C: Chat restoration SFT
  • D: DPO preference alignment

Capabilities

โœ… STEM problem solving (math, science, coding) โœ… Verification of suggested answers โœ… Structured reasoning when appropriate โœ… Natural conversational ability โœ… Preference-aligned responses

Usage

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    "HallD/SkeptiSTEM-4B-v2-final-merged-16bit",
    max_seq_length=4096,
    load_in_4bit=True,
)
FastLanguageModel.for_inference(model)

messages = [
    {"role": "user", "content": "What is the derivative of x^3 + 2x?"}
]

text = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
inputs = tokenizer(text, return_tensors="pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

License

Apache 2.0

Trained with Unsloth.

Downloads last month
22
Safetensors
Model size
4B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for HallD/SkeptiSTEM-4B-v2-final-merged-16bit

Base model

Qwen/Qwen3-4B-Base
Finetuned
(196)
this model