SkeptiSTEM-4B-v2 Final (Merged 16-bit)
Complete merged model combining all training stages:
- R1: STEM SFT (math, science, code)
- R2: Format primer (reasoning tags)
- R3: GRPO verification (DOUBT framework)
- C: Chat restoration SFT
- D: DPO preference alignment
Capabilities
โ STEM problem solving (math, science, coding) โ Verification of suggested answers โ Structured reasoning when appropriate โ Natural conversational ability โ Preference-aligned responses
Usage
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
"HallD/SkeptiSTEM-4B-v2-final-merged-16bit",
max_seq_length=4096,
load_in_4bit=True,
)
FastLanguageModel.for_inference(model)
messages = [
{"role": "user", "content": "What is the derivative of x^3 + 2x?"}
]
text = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
inputs = tokenizer(text, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
License
Apache 2.0
Trained with Unsloth.
- Downloads last month
- 22