slm-125m-instruct

A 125M decoder-only language model (instruction-tuned via chat SFT + code SFT). Part of the SLM model family — built entirely from scratch, from raw web data through to a production-ready aligned model.

This is the instruct variant — the base model supervised fine-tuned on chat and code instruction datasets. It follows instructions reliably and can generate Python code. Use tohio/slm-125m-chat for the DPO-aligned version preferred for open-ended conversation. Use tohio/slm-125m for the raw base model.

Model Family

Variant	Hub	Description
Base	tohio/slm-125m	Pretrained only
Instruct	tohio/slm-125m-instruct	Chat + code SFT
Chat	tohio/slm-125m-chat	SFT + DPO aligned

Architecture

Component	Choice	Rationale
Positional encoding	RoPE	Better length generalisation, relative position awareness
Normalization	RMSNorm	Faster than LayerNorm, modern standard
Activation	SwiGLU	Better gradient flow, used by LLaMA and Mistral
Attention	GQA	Reduces KV cache memory at inference
Bias	None	Simpler, modern standard
Embeddings	Tied	Reduces parameters, effective at small scale
Vocab size	32,000	Custom BPE tokenizer trained on the pretraining corpus
Parameters	125.3M (125,264,640 parameters)

Training

Pretraining corpus — 5B tokens blended across the following sources:

Source	Target Share	Link
`common_crawl`	10.0%	Common Crawl
`fineweb`	47.5%	FineWeb
`wikipedia`	10.0%	Wikipedia (EN)
`pg19`	2.5%	PG-19 (Project Gutenberg)
`pes2o`	5.0%	peS2o (academic papers)
`open_web_math`	10.0%	OpenWebMath
`stackexchange`	5.0%	StackExchange
`code`	10.0%	Code (multi-source)

Realized mix may differ from target — supply-bound sources (pes2o, jupyter at this scale) route their deficit to FineWeb.

Fine-tuning

Stage	Dataset	Size
Chat SFT	OpenHermes-2.5	~1M examples
Code SFT	Magicoder-OSS-Instruct-75K	~75K examples

Hardware: NVIDIA H200 (pretraining on 1× H200, fine-tuning on 1× H200)

Evaluation

Evaluated using lm-evaluation-harness.

Benchmark	Few-shot	Metric	Score
HellaSwag	10-shot	acc_norm	0.3257
ARC-Easy	25-shot	acc_norm	0.4739
ARC-Challenge	25-shot	acc_norm	0.2585
MMLU	5-shot	acc	0.2531
TruthfulQA	0-shot	acc	0.4187
HumanEval	0-shot	pass@1	0.0000

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "tohio/slm-125m-instruct",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
    "tohio/slm-125m-instruct",
    trust_remote_code=True,
)

messages = [
    {"role": "system", "content": "Answer clearly and concisely."},
    {"role": "user", "content": "Explain what a transformer is."},
]

inputs = tokenizer.apply_chat_template(
    messages,
    return_tensors="pt",
    add_generation_prompt=True,
    return_dict=True,
)

endofturn_id = tokenizer.convert_tokens_to_ids("<|endofturn|>")

output = model.generate(
    **inputs,
    max_new_tokens=120,
    do_sample=False,
    repetition_penalty=1.1,
    pad_token_id=tokenizer.pad_token_id or tokenizer.eos_token_id,
    eos_token_id=[tokenizer.eos_token_id, endofturn_id],
)

input_len = inputs["input_ids"].shape[1]
print(tokenizer.decode(output[0][input_len:], skip_special_tokens=True))

trust_remote_code=True loads the custom SLM architecture bundled alongside the model weights — no local install of the tohio/slm codebase required.

Limitations

Scale: At 125M parameters this model is significantly smaller than frontier models. It will underperform on complex reasoning, long-context tasks, and domains not well-represented in the pretraining data.
Hallucination: Like all language models, this model can generate plausible-sounding but factually incorrect content. Outputs should not be used as a source of truth without independent verification.
Safety: DPO alignment provides basic harmlessness training but does not guarantee safe outputs in all contexts. This model has not undergone red-teaming or adversarial safety evaluation.
Languages: Training data is predominantly English. Performance on other languages will be significantly degraded.
Code: Code generation is primarily Python-oriented, reflecting the code sub-mix distribution used in pretraining and SFT.

slm — full training pipeline (data curation through serving)
ai-infra — production Kubernetes serving via vLLM

Downloads last month: 310

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for tohio/slm-125m-instruct

Base model

tohio/slm-125m

Finetuned

(2)

this model

tohio
/

slm-125m-instruct