SmolLM-135M-SLERP-Merge

Overview

A SLERP (Spherical Linear Interpolation) merge of SmolLM-135M (base) and SmolLM-135M-Instruct. This "chimera" model blends raw language modeling capabilities with instruction-following abilities, creating a balanced model that inherits strengths from both parents.

Key Features

Dual heritage: Combines base + instruct capabilities
SLERP merge: Uses spherical interpolation for better weight preservation
Ultra-small: Only ~513 MB total
No training required: Pure weight-space interpolation

Merge Details

Model A (base): HuggingFaceTB/SmolLM-135M
Model B (instruct): HuggingFaceTB/SmolLM-135M-Instruct
Method: SLERP (Spherical Linear Interpolation)
Interpolation factor: t=0.6 (60% instruct, 40% base)
Weight matrices: SLERP interpolation
Biases/norms: Linear interpolation

Why SLERP?

Linear interpolation (lerp) can reduce the magnitude of weight vectors, potentially degrading model quality. SLERP interpolates along the surface of a hypersphere, preserving vector magnitudes while smoothly transitioning between the two models. This typically produces higher quality merges.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("Ringkvist/SmolLM-135M-SLERP-Merge")
tokenizer = AutoTokenizer.from_pretrained("Ringkvist/SmolLM-135M-SLERP-Merge")

inputs = tokenizer("Explain what photosynthesis is:", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Merge Recipe (for reproduction with mergekit)

slices:
  - sources:
      - model: HuggingFaceTB/SmolLM-135M
        layer_range: [0, 30]
      - model: HuggingFaceTB/SmolLM-135M-Instruct
        layer_range: [0, 30]
merge_method: slerp
base_model: HuggingFaceTB/SmolLM-135M
parameters:
  t:
    - value: 0.6
dtype: float16

Parent Models

Model	Role	Description
HuggingFaceTB/SmolLM-135M	Base	Raw language modeling
HuggingFaceTB/SmolLM-135M-Instruct	Instruct	Instruction following

Downloads last month: 160

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for Abhinav-Anand/MiniMerge-0.1B

HuggingFaceTB/SmolLM-135M

HuggingFaceTB/SmolLM-135M-Instruct

Merge model

this model

Quantizations

1 model