SmolLM-135M-SLERP-Merge

Overview

A SLERP (Spherical Linear Interpolation) merge of SmolLM-135M (base) and SmolLM-135M-Instruct. This "chimera" model blends raw language modeling capabilities with instruction-following abilities, creating a balanced model that inherits strengths from both parents.

Key Features

  • Dual heritage: Combines base + instruct capabilities
  • SLERP merge: Uses spherical interpolation for better weight preservation
  • Ultra-small: Only ~513 MB total
  • No training required: Pure weight-space interpolation

Merge Details

Why SLERP?

Linear interpolation (lerp) can reduce the magnitude of weight vectors, potentially degrading model quality. SLERP interpolates along the surface of a hypersphere, preserving vector magnitudes while smoothly transitioning between the two models. This typically produces higher quality merges.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("Ringkvist/SmolLM-135M-SLERP-Merge")
tokenizer = AutoTokenizer.from_pretrained("Ringkvist/SmolLM-135M-SLERP-Merge")

inputs = tokenizer("Explain what photosynthesis is:", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Merge Recipe (for reproduction with mergekit)

slices:
  - sources:
      - model: HuggingFaceTB/SmolLM-135M
        layer_range: [0, 30]
      - model: HuggingFaceTB/SmolLM-135M-Instruct
        layer_range: [0, 30]
merge_method: slerp
base_model: HuggingFaceTB/SmolLM-135M
parameters:
  t:
    - value: 0.6
dtype: float16

Parent Models

Model Role Description
HuggingFaceTB/SmolLM-135M Base Raw language modeling
HuggingFaceTB/SmolLM-135M-Instruct Instruct Instruction following
Downloads last month
160
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Abhinav-Anand/MiniMerge-0.1B