SmolLM-135M-SLERP-Merge
Overview
A SLERP (Spherical Linear Interpolation) merge of SmolLM-135M (base) and SmolLM-135M-Instruct. This "chimera" model blends raw language modeling capabilities with instruction-following abilities, creating a balanced model that inherits strengths from both parents.
Key Features
- Dual heritage: Combines base + instruct capabilities
- SLERP merge: Uses spherical interpolation for better weight preservation
- Ultra-small: Only ~513 MB total
- No training required: Pure weight-space interpolation
Merge Details
- Model A (base): HuggingFaceTB/SmolLM-135M
- Model B (instruct): HuggingFaceTB/SmolLM-135M-Instruct
- Method: SLERP (Spherical Linear Interpolation)
- Interpolation factor: t=0.6 (60% instruct, 40% base)
- Weight matrices: SLERP interpolation
- Biases/norms: Linear interpolation
Why SLERP?
Linear interpolation (lerp) can reduce the magnitude of weight vectors, potentially degrading model quality. SLERP interpolates along the surface of a hypersphere, preserving vector magnitudes while smoothly transitioning between the two models. This typically produces higher quality merges.
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("Ringkvist/SmolLM-135M-SLERP-Merge")
tokenizer = AutoTokenizer.from_pretrained("Ringkvist/SmolLM-135M-SLERP-Merge")
inputs = tokenizer("Explain what photosynthesis is:", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Merge Recipe (for reproduction with mergekit)
slices:
- sources:
- model: HuggingFaceTB/SmolLM-135M
layer_range: [0, 30]
- model: HuggingFaceTB/SmolLM-135M-Instruct
layer_range: [0, 30]
merge_method: slerp
base_model: HuggingFaceTB/SmolLM-135M
parameters:
t:
- value: 0.6
dtype: float16
Parent Models
| Model | Role | Description |
|---|---|---|
| HuggingFaceTB/SmolLM-135M | Base | Raw language modeling |
| HuggingFaceTB/SmolLM-135M-Instruct | Instruct | Instruction following |
- Downloads last month
- 160
Model tree for Abhinav-Anand/MiniMerge-0.1B
Merge model
this model