Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch
Paper
• 2311.03099 • Published
• 30
This is a merge of pre-trained language models created using mergekit.
This model was merged using the DARE TIES merge method using Novaciano/Think.NPC-1B as a base.
The following models were included in the merge:
The following YAML configuration was used to produce this model:
merge_method: dare_ties
dtype: float16
out_dtype: float16
base_model: Novaciano/Think.NPC-1B
models:
- model: distil-labs/Distil-NPC-gemma-3-1b-it
parameters:
weight: 0.45
density: 0.32
- model: wexyyyyyy/gemma-3-1b-it-heretic
parameters:
weight: 0.35
density: 0.32
parameters:
t: 0.25 # menos interpolación → más dominancia del base
lambda: -0.62 # más negativo para matar cualquier alineamiento residual
normalize: false
rescale: true
rescale_factor: 1.28 # subí un toque para amplificar el trash y degeneración
memory_efficient: true
low_cpu_mem_usage: true
layer_range:
- value: [5, 22] # protejo más los embeddings y lm_head
tie_word_embeddings: true
tie_output_embeddings: true