Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch
Paper
•
2311.03099
•
Published
•
30
This merge seeks to improve upon the successful V2 model by using a more uncensored Llama 3.1 model over Lexi, and increasing the density to 1.0 from 0.8.
Merges with higher densities have shown consistent improvement, and an earlier Evolve Merge test showed that the best density with this model configuration was at 1.0.
This model was merged using the DARE TIES merge method using unsloth/Meta-Llama-3.1-8B-Instruct as a base.
The following models were included in the merge:
The following YAML configuration was used to produce this model:
base_model: unsloth/Meta-Llama-3.1-8B-Instruct
dtype: bfloat16
merge_method: dare_ties
slices:
- sources:
- layer_range: [0, 32]
model: akjindal53244/Llama-3.1-Storm-8B
parameters:
density: 1.0
weight: 0.25
- layer_range: [0, 32]
model: arcee-ai/Llama-3.1-SuperNova-Lite
parameters:
density: 1.0
weight: 0.33
- layer_range: [0, 32]
model: SicariusSicariiStuff/LLAMA-3_8B_Unaligned_BETA
parameters:
density: 1.0
weight: 0.42
- layer_range: [0, 32]
model: unsloth/Meta-Llama-3.1-8B-Instruct
tokenizer_source: base
Detailed results can be found here! Summarized results can be found here!
| Metric | Value |
|---|---|
| Avg. | 28.44 |
| IFEval (0-Shot) | 77.86 |
| BBH (3-Shot) | 29.56 |
| MATH Lvl 5 (4-Shot) | 14.65 |
| GPQA (0-shot) | 6.26 |
| MuSR (0-shot) | 11.09 |
| MMLU-PRO (5-shot) | 31.25 |