DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based Sampling
Paper
•
2406.11617
•
Published
•
8
NemoMix-12B-DellaV1a is an experimental merge of the following models using the DELLA method with mergekit:
EDIT: There seem to be tokenizer issues. I would probably have to merge with the base model instead of instruct I'm guessing. Don't bother.
models:
- model: BeaverAI/mistral-doryV2-12b
parameters:
weight: 0.20
density: 0.42
- model: NeverSleep/Lumimaid-v0.2-12B
parameters:
weight: 0.22
density: 0.54
- model: intervitens/mini-magnum-12b-v1.1
parameters:
weight: 0.24
density: 0.66
- model: grimjim/mistralai-Mistral-Nemo-Instruct-2407
parameters:
weight: 0.34
density: 0.78
merge_method: della
base_model: grimjim/mistralai-Mistral-Nemo-Instruct-2407
parameters:
int8_mask: true
epsilon: 0.1
lambda: 1.0
density: 0.7
dtype: bfloat16