DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based Sampling
Paper
• 2406.11617 • Published
• 10
This is a merge of pre-trained language models created using mergekit.
This model was merged using the Linear DELLA merge method using BioMistral/BioMistral-7B as a base.
The following models were included in the merge:
The following YAML configuration was used to produce this model:
models:
- model: BioMistral/BioMistral-7B
parameters:
density: 1 # Fraction of weights in differences from the base model to retain
epsilon: 0.1 # Maximum change in drop probability based on magnitude (range: density ± epsilon)
weight: 1
lambda: 0.9
merge_method: della_linear
base_model: BioMistral/BioMistral-7B
parameters:
density: 1 # Fraction of weights in differences from the base model to retain
epsilon: 0.1 # Maximum change in drop probability (range: density ± epsilon, ensure 0 <= value <= 1)
weight: 1
lambda: 0.9
normalize: true
int8_mask: true
dtype: bfloat16
name: OdiaGenAI/mistral_hindi_7b_base_v1
Base model
BioMistral/BioMistral-7B