Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
Paper
• 2203.05482 • Published
• 7
This is a merge of pre-trained language models created using mergekit.
This model was merged using the Linear merge method using SousiOmine/Kuroiso-CR-7B-20250124 as a base.
The following models were included in the merge:
The following YAML configuration was used to produce this model:
models:
- layer_range: [0, 28]
model: bunnycore/Blabbertron-1.0
parameters:
weight: 1
density: 0.9
gamma: 0.01
normalize: true
int8_mask: true
random_seed: 0
temperature: 0.5
top_p: 0.65
inference: true
max_tokens: 999999999
stream: true
quantization:
method: int8
value: 100
quantization:
method: int4
value: 100
merge_method: linear
base_model: SousiOmine/Kuroiso-CR-7B-20250124
weight: 1
density: 0.9
gamma: 0.01
normalize: true
int8_mask: true
random_seed: 0
temperature: 0.5
top_p: 0.65
inference: true
max_tokens: 999999999
stream: true
quantization:
method: int8
value: 100
quantization:
method: int4
value: 100
dtype: float16
parameters:
weight: 1
density: 0.9
gamma: 0.01
normalize: true
int8_mask: true
random_seed: 0
temperature: 0.5
top_p: 0.65
inference: true
max_tokens: 999999999
stream: true
quantization:
method: int8
value: 100
quantization:
method: int4
value: 100