|
|
--- |
|
|
base_model: [] |
|
|
library_name: transformers |
|
|
tags: |
|
|
- mergekit |
|
|
- merge |
|
|
|
|
|
--- |
|
|
|
|
|
Use ChatML or MistralNemo format. |
|
|
|
|
|
Conclusion: These types of merge methods tend to work better when at least 1 model has a much higher weight then the rest |
|
|
|
|
|
After further testing this is the best Nemo model I have ever used |
|
|
|
|
|
### Configuration |
|
|
|
|
|
The following YAML configuration was used to produce this model: |
|
|
|
|
|
```yaml |
|
|
models: |
|
|
- model: mistral-nemo-gutenberg-12B-v4 |
|
|
parameters: |
|
|
weight: 0.2 |
|
|
- model: Violet_Twilight-v0.2 |
|
|
parameters: |
|
|
weight: 0.3 |
|
|
- model: Lyra-Gutenberg-mistral-nemo-12B |
|
|
parameters: |
|
|
weight: 0.5 |
|
|
- model: Grey-12b |
|
|
parameters: |
|
|
weight: 0.2 |
|
|
base_model: Mistral-Nemo-Base-2407 |
|
|
parameters: |
|
|
density: 0.5 |
|
|
epsilon: 0.1 |
|
|
lambda: 1.1 |
|
|
normalize: false |
|
|
int8_mask: true |
|
|
rescale: true |
|
|
merge_method: della_linear |
|
|
tokenizer: |
|
|
source: union |
|
|
dtype: bfloat16 |
|
|
``` |
|
|
|