Nohobby
/

MS3-test-Merge-1

Text Generation

text-generation-inference

Model card Files Files and versions

Nohobby commited on Feb 3, 2025

Commit

a4279a6

·

verified ·

1 Parent(s): 8e8a39b

Update README.md

Files changed (1) hide show

README.md +30 -15

README.md CHANGED Viewed

@@ -8,29 +8,44 @@ tags:
 ---
 # merge
-This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
-## Merge Details
-### Merge Method
-This model was merged using the [Linear DELLA](https://arxiv.org/abs/2406.11617) merge method using [Nohobby/ignore_MS3-test-UNHOLY](https://huggingface.co/Nohobby/ignore_MS3-test-UNHOLY) as a base.
-### Models Merged
-The following models were included in the merge:
-* [unsloth/Mistral-Small-24B-Instruct-2501](https://huggingface.co/unsloth/Mistral-Small-24B-Instruct-2501)
-### Configuration
-The following YAML configuration was used to produce this model:
 ```yaml
 dtype: bfloat16
 tokenizer_source: base
 merge_method: della_linear
 parameters:
   density: 0.55
-base_model: Nohobby/ignore_MS3-test-UNHOLY
 models:
   - model: unsloth/Mistral-Small-24B-Instruct-2501
     parameters:
@@ -46,7 +61,7 @@ models:
         - filter: down_proj
           value: [1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0]
         - value: 0
-  - model: Nohobby/ignore_MS3-test-UNHOLY
     parameters:
       weight:
         - filter: v_proj
@@ -60,4 +75,4 @@ models:
         - filter: down_proj
           value: [0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1]
         - value: 1
-```

 ---
 # merge
+I haven't tried the untuned MS3 before messing around with the merge. But I don't think it's all that different from this thing. It's not like there's no influence from the tuned adapters at all, it's just less than I expected. That might be for the better, though. The result is usable as is.
+Will use this as part of upcoming merges when there is enough fuel.
+## Merge Details
+### Step1
+```yaml
+models:
+  - model: unsloth/Mistral-Small-24B-Base-2501
+  - model: unsloth/Mistral-Small-24B-Instruct-2501+ToastyPigeon/new-ms-rp-test-ws
+    parameters:
+        select_topk:
+          - value: [0.05, 0.03, 0.02, 0.02, 0.01]
+  - model: unsloth/Mistral-Small-24B-Instruct-2501+estrogen/MS2501-24b-Ink-ep2-adpt
+    parameters:
+        select_topk: 0.1
+  - model: trashpanda-org/MS-24B-Instruct-Mullein-v0
+    parameters:
+        select_topk: 0.4
+base_model: unsloth/Mistral-Small-24B-Base-2501
+merge_method: sce
+parameters:
+  int8_mask: true
+  rescale: true
+  normalize: true
+dtype: bfloat16
+tokenizer_source: base
+```
+### Step2
 ```yaml
 dtype: bfloat16
 tokenizer_source: base
 merge_method: della_linear
 parameters:
   density: 0.55
+base_model: Step1
 models:
   - model: unsloth/Mistral-Small-24B-Instruct-2501
     parameters:
         - filter: down_proj
           value: [1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0]
         - value: 0
+  - model: Step1
     parameters:
       weight:
         - filter: v_proj
         - filter: down_proj
           value: [0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1]
         - value: 1
+```