mllm-dev commited on
Commit
4b53cc6
·
verified ·
1 Parent(s): 2fcf3bd

Upload folder using huggingface_hub

Browse files
README.md CHANGED
@@ -1,10 +1,10 @@
1
  ---
2
  base_model:
3
  - mllm-dev/gpt2_f_experiment_4
 
4
  - mllm-dev/gpt2_f_experiment_1
5
  - mllm-dev/gpt2_f_experiment_0
6
  - mllm-dev/gpt2_f_experiment_3
7
- - mllm-dev/gpt2_f_experiment_2
8
  library_name: transformers
9
  tags:
10
  - mergekit
@@ -18,24 +18,29 @@ This is a merge of pre-trained language models created using [mergekit](https://
18
  ## Merge Details
19
  ### Merge Method
20
 
21
- This model was merged using the [linear](https://arxiv.org/abs/2203.05482) merge method.
22
 
23
  ### Models Merged
24
 
25
  The following models were included in the merge:
26
  * [mllm-dev/gpt2_f_experiment_4](https://huggingface.co/mllm-dev/gpt2_f_experiment_4)
 
27
  * [mllm-dev/gpt2_f_experiment_1](https://huggingface.co/mllm-dev/gpt2_f_experiment_1)
28
- * [mllm-dev/gpt2_f_experiment_0](https://huggingface.co/mllm-dev/gpt2_f_experiment_0)
29
  * [mllm-dev/gpt2_f_experiment_3](https://huggingface.co/mllm-dev/gpt2_f_experiment_3)
30
- * [mllm-dev/gpt2_f_experiment_2](https://huggingface.co/mllm-dev/gpt2_f_experiment_2)
31
 
32
  ### Configuration
33
 
34
  The following YAML configuration was used to produce this model:
35
 
36
  ```yaml
 
 
 
37
  dtype: float16
38
- merge_method: linear
 
 
 
39
  slices:
40
  - sources:
41
  - layer_range: [0, 12]
@@ -43,29 +48,34 @@ slices:
43
  model:
44
  path: mllm-dev/gpt2_f_experiment_0
45
  parameters:
 
46
  weight: 1.0
47
  - layer_range: [0, 12]
48
  model:
49
  model:
50
  path: mllm-dev/gpt2_f_experiment_1
51
  parameters:
 
52
  weight: 1.0
53
  - layer_range: [0, 12]
54
  model:
55
  model:
56
  path: mllm-dev/gpt2_f_experiment_2
57
  parameters:
 
58
  weight: 1.0
59
  - layer_range: [0, 12]
60
  model:
61
  model:
62
  path: mllm-dev/gpt2_f_experiment_3
63
  parameters:
 
64
  weight: 1.0
65
  - layer_range: [0, 12]
66
  model:
67
  model:
68
  path: mllm-dev/gpt2_f_experiment_4
69
  parameters:
 
70
  weight: 1.0
71
  ```
 
1
  ---
2
  base_model:
3
  - mllm-dev/gpt2_f_experiment_4
4
+ - mllm-dev/gpt2_f_experiment_2
5
  - mllm-dev/gpt2_f_experiment_1
6
  - mllm-dev/gpt2_f_experiment_0
7
  - mllm-dev/gpt2_f_experiment_3
 
8
  library_name: transformers
9
  tags:
10
  - mergekit
 
18
  ## Merge Details
19
  ### Merge Method
20
 
21
+ This model was merged using the [DARE](https://arxiv.org/abs/2311.03099) [TIES](https://arxiv.org/abs/2306.01708) merge method using [mllm-dev/gpt2_f_experiment_0](https://huggingface.co/mllm-dev/gpt2_f_experiment_0) as a base.
22
 
23
  ### Models Merged
24
 
25
  The following models were included in the merge:
26
  * [mllm-dev/gpt2_f_experiment_4](https://huggingface.co/mllm-dev/gpt2_f_experiment_4)
27
+ * [mllm-dev/gpt2_f_experiment_2](https://huggingface.co/mllm-dev/gpt2_f_experiment_2)
28
  * [mllm-dev/gpt2_f_experiment_1](https://huggingface.co/mllm-dev/gpt2_f_experiment_1)
 
29
  * [mllm-dev/gpt2_f_experiment_3](https://huggingface.co/mllm-dev/gpt2_f_experiment_3)
 
30
 
31
  ### Configuration
32
 
33
  The following YAML configuration was used to produce this model:
34
 
35
  ```yaml
36
+ base_model:
37
+ model:
38
+ path: mllm-dev/gpt2_f_experiment_0
39
  dtype: float16
40
+ merge_method: dare_ties
41
+ parameters:
42
+ int8_mask: 1.0
43
+ normalize: 1.0
44
  slices:
45
  - sources:
46
  - layer_range: [0, 12]
 
48
  model:
49
  path: mllm-dev/gpt2_f_experiment_0
50
  parameters:
51
+ density: 1.0
52
  weight: 1.0
53
  - layer_range: [0, 12]
54
  model:
55
  model:
56
  path: mllm-dev/gpt2_f_experiment_1
57
  parameters:
58
+ density: 1.0
59
  weight: 1.0
60
  - layer_range: [0, 12]
61
  model:
62
  model:
63
  path: mllm-dev/gpt2_f_experiment_2
64
  parameters:
65
+ density: 1.0
66
  weight: 1.0
67
  - layer_range: [0, 12]
68
  model:
69
  model:
70
  path: mllm-dev/gpt2_f_experiment_3
71
  parameters:
72
+ density: 1.0
73
  weight: 1.0
74
  - layer_range: [0, 12]
75
  model:
76
  model:
77
  path: mllm-dev/gpt2_f_experiment_4
78
  parameters:
79
+ density: 1.0
80
  weight: 1.0
81
  ```
config.json CHANGED
@@ -1,5 +1,5 @@
1
  {
2
- "_name_or_path": "mllm-dev/gpt2_f_experiment_4",
3
  "activation_function": "gelu_new",
4
  "architectures": [
5
  "GPT2ForSequenceClassification"
 
1
  {
2
+ "_name_or_path": "mllm-dev/gpt2_f_experiment_0",
3
  "activation_function": "gelu_new",
4
  "architectures": [
5
  "GPT2ForSequenceClassification"
mergekit_config.yml CHANGED
@@ -1,5 +1,11 @@
 
 
 
1
  dtype: float16
2
- merge_method: linear
 
 
 
3
  slices:
4
  - sources:
5
  - layer_range: [0, 12]
@@ -7,28 +13,33 @@ slices:
7
  model:
8
  path: mllm-dev/gpt2_f_experiment_0
9
  parameters:
 
10
  weight: 1.0
11
  - layer_range: [0, 12]
12
  model:
13
  model:
14
  path: mllm-dev/gpt2_f_experiment_1
15
  parameters:
 
16
  weight: 1.0
17
  - layer_range: [0, 12]
18
  model:
19
  model:
20
  path: mllm-dev/gpt2_f_experiment_2
21
  parameters:
 
22
  weight: 1.0
23
  - layer_range: [0, 12]
24
  model:
25
  model:
26
  path: mllm-dev/gpt2_f_experiment_3
27
  parameters:
 
28
  weight: 1.0
29
  - layer_range: [0, 12]
30
  model:
31
  model:
32
  path: mllm-dev/gpt2_f_experiment_4
33
  parameters:
 
34
  weight: 1.0
 
1
+ base_model:
2
+ model:
3
+ path: mllm-dev/gpt2_f_experiment_0
4
  dtype: float16
5
+ merge_method: dare_ties
6
+ parameters:
7
+ int8_mask: 1.0
8
+ normalize: 1.0
9
  slices:
10
  - sources:
11
  - layer_range: [0, 12]
 
13
  model:
14
  path: mllm-dev/gpt2_f_experiment_0
15
  parameters:
16
+ density: 1.0
17
  weight: 1.0
18
  - layer_range: [0, 12]
19
  model:
20
  model:
21
  path: mllm-dev/gpt2_f_experiment_1
22
  parameters:
23
+ density: 1.0
24
  weight: 1.0
25
  - layer_range: [0, 12]
26
  model:
27
  model:
28
  path: mllm-dev/gpt2_f_experiment_2
29
  parameters:
30
+ density: 1.0
31
  weight: 1.0
32
  - layer_range: [0, 12]
33
  model:
34
  model:
35
  path: mllm-dev/gpt2_f_experiment_3
36
  parameters:
37
+ density: 1.0
38
  weight: 1.0
39
  - layer_range: [0, 12]
40
  model:
41
  model:
42
  path: mllm-dev/gpt2_f_experiment_4
43
  parameters:
44
+ density: 1.0
45
  weight: 1.0
model-00001-of-00001.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:109d22198c42220534f2b55ff9566334f14c2d3c6976f90d83b3d654b92dbc74
3
  size 248902264
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:11322e145e7b61665593903f460c972df3374b662f8ac11f087211938f7fd91c
3
  size 248902264