FINGU-AI commited on
Commit
d2a5483
·
verified ·
1 Parent(s): 5b124b5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +41 -52
README.md CHANGED
@@ -2,73 +2,62 @@
2
  library_name: transformers
3
  license: mit
4
  ---
5
- ### Model Card for **Phi-4 Merge Model**
6
 
7
- #### **Model Description**
8
- This is a merged model combining pre-trained language models with specific configurations and parameters, created using MergeKit. The merge focuses on balancing reasoning, stock capabilities, and performance enhancements derived from both parent models.
9
 
10
  ---
11
 
12
- ### **Merge Details**
13
 
14
- **Merge Method**
15
- The model was merged using the **SLERP merge method**, preserving key features from each base model to optimize reasoning, general knowledge, and understanding.
 
 
 
16
 
17
  ---
18
 
19
- ### **Models Merged**
20
- The following models were included in this merge:
21
 
22
- 1. **bunnycore/Phi-4-RR-Shoup** contributing **40.95%**
23
- 2. **bunnycore/Phi-4-Model-Stock-v4** – contributing **41.03%**
24
 
25
  ---
26
 
27
- ### **Configuration**
28
  The following YAML configuration was used to produce this merged model:
29
 
30
  ```yaml
31
- models:
 
32
  - model: bunnycore/Phi-4-RR-Shoup
33
- parameters:
34
- weight: 0.4095
 
35
  - model: bunnycore/Phi-4-Model-Stock-v4
36
- parameters:
37
- weight: 0.4103
38
- merge_method: linear
39
- normalize: false
40
- int8_mask: true
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
41
  dtype: bfloat16
42
- ```
43
-
44
- ---
45
-
46
- ### **Open LLM Leaderboard Evaluation Results**
47
- Below are the evaluation metrics achieved by this merged model:
48
-
49
- | **Metric** | **Score** |
50
- |-------------------------|------------|
51
- | **Avg.** | **40.95** |
52
- | **IFEval (0-Shot)** | **65.87** |
53
- | **BBH (3-Shot)** | **56.11** |
54
- | **MATH Lvl 5 (4-Shot)** | **47.96** |
55
- | **GPQA (0-shot)** | **11.63** |
56
- | **MuSR (0-shot)** | **14.94** |
57
- | **MMLU-PRO (5-shot)** | **49.21** |
58
-
59
- ---
60
-
61
- ### **Potential Use Cases**
62
- This merged model can be applied in various NLP tasks, including but not limited to:
63
-
64
- - **Zero-shot and Few-shot Reasoning**
65
- - **Mathematical Problem Solving**
66
- - **General Knowledge Question Answering**
67
- - **Multi-task Learning for Professional Knowledge Areas (MMLU)**
68
-
69
- ---
70
-
71
- ### **License and Usage**
72
- Ensure compliance with the licenses of the merged models. This merged model inherits licenses from all parent models, and the user is advised to review and adhere to individual model licenses.
73
-
74
- **Disclaimer:** This model is provided as-is without warranty. Performance may vary based on specific tasks or evaluation benchmarks.
 
2
  library_name: transformers
3
  license: mit
4
  ---
5
+ # Phi-4 SLERP Merge Model
6
 
7
+ ## Model Description
8
+ This is a merged language model created using the **Spherical Linear Interpolation (SLERP) merge method**, allowing for a smooth blend of features from both parent models across different layers. The merge optimizes reasoning, general knowledge, and task-specific performance by strategically interpolating attention and MLP components.
9
 
10
  ---
11
 
12
+ ## Merge Details
13
 
14
+ **Merge Method:**
15
+ The model was merged using **SLERP (Spherical Linear Interpolation)** rather than a traditional linear merge, ensuring a well-balanced combination of both source models while maintaining coherent weight transitions.
16
+
17
+ **Base Model:**
18
+ - **bunnycore/Phi-4-RR-Shoup** (used as the primary base)
19
 
20
  ---
21
 
22
+ ## Models Merged
23
+ The following models were included in this merge:
24
 
25
+ 1. **bunnycore/Phi-4-RR-Shoup** (Primary base)
26
+ 2. **bunnycore/Phi-4-Model-Stock-v4**
27
 
28
  ---
29
 
30
+ ## Configuration
31
  The following YAML configuration was used to produce this merged model:
32
 
33
  ```yaml
34
+ slices:
35
+ - sources:
36
  - model: bunnycore/Phi-4-RR-Shoup
37
+ layer_range:
38
+ - 0
39
+ - 32
40
  - model: bunnycore/Phi-4-Model-Stock-v4
41
+ layer_range:
42
+ - 0
43
+ - 32
44
+ merge_method: slerp
45
+ base_model: bunnycore/Phi-4-RR-Shoup
46
+ parameters:
47
+ t:
48
+ - filter: self_attn
49
+ value:
50
+ - 0
51
+ - 0.5
52
+ - 0.3
53
+ - 0.7
54
+ - 1
55
+ - filter: mlp
56
+ value:
57
+ - 1
58
+ - 0.5
59
+ - 0.7
60
+ - 0.3
61
+ - 0
62
+ - value: 0.5
63
  dtype: bfloat16