Update README.md
Browse files
README.md
CHANGED
|
@@ -2,73 +2,62 @@
|
|
| 2 |
library_name: transformers
|
| 3 |
license: mit
|
| 4 |
---
|
| 5 |
-
|
| 6 |
|
| 7 |
-
|
| 8 |
-
This is a merged model
|
| 9 |
|
| 10 |
---
|
| 11 |
|
| 12 |
-
|
| 13 |
|
| 14 |
-
**Merge Method
|
| 15 |
-
The model was merged using
|
|
|
|
|
|
|
|
|
|
| 16 |
|
| 17 |
---
|
| 18 |
|
| 19 |
-
|
| 20 |
-
The following models were included in this merge:
|
| 21 |
|
| 22 |
-
1. **bunnycore/Phi-4-RR-Shoup**
|
| 23 |
-
2. **bunnycore/Phi-4-Model-Stock-v4**
|
| 24 |
|
| 25 |
---
|
| 26 |
|
| 27 |
-
|
| 28 |
The following YAML configuration was used to produce this merged model:
|
| 29 |
|
| 30 |
```yaml
|
| 31 |
-
|
|
|
|
| 32 |
- model: bunnycore/Phi-4-RR-Shoup
|
| 33 |
-
|
| 34 |
-
|
|
|
|
| 35 |
- model: bunnycore/Phi-4-Model-Stock-v4
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 41 |
dtype: bfloat16
|
| 42 |
-
```
|
| 43 |
-
|
| 44 |
-
---
|
| 45 |
-
|
| 46 |
-
### **Open LLM Leaderboard Evaluation Results**
|
| 47 |
-
Below are the evaluation metrics achieved by this merged model:
|
| 48 |
-
|
| 49 |
-
| **Metric** | **Score** |
|
| 50 |
-
|-------------------------|------------|
|
| 51 |
-
| **Avg.** | **40.95** |
|
| 52 |
-
| **IFEval (0-Shot)** | **65.87** |
|
| 53 |
-
| **BBH (3-Shot)** | **56.11** |
|
| 54 |
-
| **MATH Lvl 5 (4-Shot)** | **47.96** |
|
| 55 |
-
| **GPQA (0-shot)** | **11.63** |
|
| 56 |
-
| **MuSR (0-shot)** | **14.94** |
|
| 57 |
-
| **MMLU-PRO (5-shot)** | **49.21** |
|
| 58 |
-
|
| 59 |
-
---
|
| 60 |
-
|
| 61 |
-
### **Potential Use Cases**
|
| 62 |
-
This merged model can be applied in various NLP tasks, including but not limited to:
|
| 63 |
-
|
| 64 |
-
- **Zero-shot and Few-shot Reasoning**
|
| 65 |
-
- **Mathematical Problem Solving**
|
| 66 |
-
- **General Knowledge Question Answering**
|
| 67 |
-
- **Multi-task Learning for Professional Knowledge Areas (MMLU)**
|
| 68 |
-
|
| 69 |
-
---
|
| 70 |
-
|
| 71 |
-
### **License and Usage**
|
| 72 |
-
Ensure compliance with the licenses of the merged models. This merged model inherits licenses from all parent models, and the user is advised to review and adhere to individual model licenses.
|
| 73 |
-
|
| 74 |
-
**Disclaimer:** This model is provided as-is without warranty. Performance may vary based on specific tasks or evaluation benchmarks.
|
|
|
|
| 2 |
library_name: transformers
|
| 3 |
license: mit
|
| 4 |
---
|
| 5 |
+
# Phi-4 SLERP Merge Model
|
| 6 |
|
| 7 |
+
## Model Description
|
| 8 |
+
This is a merged language model created using the **Spherical Linear Interpolation (SLERP) merge method**, allowing for a smooth blend of features from both parent models across different layers. The merge optimizes reasoning, general knowledge, and task-specific performance by strategically interpolating attention and MLP components.
|
| 9 |
|
| 10 |
---
|
| 11 |
|
| 12 |
+
## Merge Details
|
| 13 |
|
| 14 |
+
**Merge Method:**
|
| 15 |
+
The model was merged using **SLERP (Spherical Linear Interpolation)** rather than a traditional linear merge, ensuring a well-balanced combination of both source models while maintaining coherent weight transitions.
|
| 16 |
+
|
| 17 |
+
**Base Model:**
|
| 18 |
+
- **bunnycore/Phi-4-RR-Shoup** (used as the primary base)
|
| 19 |
|
| 20 |
---
|
| 21 |
|
| 22 |
+
## Models Merged
|
| 23 |
+
The following models were included in this merge:
|
| 24 |
|
| 25 |
+
1. **bunnycore/Phi-4-RR-Shoup** (Primary base)
|
| 26 |
+
2. **bunnycore/Phi-4-Model-Stock-v4**
|
| 27 |
|
| 28 |
---
|
| 29 |
|
| 30 |
+
## Configuration
|
| 31 |
The following YAML configuration was used to produce this merged model:
|
| 32 |
|
| 33 |
```yaml
|
| 34 |
+
slices:
|
| 35 |
+
- sources:
|
| 36 |
- model: bunnycore/Phi-4-RR-Shoup
|
| 37 |
+
layer_range:
|
| 38 |
+
- 0
|
| 39 |
+
- 32
|
| 40 |
- model: bunnycore/Phi-4-Model-Stock-v4
|
| 41 |
+
layer_range:
|
| 42 |
+
- 0
|
| 43 |
+
- 32
|
| 44 |
+
merge_method: slerp
|
| 45 |
+
base_model: bunnycore/Phi-4-RR-Shoup
|
| 46 |
+
parameters:
|
| 47 |
+
t:
|
| 48 |
+
- filter: self_attn
|
| 49 |
+
value:
|
| 50 |
+
- 0
|
| 51 |
+
- 0.5
|
| 52 |
+
- 0.3
|
| 53 |
+
- 0.7
|
| 54 |
+
- 1
|
| 55 |
+
- filter: mlp
|
| 56 |
+
value:
|
| 57 |
+
- 1
|
| 58 |
+
- 0.5
|
| 59 |
+
- 0.7
|
| 60 |
+
- 0.3
|
| 61 |
+
- 0
|
| 62 |
+
- value: 0.5
|
| 63 |
dtype: bfloat16
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|