|
|
--- |
|
|
library_name: transformers |
|
|
license: mit |
|
|
--- |
|
|
# Phi-4 SLERP Merge Model |
|
|
|
|
|
## Model Description |
|
|
This is a merged language model created using the **Spherical Linear Interpolation (SLERP) merge method**, allowing for a smooth blend of features from both parent models across different layers. The merge optimizes reasoning, general knowledge, and task-specific performance by strategically interpolating attention and MLP components. |
|
|
|
|
|
--- |
|
|
|
|
|
## Merge Details |
|
|
|
|
|
**Merge Method:** |
|
|
The model was merged using **SLERP (Spherical Linear Interpolation)** rather than a traditional linear merge, ensuring a well-balanced combination of both source models while maintaining coherent weight transitions. |
|
|
|
|
|
**Base Model:** |
|
|
- **bunnycore/Phi-4-RR-Shoup** (used as the primary base) |
|
|
|
|
|
--- |
|
|
|
|
|
## Models Merged |
|
|
The following models were included in this merge: |
|
|
|
|
|
1. **bunnycore/Phi-4-RR-Shoup** (Primary base) |
|
|
2. **bunnycore/Phi-4-Model-Stock-v4** |
|
|
|
|
|
--- |
|
|
|
|
|
## Configuration |
|
|
The following YAML configuration was used to produce this merged model: |
|
|
|
|
|
```yaml |
|
|
slices: |
|
|
- sources: |
|
|
- model: bunnycore/Phi-4-RR-Shoup |
|
|
layer_range: |
|
|
- 0 |
|
|
- 32 |
|
|
- model: bunnycore/Phi-4-Model-Stock-v4 |
|
|
layer_range: |
|
|
- 0 |
|
|
- 32 |
|
|
merge_method: slerp |
|
|
base_model: bunnycore/Phi-4-RR-Shoup |
|
|
parameters: |
|
|
t: |
|
|
- filter: self_attn |
|
|
value: |
|
|
- 0 |
|
|
- 0.5 |
|
|
- 0.3 |
|
|
- 0.7 |
|
|
- 1 |
|
|
- filter: mlp |
|
|
value: |
|
|
- 1 |
|
|
- 0.5 |
|
|
- 0.7 |
|
|
- 0.3 |
|
|
- 0 |
|
|
- value: 0.5 |
|
|
dtype: bfloat16 |
|
|
|