|
|
--- |
|
|
library_name: transformers |
|
|
tags: |
|
|
- mergekit |
|
|
- merge |
|
|
--- |
|
|
# iceblink-v3e |
|
|
|
|
|
This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit). |
|
|
|
|
|
## Merge Details |
|
|
### Merge Method |
|
|
|
|
|
This model was merged using the [SLERP](https://en.wikipedia.org/wiki/Slerp) merge method. |
|
|
|
|
|
### Models Merged |
|
|
|
|
|
The following models were included in the merge: |
|
|
* [ApocalypseParty/derestricted-iceblink](https://huggingface.co/ApocalypseParty/derestricted-iceblink) |
|
|
* [zai-org/GLM-4.5-Air](https://huggingface.co/zai-org/GLM-4.5-Air) |
|
|
|
|
|
### Configuration |
|
|
|
|
|
The following YAML configuration was used to produce this model: |
|
|
|
|
|
```yaml |
|
|
models: |
|
|
- model: ApocalypseParty/derestricted-iceblink |
|
|
- model: zai-org/GLM-4.5-Air |
|
|
merge_method: slerp |
|
|
base_model: zai-org/GLM-4.5-Air |
|
|
parameters: |
|
|
t: |
|
|
# βββββββββββββββββββββββββββββββββββββββββββββββ |
|
|
# UNTRAINED - pure base |
|
|
# βββββββββββββββββββββββββββββββββββββββββββββββ |
|
|
|
|
|
- filter: embed_tokens |
|
|
value: 0.0 |
|
|
- filter: layernorm |
|
|
value: 0.0 |
|
|
- filter: e_score_correction |
|
|
value: 0.0 |
|
|
- filter: layers.46. |
|
|
value: 0.0 |
|
|
|
|
|
# βββββββββββββββββββββββββββββββββββββββββββββββ |
|
|
# ROUTER |
|
|
# βββββββββββββββββββββββββββββββββββββββββββββββ |
|
|
|
|
|
- filter: mlp.gate.weight |
|
|
value: 0.30 |
|
|
|
|
|
# βββββββββββββββββββββββββββββββββββββββββββββββ |
|
|
# LAYER 0 (dense) |
|
|
# βββββββββββββββββββββββββββββββββββββββββββββββ |
|
|
|
|
|
- filter: layers.0. |
|
|
value: 0.35 |
|
|
|
|
|
# βββββββββββββββββββββββββββββββββββββββββββββββ |
|
|
# SHARED EXPERTS |
|
|
# βββββββββββββββββββββββββββββββββββββββββββββββ |
|
|
|
|
|
- filter: mlp.shared |
|
|
value: 0.55 |
|
|
|
|
|
# βββββββββββββββββββββββββββββββββββββββββββββββ |
|
|
# LAYERS ATTN |
|
|
# βββββββββββββββββββββββββββββββββββββββββββββββ |
|
|
|
|
|
- filter: layers.1.self_attn |
|
|
value: 0.35 |
|
|
- filter: layers.2.self_attn |
|
|
value: 0.35 |
|
|
- filter: layers.3.self_attn |
|
|
value: 0.34 |
|
|
- filter: layers.4.self_attn |
|
|
value: 0.34 |
|
|
- filter: layers.5.self_attn |
|
|
value: 0.33 |
|
|
- filter: layers.6.self_attn |
|
|
value: 0.33 |
|
|
- filter: layers.7.self_attn |
|
|
value: 0.32 |
|
|
- filter: layers.8.self_attn |
|
|
value: 0.32 |
|
|
- filter: layers.9.self_attn |
|
|
value: 0.30 |
|
|
- filter: layers.10.self_attn |
|
|
value: 0.28 |
|
|
- filter: layers.11.self_attn |
|
|
value: 0.28 |
|
|
- filter: layers.12.self_attn |
|
|
value: 0.28 |
|
|
- filter: layers.13.self_attn |
|
|
value: 0.28 |
|
|
- filter: layers.14.self_attn |
|
|
value: 0.28 |
|
|
- filter: layers.15.self_attn |
|
|
value: 0.28 |
|
|
- filter: layers.16.self_attn |
|
|
value: 0.28 |
|
|
- filter: layers.17.self_attn |
|
|
value: 0.28 |
|
|
- filter: layers.18.self_attn |
|
|
value: 0.30 |
|
|
- filter: layers.19.self_attn |
|
|
value: 0.32 |
|
|
- filter: layers.20.self_attn |
|
|
value: 0.34 |
|
|
- filter: layers.21.self_attn |
|
|
value: 0.36 |
|
|
- filter: layers.22.self_attn |
|
|
value: 0.38 |
|
|
- filter: layers.23.self_attn |
|
|
value: 0.42 |
|
|
- filter: layers.24.self_attn |
|
|
value: 0.45 |
|
|
- filter: layers.25.self_attn |
|
|
value: 0.48 |
|
|
- filter: layers.26.self_attn |
|
|
value: 0.50 |
|
|
- filter: layers.27.self_attn |
|
|
value: 0.52 |
|
|
- filter: layers.28.self_attn |
|
|
value: 0.54 |
|
|
- filter: layers.29.self_attn |
|
|
value: 0.55 |
|
|
- filter: layers.30.self_attn |
|
|
value: 0.56 |
|
|
- filter: layers.31.self_attn |
|
|
value: 0.57 |
|
|
- filter: layers.32.self_attn |
|
|
value: 0.58 |
|
|
- filter: layers.33.self_attn |
|
|
value: 0.58 |
|
|
- filter: layers.34.self_attn |
|
|
value: 0.58 |
|
|
- filter: layers.35.self_attn |
|
|
value: 0.58 |
|
|
- filter: layers.36.self_attn |
|
|
value: 0.60 |
|
|
- filter: layers.37.self_attn |
|
|
value: 0.62 |
|
|
- filter: layers.38.self_attn |
|
|
value: 0.65 |
|
|
- filter: layers.39.self_attn |
|
|
value: 0.65 |
|
|
- filter: layers.40.self_attn |
|
|
value: 0.65 |
|
|
- filter: layers.41.self_attn |
|
|
value: 0.62 |
|
|
- filter: layers.42.self_attn |
|
|
value: 0.60 |
|
|
- filter: layers.43.self_attn |
|
|
value: 0.58 |
|
|
- filter: layers.44.self_attn |
|
|
value: 0.55 |
|
|
- filter: layers.45.self_attn |
|
|
value: 0.52 |
|
|
|
|
|
# βββββββββββββββββββββββββββββββββββββββββββββββ |
|
|
# ROUTED EXPERTS MLP |
|
|
# βββββββββββββββββββββββββββββββββββββββββββββββ |
|
|
|
|
|
- filter: layers.1.mlp.experts |
|
|
value: 0.55 |
|
|
- filter: layers.2.mlp.experts |
|
|
value: 0.56 |
|
|
- filter: layers.3.mlp.experts |
|
|
value: 0.58 |
|
|
- filter: layers.4.mlp.experts |
|
|
value: 0.60 |
|
|
- filter: layers.5.mlp.experts |
|
|
value: 0.62 |
|
|
- filter: layers.6.mlp.experts |
|
|
value: 0.63 |
|
|
- filter: layers.7.mlp.experts |
|
|
value: 0.65 |
|
|
- filter: layers.8.mlp.experts |
|
|
value: 0.66 |
|
|
- filter: layers.9.mlp.experts |
|
|
value: 0.67 |
|
|
- filter: layers.10.mlp.experts |
|
|
value: 0.68 |
|
|
- filter: layers.11.mlp.experts |
|
|
value: 0.68 |
|
|
- filter: layers.12.mlp.experts |
|
|
value: 0.68 |
|
|
- filter: layers.13.mlp.experts |
|
|
value: 0.68 |
|
|
- filter: layers.14.mlp.experts |
|
|
value: 0.70 |
|
|
- filter: layers.15.mlp.experts |
|
|
value: 0.70 |
|
|
- filter: layers.16.mlp.experts |
|
|
value: 0.70 |
|
|
- filter: layers.17.mlp.experts |
|
|
value: 0.70 |
|
|
- filter: layers.18.mlp.experts |
|
|
value: 0.72 |
|
|
- filter: layers.19.mlp.experts |
|
|
value: 0.72 |
|
|
- filter: layers.20.mlp.experts |
|
|
value: 0.74 |
|
|
- filter: layers.21.mlp.experts |
|
|
value: 0.75 |
|
|
- filter: layers.22.mlp.experts |
|
|
value: 0.76 |
|
|
- filter: layers.23.mlp.experts |
|
|
value: 0.77 |
|
|
- filter: layers.24.mlp.experts |
|
|
value: 0.78 |
|
|
- filter: layers.25.mlp.experts |
|
|
value: 0.78 |
|
|
- filter: layers.26.mlp.experts |
|
|
value: 0.78 |
|
|
- filter: layers.27.mlp.experts |
|
|
value: 0.78 |
|
|
- filter: layers.28.mlp.experts |
|
|
value: 0.78 |
|
|
- filter: layers.29.mlp.experts |
|
|
value: 0.80 |
|
|
- filter: layers.30.mlp.experts |
|
|
value: 0.80 |
|
|
- filter: layers.31.mlp.experts |
|
|
value: 0.80 |
|
|
- filter: layers.32.mlp.experts |
|
|
value: 0.80 |
|
|
- filter: layers.33.mlp.experts |
|
|
value: 0.80 |
|
|
- filter: layers.34.mlp.experts |
|
|
value: 0.80 |
|
|
- filter: layers.35.mlp.experts |
|
|
value: 0.80 |
|
|
- filter: layers.36.mlp.experts |
|
|
value: 0.82 |
|
|
- filter: layers.37.mlp.experts |
|
|
value: 0.82 |
|
|
- filter: layers.38.mlp.experts |
|
|
value: 0.82 |
|
|
- filter: layers.39.mlp.experts |
|
|
value: 0.82 |
|
|
- filter: layers.40.mlp.experts |
|
|
value: 0.80 |
|
|
- filter: layers.41.mlp.experts |
|
|
value: 0.78 |
|
|
- filter: layers.42.mlp.experts |
|
|
value: 0.75 |
|
|
- filter: layers.43.mlp.experts |
|
|
value: 0.72 |
|
|
- filter: layers.44.mlp.experts |
|
|
value: 0.70 |
|
|
- filter: layers.45.mlp.experts |
|
|
value: 0.68 |
|
|
- value: 0.35 |
|
|
dtype: bfloat16 |
|
|
out_dtype: bfloat16 |
|
|
tokenizer: |
|
|
source: base |
|
|
|
|
|
``` |