Mixcraftz-123b
A SLERP of GigaMag-Behemoth-123b with Tess-3-Mistral-Large-2-123B. Tess added some creativity and the model seems to be a little less predictable compared to Gigamag
Works well with Metharme and Mistral templates. Haven't tested any others.
Merge Details
Merge Method
This model was merged using the SLERP merge method.
Models Merged
The following models were included in the merge:
- GigaMag-Behemoth-123b
- Tess-3-Mistral-Large-2-123B
Configuration
The following YAML configuration was used to produce this model:
base_model: /workspace/GigaMag-Behemoth
dtype: bfloat16
merge_method: slerp
parameters:
t:
- filter: self_attn
value: [0.1, 0.3, 0.5, 0.55, 0.5, 0.3, 0.1]
- filter: mlp
value: [0.1, 0.3, 0.5, 0.55, 0.5, 0.3, 0.1]
- value: 0.5
slices:
- sources:
- layer_range: [0, 88]
model: /workspace/GigaMag-Behemoth
- layer_range: [0, 88]
model: /workspace/cache/models--migtissera--Tess-3-Mistral-Large-2-123B/snapshots/c07f8a90214a71fa303394d3d52443d392dad771
- Downloads last month
- 1