Mixcraftz-123b

A SLERP of GigaMag-Behemoth-123b with Tess-3-Mistral-Large-2-123B. Tess added some creativity and the model seems to be a little less predictable compared to Gigamag

Works well with Metharme and Mistral templates. Haven't tested any others.

Merge Details

Merge Method

This model was merged using the SLERP merge method.

Models Merged

The following models were included in the merge:

  • GigaMag-Behemoth-123b
  • Tess-3-Mistral-Large-2-123B

Configuration

The following YAML configuration was used to produce this model:

base_model: /workspace/GigaMag-Behemoth
dtype: bfloat16
merge_method: slerp
parameters:
  t:
  - filter: self_attn
    value: [0.1, 0.3, 0.5, 0.55, 0.5, 0.3, 0.1]
  - filter: mlp
    value: [0.1, 0.3, 0.5, 0.55, 0.5, 0.3, 0.1]
  - value: 0.5
slices:
- sources:
  - layer_range: [0, 88]
    model: /workspace/GigaMag-Behemoth
  - layer_range: [0, 88]
    model: /workspace/cache/models--migtissera--Tess-3-Mistral-Large-2-123B/snapshots/c07f8a90214a71fa303394d3d52443d392dad771
Downloads last month
1
Safetensors
Model size
123B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for bruhzair/Mixcraftz-123b

Quantizations
2 models