NemoMix-12B-DellaV1a

NemoMix-12B-DellaV1a is an experimental merge of the following models using the DELLA method with mergekit:

EDIT: There seem to be tokenizer issues. I would probably have to merge with the base model instead of instruct I'm guessing. Don't bother.

🧩 Configuration

models:
  - model: BeaverAI/mistral-doryV2-12b
    parameters:
      weight: 0.20
      density: 0.42
  - model: NeverSleep/Lumimaid-v0.2-12B
    parameters:
      weight: 0.22
      density: 0.54
  - model: intervitens/mini-magnum-12b-v1.1
    parameters:
      weight: 0.24
      density: 0.66
  - model: grimjim/mistralai-Mistral-Nemo-Instruct-2407
    parameters:
      weight: 0.34
      density: 0.78
merge_method: della
base_model: grimjim/mistralai-Mistral-Nemo-Instruct-2407
parameters:
  int8_mask: true
  epsilon: 0.1  
  lambda: 1.0   
  density: 0.7
dtype: bfloat16

Downloads last month: 2

Safetensors

Model size

12B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for jsfs11/NemoMix-12B-DellaV1a

DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based Sampling

Paper • 2406.11617 • Published Jun 17, 2024 • 10