Nix2.5-plus

i still recomend normal Nix2.5

Model Description

This is a merged model, Nix2.5-plus, created using mergekit's slerp (Spherical Linear Interpolation) method. It combines the strengths of ray0rf1re/Nix2.5 and ray0rf1re/Nix1.5 to potentially offer improved performance or a different balance of capabilities.

Merge Details

Nix2.5-plus is a merge of the following models using the slerp merge method from mergekit:

The merge was performed with a specific t parameter of 0.275. This signifies a weighted combination where ray0rf1re/Nix1.5 contributes approximately 27.5% and ray0rf1re/Nix2.5 contributes approximately 72.5% to the final merged model's characteristics. ray0rf1re/Nix2.5 was used as the base model for this slerp merge.

βš™β€‡ Configuration

slices:
  - sources:
      - model: ray0rf1re/Nix2.5
        layer_range: [0, 32]
      - model: ray0rf1re/Nix1.5
        layer_range: [0, 32]
merge_method: slerp
base_model: ray0rf1re/Nix2.5
parameters:
  t:
    - filter: self_attn
      value: [0, 0.5, 0.3, 0.7, 1]
    - filter: mlp
      value: [1, 0.5, 0.7, 0.3, 0]
    - value: 0.275
dtype: bfloat16

Usage

To use this model, you can load it with the transformers library:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "ray0rf1re/Nix2.5-plus"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Example usage (adjust as needed)
input_text = "Hello, my name is"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids
output = model.generate(input_ids, max_new_tokens=50)
print(tokenizer.decode(output[0], skip_special_tokens=True))

Training Data

This merged model leverages the training data of its constituent models: ray0rf1re/Nix2.5 and ray0rf1re/Nix1.5. Please refer to the respective model cards for details on their training datasets.

Limitations

As a merged model, its performance and biases are inherited from its base models. Thorough evaluation is recommended for specific use cases. Merged models may sometimes exhibit unexpected behaviors or a degradation in certain tasks compared to their individual components.

Downloads last month
72
Safetensors
Model size
3B params
Tensor type
F16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for ray0rf1re/Nix2.5-plus

Quantizations
2 models