Granite-3.3-2B-Avg

Granite-3.3-2B-Avg is a merge of the following models using LazyMergekit:

🧩 Configuration

# ----------------------------------------------------------------------
# merge_weighted_average.yaml
#   Weighted‑average merge of two 2‑B LLMs
# ----------------------------------------------------------------------
# Merge method
# ----------------------------------------------------------------------
merge_method: linear

# ----------------------------------------------------------------------
# Base models + per‑model weights
# ----------------------------------------------------------------------
# The list can be extended (just add another entry with its weight).
# The `normalize: true` flag below will automatically scale the
# weights so that their sum equals 1.0.
models:
  - model: powermove72/granite-3.3-2b-Hermes3dataset
    parameters:
      weight: 0.6
  - model: ibm-granite/granite-3.3-2b-instruct
    parameters:
      weight: 0.4

# ----------------------------------------------------------------------
# Tokenizer
# ----------------------------------------------------------------------
# All source models share the same tokenizer (Llama‑2’s tokenizer).  
# If you later decide to use a different tokenizer, just change this line.
tokenizer_source: powermove72/granite-3.3-2b-Hermes3dataset

# ----------------------------------------------------------------------
# Precision & device handling
# ----------------------------------------------------------------------
# bfloat16 is the preferred dtype on modern GPUs (A100, H100, RTX 4090, etc.)  
# because it offers the dynamic range of float16 without the overflow risk.
# If the current runtime cannot run bfloat16, the merger will fall back
# to float16 automatically.
dtype: bfloat16

# ----------------------------------------------------------------------
# Merge‑specific parameters
# ----------------------------------------------------------------------
parameters:
  # Normalise the per‑model weights so the sum = 1.0 (keeps the
  # merged tensor magnitude comparable to the originals).
  normalize: true

  # Load the checkpoints with the “low‑cpu‑mem” strategy – only the
  # tensors needed for the current layer are streamed onto GPU.
  low_cpu_mem_usage: true

  # Optional: add a small amount of Gaussian jitter to break exact
  # symmetry when two models have identical weights in a layer.
  # jitter_std: 0.001   # ← uncomment if you ever need it

# ----------------------------------------------------------------------
# Reproducibility & deterministic execution
# ----------------------------------------------------------------------
seed: 2025                     # any integer you like
deterministic: true            # forces torch‑cudnn deterministic mode

💻 Usage

!pip install -qU transformers accelerate

from transformers import AutoTokenizer
import transformers
import torch

model = "powermove72/Granite-3.3-2B-Avg"
messages = [{"role": "user", "content": "What is a large language model?"}]

tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    torch_dtype=torch.float16,
    device_map="auto",
)

outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
Downloads last month
6
Safetensors
Model size
3B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for powermove72/Granite-3.3-2B-Avg