Granite-3.3-2B-Avg
Granite-3.3-2B-Avg is a merge of the following models using LazyMergekit:
🧩 Configuration
# ----------------------------------------------------------------------
# merge_weighted_average.yaml
# Weighted‑average merge of two 2‑B LLMs
# ----------------------------------------------------------------------
# Merge method
# ----------------------------------------------------------------------
merge_method: linear
# ----------------------------------------------------------------------
# Base models + per‑model weights
# ----------------------------------------------------------------------
# The list can be extended (just add another entry with its weight).
# The `normalize: true` flag below will automatically scale the
# weights so that their sum equals 1.0.
models:
- model: powermove72/granite-3.3-2b-Hermes3dataset
parameters:
weight: 0.6
- model: ibm-granite/granite-3.3-2b-instruct
parameters:
weight: 0.4
# ----------------------------------------------------------------------
# Tokenizer
# ----------------------------------------------------------------------
# All source models share the same tokenizer (Llama‑2’s tokenizer).
# If you later decide to use a different tokenizer, just change this line.
tokenizer_source: powermove72/granite-3.3-2b-Hermes3dataset
# ----------------------------------------------------------------------
# Precision & device handling
# ----------------------------------------------------------------------
# bfloat16 is the preferred dtype on modern GPUs (A100, H100, RTX 4090, etc.)
# because it offers the dynamic range of float16 without the overflow risk.
# If the current runtime cannot run bfloat16, the merger will fall back
# to float16 automatically.
dtype: bfloat16
# ----------------------------------------------------------------------
# Merge‑specific parameters
# ----------------------------------------------------------------------
parameters:
# Normalise the per‑model weights so the sum = 1.0 (keeps the
# merged tensor magnitude comparable to the originals).
normalize: true
# Load the checkpoints with the “low‑cpu‑mem” strategy – only the
# tensors needed for the current layer are streamed onto GPU.
low_cpu_mem_usage: true
# Optional: add a small amount of Gaussian jitter to break exact
# symmetry when two models have identical weights in a layer.
# jitter_std: 0.001 # ← uncomment if you ever need it
# ----------------------------------------------------------------------
# Reproducibility & deterministic execution
# ----------------------------------------------------------------------
seed: 2025 # any integer you like
deterministic: true # forces torch‑cudnn deterministic mode
💻 Usage
!pip install -qU transformers accelerate
from transformers import AutoTokenizer
import transformers
import torch
model = "powermove72/Granite-3.3-2B-Avg"
messages = [{"role": "user", "content": "What is a large language model?"}]
tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipeline = transformers.pipeline(
"text-generation",
model=model,
torch_dtype=torch.float16,
device_map="auto",
)
outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
- Downloads last month
- 6
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for powermove72/Granite-3.3-2B-Avg
Merge model
this model