YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

These did not have a seed # assigned so produces different safetensors if you try to remerge

Future versions with dare_ties are to use --random-seed 420

Update: The same logic also applies to della, you must assign a --random-seed # if you want the merge to be fully deterministic


The della merge method employs probabilistic dropout that requires a random seed for reproducibility. The DELLA method uses torch.bernoulli() to randomly sample which weights to keep based on magnitude-dependent probabilities.

Specifically, at the core of DELLA's della_magprune function, it calculates drop probabilities and then uses torch.bernoulli(probs) to randomly decide which parameters to drop.

Why You Get Consistent Results on Your PC

Each time you run the merge on your PC without setting a seed, PyTorch initializes its random number generator to a consistent but machine-specific state. This explains why:

  • โœ… Multiple runs on your PC produce identical results
  • โŒ Your results never match the original creator's results
  • โŒ Results differ between PCs

The Solution: Use --random-seed

MergeKit provides a random_seed option specifically for this purpose.

When you set a random seed, it calls transformers.trainer_utils.set_seed() to ensure reproducible behavior across PyTorch, NumPy, and Python's random module.

To get reproducible merges, run:

mergekit-yaml config.yaml ./output --random-seed 420

Both you and the original creator must use the same seed value to get identical results.

While both use randomness, DELLA's documentation may not have emphasized this as clearly. The key difference is that DARE methods use SparsificationMethod.random (pure Bernoulli sampling), while DELLA uses SparsificationMethod.della_magprune (magnitude-weighted Bernoulli sampling) - but both are random and require seeds for reproducibility.

mergekit/sparsify.py

def della_magprune(
    tensor: torch.Tensor,
    density: float,
    epsilon: float,
    rescale_norm: Optional[RescaleNorm] = None,
) -> torch.Tensor:
    if density >= 1:
        return tensor
    if density <= 0:
        return torch.zeros_like(tensor)
    orig_shape = tensor.shape

    if density + epsilon >= 1 or density - epsilon <= 0:
        raise ValueError(
            "Epsilon must be chosen such that density +/- epsilon is in (0, 1)"
        )

    work_dtype = (
        tensor.dtype
        if tensor.device.type != "cpu" or tensor.dtype == torch.bfloat16
        else torch.float32
    )

    if len(tensor.shape) < 2:
        tensor = tensor.unsqueeze(0)
    magnitudes = tensor.abs()

    sorted_indices = torch.argsort(magnitudes, dim=1, descending=False)
    ranks = sorted_indices.argsort(dim=1).to(work_dtype) + 1

    min_ranks = ranks.min(dim=1, keepdim=True).values
    max_ranks = ranks.max(dim=1, keepdim=True).values
    rank_norm = ((ranks - min_ranks) / (max_ranks - min_ranks)).clamp(0, 1)
    probs = (density - epsilon) + rank_norm * 2 * epsilon
    mask = torch.bernoulli(probs).to(work_dtype)

    res = rescaled_masked_tensor(tensor.to(work_dtype), mask, rescale_norm)
    return res.to(tensor.dtype).reshape(orig_shape)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support