Too early for 1.3?

#3
by Naphula - opened

I already have ideas for Goetia version 1.3. Latest experiments are indicating that Della > Karcher

If merged correctly, ablations might not be needed anymore. I'm working on a Della Audit tool to ensure it isn't just one model's weights dominating the entire merge.

Slimaki is uncensored despite using censored components. Now, a 16 model merge test for Asmodeus also shows itself to be quite promising, no refusals despite having a bunch of components which individually would refuse prompts. Perhaps DELLA is superior to Karcher and SLERP for uncensored merges. Thanks to @Casual-Autopsy for helping lead to this discovery.

Also, I have modified mergekit-evolve to use graph_v18 and support Della, but my GPU is too weak to test it on 24B models.

Note: this isn't confirmed operational yet and requires changing:

  • config.py
  • graph.py
  • sparsify.py
  • /evo/actors.py
  • /evo/genome.py
  • /evo/monkeypatch.py
  • /evo/strategy.py
  • /scripts/evolve.py

I added this to def della_magprune

# --- SAFETY GUARD START ---
    # Ensure density isn't exactly 0 or 1
    density = max(1e-4, min(1.0 - 1e-4, density))
    
    # Epsilon must be < density AND < (1 - density)
    # If the optimizer guessed a bad epsilon, we shrink it to the max allowed value
    max_epsilon = min(density, 1.0 - density) - 1e-4
    if abs(epsilon) > max_epsilon:
        epsilon = max_epsilon if epsilon > 0 else -max_epsilon
    # --- SAFETY GUARD END ---

    orig_shape = tensor.shape
    work_dtype = (
        tensor.dtype
        if tensor.device.type != "cpu" or tensor.dtype == torch.bfloat16
        else torch.float32
    )

Sign up or log in to comment