Too early for 1.3?
I already have ideas for Goetia version 1.3. Latest experiments are indicating that Della > Karcher
If merged correctly, ablations might not be needed anymore. I'm working on a Della Audit tool to ensure it isn't just one model's weights dominating the entire merge.
Slimaki is uncensored despite using censored components. Now, a 16 model merge test for Asmodeus also shows itself to be quite promising, no refusals despite having a bunch of components which individually would refuse prompts. Perhaps DELLA is superior to Karcher and SLERP for uncensored merges. Thanks to @Casual-Autopsy for helping lead to this discovery.
Also, I have modified mergekit-evolve to use graph_v18 and support Della, but my GPU is too weak to test it on 24B models.
Note: this isn't confirmed operational yet and requires changing:
- config.py
- graph.py
- sparsify.py
- /evo/actors.py
- /evo/genome.py
- /evo/monkeypatch.py
- /evo/strategy.py
- /scripts/evolve.py
I added this to def della_magprune
# --- SAFETY GUARD START ---
# Ensure density isn't exactly 0 or 1
density = max(1e-4, min(1.0 - 1e-4, density))
# Epsilon must be < density AND < (1 - density)
# If the optimizer guessed a bad epsilon, we shrink it to the max allowed value
max_epsilon = min(density, 1.0 - density) - 1e-4
if abs(epsilon) > max_epsilon:
epsilon = max_epsilon if epsilon > 0 else -max_epsilon
# --- SAFETY GUARD END ---
orig_shape = tensor.shape
work_dtype = (
tensor.dtype
if tensor.device.type != "cpu" or tensor.dtype == torch.bfloat16
else torch.float32
)