Vibe check ๐ซง
Something unexpected here, can't quite put my finger on it... but it's different, isn't it? Did you put a secret ingredient in here? โจ
Yeah, I was surprised at how good Maginum Cydoms is, it surpassed my expectations but wasn't reconstructible due to --random-seed differences.
On a whim I tested putting all the components into Della, skipping TIES and SLERP. So this merge uses the same components as Maginum, but turned out differently and seems to be less censored overall.
I tried setting up mergekit-evolve to run with Della but in the end my PC was too slow so just ended up assigning equal numbers to everything.
It appears that DELLA creates more of an "emergent personality" than Karcher. Asmodeus v1 is even more different yet (16 models). It doesn't resort to putting everything in bullet point lists and actually writes paragraph format. And I just merged another one with 32 models to test next.
I think the secret ingredient is the fact that DELLA allows you to bridge 2501 and 2503 models, which previously, was not possible with SLERP/KARCHER. The guy who made Maginum discovered this. I wasn't expecting it to be functional, yet it is. The other trick seems to be using @OddTheGreat method of normalize=false and then assigning total weights to >1
All my future dare/della merges are now set to use seed #420 by default timeout /t 3 /nobreak && mergekit-yaml C:\mergekit-main\config.yaml C:\mergekit-main\merged_model_output --copy-tokenizer --allow-crimes --out-shard-size 5B --trust-remote-code --lazy-unpickle --random-seed 420 --cuda
architecture: MistralForCausalLM
models:
- model: B:\24B\!models--anthracite-core--Mistral-Small-3.2-24B-Instruct-2506-Text-Only
- model: B:\24B\!models--TheDrummer--Cydonia-24B-v4.3
parameters:
density: 0.75
weight: 0.5
epsilon: 0.25
- model: B:\24B\!models--ReadyArt--4.2.0-Broken-Tutu-24b
parameters:
density: 0.75
weight: 0.5
epsilon: 0.25
- model: B:\24B\!models--zerofata--MS3.2-PaintedFantasy-v2-24B
parameters:
density: 0.75
weight: 0.5
epsilon: 0.25
- model: B:\24B\!models--TheDrummer--Magidonia-24B-v4.3
parameters:
density: 0.75
weight: 0.5
epsilon: 0.25
- model: B:\24B\!models--TheDrummer--Precog-24B-v1
parameters:
density: 0.75
weight: 0.5
epsilon: 0.25
- model: B:\24B\!models--zerofata--MS3.2-PaintedFantasy-v3-24B
parameters:
density: 0.75
weight: 0.5
epsilon: 0.25
# Seed: 420
merge_method: della
base_model: B:\24B\!models--anthracite-core--Mistral-Small-3.2-24B-Instruct-2506-Text-Only
parameters:
lambda: 1.0
normalize: false
int8_mask: false
dtype: float32
out_dtype: bfloat16
tokenizer:
source: union
chat_template: auto
name: ๐ ลlimaki-24B-v1
Lastly I added a 'safety net' for the epsilon function so it can't break the merge.
def della_magprune(
tensor: torch.Tensor,
density: float,
epsilon: float,
rescale_norm: Optional[RescaleNorm] = None,
) -> torch.Tensor:
if density >= 1:
return tensor
if density <= 0:
return torch.zeros_like(tensor)
# --- SAFETY GUARD START ---
# Ensure density isn't exactly 0 or 1
density = max(1e-4, min(1.0 - 1e-4, density))
# Epsilon must be < density AND < (1 - density)
# If the optimizer guessed a bad epsilon, we shrink it to the max allowed value
max_epsilon = min(density, 1.0 - density) - 1e-4
if abs(epsilon) > max_epsilon:
epsilon = max_epsilon if epsilon > 0 else -max_epsilon
# --- SAFETY GUARD END ---
orig_shape = tensor.shape
work_dtype = (
tensor.dtype
if tensor.device.type != "cpu" or tensor.dtype == torch.bfloat16
else torch.float32
)
if len(tensor.shape) < 2:
tensor = tensor.unsqueeze(0)
magnitudes = tensor.abs()
sorted_indices = torch.argsort(magnitudes, dim=1, descending=False)
ranks = sorted_indices.argsort(dim=1).to(work_dtype) + 1
min_ranks = ranks.min(dim=1, keepdim=True).values
max_ranks = ranks.max(dim=1, keepdim=True).values
rank_norm = ((ranks - min_ranks) / (max_ranks - min_ranks)).clamp(0, 1)
# Now this line is guaranteed not to produce values < 0 or > 1
probs = (density - epsilon) + rank_norm * 2 * epsilon
mask = torch.bernoulli(probs.clamp(0, 1)).to(work_dtype)
res = rescaled_masked_tensor(tensor.to(work_dtype), mask, rescale_norm)
return res.to(tensor.dtype).reshape(orig_shape)
Nice, I'll try those settings. Have you tested with top n sigma 1.25 or dyna temp? I was noticing big variations even with temp 0.
I don't think LM Studio exposes those settings on MLX. But I'm always very interested in what the creator (in this case, you!) intended as the default settings... so, what do you use to bring out the character the most? ๐
I use either benchmark mode or creative mode. The creative mode settings I'm currently using (for mistral) are posted on Goetia v1.2 page. Benchmark mode is just temp 0, top p 0.95, and rep pen 1.12.
Standard benchmark mode allows me to see how much low temp variation there is and is more reliable for testing Q0/compliance.

