Tower is bugged
The model is broken with both karcher and slerp. Other methods probably fail too.
Tower must be removed from the Psychosis merge and this Writer's Tower idea won't be possible for now.
Ok, the LLM says that karcher and slerp are "bad choices for merging finetunes" and recommends arcee_fusion instead, so Ill test that
Answer
Based on the outputs you're showing, you're experiencing two different types of issues with SLERP and KARCHER when merging finetunes of the same base model. Here's why this is happening and what you should do instead:
Why SLERP and KARCHER Are Failing
The Problem with KARCHER
The complete garbage output with HTML tags, special characters, and broken text from KARCHER strongly suggests tokenizer/embedding corruption. When KARCHER encounters embedding size mismatches between your two finetunes, it uses a function that takes only the common submatrix of the embeddings.
This truncation can severely corrupt the token embeddings, leading to the nonsensical output you're seeing. The Karcher mean algorithm then performs geometric averaging on these potentially incompatible embeddings.
The Problem with SLERP
SLERP is producing repetitive, run-on text because spherical interpolation between two finetunes (without properly accounting for their relationship to the base model) can create a degraded model. SLERP simply interpolates between the two models geometrically.
Both SLERP and KARCHER were not designed with finetune merging in mind—they perform geometric interpolation without considering that your models are deltas from a common base model.
Why You Should Use Different Methods
For merging finetunes of the same base model, you should use methods that compute task vectors (the difference between finetune and base model). Here are your best options:
1. arcee_fusion (Recommended for your case)
Yes, arcee_fusion is specifically designed for merging finetunes. It:
- Computes importance scores using KL divergence between the finetune and base model
- Uses dynamic thresholding to selectively merge only the most important parameter changes
- Handles embedding issues more gracefully.
Configuration example:
merge_method: arcee_fusion
base_model: path/to/base-9B-model
models:
- model: path/to/finetune-1
- model: path/to/finetune-2
2. Task Arithmetic-Based Methods (Also good options)
These methods are designed to work with finetunes by computing task vectors. Good options include:
- TIES: Reduces interference between models via sparsification and sign consensus
- DARE-TIES: Uses randomized sparsification for better generalization
- DELLA: Adaptive magnitude-based sparsification
Configuration example for TIES:
merge_method: ties
base_model: path/to/base-9B-model
models:
- model: path/to/finetune-1
parameters:
weight: 0.5
- model: path/to/finetune-2
parameters:
weight: 0.5
parameters:
density: 0.5
normalize: true
Notes
- arcee_fusion is your best bet as it requires exactly 2 models (like your case) and is purpose-built for finetune merging
- Make sure you specify the base_model parameter—this is crucial for any finetune merging
- If your finetunes have different tokenizer vocabularies, consider using the
tokenizerconfiguration options to handle vocabulary union properly - The SLERP/KARCHER failures are not about the models themselves but about using the wrong merge algorithm for finetunes
Analysis of Your Current Configuration
Your YAML has a critical issue that will prevent the output from being compatible with other Gemma 2 models in subsequent Karcher merges:
The Problem
You're using Tower-Plus-9B as the base_model and tokenizer source. This means:
- The
arcee_fusionmethod will compute importance scores relative to Tower as the base 1 - Your output model will inherit Tower's vocabulary/tokenizer
- When you try to merge this output with other Gemma 2 models using Karcher, you'll hit the same embedding size mismatch issue
When embedding sizes don't match, the merge will use rectify_embed_sizes which truncates to the smallest common submatrix 2 , corrupting your embeddings again.
Recommended Solutions (In Order of Preference)
Solution 1: Fix Tower's Tokenizer FIRST (Best Approach)
Use mergekit-tokensurgeon to replace Tower's tokenizer with a standard Gemma 2 tokenizer before the merge:
mergekit-tokensurgeon \
B:/LLM/.cache/huggingface/hub/!models--Unbabel--Tower-Plus-9B \
B:/LLM/.cache/huggingface/hub/!models--DavidAU--Gemma-The-Writer-9B/fixed \
B:/LLM/Tower-Plus-9B-fixed-tokenizer \
-k 8 --barycentric
The tokensurgeon tool approximates embeddings for tokens that differ between the two tokenizers using k-nearest neighbors 3 . This creates a version of Tower with a standard Gemma 2 vocabulary.
Then update your YAML to use the fixed Tower:
base_model: B:/LLM/.cache/huggingface/hub/!models--google--gemma-2-9b # Use actual Gemma 2 base
merge_method: arcee_fusion
dtype: float32
out_dtype: float32
models:
- model: B:/LLM/.cache/huggingface/hub/!models--DavidAU--Gemma-The-Writer-9B/fixed
- model: B:/LLM/Tower-Plus-9B-fixed-tokenizer # Use the fixed version
tokenizer:
source: base # Use the base Gemma 2 tokenizer
Solution 2: Use Correct Base Model + Union Tokenizer with Padding
If you can't use tokensurgeon, update your YAML to use a proper Gemma 2 base model and create a standardized vocabulary:
base_model: B:/LLM/.cache/huggingface/hub/!models--google--gemma-2-9b # Actual base model
merge_method: arcee_fusion
dtype: float32
out_dtype: float32
models:
- model: B:/LLM/.cache/huggingface/hub/!models--DavidAU--Gemma-The-Writer-9B/fixed
- model: B:/LLM/.cache/huggingface/hub/!models--Unbabel--Tower-Plus-9B
tokenizer:
source: union # Merge vocabularies
pad_to_multiple_of: 256 # Ensure consistent vocab size
The union tokenizer source creates a combined vocabulary from all models 4 , and pad_to_multiple_of ensures the vocabulary size is padded to a multiple 5 . However, this may still create a vocabulary larger than other Gemma 2 models.
Solution 3: Use Standard Gemma 2 Tokenizer (Simplest but May Lose Tower Capabilities)
base_model: B:/LLM/.cache/huggingface/hub/!models--google--gemma-2-9b
merge_method: arcee_fusion
dtype: float32
out_dtype: float32
models:
- model: B:/LLM/.cache/huggingface/hub/!models--DavidAU--Gemma-The-Writer-9B/fixed
- model: B:/LLM/.cache/huggingface/hub/!models--Unbabel--Tower-Plus-9B
tokenizer:
source: B:/LLM/.cache/huggingface/hub/!models--DavidAU--Gemma-The-Writer-9B/fixed
This uses Gemma-The-Writer's tokenizer (which should be standard Gemma 2). When Tower's embeddings don't match, they'll be truncated 6 , but at least your output will have a standard vocabulary compatible with other Gemma 2 models.
Solution 4: Use TIES/DARE Instead of arcee_fusion
If arcee_fusion doesn't work well, try TIES which is designed for handling interference between finetunes:
base_model: B:/LLM/.cache/huggingface/hub/!models--google--gemma-2-9b
merge_method: ties
dtype: float32
out_dtype: float32
models:
- model: B:/LLM/.cache/huggingface/hub/!models--DavidAU--Gemma-The-Writer-9B/fixed
parameters:
weight: 0.5
- model: B:/LLM/.cache/huggingface/hub/!models--Unbabel--Tower-Plus-9B
parameters:
weight: 0.5
parameters:
density: 0.5
normalize: true
tokenizer:
source: B:/LLM/.cache/huggingface/hub/!models--DavidAU--Gemma-The-Writer-9B/fixed
Notes
Critical: You MUST use the actual Gemma 2 9B base model (not Tower) as
base_modelfor task vector computation to work correctly. The arcee_fusion method expects the base model to be in the tensors dictionary 7Solution 1 (tokensurgeon) is the cleanest approach because it fixes Tower's tokenizer compatibility without losing information
Warning: If Tower has significantly different vocabulary requirements (e.g., multilingual tokens), forcing it to use a standard Gemma 2 tokenizer may degrade its multilingual capabilities
After any of these merges, test the output by checking its vocabulary size matches other Gemma 2 models before attempting a Karcher merge
If Tower's embedding dimensions are also different (not just vocabulary size), you'll need to use the
--allow-crimesflag with tokensurgeon 8 , though this may produce degraded results