Like to repeat what user say. But here is the result

#1
by djuna - opened

Summarize this comment "Hey, I've got a few tips for merging that might help you with your 4.0 rendition of Stroganoff. Some researched, some I've found on my own.

Set 'normalize=false' so you have more control with the weights. By default, mergekit has 'normalize=true' and it makes it measure the weights of relative to each other to one. So with what you have it set to now, the weights are computed closer to '0.9, 0.3, 0.9' then '0.3, 0.1, 0.3'.

DARE TIES is great, but I've found that TIES anything loses a lot of the characteristics of the added models into the base model. It has its uses and technically produces a better model. However, it's likely why Spellbound Instruct is breaking your merge since it's retaining a lot of the instructs story heavy capabilities. Try setting one of the other models as base with DARE Linear, then place Spellbound at the bottom of the stack. Should produce a more vibrant model.

If you want to retain as much of a model as possible into a merge, passthrough/frankenmerging is the best route, though it takes the longest to find the right recipe. Manually taking a slice out of one model and putting it into another sometimes works better than most merge methods. Just keep in mind layer theory where the first layers primarily control formatting and instruct following, the mids are where most of the detailed information sits, and the ends have the most influence on the style of the output.

I'm by no means an expert, but hopefully some of that helps and I'm open to any questions if you have any."

Output:
Here's a summary of the comment:

  1. Set 'normalize=false' to control more control over the weights.

  2. By default, mergekit has 'normalize=true' and this makes the weights closer to '0.9, 0.3, 0.9' than '0.3, 0.1, 0.3'.

  3. If your Stroganoff merge formula is set to '0.9, 0.3, 0.9', try setting one of the other models as the base and place Spellbound at the bottom.

  4. DARE TIES is a good method, but it might not be as effective as it should be, losing some of the model's characteristics when combined with other models.

  5. Pass-through/Frankenmerging is generally faster and more reliable but it can take longer to find the right recipe.

  6. Layer theory is important in designing models, where the first layers are responsible for formatting and instructions, the middles are where the detailed information is, and the ends have the most influence on the style of the output.

  7. The author does not consider themselves an expert, but they hope these suggestions may help you and is open to any questions you might have.

Sign up or log in to comment