metadata
tags:
- model-merge
- hermite-interpolation
- deepseek
base_model:
- deepseek-ai/deepseek-math-7b-instruct
- deepseek-ai/deepseek-coder-7b-instruct-v1.5
deepseek-7b-math-code-lagrange-optimal
Hermite 補間で最適化された λ によるモデルマージ。
Merge Configuration
| Parameter | Value |
|---|---|
| Method | Hermite interpolation (Phase 2 optimized) |
| λ | [0.499256, 0.500744] |
| dtype | torch.float16 |
- Model 0 (
deepseek-ai/deepseek-math-7b-instruct): λ=0.499256 - Model 1 (
deepseek-ai/deepseek-coder-7b-instruct-v1.5): λ=0.500744
Tokenizer
Union tokenizer (mergekit-style): vocab size = 100016
Formula
θ* = Σ_k λ_k θ_k
The mixing weights λ were optimized by minimizing the Hermite polynomial approximation of the loss function (see Phase 2).