| tags: | |
| - model-merge | |
| - hermite-interpolation | |
| - deepseek | |
| base_model: | |
| - deepseek-ai/deepseek-math-7b-instruct | |
| - deepseek-ai/deepseek-coder-7b-instruct-v1.5 | |
| # deepseek-7b-math-code-lagrange-optimal | |
| Hermite 補間で最適化された λ によるモデルマージ。 | |
| ## Merge Configuration | |
| | Parameter | Value | | |
| |-----------|-------| | |
| | Method | Hermite interpolation (Phase 2 optimized) | | |
| | λ | [0.499256, 0.500744] | | |
| | dtype | torch.float16 | | |
| - **Model 0** (`deepseek-ai/deepseek-math-7b-instruct`): λ=0.499256 | |
| - **Model 1** (`deepseek-ai/deepseek-coder-7b-instruct-v1.5`): λ=0.500744 | |
| ## Tokenizer | |
| Union tokenizer (mergekit-style): vocab size = 100016 | |
| ## Formula | |
| θ* = Σ_k λ_k θ_k | |
| The mixing weights λ were optimized by minimizing the Hermite polynomial | |
| approximation of the loss function (see Phase 2). | |