File size: 827 Bytes
d3775d8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
---
tags:
- model-merge
- hermite-interpolation
- deepseek
base_model:
- deepseek-ai/deepseek-math-7b-instruct
- deepseek-ai/deepseek-coder-7b-instruct-v1.5
---

# deepseek-7b-math-code-lagrange-optimal

Hermite 補間で最適化された λ によるモデルマージ。

## Merge Configuration

| Parameter | Value |
|-----------|-------|
| Method | Hermite interpolation (Phase 2 optimized) |
| λ | [0.499256, 0.500744] |
| dtype | torch.float16 |

- **Model 0** (`deepseek-ai/deepseek-math-7b-instruct`): λ=0.499256
- **Model 1** (`deepseek-ai/deepseek-coder-7b-instruct-v1.5`): λ=0.500744

## Tokenizer

Union tokenizer (mergekit-style): vocab size = 100016

## Formula

θ* = Σ_k λ_k θ_k

The mixing weights λ were optimized by minimizing the Hermite polynomial
approximation of the loss function (see Phase 2).