lejelly's picture
Upload merged model (lambda_a=0.60, lambda_b=0.40)
1f05fab verified
---
tags:
- model-merge
- hermite-interpolation
- deepseek
base_model:
- deepseek-ai/deepseek-math-7b-instruct
- jahyungu/deepseek-math-7b-instruct_hendrycks_math
---
# deepseek-7b-math-hendrycksmath-lambda06
2モデルの線形補間マージモデル。
## Merge Configuration
| Parameter | Value |
|-----------|-------|
| Model A | `deepseek-ai/deepseek-math-7b-instruct` |
| Model B | `jahyungu/deepseek-math-7b-instruct_hendrycks_math` |
| λ_a | 0.60 |
| λ_b | 0.40 |
| Formula | θ* = 0.60 × θ_a + 0.40 × θ_b |
| dtype | torch.float16 |
## Tokenizer
Union tokenizer (mergekit-style): vocabularies of both models are merged.
- Union vocab size: 100002
- Tokens added from Model B: 0
- Tokens only in Model A: 0
For tokens missing from a model, the other model's embedding is used as fallback
before linear interpolation.
## Description
This model was created by linearly interpolating the parameters of two models:
- **Model A** (`deepseek-ai/deepseek-math-7b-instruct`): weight = 0.60
- **Model B** (`jahyungu/deepseek-math-7b-instruct_hendrycks_math`): weight = 0.40