Upload merged model (lambda_a=0.60, lambda_b=0.40)

1f05fab verified 19 days ago

1.09 kB

	---
	tags:
	- model-merge
	- hermite-interpolation
	- deepseek
	base_model:
	- deepseek-ai/deepseek-math-7b-instruct
	- jahyungu/deepseek-math-7b-instruct_hendrycks_math
	---

	# deepseek-7b-math-hendrycksmath-lambda06

	2モデルの線形補間マージモデル。

	## Merge Configuration

	\| Parameter \| Value \|
	\|-----------\|-------\|
	\| Model A \| `deepseek-ai/deepseek-math-7b-instruct` \|
	\| Model B \| `jahyungu/deepseek-math-7b-instruct_hendrycks_math` \|
	\| λ_a \| 0.60 \|
	\| λ_b \| 0.40 \|
	\| Formula \| θ* = 0.60 × θ_a + 0.40 × θ_b \|
	\| dtype \| torch.float16 \|

	## Tokenizer

	Union tokenizer (mergekit-style): vocabularies of both models are merged.
	- Union vocab size: 100002
	- Tokens added from Model B: 0
	- Tokens only in Model A: 0

	For tokens missing from a model, the other model's embedding is used as fallback
	before linear interpolation.

	## Description

	This model was created by linearly interpolating the parameters of two models:
	- Model A (`deepseek-ai/deepseek-math-7b-instruct`): weight = 0.60
	- Model B (`jahyungu/deepseek-math-7b-instruct_hendrycks_math`): weight = 0.40