Based on the latest research papers, we have successfully implemented cross-model size merging and applied the paper's methodology to the latest Qwen3.5 architecture.

Model Highlights:

merge method: Optimal Transport Merge
precision: dtype: bfloat16
Context length: 262,144

Parameter Settings:

General Tasks:

temperature=0.7, top_p=0.8, top_k=20, min_p=0.0, presence_penalty=1.5, repetition_penalty=1.0

Reasoning Tasks:

temperature=1.0, top_p=1.0, top_k=40, min_p=0.0, presence_penalty=2.0, repetition_penalty=1.0

Paper & GitHub Repository:

Paper

GitHub Repository

Some Details:

Data:

CodeAlpaca_20K

We perform activation extraction using 2,000 prompts from the test dataset.

Details of Each Tensor:

merge_stats.csv

global_state.json

Downloads last month: 5

Safetensors

Model size

10B params

Tensor type

BF16

F32

Model tree for YOYO-AI/Qwen3.5-9B-YOYO-Instruct

Qwen/Qwen3.5-27B

Qwen/Qwen3.5-9B

Merge model

this model

Quantizations

2 models

Collection including YOYO-AI/Qwen3.5-9B-YOYO-Instruct

Qwen3.5-YOYO

Collection

2 items • Updated Mar 27 • 1

Paper for YOYO-AI/Qwen3.5-9B-YOYO-Instruct

Transport and Merge: Cross-Architecture Merging for Large Language Models

Paper • 2602.05495 • Published Feb 5