Based on the latest research papers, we have successfully implemented cross-model size merging and applied the paper's methodology to the latest Qwen3.5 architecture.

Model Highlights:

  • merge method: Optimal Transport Merge

  • precision: dtype: bfloat16

  • Context length: 262,144

Parameter Settings:

General Tasks:

temperature=0.7, top_p=0.8, top_k=20, min_p=0.0, presence_penalty=1.5, repetition_penalty=1.0

Reasoning Tasks:

temperature=1.0, top_p=1.0, top_k=40, min_p=0.0, presence_penalty=2.0, repetition_penalty=1.0

Paper & GitHub Repository:

Paper

GitHub Repository

Some Details:

Data:

CodeAlpaca_20K

We perform activation extraction using 2,000 prompts from the test dataset.

Details of Each Tensor:

merge_stats.csv

global_state.json

Downloads last month
-
Safetensors
Model size
10B params
Tensor type
BF16
·
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for YOYO-AI/Qwen3.5-9B-YOYO-Instruct

Merge model
this model
Quantizations
2 models

Collection including YOYO-AI/Qwen3.5-9B-YOYO-Instruct

Paper for YOYO-AI/Qwen3.5-9B-YOYO-Instruct