Qwen3.5-YOYO
Collection
2 items • Updated • 1
Based on the latest research papers, we have successfully implemented cross-model size merging and applied the paper's methodology to the latest Qwen3.5 architecture.
merge method: Optimal Transport Merge
precision: dtype: bfloat16
Context length: 262,144
temperature=0.7,top_p=0.8,top_k=20,min_p=0.0,presence_penalty=1.5,repetition_penalty=1.0
temperature=1.0,top_p=1.0,top_k=40,min_p=0.0,presence_penalty=2.0,repetition_penalty=1.0
We perform activation extraction using 2,000 prompts from the test dataset.