Qwen3.5-16B-Dense (Structural Expansion)

⚠️ Status: Architectural Expansion / Untrained Weights

This model is a dense structural expansion of Qwen3.5-9B-Instruct, scaled to 16.1B parameters. It was created using structural mitosis to increase the model's depth and breadth, providing a larger "cognitive canvas" for downstream training.

Note: As this is a structural expansion, the new parameters have not yet been calibrated. The model will require a "Repair" SFT or Continued Pre-Training (CPT) phase to utilize its expanded capacity. Vision capability is preserved, however the model will need training to utilise it effectively.

🛠 Architecture & Expansion Strategy

The expansion targets the inherent limitations of sub-10B models—specifically knowledge density and reasoning stability—by providing additional parameter headroom.

Base Model: Qwen/Qwen3.5-9B-Instruct
Expanded Parameters: ~16.1B
Methodology: Structural Mitosis
- Layer Duplication: High-importance layers were identified and duplicated to extend transformer depth.
- SVD Noise Injection: Singular Value Decomposition (SVD) based noise was injected into the duplicated weights to break symmetry and induce divergence, preventing "identity-mapping" stalls during early training.

Why 16B?

The 16B parameter count represents a strategic "sweet spot" for modern hardware. It offers a significant increase in total neurons and associative memory over the 9B base, while remaining highly performant on consumer-grade GPUs (e.g., RTX 3090/4090/5080) when quantized to 4-bit or 8-bit.

🚀 Call to Action: Training & Calibration

This model is released as a base for researchers and hobbyists interested in high-density dense models. The additional ~7.1B parameters are currently "blank" capacity ready to be filled with specialized knowledge.

Recommended Training Path:

Symmetry Breaking (Calibration): A short run on ~2-5B tokens of high-diversity data using a very low learning rate (1e-6) to allow the SVD-diverged layers to settle into functional roles.
Knowledge Distillation: Fine-tuning on high-reasoning datasets (such as Opus-distilled sets) to take advantage of the expanded FFN capacity.
DPO/PPO: Final alignment to stabilize the increased depth and prevent coherence drift during long-context generation.

📜 Credits

Base Architecture: Qwen/Qwen3.5-9B-Instruct
Expansion Framework: Self Designed "Structural Mitosis" Framework

Downloads last month: 88

Safetensors

Model size

15B params

Tensor type

BF16

Model tree for blascotobasco/Qwen3.5-16B-Test

Base model

Qwen/Qwen3.5-9B-Base

Finetuned

Qwen/Qwen3.5-9B

Finetuned

(61)

this model