Qwen2.5-Coder-1.5B-RYS-4-9

Qwen2.5-Coder-1.5B-Instruct with layers 4-8 duplicated. Early-stack reasoning circuitry โ€” suppressed by code-specialization training โ€” runs twice on every forward pass.

28 base layers โ†’ 33 after duplication. No training, no merging, no weight changes.

Reasoning 23.53% โ†’ 52.94% (+29.41). EQ 70.90 โ†’ 77.66 (+6.76). Math 0.506 โ†’ 0.4687 (โˆ’3.73). Peak reasoning ฮ” across the full 54-config sweep is +35.29% at (6,11) block-5; (4,9) is the best-combined pick.

Results

Metric Baseline RYS (4,9) Delta
Math 0.506 0.4687 โˆ’3.73
EQ 70.90 77.66 +6.76
Reasoning 23.53% 52.94% +29.41

The code-specialist reasoning unlock. Qwen2.5-Coder-1.5B was trained heavily on code, leaving general reasoning under-developed (baseline 23.53%, vs the non-coder Qwen2.5-1.5B-Instruct's 76.47% at the same parameter count). RYS unlocks the dormant general-reasoning circuit. Specialization training trade-off mechanism: where the non-coder sibling has the circuit reliable and RYS adds only +11.76%, the code-specialist's circuit is suppressed enough that the same duplication operation unlocks +29.41% lift (+35.29% at peak).

50 of 54 swept configs boost reasoning >5%. Pick this when you want a tiny coder that can also reason. The published v1 sibling Qwen2.5-1.5B-RYS-4-7-GGUF is the general "balanced daily driver"; this is the code-specialized counterpart.

Usage

llama-server -m Qwen2.5-Coder-1.5B-RYS-4-9-Q4_K_M.gguf -ngl 99

Full sweep data

54 configurations tested. (4,9) block-5 is the best-combined pick. Full per-config sweep + cross-architecture analysis: v2 dataset.

Part of the RYS Sovereign Collection v2.


Where this sits in the Sovereign Collection

v1 โ€” Qwen2.5 cross-scale + Qwen3-32B headline crossover. 5 model repos: 0.5B EQ specialist / 1.5B daily driver / 7B math specialist (+ AWQ) / Qwen3-32B "Big Boy." This Coder variant is the code-specialized sibling of the v1 1.5B daily driver.

v2 โ€” cross-architecture corpus. 21 model variants across 10 architecture families. Inverse correlation (r = โˆ’0.726): weak baselines lift more, in their weakest dimension. Three mechanisms: under-training scale (Llama-3.2-1B), MoE routing inefficiency (Granite-3.1-1B-A400M), specialization training trade-off (this model). Plus EQ-amplifier extreme (TinyLlama-1.1B) and a first published negative result (SmolLM2-1.7B). 13 deployable RYS-applied weight repos covering every non-zero-lift variant.

Credit

John Broadway, with collaboration from Claude (Opus 4.6 in April 2026 sweep generation and build pipeline; Opus 4.7 in May 2026 cross-architecture analysis and publication). Original RYS method by David Ng on Qwen2-72B; sweep + probe toolkit by alainnothere.

Downloads last month
190
GGUF
Model size
2B params
Architecture
qwen2
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for john-broadway/Qwen2.5-Coder-1.5B-RYS-4-9-GGUF

Quantized
(93)
this model

Collection including john-broadway/Qwen2.5-Coder-1.5B-RYS-4-9-GGUF