Qwopus3.6-27B-v2-RYS-Balanced

This is a RYS/relayer export of Jackrong/Qwopus3.6-27B-v2.

The checkpoint physically duplicates selected decoder layers in the Hugging Face safetensors weights. It does not require the RYS runtime wrapper.

Variant

  • Target repo: hampsonw/Qwopus3.6-27B-v2-RYS-Balanced
  • Objective: Balanced Math+EQ
  • Repeated block: 15,30 (--blocks "15,30")
  • Repeated source layers: 15–29 inclusive, zero-indexed
  • Source text layers: 64
  • Target text layers: 79
  • Extra repeated layers: 15

Probe result

From the BF16 Transformers scan over math_16 + eq_16:

Metric Score Delta vs baseline
Math 0.818086 +0.104831
EQ 0.741500 +0.010000

Rank note: balanced rank 1.

Baseline scores from the scan:

  • Math: 0.713255
  • EQ: 0.731500

Local diagnostic evaluation results

These are local diagnostic subsample results, not full benchmark claims. They were run through an OpenAI-compatible vLLM endpoint with Qwen3.6 thinking-mode sampling:

  • temperature=1.0, top_p=0.95, top_k=20, min_p=0.0
  • thinking_token_budget=32768
  • max_tokens=81920
  • MATH prompt: Please reason step by step, and put your final answer within \boxed{}.

Comparisons below use:

  • Qwen baseline: cyankiwi/Qwen3.6-27B-AWQ-BF16-INT4
  • Qwopus baseline: Jackrong/Qwopus3.6-27B-v2
  • This model: hampsonw/Qwopus3.6-27B-v2-RYS-Balanced

MATH-500 hardest-50 diagnostic

All three models solved the 50-item slice after manual normalization of mathematically equivalent answer formats. Raw scorer misses were formatting-equivalence issues such as 5.5 vs \frac{11}{2} and 2\sqrt{3}+1 vs 1+2\sqrt{3}.

Model Raw scorer Audited Completion tokens Reasoning tokens Total tokens
Qwen3.6-27B baseline 47/50 50/50 633,524 599,909 639,736
Qwopus3.6-27B-v2 46/50 50/50 304,733 270,520 310,945
Qwopus3.6-27B-v2-RYS-Balanced 45/50 50/50 301,726 267,582 307,938

On this MATH diagnostic slice, RYS-Balanced matched Qwopus v2 audited accuracy while using roughly the same number of tokens, and both Qwopus variants used about half the completion tokens of the Qwen baseline.

LiveCodeBench release_v6 hardest-49 diagnostic

This diagnostic uses the deterministic hardest-50 slice from public livecodebench/code_generation_lite release_v6, with item 3344 dropped as a whole question because it contains a malformed private testcase outside the stated List[List[int]] function contract.

Model Correct Accuracy Completion tokens Reasoning tokens Total tokens
Qwen3.6-27B baseline 43/49 87.8% 1,640,947 1,440,065 1,672,212
Qwopus3.6-27B-v2 45/49 91.8% 1,400,939 1,347,042 1,432,204
Qwopus3.6-27B-v2-RYS-Balanced 32/49 65.3% 983,974 806,168 1,015,803

RYS-Balanced was substantially cheaper in tokens on this LCB diagnostic, but accuracy was much worse than both baselines. It had two terminal failed items (2952, 3233) and many more non-passing solutions.

Interpretation

This checkpoint is currently best viewed as a math-focused experimental RYS export. The small MATH diagnostic looked strong and token-efficient, but the LiveCodeBench diagnostic regressed significantly. Use with caution for coding-heavy workloads.

Provenance

  • Source model: Jackrong/Qwopus3.6-27B-v2
  • Export method: RYS physical layer duplication
  • Export manifest: rys_export_manifest.json
  • Probe result bundle: qwopus36-bf16-results_20260523T213305Z.tar.zst

Notes

This model has not yet been validated on the larger math_120 + eq_140 probe set or full public benchmark suite. The local diagnostics above suggest the RYS-Balanced export may preserve math performance while reducing tokens, but it regressed on the coding diagnostic. Treat it as experimental.

Downloads last month
9
Safetensors
Model size
33B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for hampsonw/Qwopus3.6-27B-v2-RYS-Balanced

Finetuned
(7)
this model