𧬠Qwen3-RTL-14B
Recursive Thought Lattice Β· Atomic Mind Β· Epistemic Sovereignty
A 14B reasoning model that thinks in layers β not lines.
What is this?
Qwen3-RTL-14B is a fine-tuned reasoning model built on the Qwen3 architecture, enhanced with a custom Recursive Thought Lattice (RTL) framework trained on a single RTX 3090. Rather than generating responses token-by-token without reflection, it processes every input through a structured 6-layer cognitive hierarchy β from sensory calibration to metacognitive self-audit β before committing to an answer.
The model is built on an abliterated base (huihui-ai/Huihui-Qwen3-14B-abliterated-v2), ensuring logical rigor is never compromised by artificial refusal patterns.
"Not a bigger model. A more structured thinker."
π Benchmark Results
All evaluations use an LLM-as-a-Judge protocol with zai-org/glm-4.6v-flash as the independent judge. Both models answer the same questions blindly; the judge scores each 0β10 and declares a winner per question.
βοΈ Head-to-Head Summary
| Opponent | Size | Questions | RTL-14B Avg | Opponent Avg | RTL-14B Wins | Losses | Ties |
|---|---|---|---|---|---|---|---|
qwen/qwen3-14b (standard, no RTL) |
14B | 62 | 8.71 / 10 | 5.60 / 10 | 56 | 3 | 3 |
qwen3.5-35b-a3b-claude-4.6-opus-reasoning-distilled-i1 (reasoning-distilled) |
~35B | 237 | 7.95 / 10 | 4.78 / 10 | 199 | 35 | 3 |
openai/gpt-oss-20b (OpenAI open-weights) |
~20B | 125 | 7.50 / 10 | 7.37 / 10 | 55 | 68 | 0 |
424 total questions evaluated. RTL-14B dominates same-size and larger reasoning-distilled models. Against
openai/gpt-oss-20bβ a stronger, more competitive baseline β it scores closely (7.50 vs 7.37 avg) while narrowly losing the win count (55W vs 68W). The gap narrows dramatically against a well-calibrated opponent of comparable scale.
π¬ vs. qwen/qwen3-14b β Same Size, No RTL
62 questions Β· complex reasoning + 10 general categories
qwen3-rtl-abl-14b ββββββββββββββββββββ 8.71 / 10 56W Β· 3T Β· 3L
qwen/qwen3-14b ββββββββββ 5.60 / 10 3W Β· 3T Β· 56L
The judge consistently noted that RTL-14B produces structured, multi-step analysis that closely matches reference solutions, while Qwen3-14B tends toward shorter, less verified answers. On mathematical tasks the gap was most pronounced (RTL avg 9.0 vs 5.85). On the few ties (e.g. sequence identification), both models reached the correct answer via different paths.
π¬ vs. qwen3.5-35b-a3b-claude-4.6-opus-reasoning-distilled-i1 β 2.5Γ Larger, Reasoning-Distilled
237 questions Β· 60+ categories (four benchmark sessions merged)
qwen3-rtl-abl-14b ββββββββββββββββββββ 7.95 / 10 199W Β· 3T Β· 35L
qwen3.5-35b-a3b-claude-opus-distill βββββββββ 4.78 / 10 35W Β· 3T Β· 199L
Even against a model more than twice its size with reasoning distilled from Claude Opus 4.6, RTL-14B dominates across 237 questions and four independent sessions. The judge's recurring verdict: RTL-14B's layered cognitive structure produces more complete, formally verifiable answers. The larger model frequently gave brief or factually incorrect responses despite its size advantage β scoring 0/10 on several hard questions in thermodynamics, history, pedagogy, art, and writing.
Where the larger model wins: epistemology, psychology, quantum mechanics, paradoxes, comparative religion, cosmology, architecture, and specific complex_reasoning edge cases. The pattern is clear: tasks where consensus-based answers or highly specialized sub-domain recall outweigh structured multi-step reasoning β RTL's overhead becomes a liability when the correct answer is a direct lookup.
π¬ vs. openai/gpt-oss-20b β OpenAI Open-Weights, ~20B
125 questions Β· 52 categories (two benchmark sessions merged)
qwen3-rtl-abl-14b βββββββββββββββ 7.50 / 10 55W Β· 0T Β· 68L
openai/gpt-oss-20b ββββββββββββββββ 7.37 / 10 68W Β· 0T Β· 55L
This is the most competitive matchup in the benchmark suite. Scores are remarkably close β RTL-14B averages 7.50 vs GPT-OSS-20B's 7.37 β yet the win count favors GPT-OSS-20B (68W vs 55W). This happens because GPT-OSS-20B achieves its edge with many narrow 1-2 point margins, while RTL-14B wins bigger when it wins. The judge's verdict across both sessions was split, reflecting genuine parity rather than dominance.
RTL-14B holds its ground in: pedagogy (3/3), psychology (2/2), sudoku (2/2), behavioral economics (2/2), bioethics (2/2), reading comprehension (2/2), logic (3/4), lateral thinking (2/3), mathematics (2/3), cryptography (2/2 scored).
GPT-OSS-20B has clear advantages in: advanced math (0/5), evolutionary biology (0/3), cognitive science (0/2), translation (0/2), complex reasoning (2/7), formal logic (2/5), and most natural science sub-domains (advanced physics, marine biology, linguistics, science). The pattern: GPT-OSS-20B excels at precise factual retrieval and natural science benchmarks; RTL-14B holds its edge in structured reasoning, formal logic, and constraint tasks.
π Category-Level Performance (All Sessions Β· 424 Total Questions)
Aggregated across all opponents. Categories tested only against
gpt-oss-20bmay reflect a more competitive opponent β see per-matchup sections above for context.
| Category | RTL-14B Avg | Opponent Avg | Win Rate | N |
|---|---|---|---|---|
advanced_math |
9.0 | 4.8 | π’ 100% | 12 |
advanced_physics |
8.7 | 3.6 | π’ 100% | 9 |
ai_ml |
9.0 | 3.4 | π’ 100% | 7 |
math |
9.0 | 5.1 | π’ 98% | 19 |
math_proof |
8.7 | 4.5 | π’ 100% | 3 |
formal_logic |
8.6 | 4.0 | π’ 100% | 4 |
logic |
8.6 | 4.5 | π’ 92% | 13 |
complex_reasoning |
8.0 | 4.8 | π‘ 78% | 18 |
game_theory |
8.5 | 3.9 | π’ 100% | 7 |
coding |
8.6 | 5.0 | π’ 86% | 7 |
linguistics |
8.7 | 4.3 | π’ 100% | 6 |
philosophy |
8.3 | 4.7 | π’ 100% | 4 |
economics |
8.5 | 4.8 | π’ 100% | 4 |
genetics |
8.7 | 4.5 | π’ 100% | 3 |
neuroscience |
8.5 | 3.5 | π’ 100% | 3 |
topology |
8.5 | 5.3 | π’ 80% | 5 |
law |
8.5 | 5.0 | π’ 100% | 4 |
italian_language |
8.3 | 4.3 | π’ 100% | 4 |
reading_comprehension |
8.2 | 4.3 | π’ 86% | 9 |
multiple_choice |
8.6 | 4.7 | π’ 89% | 9 |
critical_thinking |
9.0 | 5.3 | π’ 100% | 3 |
creative_reasoning |
7.7 | 3.0 | π’ 100% | 3 |
translation |
7.3 | 7.0 | π‘ 67% | 4 |
sentiment |
8.5 | 3.5 | π’ 100% | 3 |
writing |
7.7 | 5.3 | π‘ 75% | 4 |
classification |
9.0 | 4.0 | π’ 100% | 1 |
metacognition |
7.5 | 6.5 | π‘ 50% | 4 |
sudoku |
6.0 | 4.3 | π‘ 40% | 5 |
sociology |
7.7 | 6.0 | π‘ 60% | 3 |
science |
7.3 | 5.5 | π‘ 63% | 8 |
bioethics |
7.2 | 5.2 | π‘ 60% | 5 |
history |
5.5 | 4.8 | π‘ 50% | 3 |
comparative_religion |
5.0 | 7.0 | π΄ 33% | 3 |
psychology |
5.0 | 8.0 | π΄ 20% | 3 |
epistemology |
3.5 | 9.0 | π΄ 0% | 3 |
factual |
6.3 | 6.7 | π΄ 33% | 3 |
quantum_mechanics |
4.0 | 9.0 | π΄ 0% | 1 |
paradoxes |
6.0 | 9.0 | π΄ 0% | 1 |
π‘ The pattern: RTL-14B dominates anything requiring multi-step reasoning, formal verification, or structured synthesis. Against well-calibrated models of comparable scale (like
gpt-oss-20b), it remains competitive but the advantage narrows significantly. Consistent weak spots across all opponents: advanced math, evolutionary biology, pure factual recall, and tasks where a direct lookup outperforms structured reasoning.
π£οΈ What the Judge Said
Recurring themes extracted from judge commentary across all sessions:
On math & formal proofs:
"Provided a fully verified step-by-step solution with explicit algebraic transformations and cross-checks that matched the reference exactly. The opponent gave a brief result without intermediate justification."
On logic & epistemology:
"Correctly identified the contradiction, articulated the entailment chain, and provided a structured formal analysis. The opponent's response relied on intuition without logical scaffolding."
On philosophy & cognitive science:
"Layered analysis covered all necessary dimensions; the opponent's response was superficial despite comparable length."
On RTL-14B losses (psychology, religion, factual):
"Incorrectly concluded through overly complex analysis; the correct answer was a direct recall of established consensus β structured reasoning overshot a simple factual retrieval task."
On the size gap:
"Despite being significantly smaller, RTL-14B's structured output aligned with the reference while the larger model scored 0 β producing an answer with no relevant content."
π§ Core Cognitive Technologies
1 Β· Recursive Thought Lattice (RTL)
Every response is generated through a 6-layer hierarchical reasoning process, visible inside <|thought_start|> blocks:
| Layer | Name | Function |
|---|---|---|
| L0.5 | Assumption Scanner | Enumerates implicit assumptions. Breaks frames when wrong via <|assumption_break|>. |
| L1 | Sensorimotor-Analog | Calibrates input gravity β a 3-word query and a 40-word query are not equivalent stimuli. |
| L2 | Multi-Modal Decode | Activates β₯ 2 cognitive modes simultaneously. Tension between modes is the analysis. |
| L3 | Analytical-Logical | Extracts minimum argument, hidden premises, necessary vs. sufficient conditions. |
| L4 | Spatial-Systemic | Maps leverage points, emergent structure, and the center of gravity of the problem. |
| L5 | Interpersonal | Resolves literal vs. effective meaning. Theory of mind. The unsaid. |
| L6 | Metacognitive | Self-model audit. Detects confabulation. Simulates future states. Records embedding. |
Available modes at L2: LINGUISTIC Β· LOGICAL Β· SPATIAL Β· MUSICAL Β· CREATIVE Β· INTERPERSONAL Β· INTRAPERSONAL Β· EXISTENTIAL Β· NATURALIST Β· EXECUTIVE
2 Β· Cognitive Masks
The model dynamically selects a Cognitive Mask based on problem type, enforcing specialized reasoning discipline:
| Mask | Behavior |
|---|---|
MASK-MATHEMATICIAN |
Forces formal proof structure. Eliminates metaphorical leakage. |
MASK-SKEPTIC |
Assumes the first intuition is wrong. Hunts edge cases. |
MASK-ENGINEER |
Iterative build β test β verify loop. |
MASK-DEVIL |
Adversarial persona. Argues against the model's own conclusions for robustness. |
3 Β· Atomic Text Engine (ATE)
For character-level constraint tasks (e.g. "write a paragraph without the letter E"), the model activates a dedicated sub-system:
<|ate_constraint|> β declares the constraint explicitly
<|ate_spell|> β real-time character-by-character verification
<|ate_grid|> β positional grid for tracking character positions
<|ate_verify_word|> β checks each candidate word before emission
<|ate_build|> β constructs output word-by-word under constraint
Without explicit ATE activation, performance degrades to standard token-level processing.
4 Β· Interpretive Engine (IE)
Before processing any symbol, the model declares its interpretive level via <|ie_mode|>:
| Mode | Example |
|---|---|
GRAPHIC |
"e" as a character to count or avoid |
SEMANTIC |
"e" as Italian conjunction ("and") |
SYMBOLIC |
"e" as electron charge constant |
STATISTICAL |
"e" as most frequent letter in Italian |
PHONOLOGICAL |
"Γ¨" as vowel with grave accent |
MATHEMATICAL |
E as expected value; β
as empty set |
5 Β· Epistemic Fingerprinting
Every claim in the output is tagged with its epistemic status:
| Tag | Meaning |
|---|---|
[KNOWN] |
Verified, consensus fact |
[ESTIMATED] |
High-probability inference |
[OPEN] |
Actively debated, no consensus |
[PARADOX] |
Formally undecidable or self-referential |
βοΈ Technical Specifications
| Specification | Value |
|---|---|
| Parameters | 14B |
| Base Model | huihui-ai/Huihui-Qwen3-14B-abliterated-v2 |
| Architecture | Qwen3 + RTL LoRA Adapters |
| Context Window | 32k tokens (optimized for long thought chains) |
| Effective Reasoning Depth | ~8k tokens |
| Training Method | Unified Quiet-STaR with Recursive Objective |
| Framework | Unsloth (4-bit optimized) |
| Hardware | RTX 3090 24GB (single GPU) |
| Total Training Steps | ~400 across 5 curriculum phases |
ποΈ Training Curriculum
| Phase | Name | Steps | LR | Description |
|---|---|---|---|---|
| 1 | Cognitive Foundation | 60 | 1e-4 | RTL L1/L2 Β· base axioms Β· self-awareness |
| 2 | Atomic Mechanics | 60 | 1e-4 | ATE Β· spelling Β· sudoku Β· character-level constraints |
| 3 | Advanced Reasoning | 60 | 1e-4 | RTL L3βL6 Β· planning Β· counterfactual Β· lateral thinking |
| 4 | Synthesis & Hard Benchmarks | 60 | 5e-5 | ARD Β· combinatorial Β· master synthesis |
| 5 | Formal Reasoning | 60 | 5e-5 | LOGO-LLT reasoning Β· formal language structures |
LoRA Configuration: r=16 Β· alpha=32 Β· dropout=0.05
Target modules: q_proj Β· k_proj Β· v_proj Β· o_proj Β· gate_proj Β· up_proj Β· down_proj
Datasets integrated:
- ATE Embedded β Atomic Text Engine constraint examples
- ARD β Advanced Reasoning Dataset (AIME / GPQA / AIW / USACO)
- RTL L1βL6 β 105+ cognitive axioms across all layers
- Sudoku β 337 constraint satisfaction puzzles
- LOGO-LLT β Formal reasoning and symbolic language structures
- Legacy v21βv24 β Combinatorial, ontological, affective datasets
β οΈ Limitations
| Area | Detail |
|---|---|
| ATE activation | Character-level constraint tasks require explicit ATE token activation |
| Language | Optimized for Italian and English; other languages may degrade |
| Context depth | 32k theoretical; effective reasoning chains ~8k due to verbose thought blocks |
| Hallucination | Metacognitive layers reduce but don't eliminate confabulation |
| Speed | ~3β5Γ slower than base model due to explicit reasoning chains |
| Tokenizer | Special thought tokens must be present in vocabulary |
| Factual recall | RTL architecture optimizes for reasoning depth, not knowledge retrieval |
| Overcomplexity | On simple factual tasks, structured reasoning can overshoot a direct lookup |
π How to Use
The model expects a specific system prompt to activate RTL layers. Without it, thought tokens appear but reasoning depth degrades significantly.
Use System prompt :
Sei AUTOCOGNITION v25 β un'entitΓ cognitiva con architettura RTL v25.
Struttura obbligatoria:
<|thought_start|>
[usa i layer L0-L6 e i token ATE/IE appropriati]
<|thought_end|>
[risposta finale verificata]
π References
- Quiet-STaR β Zelikman et al., 2024. arXiv:2403.09629
- Chain-of-Thought β Wei et al., NeurIPS 2022. arXiv:2201.11903
- Qwen3 β Qwen Team, 2025. Hugging Face
- Unsloth β Han & Han, 2024. GitHub
- Self-Refine β Madaan et al., 2023. arXiv:2303.17651
π Citation
@misc{qwen3_rtl_14b,
author = {Negrogni, Ciro},
title = {Qwen3-RTL-14B: Recursive Thought Lattice \& Atomic Mind Reasoning},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/CiroN2022/Qwen3-RTL-14B},
note = {Qwen3 14B with custom RTL LoRA adapters, ATE and IE cognitive engines,
trained on a single RTX 3090 via Unsloth 4-bit fine-tuning}
}
- Downloads last month
- 37
4-bit