Instructions to use CiroN2022/Qwen3-RTL-14B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use CiroN2022/Qwen3-RTL-14B with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="CiroN2022/Qwen3-RTL-14B",
	filename="Qwen3-RTL-ABL-14B.Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use CiroN2022/Qwen3-RTL-14B with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf CiroN2022/Qwen3-RTL-14B:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf CiroN2022/Qwen3-RTL-14B:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf CiroN2022/Qwen3-RTL-14B:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf CiroN2022/Qwen3-RTL-14B:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf CiroN2022/Qwen3-RTL-14B:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf CiroN2022/Qwen3-RTL-14B:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf CiroN2022/Qwen3-RTL-14B:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf CiroN2022/Qwen3-RTL-14B:Q4_K_M

Use Docker

docker model run hf.co/CiroN2022/Qwen3-RTL-14B:Q4_K_M

LM Studio
Jan
Ollama
How to use CiroN2022/Qwen3-RTL-14B with Ollama:
```
ollama run hf.co/CiroN2022/Qwen3-RTL-14B:Q4_K_M
```

Unsloth Studio

How to use CiroN2022/Qwen3-RTL-14B with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for CiroN2022/Qwen3-RTL-14B to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for CiroN2022/Qwen3-RTL-14B to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for CiroN2022/Qwen3-RTL-14B to start chatting

Docker Model Runner
How to use CiroN2022/Qwen3-RTL-14B with Docker Model Runner:
```
docker model run hf.co/CiroN2022/Qwen3-RTL-14B:Q4_K_M
```

Lemonade

How to use CiroN2022/Qwen3-RTL-14B with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull CiroN2022/Qwen3-RTL-14B:Q4_K_M

Run and chat with the model

lemonade run user.Qwen3-RTL-14B-Q4_K_M

List all available models

lemonade list

🧬 Qwen3-RTL-14B

Recursive Thought Lattice · Atomic Mind · Epistemic Sovereignty

A 14B reasoning model that thinks in layers — not lines.

What is this?

Qwen3-RTL-14B is a fine-tuned reasoning model built on the Qwen3 architecture, enhanced with a custom Recursive Thought Lattice (RTL) framework trained on a single RTX 3090. Rather than generating responses token-by-token without reflection, it processes every input through a structured 6-layer cognitive hierarchy — from sensory calibration to metacognitive self-audit — before committing to an answer.

The model is built on an abliterated base (huihui-ai/Huihui-Qwen3-14B-abliterated-v2), ensuring logical rigor is never compromised by artificial refusal patterns.

"Not a bigger model. A more structured thinker."

📊 Benchmark Results

All evaluations use an LLM-as-a-Judge protocol with zai-org/glm-4.6v-flash as the independent judge. Both models answer the same questions blindly; the judge scores each 0–10 and declares a winner per question.

⚔️ Head-to-Head Summary

Opponent	Size	Questions	RTL-14B Avg	Opponent Avg	RTL-14B Wins	Losses	Ties
`qwen/qwen3-14b` (standard, no RTL)	14B	62	8.71 / 10	5.60 / 10	56	3	3
`qwen3.5-35b-a3b-claude-4.6-opus-reasoning-distilled-i1` (reasoning-distilled)	~35B	237	7.95 / 10	4.78 / 10	199	35	3
`openai/gpt-oss-20b` (OpenAI open-weights)	~20B	125	7.50 / 10	7.37 / 10	55	68	0

424 total questions evaluated. RTL-14B dominates same-size and larger reasoning-distilled models. Against openai/gpt-oss-20b — a stronger, more competitive baseline — it scores closely (7.50 vs 7.37 avg) while narrowly losing the win count (55W vs 68W). The gap narrows dramatically against a well-calibrated opponent of comparable scale.

🔬 vs. `qwen/qwen3-14b` — Same Size, No RTL

62 questions · complex reasoning + 10 general categories

  qwen3-rtl-abl-14b    ████████████████████  8.71 / 10   56W · 3T · 3L
  qwen/qwen3-14b       ██████████            5.60 / 10    3W · 3T · 56L

The judge consistently noted that RTL-14B produces structured, multi-step analysis that closely matches reference solutions, while Qwen3-14B tends toward shorter, less verified answers. On mathematical tasks the gap was most pronounced (RTL avg 9.0 vs 5.85). On the few ties (e.g. sequence identification), both models reached the correct answer via different paths.

🔬 vs. `qwen3.5-35b-a3b-claude-4.6-opus-reasoning-distilled-i1` — 2.5× Larger, Reasoning-Distilled

237 questions · 60+ categories (four benchmark sessions merged)

  qwen3-rtl-abl-14b                    ████████████████████  7.95 / 10   199W · 3T · 35L
  qwen3.5-35b-a3b-claude-opus-distill  █████████             4.78 / 10    35W · 3T · 199L

Even against a model more than twice its size with reasoning distilled from Claude Opus 4.6, RTL-14B dominates across 237 questions and four independent sessions. The judge's recurring verdict: RTL-14B's layered cognitive structure produces more complete, formally verifiable answers. The larger model frequently gave brief or factually incorrect responses despite its size advantage — scoring 0/10 on several hard questions in thermodynamics, history, pedagogy, art, and writing.

Where the larger model wins: epistemology, psychology, quantum mechanics, paradoxes, comparative religion, cosmology, architecture, and specific complex_reasoning edge cases. The pattern is clear: tasks where consensus-based answers or highly specialized sub-domain recall outweigh structured multi-step reasoning — RTL's overhead becomes a liability when the correct answer is a direct lookup.

🔬 vs. `openai/gpt-oss-20b` — OpenAI Open-Weights, ~20B

125 questions · 52 categories (two benchmark sessions merged)

  qwen3-rtl-abl-14b    ███████████████       7.50 / 10   55W · 0T · 68L
  openai/gpt-oss-20b   ████████████████      7.37 / 10   68W · 0T · 55L

This is the most competitive matchup in the benchmark suite. Scores are remarkably close — RTL-14B averages 7.50 vs GPT-OSS-20B's 7.37 — yet the win count favors GPT-OSS-20B (68W vs 55W). This happens because GPT-OSS-20B achieves its edge with many narrow 1-2 point margins, while RTL-14B wins bigger when it wins. The judge's verdict across both sessions was split, reflecting genuine parity rather than dominance.

RTL-14B holds its ground in: pedagogy (3/3), psychology (2/2), sudoku (2/2), behavioral economics (2/2), bioethics (2/2), reading comprehension (2/2), logic (3/4), lateral thinking (2/3), mathematics (2/3), cryptography (2/2 scored).

GPT-OSS-20B has clear advantages in: advanced math (0/5), evolutionary biology (0/3), cognitive science (0/2), translation (0/2), complex reasoning (2/7), formal logic (2/5), and most natural science sub-domains (advanced physics, marine biology, linguistics, science). The pattern: GPT-OSS-20B excels at precise factual retrieval and natural science benchmarks; RTL-14B holds its edge in structured reasoning, formal logic, and constraint tasks.

📈 Category-Level Performance (All Sessions · 424 Total Questions)

Aggregated across all opponents. Categories tested only against gpt-oss-20b may reflect a more competitive opponent — see per-matchup sections above for context.

Category	RTL-14B Avg	Opponent Avg	Win Rate	N
`advanced_math`	9.0	4.8	🟢 100%	12
`advanced_physics`	8.7	3.6	🟢 100%	9
`ai_ml`	9.0	3.4	🟢 100%	7
`math`	9.0	5.1	🟢 98%	19
`math_proof`	8.7	4.5	🟢 100%	3
`formal_logic`	8.6	4.0	🟢 100%	4
`logic`	8.6	4.5	🟢 92%	13
`complex_reasoning`	8.0	4.8	🟡 78%	18
`game_theory`	8.5	3.9	🟢 100%	7
`coding`	8.6	5.0	🟢 86%	7
`linguistics`	8.7	4.3	🟢 100%	6
`philosophy`	8.3	4.7	🟢 100%	4
`economics`	8.5	4.8	🟢 100%	4
`genetics`	8.7	4.5	🟢 100%	3
`neuroscience`	8.5	3.5	🟢 100%	3
`topology`	8.5	5.3	🟢 80%	5
`law`	8.5	5.0	🟢 100%	4
`italian_language`	8.3	4.3	🟢 100%	4
`reading_comprehension`	8.2	4.3	🟢 86%	9
`multiple_choice`	8.6	4.7	🟢 89%	9
`critical_thinking`	9.0	5.3	🟢 100%	3
`creative_reasoning`	7.7	3.0	🟢 100%	3
`translation`	7.3	7.0	🟡 67%	4
`sentiment`	8.5	3.5	🟢 100%	3
`writing`	7.7	5.3	🟡 75%	4
`classification`	9.0	4.0	🟢 100%	1
`metacognition`	7.5	6.5	🟡 50%	4
`sudoku`	6.0	4.3	🟡 40%	5
`sociology`	7.7	6.0	🟡 60%	3
`science`	7.3	5.5	🟡 63%	8
`bioethics`	7.2	5.2	🟡 60%	5
`history`	5.5	4.8	🟡 50%	3
`comparative_religion`	5.0	7.0	🔴 33%	3
`psychology`	5.0	8.0	🔴 20%	3
`epistemology`	3.5	9.0	🔴 0%	3
`factual`	6.3	6.7	🔴 33%	3
`quantum_mechanics`	4.0	9.0	🔴 0%	1
`paradoxes`	6.0	9.0	🔴 0%	1

💡 The pattern: RTL-14B dominates anything requiring multi-step reasoning, formal verification, or structured synthesis. Against well-calibrated models of comparable scale (like gpt-oss-20b), it remains competitive but the advantage narrows significantly. Consistent weak spots across all opponents: advanced math, evolutionary biology, pure factual recall, and tasks where a direct lookup outperforms structured reasoning.

🗣️ What the Judge Said

Recurring themes extracted from judge commentary across all sessions:

On math & formal proofs:

"Provided a fully verified step-by-step solution with explicit algebraic transformations and cross-checks that matched the reference exactly. The opponent gave a brief result without intermediate justification."

On logic & epistemology:

"Correctly identified the contradiction, articulated the entailment chain, and provided a structured formal analysis. The opponent's response relied on intuition without logical scaffolding."

On philosophy & cognitive science:

"Layered analysis covered all necessary dimensions; the opponent's response was superficial despite comparable length."

On RTL-14B losses (psychology, religion, factual):

"Incorrectly concluded through overly complex analysis; the correct answer was a direct recall of established consensus — structured reasoning overshot a simple factual retrieval task."

On the size gap:

"Despite being significantly smaller, RTL-14B's structured output aligned with the reference while the larger model scored 0 — producing an answer with no relevant content."

🧠 Core Cognitive Technologies

1 · Recursive Thought Lattice (RTL)

Every response is generated through a 6-layer hierarchical reasoning process, visible inside <|thought_start|> blocks:

Layer	Name	Function
L0.5	Assumption Scanner	Enumerates implicit assumptions. Breaks frames when wrong via `<\|assumption_break\|>`.
L1	Sensorimotor-Analog	Calibrates input gravity — a 3-word query and a 40-word query are not equivalent stimuli.
L2	Multi-Modal Decode	Activates ≥ 2 cognitive modes simultaneously. Tension between modes is the analysis.
L3	Analytical-Logical	Extracts minimum argument, hidden premises, necessary vs. sufficient conditions.
L4	Spatial-Systemic	Maps leverage points, emergent structure, and the center of gravity of the problem.
L5	Interpersonal	Resolves literal vs. effective meaning. Theory of mind. The unsaid.
L6	Metacognitive	Self-model audit. Detects confabulation. Simulates future states. Records embedding.

Available modes at L2: LINGUISTIC · LOGICAL · SPATIAL · MUSICAL · CREATIVE · INTERPERSONAL · INTRAPERSONAL · EXISTENTIAL · NATURALIST · EXECUTIVE

2 · Cognitive Masks

The model dynamically selects a Cognitive Mask based on problem type, enforcing specialized reasoning discipline:

Mask	Behavior
`MASK-MATHEMATICIAN`	Forces formal proof structure. Eliminates metaphorical leakage.
`MASK-SKEPTIC`	Assumes the first intuition is wrong. Hunts edge cases.
`MASK-ENGINEER`	Iterative build → test → verify loop.
`MASK-DEVIL`	Adversarial persona. Argues against the model's own conclusions for robustness.

3 · Atomic Text Engine (ATE)

For character-level constraint tasks (e.g. "write a paragraph without the letter E"), the model activates a dedicated sub-system:

<|ate_constraint|>  →  declares the constraint explicitly
<|ate_spell|>       →  real-time character-by-character verification
<|ate_grid|>        →  positional grid for tracking character positions
<|ate_verify_word|> →  checks each candidate word before emission
<|ate_build|>       →  constructs output word-by-word under constraint

Without explicit ATE activation, performance degrades to standard token-level processing.

4 · Interpretive Engine (IE)

Before processing any symbol, the model declares its interpretive level via <|ie_mode|>:

Mode	Example
`GRAPHIC`	`"e"` as a character to count or avoid
`SEMANTIC`	`"e"` as Italian conjunction ("and")
`SYMBOLIC`	`"e"` as electron charge constant
`STATISTICAL`	`"e"` as most frequent letter in Italian
`PHONOLOGICAL`	`"è"` as vowel with grave accent
`MATHEMATICAL`	`E` as expected value; `∅` as empty set

5 · Epistemic Fingerprinting

Every claim in the output is tagged with its epistemic status:

Tag	Meaning
`[KNOWN]`	Verified, consensus fact
`[ESTIMATED]`	High-probability inference
`[OPEN]`	Actively debated, no consensus
`[PARADOX]`	Formally undecidable or self-referential

⚙️ Technical Specifications

Specification	Value
Parameters	14B
Base Model	`huihui-ai/Huihui-Qwen3-14B-abliterated-v2`
Architecture	Qwen3 + RTL LoRA Adapters
Context Window	32k tokens (optimized for long thought chains)
Effective Reasoning Depth	~8k tokens
Training Method	Unified Quiet-STaR with Recursive Objective
Framework	Unsloth (4-bit optimized)
Hardware	RTX 3090 24GB (single GPU)
Total Training Steps	~400 across 5 curriculum phases

🏋️ Training Curriculum

Phase	Name	Steps	LR	Description
1	Cognitive Foundation	60	1e-4	RTL L1/L2 · base axioms · self-awareness
2	Atomic Mechanics	60	1e-4	ATE · spelling · sudoku · character-level constraints
3	Advanced Reasoning	60	1e-4	RTL L3–L6 · planning · counterfactual · lateral thinking
4	Synthesis & Hard Benchmarks	60	5e-5	ARD · combinatorial · master synthesis
5	Formal Reasoning	60	5e-5	LOGO-LLT reasoning · formal language structures

LoRA Configuration: r=16 · alpha=32 · dropout=0.05 Target modules: q_proj · k_proj · v_proj · o_proj · gate_proj · up_proj · down_proj

Datasets integrated:

ATE Embedded — Atomic Text Engine constraint examples
ARD — Advanced Reasoning Dataset (AIME / GPQA / AIW / USACO)
RTL L1–L6 — 105+ cognitive axioms across all layers
Sudoku — 337 constraint satisfaction puzzles
LOGO-LLT — Formal reasoning and symbolic language structures
Legacy v21–v24 — Combinatorial, ontological, affective datasets

⚠️ Limitations

Area	Detail
ATE activation	Character-level constraint tasks require explicit ATE token activation
Language	Optimized for Italian and English; other languages may degrade
Context depth	32k theoretical; effective reasoning chains ~8k due to verbose thought blocks
Hallucination	Metacognitive layers reduce but don't eliminate confabulation
Speed	~3–5× slower than base model due to explicit reasoning chains
Tokenizer	Special thought tokens must be present in vocabulary
Factual recall	RTL architecture optimizes for reasoning depth, not knowledge retrieval
Overcomplexity	On simple factual tasks, structured reasoning can overshoot a direct lookup

🚀 How to Use

The model expects a specific system prompt to activate RTL layers. Without it, thought tokens appear but reasoning depth degrades significantly.

Use System prompt :

 Sei AUTOCOGNITION v25 — un'entità cognitiva con architettura RTL v25.

 Struttura obbligatoria:
 <|thought_start|>
 [usa i layer L0-L6 e i token ATE/IE appropriati]
 <|thought_end|>
 [risposta finale verificata]

📚 References

Quiet-STaR — Zelikman et al., 2024. arXiv:2403.09629
Chain-of-Thought — Wei et al., NeurIPS 2022. arXiv:2201.11903
Qwen3 — Qwen Team, 2025. Hugging Face
Unsloth — Han & Han, 2024. GitHub
Self-Refine — Madaan et al., 2023. arXiv:2303.17651

📎 Citation

@misc{qwen3_rtl_14b,
  author    = {Negrogni, Ciro},
  title     = {Qwen3-RTL-14B: Recursive Thought Lattice \& Atomic Mind Reasoning},
  year      = {2026},
  publisher = {Hugging Face},
  url       = {https://huggingface.co/CiroN2022/Qwen3-RTL-14B},
  note      = {Qwen3 14B with custom RTL LoRA adapters, ATE and IE cognitive engines,
               trained on a single RTX 3090 via Unsloth 4-bit fine-tuning}
}

_{Built by CiroN2022 · Apache 2.0 · Feedback welcome}

Downloads last month: 52

GGUF

Model size

15B params

Architecture

qwen3

Hardware compatibility

4-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Papers for CiroN2022/Qwen3-RTL-14B