Dualmind-Qwen-1.7B-Thinking

Claude Opus 4.6 Reasoning Traces → 1.7B via DualMind SFT

Convergent Intelligence LLC: Research Division


What This Is

A 1.7B model trained on 2.5M+ tokens of Claude Opus 4.6 reasoning traces using the DualMind SFT methodology. The training data comes from Opus-4.6-Reasoning-3000x-filtered — a curated dataset of extended reasoning chains from Anthropic's most capable model, with refusals removed.

This is the Opus variant of the DualMind family. Where the base DualMind model was trained on LogicInference data, this model absorbs the reasoning patterns of Claude Opus 4.6 — longer chains, more nuanced self-correction, and richer deliberative structure. The Opus teacher produces qualitatively different reasoning than synthetic logic datasets: it backtracks, hedges, reconsiders, and synthesizes in ways that reflect genuine uncertainty navigation rather than pattern completion.

The base model is Disctil-Qwen3-1.7B — already DISC-refined and sitting in the middle of the DistilQwen distillation chain — giving it a strong structural foundation before the Opus reasoning signal is applied.

Architecture

Parameter Value
Architecture Qwen3ForCausalLM
Parameters ~2.03B (1.7B effective)
Hidden Size 2048
Layers 28
Attention Heads 16 (Q) / 8 (KV) — GQA
Intermediate 6144
Head Dimension 128
Context Length 40,960 tokens (max position)
Vocabulary 151,936
Precision BF16
Activation SiLU

Training

Parameter Value
Base Model Disctil-Qwen3-1.7B
Dataset Opus-4.6-Reasoning-3000x-filtered
Additional Tokens ~2.5M
Max Sequence Length 4,096
Total Steps 512
Epochs ~7.4
Method SFT (TRL SFTTrainer)
Precision BF16
Hardware NVIDIA H100

Training Dynamics

Metric Start End
Training Loss 1.744 1.455
Eval Loss 1.406
Token Accuracy 61.0% 67.8%

The loss curve shows clean convergence across 7.4 epochs with no signs of overfitting — eval loss (1.406) remains below final training loss (1.455). The 6.8 percentage point gain in token accuracy reflects genuine absorption of the Opus reasoning structure, not memorization.

Why Opus Traces

The Opus-4.6-Reasoning dataset captures something that synthetic datasets don't: the way a frontier model navigates genuine uncertainty. Opus doesn't just solve problems — it reasons about its own confidence, backtracks when a line of thought weakens, and synthesizes across multiple attempted approaches. When you distill from these traces, the student doesn't just learn to produce correct answers. It learns the shape of deliberation.

This is the DualMind thesis in practice: the cognitive loop (explore → examine → respond) isn't an architectural trick. It's a training signal. When the teacher naturally exhibits multi-phase reasoning, the student absorbs that structure through standard SFT.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "reaperdoesntknow/Dualmind-Qwen-1.7B-Thinking",
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(
    "reaperdoesntknow/Dualmind-Qwen-1.7B-Thinking"
)

messages = [
    {"role": "user", "content": "What happens to information that falls into a black hole? Walk me through the paradox."}
]

text = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

output = model.generate(
    **inputs,
    max_new_tokens=2048,
    do_sample=True,
    top_p=0.9,
    temperature=0.7,
    repetition_penalty=1.15
)

print(tokenizer.decode(output[0], skip_special_tokens=True))

Generation Tips

  • Temperature 0.6–0.8 — the Opus reasoning traces have natural variance in them. Don't flatten it with low temperature.
  • Repetition penalty 1.1–1.2 — prevents looping during extended reasoning chains.
  • Max tokens 1024–2048 — trained at 4096 max seq, so it can go long. The Opus signal rewards longer generation windows.
  • The model may produce multi-phase reasoning naturally (exploring, then reconsidering, then concluding). This is the intended behavior — the DualMind cognitive loop emerging from the training signal.

Model Lineage

Qwen3-1.7B (base)
  → DiStil-Qwen3-1.7B-uncensored (uncensored SFT)
    → Disctil-Qwen3-1.7B (DISC refinement)
      → Dualmind-Qwen-1.7B-Thinking ← you are here
           ↑
    Opus 4.6 reasoning traces (2.5M tokens, DualMind SFT)

DualMind Family Comparison

Model Training Signal Character
DualMind LogicInference Structured logical deduction
Dualmind-Qwen-1.7B-Thinking Opus 4.6 Reasoning Extended deliberation, self-correction
TopologicalQwen 30B-Thinking (TKD) Topology-aware physics CoT

Same methodology, different teachers, different capabilities. The LogicInference variant is more mechanical. The Opus variant is more deliberative. TopologicalQwen is the full TKD pipeline with BV decomposition. They're complementary — different facets of the same cognitive architecture.

DualMind Collection

Model Description
DualMind LogicInference-trained. Explore→Examine→Response cognitive loop.
DualMind_Methodology Paper: Three Teachers to Dual Cognition (DOI: 10.57967/hf/8184)
Dualmind-Qwen-1.7B-Thinking ← this model. Opus 4.6 reasoning variant.
DualMind-GGUF LogicInference variant quantized for edge deployment.

Full collection: DualMind on HuggingFace

Papers

License

Apache 2.0

Mathematical Foundations: Discrepancy Calculus (DISC)

This model's training pipeline is grounded in Discrepancy Calculus — a measure-theoretic framework that treats singularities as primary structure rather than pathology. Full theory: "On the Formal Analysis of Discrepancy Calculus" (Colca, 2026; Convergent Intelligence LLC: Research Division).

The Core Operator:

Df(x)=limε01εxx+εf(t)f(x)txdtDf(x) = \lim_{\varepsilon \downarrow 0} \frac{1}{\varepsilon} \int_x^{x+\varepsilon} \frac{|f(t) - f(x)|}{|t - x|}\, dt

For smooth $f$: $Df(x) = |f'(x)|$. For rough $f$: $D$ localizes irregularity to null sets while preserving integral structure.

The Mesh Fundamental Identity — every BV function decomposes as:

f(b)f(a)=abf(x)dxsmooth (AC)+xJfΔf(x)jumps+Dcf(I)Cantor driftf(b) - f(a) = \underbrace{\int_a^b f'(x)\,dx}_{\text{smooth (AC)}} + \underbrace{\sum_{x \in J_f} \Delta f(x)}_{\text{jumps}} + \underbrace{D^c f(I)}_{\text{Cantor drift}}

Standard knowledge distillation captures only term 1. Topological Knowledge Distillation (TKD) preserves all three by treating the teacher's output distribution as a BV function and computing discrepancy energy, jump sets, and gap energy density before training begins.

Citation

@misc{colca2026dualmind,
  title={Three Teachers to Dual Cognition: From Knowledge Distillation to Emergent Reasoning},
  author={Colca, Roy},
  year={2026},
  doi={10.57967/hf/8184},
  publisher={Convergent Intelligence LLC: Research Division}
}

Convergent Intelligence LLC: Research Division — 49 models, 22,598+ downloads across the portfolio. Full portfolio | DualMind Collection | DistilQwen Collection

Downloads last month
-
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for reaperdoesntknow/Dualmind-Qwen-1.7B-Thinking

Datasets used to train reaperdoesntknow/Dualmind-Qwen-1.7B-Thinking

Collection including reaperdoesntknow/Dualmind-Qwen-1.7B-Thinking