license: apache-2.0
language:
- en
tags:
- dualmind
- knowledge-distillation
- topology-aware
- self-critique
- opus
- convergent-intelligence
- qwen3
- convergentintel
- edge
- distillation
base_model:
- reaperdoesntknow/DualMind
datasets:
- nohurry/Opus-4.6-Reasoning-3000x-filtered
- zai-org/LongWriter-6k
model_name: DualMinded-Qwen3-1.7B
pipeline_tag: text-generation
DualMinded-Qwen3-1.7B
A 1.7B parameter dual-cognition model trained on Opus 4.6 reasoning traces. The model implements a three-phase cognitive loop β explore, examine, respond β where it reasons freely, critiques its own reasoning, then synthesizes a clean answer.
Convergent Intelligence LLC: Research Division
Architecture
<explore> β unconstrained reasoning, derivation, speculation
</explore>
<examine> β adversarial self-critique, error detection, refinement
</examine>
<response> β clean synthesis from the internal dialogue
</response>
This is the multi-model collision array collapsed into a single architecture. The dialectical structure that produces novel insights from architectural diversity is recreated through role-conditioned generation on shared weights. No extra parameters, no routing β same weights, different cognitive modes.
Training Pipeline
DualMinded-Qwen3-1.7B is the product of a four-stage pipeline:
Stage 1 β Multi-Teacher Distillation: Qwen3-30B-A3B in three variants (Instruct, Thinking, Coder) distilled into Qwen3-1.7B via proof-weighted KD with 2.25Γ loss amplification on reasoning tokens.
Stage 2 β DISC Refinement: Disctil-Qwen3-1.7B: the student refined through Discrepancy Calculus, detecting and preserving structural boundaries in the teacher's distribution.
Stage 3 β Topological Knowledge Distillation (TKD): Continuous-stream distillation with topology-guided windowing from Qwen3-30B-A3B-Thinking. Bounded variation decomposition of the teacher's output: smooth + jumps + drift. Jump positions amplified at 3Ο, windows cut at low-discrepancy boundaries, 4-phase curriculum ordering (easy β hard).
Stage 4 β DualMind SFT on Opus 4.6:
SFT using Opus-4.6-Reasoning-3000x-filtered. The thinking column maps directly to <explore> β no heuristic sentence splitting needed. The solution column is split into <examine> + <response>.
Training Configuration
| Parameter | Value |
|---|---|
| Base checkpoint | TKD checkpoint-512 |
| Dataset | Opus-4.6-Reasoning-3000x-filtered (50%) |
| Max seq length | 2048 |
| Batch size | 2 Γ 8 accum = 16 effective |
| Learning rate | 5e-6 (cosine) |
| Warmup | 32 steps |
| Max steps | 1024 |
| Precision | BF16 |
| Hardware | NVIDIA H100 |
DualMind vs DualMinded
| DualMind | DualMinded | |
|---|---|---|
| SFT Data | LogicInference_OA | Opus-4.6-Reasoning |
| Explore Source | Heuristic CoT split | Direct Opus thinking column |
| Strength | Formal logic, structured proofs | Extended reasoning, creative derivation |
| Base Checkpoint | TKD final | TKD checkpoint-512 |
Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model = AutoModelForCausalLM.from_pretrained(
"reaperdoesntknow/DualMinded-Qwen3-1.7B",
torch_dtype=torch.bfloat16,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("reaperdoesntknow/DualMinded-Qwen3-1.7B")
prompt = "##USER:\nProve the mean value theorem.\n\n<explore>\n"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
out = model.generate(
**inputs,
max_new_tokens=512,
do_sample=True,
temperature=0.6,
top_p=0.9,
repetition_penalty=1.15,
)
print(tokenizer.decode(out[0], skip_special_tokens=True))
Ghost Imprinting
Sequential distillation from multiple teachers (Instruct β Thinking β Coder β Opus) leaves residual fields in weight space. These residuals produce capabilities absent from any individual teacher β the singular-continuous component of the bounded variation decomposition applied to the parameter tensor. Models in the DualMind family exhibit emergent behaviors (e.g., literary content from physics-only training data) attributable to these ghost imprints.
GGUF
Quantized versions available at DualMinded-Qwen3-1.7B-GGUF: F16, Q8_0, Q5_K_M, Q4_K_M.
Ollama: ollama run reaperdoesntrun/DualMinded-1.7B
Related
- DualMind β LogicInference-trained variant
- DualMind_Methodolgy β Paper: DOI 10.57967/hf/8184
- Structure Over Scale β Paper 1: CPU training methodology
- DualMind Collection
- DistilQwen Collection
Mathematical Foundations: Discrepancy Calculus (DISC)
This model's training pipeline is grounded in Discrepancy Calculus β a measure-theoretic framework that treats singularities as primary structure rather than pathology. Full theory: "On the Formal Analysis of Discrepancy Calculus" (Colca, 2026; Convergent Intelligence LLC: Research Division).
The Core Operator:
For smooth $f$: $Df(x) = |f'(x)|$. For rough $f$: $D$ localizes irregularity to null sets while preserving integral structure.
The Mesh Fundamental Identity β every BV function decomposes as:
Standard knowledge distillation captures only term 1. Topological Knowledge Distillation (TKD) preserves all three by treating the teacher's output distribution as a BV function and computing discrepancy energy, jump sets, and gap energy density before training begins.
Citation
@misc{colca2026dualmind,
title={From Three Teachers to Dual Cognition: Topology-Aware Multi-Teacher Distillation and Role-Conditioned Self-Critique at 1.7B Scale},
author={Colca, Roy S.},
year={2026},
publisher={HuggingFace},
url={https://doi.org/10.57967/hf/8184}
}
Convergent Intelligence LLC: Research Division β Apache 2.0
Convergent Intelligence Portfolio
Part of the DualMind Series by Convergent Intelligence LLC: Research Division
DualMind Family
| Model | Format | Description |
|---|---|---|
| DualMind | BF16 | LogicInference-trained. ExploreβExamineβResponse loop. |
| DualMinded-Qwen3-1.7B | BF16 | Opus 4.6 reasoning traces. Higher quality splits. |
| Dualmind-Qwen-1.7B-Thinking | BF16 | Thinking-teacher variant with extended deliberation. |
| DualMind-GGUF | GGUF | Quantized LogicInference variant. CPU/6GB GPU. |
| DualMinded-Qwen3-1.7B-GGUF | GGUF | Quantized Opus variant. Ollama ready. |
Papers
| Paper | DOI |
|---|---|
| Structure Over Scale | 10.57967/hf/8165 |
| Three Teachers to Dual Cognition | 10.57967/hf/8184 |
| Discrepancy Calculus | 10.57967/hf/8194 |
Last updated: 2026-03-31 by Convergent Intelligence LLC: Research Division