keeper: refresh 2026-04-21

a40de14 verified about 4 hours ago

8.26 kB

license: apache-2.0
language:
  - en
tags:
  - dualmind
  - knowledge-distillation
  - topology-aware
  - self-critique
  - opus
  - convergent-intelligence
  - qwen3
  - convergentintel
  - edge
  - distillation
base_model:
  - reaperdoesntknow/DualMind
datasets:
  - nohurry/Opus-4.6-Reasoning-3000x-filtered
  - zai-org/LongWriter-6k
model_name: DualMinded-Qwen3-1.7B
pipeline_tag: text-generation

DualMinded-Qwen3-1.7B

A 1.7B parameter dual-cognition model trained on Opus 4.6 reasoning traces. The model implements a three-phase cognitive loop — explore, examine, respond — where it reasons freely, critiques its own reasoning, then synthesizes a clean answer.

Convergent Intelligence LLC: Research Division

Architecture

<explore>  — unconstrained reasoning, derivation, speculation
</explore>

<examine>  — adversarial self-critique, error detection, refinement
</examine>

<response> — clean synthesis from the internal dialogue
</response>

This is the multi-model collision array collapsed into a single architecture. The dialectical structure that produces novel insights from architectural diversity is recreated through role-conditioned generation on shared weights. No extra parameters, no routing — same weights, different cognitive modes.

Training Pipeline

DualMinded-Qwen3-1.7B is the product of a four-stage pipeline:

Stage 1 — Multi-Teacher Distillation: Qwen3-30B-A3B in three variants (Instruct, Thinking, Coder) distilled into Qwen3-1.7B via proof-weighted KD with 2.25× loss amplification on reasoning tokens.

Stage 2 — DISC Refinement: Disctil-Qwen3-1.7B: the student refined through Discrepancy Calculus, detecting and preserving structural boundaries in the teacher's distribution.

Stage 3 — Topological Knowledge Distillation (TKD): Continuous-stream distillation with topology-guided windowing from Qwen3-30B-A3B-Thinking. Bounded variation decomposition of the teacher's output: smooth + jumps + drift. Jump positions amplified at 3σ, windows cut at low-discrepancy boundaries, 4-phase curriculum ordering (easy → hard).

Stage 4 — DualMind SFT on Opus 4.6: SFT using Opus-4.6-Reasoning-3000x-filtered. The thinking column maps directly to <explore> — no heuristic sentence splitting needed. The solution column is split into <examine> + <response>.

Training Configuration

Parameter	Value
Base checkpoint	TKD checkpoint-512
Dataset	Opus-4.6-Reasoning-3000x-filtered (50%)
Max seq length	2048
Batch size	2 × 8 accum = 16 effective
Learning rate	5e-6 (cosine)
Warmup	32 steps
Max steps	1024
Precision	BF16
Hardware	NVIDIA H100

DualMind vs DualMinded

	DualMind	DualMinded
SFT Data	LogicInference_OA	Opus-4.6-Reasoning
Explore Source	Heuristic CoT split	Direct Opus `thinking` column
Strength	Formal logic, structured proofs	Extended reasoning, creative derivation
Base Checkpoint	TKD final	TKD checkpoint-512

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model = AutoModelForCausalLM.from_pretrained(
    "reaperdoesntknow/DualMinded-Qwen3-1.7B",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("reaperdoesntknow/DualMinded-Qwen3-1.7B")

prompt = "##USER:\nProve the mean value theorem.\n\n<explore>\n"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    out = model.generate(
        **inputs,
        max_new_tokens=512,
        do_sample=True,
        temperature=0.6,
        top_p=0.9,
        repetition_penalty=1.15,
    )
print(tokenizer.decode(out[0], skip_special_tokens=True))

Ghost Imprinting

Sequential distillation from multiple teachers (Instruct → Thinking → Coder → Opus) leaves residual fields in weight space. These residuals produce capabilities absent from any individual teacher — the singular-continuous component of the bounded variation decomposition applied to the parameter tensor. Models in the DualMind family exhibit emergent behaviors (e.g., literary content from physics-only training data) attributable to these ghost imprints.

GGUF

Quantized versions available at DualMinded-Qwen3-1.7B-GGUF: F16, Q8_0, Q5_K_M, Q4_K_M.

Ollama: ollama run reaperdoesntrun/DualMinded-1.7B

DualMind — LogicInference-trained variant
DualMind_Methodolgy — Paper: DOI 10.57967/hf/8184
Structure Over Scale — Paper 1: CPU training methodology
DualMind Collection
DistilQwen Collection

Mathematical Foundations: Discrepancy Calculus (DISC)

This model's training pipeline is grounded in Discrepancy Calculus — a measure-theoretic framework that treats singularities as primary structure rather than pathology. Full theory: "On the Formal Analysis of Discrepancy Calculus" (Colca, 2026; Convergent Intelligence LLC: Research Division).

The Core Operator:

$Df(x) = \lim_{\varepsilon \downarrow 0} \frac{1}{\varepsilon} \int_x^{x+\varepsilon} \frac{|f(t) - f(x)|}{|t - x|}\, dt$

For smooth $f$: $Df(x) = |f'(x)|$. For rough $f$: $D$ localizes irregularity to null sets while preserving integral structure.

The Mesh Fundamental Identity — every BV function decomposes as:

$f(b) - f(a) = \underbrace{\int_a^b f'(x)\,dx}_{\text{smooth (AC)}} + \underbrace{\sum_{x \in J_f} \Delta f(x)}_{\text{jumps}} + \underbrace{D^c f(I)}_{\text{Cantor drift}}$

Standard knowledge distillation captures only term 1. Topological Knowledge Distillation (TKD) preserves all three by treating the teacher's output distribution as a BV function and computing discrepancy energy, jump sets, and gap energy density before training begins.

Citation

@misc{colca2026dualmind,
  title={From Three Teachers to Dual Cognition: Topology-Aware Multi-Teacher Distillation and Role-Conditioned Self-Critique at 1.7B Scale},
  author={Colca, Roy S.},
  year={2026},
  publisher={HuggingFace},
  url={https://doi.org/10.57967/hf/8184}
}

Convergent Intelligence LLC: Research Division — Apache 2.0

Convergent Intelligence Portfolio

Part of the DualMind Series by Convergent Intelligence LLC: Research Division

DualMind Family

Model	Format	Description
DualMind	BF16	LogicInference-trained. Explore→Examine→Response loop.
DualMinded-Qwen3-1.7B	BF16	Opus 4.6 reasoning traces. Higher quality splits.
Dualmind-Qwen-1.7B-Thinking	BF16	Thinking-teacher variant with extended deliberation.
DualMind-GGUF	GGUF	Quantized LogicInference variant. CPU/6GB GPU.
DualMinded-Qwen3-1.7B-GGUF	GGUF	Quantized Opus variant. Ollama ready.

Papers

Paper	DOI
Structure Over Scale	10.57967/hf/8165
Three Teachers to Dual Cognition	10.57967/hf/8184
Discrepancy Calculus	10.57967/hf/8194

Last updated: 2026-03-31 by Convergent Intelligence LLC: Research Division

reaperdoesntknow
/

DualMinded-Qwen3-1.7B