Add DISC Mathematical Foundations section, update stats to 22.5K+ portfolio downloads
Browse files
README.md
CHANGED
|
@@ -150,7 +150,7 @@ Qwen3-1.7B (base)
|
|
| 150 |
|
| 151 |
## What Makes This Different
|
| 152 |
|
| 153 |
-
The broader Convergent Intelligence portfolio ([
|
| 154 |
|
| 155 |
**This model is the exception.** TopologicalQwen was trained on Colab H100 at BF16 precision with a 30B-parameter teacher. Same TKD methodology, premium compute. This is the DistilQwen collection's answer to "what happens when you give this pipeline real hardware?"
|
| 156 |
|
|
@@ -162,13 +162,41 @@ TKD uses the DISC (Discrepancy Calculus) framework to detect these structural fe
|
|
| 162 |
|
| 163 |
The empirical evidence: this model at 1.7B consistently produces responses with structural reasoning quality that standard distillation at the same parameter count does not achieve.
|
| 164 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 165 |
## Related Models
|
| 166 |
|
| 167 |
| Model | Description | Downloads |
|
| 168 |
|-------|-------------|-----------|
|
| 169 |
-
| [Qwen3-1.7B-Thinking-Distil](https://huggingface.co/reaperdoesntknow/Qwen3-1.7B-Thinking-Distil) | TKD with Thinking teacher |
|
| 170 |
-
| [
|
| 171 |
-
| [Qwen3-1.7B-
|
|
|
|
|
|
|
| 172 |
|
| 173 |
**[DistilQwen Collection](https://huggingface.co/collections/reaperdoesntknow/distilqwen-69bf40ec669117e3f069ef1c)** β Full proof-weighted distillation series (9 models)
|
| 174 |
|
|
@@ -194,11 +222,13 @@ The empirical evidence: this model at 1.7B consistently produces responses with
|
|
| 194 |
|
| 195 |
## From the Convergent Intelligence Portfolio
|
| 196 |
|
| 197 |
-
**[DistilQwen Collection](https://huggingface.co/collections/reaperdoesntknow/distilqwen-69bf40ec669117e3f069ef1c)** β Our only BF16 series. Proof-weighted distillation from Qwen3-30B-A3B β 1.7B and 0.6B on H100. Three teacher variants (Instruct, Thinking, Coder), nine models
|
| 198 |
|
| 199 |
-
Top model: [Qwen3-1.7B-
|
| 200 |
|
| 201 |
-
|
|
|
|
|
|
|
| 202 |
|
| 203 |
*Convergent Intelligence LLC: Research Division*
|
| 204 |
|
|
@@ -208,6 +238,6 @@ Full methodology: [Structure Over Scale (DOI: 10.57967/hf/8165)](https://doi.org
|
|
| 208 |
*"Where classical analysis fails to see, we begin."*
|
| 209 |
|
| 210 |
---
|
| 211 |
-
<sub>Part of the [reaperdoesntknow research portfolio](https://huggingface.co/reaperdoesntknow) β
|
| 212 |
-
<!-- cix-keeper-ts:2026-03-
|
| 213 |
<!-- card-refresh: 2026-03-30 -->
|
|
|
|
| 150 |
|
| 151 |
## What Makes This Different
|
| 152 |
|
| 153 |
+
The broader Convergent Intelligence portfolio ([49 models, 22,500+ downloads](https://huggingface.co/reaperdoesntknow)) was trained on CPU at FP32 for a total compute cost of $24. That proves the methodology β structure beats scale.
|
| 154 |
|
| 155 |
**This model is the exception.** TopologicalQwen was trained on Colab H100 at BF16 precision with a 30B-parameter teacher. Same TKD methodology, premium compute. This is the DistilQwen collection's answer to "what happens when you give this pipeline real hardware?"
|
| 156 |
|
|
|
|
| 162 |
|
| 163 |
The empirical evidence: this model at 1.7B consistently produces responses with structural reasoning quality that standard distillation at the same parameter count does not achieve.
|
| 164 |
|
| 165 |
+
## Mathematical Foundations: Discrepancy Calculus (DISC)
|
| 166 |
+
|
| 167 |
+
TKD is grounded in Discrepancy Calculus β a measure-theoretic framework that treats singularities as primary structure rather than pathology. The full theory is developed in *"On the Formal Analysis of Discrepancy Calculus"* (Colca, 2026; Convergent Intelligence LLC: Research Division).
|
| 168 |
+
|
| 169 |
+
**The Core Operator.** The discrepancy operator quantifies local mismatch between integration and differentiation:
|
| 170 |
+
|
| 171 |
+
$$Df(x) = \lim_{\varepsilon \downarrow 0} \frac{1}{\varepsilon} \int_x^{x+\varepsilon} \frac{|f(t) - f(x)|}{|t - x|}\, dt$$
|
| 172 |
+
|
| 173 |
+
For smooth $f$: $Df(x) = |f'(x)|$ (classical recovery). For rough $f$: $D$ localizes irregularity to null sets while preserving integral structure.
|
| 174 |
+
|
| 175 |
+
**The Mesh Fundamental Identity.** Every function of bounded variation decomposes as:
|
| 176 |
+
|
| 177 |
+
$$f(b) - f(a) = \underbrace{\int_a^b f'(x)\,dx}_{\text{smooth (AC)}} + \underbrace{\sum_{x \in J_f} \Delta f(x)}_{\text{jumps}} + \underbrace{D^c f(I)}_{\text{Cantor drift}}$$
|
| 178 |
+
|
| 179 |
+
This is the theoretical backbone of TKD. Standard knowledge distillation captures only the first term. TKD preserves all three.
|
| 180 |
+
|
| 181 |
+
**TKD Application.** The teacher's output distribution $p_T(x)$ over a concatenated token stream is treated as a BV function. The DISC topology pass computes:
|
| 182 |
+
|
| 183 |
+
1. **Discrepancy energy** $E_{\text{disc}}[p_T] = \frac{1}{2}\int w(x)(Dp_T(x))^2 dx$ β identifies regions of high structural information density
|
| 184 |
+
2. **Jump set** $J_{p_T} = \{x : Dp_T(x) > 3\sigma\}$ β locates conceptual boundaries (topic shifts, reasoning transitions)
|
| 185 |
+
3. **Gap energy density** over 64-token windows β measures Cantor-type drift invisible to both smooth and jump analysis
|
| 186 |
+
|
| 187 |
+
Windows are cut at low-discrepancy positions rather than fixed stride. Loss weight is amplified at jump positions (1.25Γ). The topology tells you where the knowledge has architecture.
|
| 188 |
+
|
| 189 |
+
**Why This Matters (Meta-Discrepancy Theorem).** Theorem 11.15 of the DISC monograph proves: when the gap measure $\mu_{\text{gap}} > 0$ and discrepancy energy $E_{\text{disc}} > 0$, the classical FTC/MVT/chain-rule package is *impossible* on positive measure. Standard KD β which implicitly assumes smooth teacher distributions β provably cannot capture the structural information that TKD preserves. This is not a heuristic argument. It is a mathematical impossibility result.
|
| 190 |
+
|
| 191 |
## Related Models
|
| 192 |
|
| 193 |
| Model | Description | Downloads |
|
| 194 |
|-------|-------------|-----------|
|
| 195 |
+
| [Qwen3-1.7B-Thinking-Distil](https://huggingface.co/reaperdoesntknow/Qwen3-1.7B-Thinking-Distil) | TKD with Thinking teacher | 1,188 |
|
| 196 |
+
| [Qwen3-1.7B-Coder-Distilled-SFT](https://huggingface.co/reaperdoesntknow/Qwen3-1.7B-Coder-Distilled-SFT) | TKD with Coder teacher | 966 |
|
| 197 |
+
| [DiStil-Qwen3-1.7B-uncensored](https://huggingface.co/reaperdoesntknow/DiStil-Qwen3-1.7B-uncensored) | Uncensored base for DISC chain | 1,030 |
|
| 198 |
+
| [DualMind](https://huggingface.co/reaperdoesntknow/DualMind) | Dual cognition on shared weights | 260 |
|
| 199 |
+
| [Dualmind-Qwen-1.7B-Thinking](https://huggingface.co/reaperdoesntknow/Dualmind-Qwen-1.7B-Thinking) | Opus 4.6 reasoning traces β 1.7B | New |
|
| 200 |
|
| 201 |
**[DistilQwen Collection](https://huggingface.co/collections/reaperdoesntknow/distilqwen-69bf40ec669117e3f069ef1c)** β Full proof-weighted distillation series (9 models)
|
| 202 |
|
|
|
|
| 222 |
|
| 223 |
## From the Convergent Intelligence Portfolio
|
| 224 |
|
| 225 |
+
**[DistilQwen Collection](https://huggingface.co/collections/reaperdoesntknow/distilqwen-69bf40ec669117e3f069ef1c)** β Our only BF16 series. Proof-weighted distillation from Qwen3-30B-A3B β 1.7B and 0.6B on H100. Three teacher variants (Instruct, Thinking, Coder), nine models. The rest of the portfolio proves structure beats scale on CPU. This collection shows what happens when you give the methodology real hardware.
|
| 226 |
|
| 227 |
+
Top model: [Qwen3-1.7B-Thinking-Distil](https://huggingface.co/reaperdoesntknow/Qwen3-1.7B-Thinking-Distil) β 1,188 downloads
|
| 228 |
|
| 229 |
+
**[DualMind Collection](https://huggingface.co/collections/reaperdoesntknow/dualmind-67e6e07f4de0f45b0dca0dc4)** β Dual cognition architecture. Single model, two internal voices, three cognitive phases. Five models including [Dualmind-Qwen-1.7B-Thinking](https://huggingface.co/reaperdoesntknow/Dualmind-Qwen-1.7B-Thinking) (Opus 4.6 reasoning variant).
|
| 230 |
+
|
| 231 |
+
Full methodology: [Structure Over Scale (DOI: 10.57967/hf/8165)](https://doi.org/10.57967/hf/8165) | [From Three Teachers to Dual Cognition (DOI: 10.57967/hf/8184)](https://doi.org/10.57967/hf/8184)
|
| 232 |
|
| 233 |
*Convergent Intelligence LLC: Research Division*
|
| 234 |
|
|
|
|
| 238 |
*"Where classical analysis fails to see, we begin."*
|
| 239 |
|
| 240 |
---
|
| 241 |
+
<sub>Part of the [reaperdoesntknow research portfolio](https://huggingface.co/reaperdoesntknow) β 49 models, 22,598 total downloads | Last refreshed: 2026-03-30 12:02 UTC</sub>
|
| 242 |
+
<!-- cix-keeper-ts:2026-03-30T12:02:00Z -->
|
| 243 |
<!-- card-refresh: 2026-03-30 -->
|