reaperdoesntknow commited on
Commit
fa720f8
Β·
verified Β·
1 Parent(s): d3e6197

Add DISC Mathematical Foundations section, update stats to 22.5K+ portfolio downloads

Browse files
Files changed (1) hide show
  1. README.md +39 -9
README.md CHANGED
@@ -150,7 +150,7 @@ Qwen3-1.7B (base)
150
 
151
  ## What Makes This Different
152
 
153
- The broader Convergent Intelligence portfolio ([43 models, 12,000+ downloads](https://huggingface.co/reaperdoesntknow)) was trained on CPU at FP32 for a total compute cost of $24. That proves the methodology β€” structure beats scale.
154
 
155
  **This model is the exception.** TopologicalQwen was trained on Colab H100 at BF16 precision with a 30B-parameter teacher. Same TKD methodology, premium compute. This is the DistilQwen collection's answer to "what happens when you give this pipeline real hardware?"
156
 
@@ -162,13 +162,41 @@ TKD uses the DISC (Discrepancy Calculus) framework to detect these structural fe
162
 
163
  The empirical evidence: this model at 1.7B consistently produces responses with structural reasoning quality that standard distillation at the same parameter count does not achieve.
164
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
165
  ## Related Models
166
 
167
  | Model | Description | Downloads |
168
  |-------|-------------|-----------|
169
- | [Qwen3-1.7B-Thinking-Distil](https://huggingface.co/reaperdoesntknow/Qwen3-1.7B-Thinking-Distil) | TKD with Thinking teacher | 687 |
170
- | [LFM2.5-1.2B-Distilled-SFT](https://huggingface.co/reaperdoesntknow/LFM2.5-1.2B-Distilled-SFT) | Cross-architecture TKD (LFM β†’ Qwen) | 544 |
171
- | [Qwen3-1.7B-Coder-Distilled-SFT](https://huggingface.co/reaperdoesntknow/Qwen3-1.7B-Coder-Distilled-SFT) | TKD with Coder teacher | 508 |
 
 
172
 
173
  **[DistilQwen Collection](https://huggingface.co/collections/reaperdoesntknow/distilqwen-69bf40ec669117e3f069ef1c)** β€” Full proof-weighted distillation series (9 models)
174
 
@@ -194,11 +222,13 @@ The empirical evidence: this model at 1.7B consistently produces responses with
194
 
195
  ## From the Convergent Intelligence Portfolio
196
 
197
- **[DistilQwen Collection](https://huggingface.co/collections/reaperdoesntknow/distilqwen-69bf40ec669117e3f069ef1c)** β€” Our only BF16 series. Proof-weighted distillation from Qwen3-30B-A3B β†’ 1.7B and 0.6B on H100. Three teacher variants (Instruct, Thinking, Coder), nine models, 2,788 combined downloads. The rest of the portfolio proves structure beats scale on CPU. This collection shows what happens when you give the methodology real hardware.
198
 
199
- Top model: [Qwen3-1.7B-Coder-Distilled-SFT](https://huggingface.co/reaperdoesntknow/Qwen3-1.7B-Coder-Distilled-SFT) β€” 508 downloads
200
 
201
- Full methodology: [Structure Over Scale (DOI: 10.57967/hf/8165)](https://doi.org/10.57967/hf/8165)
 
 
202
 
203
  *Convergent Intelligence LLC: Research Division*
204
 
@@ -208,6 +238,6 @@ Full methodology: [Structure Over Scale (DOI: 10.57967/hf/8165)](https://doi.org
208
  *"Where classical analysis fails to see, we begin."*
209
 
210
  ---
211
- <sub>Part of the [reaperdoesntknow research portfolio](https://huggingface.co/reaperdoesntknow) β€” 48 models, 12,094 total downloads | Last refreshed: 2026-03-29 21:04 UTC</sub>
212
- <!-- cix-keeper-ts:2026-03-30T02:43:04Z -->
213
  <!-- card-refresh: 2026-03-30 -->
 
150
 
151
  ## What Makes This Different
152
 
153
+ The broader Convergent Intelligence portfolio ([49 models, 22,500+ downloads](https://huggingface.co/reaperdoesntknow)) was trained on CPU at FP32 for a total compute cost of $24. That proves the methodology β€” structure beats scale.
154
 
155
  **This model is the exception.** TopologicalQwen was trained on Colab H100 at BF16 precision with a 30B-parameter teacher. Same TKD methodology, premium compute. This is the DistilQwen collection's answer to "what happens when you give this pipeline real hardware?"
156
 
 
162
 
163
  The empirical evidence: this model at 1.7B consistently produces responses with structural reasoning quality that standard distillation at the same parameter count does not achieve.
164
 
165
+ ## Mathematical Foundations: Discrepancy Calculus (DISC)
166
+
167
+ TKD is grounded in Discrepancy Calculus β€” a measure-theoretic framework that treats singularities as primary structure rather than pathology. The full theory is developed in *"On the Formal Analysis of Discrepancy Calculus"* (Colca, 2026; Convergent Intelligence LLC: Research Division).
168
+
169
+ **The Core Operator.** The discrepancy operator quantifies local mismatch between integration and differentiation:
170
+
171
+ $$Df(x) = \lim_{\varepsilon \downarrow 0} \frac{1}{\varepsilon} \int_x^{x+\varepsilon} \frac{|f(t) - f(x)|}{|t - x|}\, dt$$
172
+
173
+ For smooth $f$: $Df(x) = |f'(x)|$ (classical recovery). For rough $f$: $D$ localizes irregularity to null sets while preserving integral structure.
174
+
175
+ **The Mesh Fundamental Identity.** Every function of bounded variation decomposes as:
176
+
177
+ $$f(b) - f(a) = \underbrace{\int_a^b f'(x)\,dx}_{\text{smooth (AC)}} + \underbrace{\sum_{x \in J_f} \Delta f(x)}_{\text{jumps}} + \underbrace{D^c f(I)}_{\text{Cantor drift}}$$
178
+
179
+ This is the theoretical backbone of TKD. Standard knowledge distillation captures only the first term. TKD preserves all three.
180
+
181
+ **TKD Application.** The teacher's output distribution $p_T(x)$ over a concatenated token stream is treated as a BV function. The DISC topology pass computes:
182
+
183
+ 1. **Discrepancy energy** $E_{\text{disc}}[p_T] = \frac{1}{2}\int w(x)(Dp_T(x))^2 dx$ β€” identifies regions of high structural information density
184
+ 2. **Jump set** $J_{p_T} = \{x : Dp_T(x) > 3\sigma\}$ β€” locates conceptual boundaries (topic shifts, reasoning transitions)
185
+ 3. **Gap energy density** over 64-token windows β€” measures Cantor-type drift invisible to both smooth and jump analysis
186
+
187
+ Windows are cut at low-discrepancy positions rather than fixed stride. Loss weight is amplified at jump positions (1.25Γ—). The topology tells you where the knowledge has architecture.
188
+
189
+ **Why This Matters (Meta-Discrepancy Theorem).** Theorem 11.15 of the DISC monograph proves: when the gap measure $\mu_{\text{gap}} > 0$ and discrepancy energy $E_{\text{disc}} > 0$, the classical FTC/MVT/chain-rule package is *impossible* on positive measure. Standard KD β€” which implicitly assumes smooth teacher distributions β€” provably cannot capture the structural information that TKD preserves. This is not a heuristic argument. It is a mathematical impossibility result.
190
+
191
  ## Related Models
192
 
193
  | Model | Description | Downloads |
194
  |-------|-------------|-----------|
195
+ | [Qwen3-1.7B-Thinking-Distil](https://huggingface.co/reaperdoesntknow/Qwen3-1.7B-Thinking-Distil) | TKD with Thinking teacher | 1,188 |
196
+ | [Qwen3-1.7B-Coder-Distilled-SFT](https://huggingface.co/reaperdoesntknow/Qwen3-1.7B-Coder-Distilled-SFT) | TKD with Coder teacher | 966 |
197
+ | [DiStil-Qwen3-1.7B-uncensored](https://huggingface.co/reaperdoesntknow/DiStil-Qwen3-1.7B-uncensored) | Uncensored base for DISC chain | 1,030 |
198
+ | [DualMind](https://huggingface.co/reaperdoesntknow/DualMind) | Dual cognition on shared weights | 260 |
199
+ | [Dualmind-Qwen-1.7B-Thinking](https://huggingface.co/reaperdoesntknow/Dualmind-Qwen-1.7B-Thinking) | Opus 4.6 reasoning traces β†’ 1.7B | New |
200
 
201
  **[DistilQwen Collection](https://huggingface.co/collections/reaperdoesntknow/distilqwen-69bf40ec669117e3f069ef1c)** β€” Full proof-weighted distillation series (9 models)
202
 
 
222
 
223
  ## From the Convergent Intelligence Portfolio
224
 
225
+ **[DistilQwen Collection](https://huggingface.co/collections/reaperdoesntknow/distilqwen-69bf40ec669117e3f069ef1c)** β€” Our only BF16 series. Proof-weighted distillation from Qwen3-30B-A3B β†’ 1.7B and 0.6B on H100. Three teacher variants (Instruct, Thinking, Coder), nine models. The rest of the portfolio proves structure beats scale on CPU. This collection shows what happens when you give the methodology real hardware.
226
 
227
+ Top model: [Qwen3-1.7B-Thinking-Distil](https://huggingface.co/reaperdoesntknow/Qwen3-1.7B-Thinking-Distil) β€” 1,188 downloads
228
 
229
+ **[DualMind Collection](https://huggingface.co/collections/reaperdoesntknow/dualmind-67e6e07f4de0f45b0dca0dc4)** β€” Dual cognition architecture. Single model, two internal voices, three cognitive phases. Five models including [Dualmind-Qwen-1.7B-Thinking](https://huggingface.co/reaperdoesntknow/Dualmind-Qwen-1.7B-Thinking) (Opus 4.6 reasoning variant).
230
+
231
+ Full methodology: [Structure Over Scale (DOI: 10.57967/hf/8165)](https://doi.org/10.57967/hf/8165) | [From Three Teachers to Dual Cognition (DOI: 10.57967/hf/8184)](https://doi.org/10.57967/hf/8184)
232
 
233
  *Convergent Intelligence LLC: Research Division*
234
 
 
238
  *"Where classical analysis fails to see, we begin."*
239
 
240
  ---
241
+ <sub>Part of the [reaperdoesntknow research portfolio](https://huggingface.co/reaperdoesntknow) β€” 49 models, 22,598 total downloads | Last refreshed: 2026-03-30 12:02 UTC</sub>
242
+ <!-- cix-keeper-ts:2026-03-30T12:02:00Z -->
243
  <!-- card-refresh: 2026-03-30 -->