| --- |
| license: apache-2.0 |
| --- |
| # I've tasked Claude with an impossible task |
|
|
| With this information I will do my best to connect the most optimal machines at my disposal to make the fastest representation of this system I can. |
|
|
| With rapid prototypes I can begin debugging. |
|
|
|
|
|  |
|
|
|
|
|
|
|
|
| # Routing has some real optimization problems |
| I'm going to work out a kernel to handle routing properly for further testing. Currently it's just too damn slow. |
|
|
| Because it's so slow I can't test optimization tweaks, settings for tasks, and so on. It's robust enough to handle them just too slow currently. |
|
|
|
|
|
|
| # Trained cross-token task functional |
|
|
| The V2 structure houses the cross-token task functionally and usefully, which makes it a viable option for utility |
| along the same measures as standard nth token prediction. Higher sequence capacity introduces higher speed and better performance returns. |
|
|
|
|
|
|
|
|
| ``` |
| ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| TEST 1: Throughput β v2 relay vs v1 sorting hat vs attention |
| ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| |
| S v2_relay v1_hat attn v2/v1 v2/a v2_MB |
| 64 2.90ms 2.99ms 0.11ms 0.97Γ 26.52Γ 25 |
| 256 2.83ms 3.01ms 0.11ms 0.94Γ 25.28Γ 22 |
| 1024 2.89ms 3.06ms 0.11ms 0.95Γ 26.12Γ 29 |
| 4096 3.02ms 3.22ms 0.21ms 0.94Γ 14.30Γ 67 |
| 16384 3.34ms 3.57ms 1.07ms 0.93Γ 3.12Γ 217 |
| 32768 4.01ms 4.28ms 3.54ms 0.94Γ 1.13Γ 419 |
| 65536 5.48ms 5.80ms 11.99ms 0.95Γ 0.46Γ 821 |
| 131072 8.80ms 9.36ms 49.09ms 0.94Γ 0.18Γ 1627 |
| |
| |
| ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| TEST 4: Trained Cross-Token Task |
| Label = (token_0_class + token_1_class) % 10 |
| 4 layers, 500 steps, S=8 |
| ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| |
| arch acc loss cross_Ξ params |
| pure_relay 52.5% 1.6210 0.0000 6,856,462 |
| v2_relay 98.0% 0.0673 0.9906 8,490,790 |
| v1_hat 95.7% 0.1510 1.0328 12,124,970 |
| attention 96.6% 0.1179 14.1474 1,581,834 |
| |
| ``` |
|
|
|
|
| # The benchmarks are promising. |
|
|
|  |
|
|
| The lack of limitations are intensely potent. |
|
|
| The cantor routing constellation had the beatrix staircase extracted, |
| and the constellation implanted inheriting all of the utilities. |
|
|
| By flat replacing the stairs, the constellation system inherited O(S) token sequencing with guaranteed geometric preservation. |
|
|
| This is a massive boost to sequence and token control. |
|
|
|
|
| # Need to prototype a cross-token relay sequence |
| As it stands the one core weakness is single-token attending sequence streams, however that's about to change. |
|
|
| This was never a limit as Claude keeps appending to the analysis, or GPT implied. |
|
|
| We solved this quite a while ago with the cantor routing, it's just a matter of tapping into the necessary pieces. |
|
|
| # I'm aware of the high cost potential |
|
|
| To expand current systems, the cost potential is extensive to utilize geometric structures. |
|
|
| I'll be working out reduced cost solutions to ensure current models can be expanded without destruction or heavy refitting. |
|
|
| The distillation training is one of the elements, and one of those most important elements are directly targeting an array of medical models. |
|
|
| My primary targets are diffusion, text processing, llm, medical classification, genetic structure, atomic structure, astronomy, physics, code, |
| and a primary series of utilities meant to prepare models rapidly and quickly as needed, rather than having to wait for days or months for cooking. |
|
|
| The idea here is to enhance every wing of science if possible, and to leave behind a breadcrumb trail for either people or AI to dig through later |
| for experiment results, utilization capacity, test conceptualizations, weights to utilize, and more. |