--- license: apache-2.0 --- # I've tasked Claude with an impossible task With this information I will do my best to connect the most optimal machines at my disposal to make the fastest representation of this system I can. With rapid prototypes I can begin debugging. ![image](https://cdn-uploads.huggingface.co/production/uploads/630cf55b15433862cfc9556f/uxauW8jYtxI4_lkOZcBMu.png) # Routing has some real optimization problems I'm going to work out a kernel to handle routing properly for further testing. Currently it's just too damn slow. Because it's so slow I can't test optimization tweaks, settings for tasks, and so on. It's robust enough to handle them just too slow currently. # Trained cross-token task functional The V2 structure houses the cross-token task functionally and usefully, which makes it a viable option for utility along the same measures as standard nth token prediction. Higher sequence capacity introduces higher speed and better performance returns. ``` ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ TEST 1: Throughput — v2 relay vs v1 sorting hat vs attention ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ S v2_relay v1_hat attn v2/v1 v2/a v2_MB 64 2.90ms 2.99ms 0.11ms 0.97× 26.52× 25 256 2.83ms 3.01ms 0.11ms 0.94× 25.28× 22 1024 2.89ms 3.06ms 0.11ms 0.95× 26.12× 29 4096 3.02ms 3.22ms 0.21ms 0.94× 14.30× 67 16384 3.34ms 3.57ms 1.07ms 0.93× 3.12× 217 32768 4.01ms 4.28ms 3.54ms 0.94× 1.13× 419 65536 5.48ms 5.80ms 11.99ms 0.95× 0.46× 821 131072 8.80ms 9.36ms 49.09ms 0.94× 0.18× 1627 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ TEST 4: Trained Cross-Token Task Label = (token_0_class + token_1_class) % 10 4 layers, 500 steps, S=8 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ arch acc loss cross_Δ params pure_relay 52.5% 1.6210 0.0000 6,856,462 v2_relay 98.0% 0.0673 0.9906 8,490,790 v1_hat 95.7% 0.1510 1.0328 12,124,970 attention 96.6% 0.1179 14.1474 1,581,834 ``` # The benchmarks are promising. ![image](https://cdn-uploads.huggingface.co/production/uploads/630cf55b15433862cfc9556f/UkSK-b2ZrmOzL-U0EX-0l.png) The lack of limitations are intensely potent. The cantor routing constellation had the beatrix staircase extracted, and the constellation implanted inheriting all of the utilities. By flat replacing the stairs, the constellation system inherited O(S) token sequencing with guaranteed geometric preservation. This is a massive boost to sequence and token control. # Need to prototype a cross-token relay sequence As it stands the one core weakness is single-token attending sequence streams, however that's about to change. This was never a limit as Claude keeps appending to the analysis, or GPT implied. We solved this quite a while ago with the cantor routing, it's just a matter of tapping into the necessary pieces. # I'm aware of the high cost potential To expand current systems, the cost potential is extensive to utilize geometric structures. I'll be working out reduced cost solutions to ensure current models can be expanded without destruction or heavy refitting. The distillation training is one of the elements, and one of those most important elements are directly targeting an array of medical models. My primary targets are diffusion, text processing, llm, medical classification, genetic structure, atomic structure, astronomy, physics, code, and a primary series of utilities meant to prepare models rapidly and quickly as needed, rather than having to wait for days or months for cooking. The idea here is to enhance every wing of science if possible, and to leave behind a breadcrumb trail for either people or AI to dig through later for experiment results, utilization capacity, test conceptualizations, weights to utilize, and more.