I expect the sheer geometric alignment alone to yield a new form of Adam tuning specific to introspective analytical alignment and with that a new format of optimizer dedicated to geometric preservation in conjunction with informational data accumulation. I also expect a new methodology for larger-buffer data movement kernel-wise, a structural boundary for SVD limitations within full spectrum, a substructure measured collapse state of SVD when projected, and multiple other models that will have hiccups and growing pains.
These tools are all building to the end-state format, which will express everything simultaneously in order to combine the necessary data from many many forms of models together, without requiring direct tooling to each model simultaneously.
Such finalized tools will include a reusable pretrained geometric patchwork that exhibits all the necessary traits of a geometric structure in it's frozen state, capable of being finetuned quickly into any other state, or simply utilized as a lookup beacon with the correct geometric transformer alignment.
The geometric transformer, which is specifically a revamped format for the transformer intentionally designed with the structural preservation of the overarching structure in mind, rather than falling directly to the naturalistic entropy of immediate solution over larger-scale contribution. This system will not replace rope, it will contribute to the concept of long-concept preservation and work hand-in-hand with systems like rope, attention, and original transformers simultaneously. ROPE based models will benefit most from this structure, as they are already trained intrinsically with alignment and rotation at their cores.
The geometric transformer by design takes nth inputs in as variant states, and those are transformed internally. Utilizing this by it's default state will yield by design, but it will require tuning and curation for specific use cases no matter which case. This is conceptually familiar to those who use transformers, and simultaneously intimidating to those who understand what I'm describing I'd think. I myself am a little intimidated that I'm this close as-is.
There are multiple other prototypes at work all leading to the geometric transformer, which will be both an empirically superior utility to any of the utilities I currently use, and embody the very essence of the geometric structure that I'm currently working with as a full trainable data mutation operation - meant to directly attenuate the structure of the observation, to the expectation of the autograd and gradients.
Getting pretty close to a few pieces, but not there yet.