Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
AbstractPhil 
posted an update 5 days ago
Post
249
geolip-bertenstein-v1 - 5 experts chosen. A collective of shared transformer aligned experts, not a mixture of experts. Similar to a MOE, but not quite. This first prototype won't have the full mailing projection relay system afforded by the geofractal router, but it will definitely be a solid prototype.

It is not production ready yet, there needs to be a few upstream and downstream tools meant to consume and process the outputs to create useful representations.

This model will be able to text respond, use whisper, see with dinolip, code with codebert, and process proteins using esm2_t33_650m_ur50.

Our experts for the prototype are;
google-bert/bert-large-uncased
facebook/dinov2-large
microsoft/codebert-base
openai/whisper-large-v3
facebook/esm2_t33_650M_UR50

Not the smartest text model, but more than enough for this preliminary use case test setup. Text is predominantly meant to align and orient downward function, the entire machine is meant to be operated unilaterally as a collective, or independently through individual pairs requests via special token access.

This model will be capable of substantial power and feats as a prototype. It will be capable of seeing and processing differential equations utilizing dinov2 and esm2 data simultaneously, which can be used for downstream analysis - and I WILL use that data to create a more powerful connection between dinov2 tokens, protein tokens, video tokens, code tokens, and audio tokens.

This is the FIRST prototype of this case, and I will introduce video, genetics, shape analysis, pattern recognition processing, and a much more powerful and reusable text model.

The tests show the models can have differential communication through the geolip transformers after procrustes pairwise analysis and pentachoron CV protective measures.

Whitening procrustes for precalculation and center-aligning allows for a faster convergence, so that should help too.

This experiment has exposed a series of potential uses of this procrustes formula hybrid with geometry, and the largest most useful utility I can think is to directly encode huge amounts of information into compacted multishot memory space.
Collapsing huge amounts of tokens into small spaces for high-fidelity relational understanding and use.

So with that thought, I'll be creating a longterm and shortterm memory composite for context window expansion, and then give Bert-Large... a larger context window. Much larger. This isn't something I can decide for how much context I can give Bert, as I've tried larger Berts in the past and they collapse quickly to nearly useless.

This however, will hold. It does not collapse, there is no room to collapse. The real question now, is how to design it, which layers to utilize for expanding that structure, the most useful multi-shot spectrum to access bert to pool the encodings, and the most useful methodology for extracting those expected outcomes in a reasonable way... without needing an arm and a leg to train Bert.

So the real problem is cost now, rather than simply tests or experiment potentials. How much will it cost to train Bert, how large can the context window be within that cost, and how many days will it take to train this expanded bert.

A brief analysis as to what I plan to do, is essentially memory is an accumulation of tokens creating a series of points on a geometric manifold, allowing guaranteed anchored differential accumulation responses. This is akin to allowing high dimensional representational boundaries in a dimensional spectral boundary that exists outside of the current system and is not currently observed in standard short term nor long term AI paradigms.

Each token is represented as potentially one, a thousand, or 500,000 representative systemic accumulations within Bert - this value is based on the resolution I want to impose. This is the geometric vocabulary's manifold control access, and where the system will live. This isn't additive, this is accumulative geometric differentiation. A far different beast that includes a large series of formulas to manifest even a theorem for.

If this works, the results will be immediate.

This is starting to get tedious. I'm going to need to start making geofractal routers to save time and form reusable components, which will enable a more reusable and easier to load structure that compiles. It'll be a little more annoying to run on other systems until I get things worked out overall, but it's going to be required soon.

In this post