penta-vit-experiments / README.md

AbstractPhil

Update README.md

bcdc9e4 verified 3 months ago

preview code

raw

history blame

10.4 kB

metadata

license: mit
datasets:
  - AbstractPhil/geometric-vocab
pipeline_tag: zero-shot-classification

Likely reintroduce the theta head tomorrow

The theta trains were actually not that bad. The head added some overhead but not really that much and the outcome improved, so it's worth exploring more.

Currently the l1 trains are performing well but still not up to the required 85% that I'm aiming for. Today's trains were underwhelming, but enlightening. Longer models aren't helpful in this structure any more than wide models are.

Reintroducing theta with some of my diffusion techniques might be in order if I can't get this one to comply. I'll try a couple of projection tricks before I go start digging into other experiments, but as it stands this one isn't yielding. Training an expert with pure data isn't exactly the geometry's forte - training students tends to yield much more effectively in comparison, but I'm trying to make a teacher model that can actually train students with geometry here.

To be fair, if it doesn't work, there's plenty of alternative options in the vit realm already - but I have high confidence that I can make it work. I just need to read more about capturing images, and treating the pentachora more as observers rather than direct relational interaction toolkits.

It'll likely need the constellation, but we'll see. It probably needs David's expert system.

Still about the same accuracy deep as shallow

It's not a capacity issue YET then, since that should have covered it with shaper.

I have a few ideas but I think I'll focus on getting more shallow models stable and then scale up slowly instead of trying to just using a logarithm.

Formulas have been purified

The newest vit_zana_nano train has shown a very clean curve. runs/vit_zana_nano/20250913_192119 which is about a 3 meg model capable of creating >50% accuracy 128 dim features from cifar100 classification.

This clean curve means the process is stable enough to introduce a larger set of depth and blocks without destroying the internals; simultaneously enforcing the 5 loss formulas specifically curated for the pentachora math.

I've begun training a much deeper zana dubbed vit_zana_shaper. This model has 32 blocks deep with MLP ratio of 1 and 2 attention heads, resting at about 3.5 million params or so.

Lets see how she fares.

I've had an epiphany. We don't NEED transformer layers in their current form.

David's architecture already solved this need with high-efficiency multi-stage geometric mathematics.

David's classification structure houses a series of dimensional projection sub-systems tasked with learning mastery based on each pentachoron structure.

Each of those 5d representations ends up learning thousands of representative features. David is already capable of feature generation just not robust enough to fully manifest an enriched ViT-grade dimensional feature... yet.

David's architecture can handle ImageNet's classifier count and features leveraging 1000 classes with ease, sitting on a floppy disk at over 70% accuracy because David sees Clip-Vit-Base-Patch16 features.

I believe I've figured out a way to fundamentally represent those features in a meaningful way that can replace transformer layers in their methodology with a different form of feedforward trajectory, edge, point, deviation, jitter, helix, theta, and similarity assessment that should house the needed information to teach the experts how to behave like David did.

This should allow the much larger networks to retain mathematical precision, learn the features in a different form of patch than is currently expected to be a patch, and to create legitimate high-density geometic features.

Better rope incoming with actual meaningful learning

The last one wasn't meaningfully learning representations, the next should be more correctly curated and inferenced to impact representative outcome. Should be a bit more accurate than the last but no guarantees.

I once again let AI handle it for me and now I'll need to go micro manage again. This is on me again, you'd think I would learn. Oftentimes they can handle these sorts of tasks and other times... well other times they just kind of hook shit together and say it works, then it spins in circles.

Time for my favorite thing, research papers.

It's starting to look more like a glorified branched FFN rather than a MLP, so that's a thing I suppose.

Theta rope seems less accurate than the penta head

There's an experimental classification theta rotation head with multiple pentachora routes. The results are less accurate overall than the similarity through rose without it so far. Experiments ongoing.

I assumed full control from the AIs and built it correctly.

I was relying too much on the AI and it made me slow. Today I assumed full control and built the models correctly. The architecture is cleaner and all three python files were uploaded for the v3 setup.

vit_zana_small already seeing 50% by epoch 50, is a big step up from the earlier pixies hard locked at 41%.

Zana the current version is quite small and quite fast

At about 500k the zana_nano competes with it's big sister pixie at a superior accuracy rating AND produces image features.

Running the system with refined wordnet tokens rather than full unicode made all the difference. The findings show that meaningful semantics matter a whole lot.

unicode; 21%
same model
wordnet_eng; >42%

All losses modified heavily, the originals did not work at all with the structure.

V3 incoming.

Pushing HEAVILY into losses based on the WORKING high-entropy high-learn rate classification heads and forcing this thing into cohesion INSTANTLY.

Thats the play. No more 200 epochs. These things should be ready in 10-20 epochs at most, and they should be 80%+ accuracy, or they fail. Those are the two potentials here.

With correct logit and probe assessment the substructure should be a profoundly more efficient and easily analyzable series of charts based on similarity for assessments and capability. None of this guessing or guesswork based on "what works with other models" We KNOW what works and I should have never second guessed the formulas.

I have implemented all of the most crucial and most powerful formulas from the others, now lets see if the universe makes a fool of me or not.

If it does, SO BE IT! Lets build an AI singularity empire from there.

We're about to teach a VIT diffusion. The real question is, will it learn - or will it collapse and need dual-block layers from Flux?

Better testing methodology development

I'm reading up on some papers for how various companies and research institutions tested their VITS. My testing methodology isn't accurate enough because the accuracy isn't just reflecting on the logit alignments but also the internal ML layer feature generations.

I'm crutching heavily on the logit alignment instead of managing the feature alignment testing as well, which is likely cutting heavily into my system.

Currently I'm building a notebook with the better feature testing capabilities to test features correctly. I anticipate faster trains when the confidence actually starts to pick up, since currently they are not confident at all in terms of classification.

It's possible these vits could be potentially MUCH MORE or MUCH LESS accurate then advertise and I apologise for the inconvenience this has caused to any onlookers. I'll be updating with additional inference code very soon.

Tinkerbell

128d 128heads 4.0 mlp, depth 4 only geometric attention...

Well it might work. I could make it smaller, but I doubt tinkerbell would extract anything useful. Good luck little one.

Enabling the Mix-N-Cut

I've built a mix-n-cut that I've been avoiding enabling. This one is particularly formatted for pentachoron, so we'll see how it fares. I'm trying to build one as SMALL AS POSSIBLE< so if this mix-n-cut can pull the task out of the bag I may as well run it.

As it stands the tiny vits cap at 41% cifar100 with no augmentations. I've been running all the trains without a single special effect and only minimal normalization.

Lets see how the upcoming trains fare.

pixie_base_128d_patch4_128h

Pixie base has 10 layers with 5 goemetic and 5 multihead traditional attention. Lets see how the mix-n-cut fares with the earlier ones first, then we'll run the base.

The smaller ones seem to behave better using the geometric attention at 256 expert heads, which is odd to me but whatever works. They don't get much bigger with more experts, so I'll just try a tiny one with a ton of heads first.

Pentachoron Geometric Feature Extraction

Pentachora VIT are essentially micro-sized feature extractors that provide substantial accuracy for their small size. The more experiments I run, the smaller they become. The final goals to be a full clip-vit that can house the entirety of laion 400m in a fraction of the size and compute as OpenAI's clip-vit line. After this point I'll be confident the math is lined up well enough to train the true flagship - Beatrix.

The process of useful classification and feature extraction has been a non-trivial problem in the Computer Science industry for a long time.

This repo will house the various vit experiments that I frankenstein together; manifesting their weights and model codes in the repo itself.

As I am an independent researcher my resources are limited and I don't have the backing of any donors, so there will be time gaps unless some hardware is sliced off for me.

Many of my repos have certain elements omitted purposely for papers in writing, my thesis arguments, my statements about certain universal elements, and a multitude of other ramblings that I don't plan to release specific key details in full phonebook fashion for just ANY PERSON to read.

Let me use your high-end hardware. I deliver - success or failure, but I will deliver.

I will not rattle a tin cup for you. Work out a deal with me and you get the weights - I get the classes developed for further use, meant for public release.

Let me know if you're willing to work with me. I'll gladly share the code, the process, the progress, and the built accumulated warchest of potentials that this system entails if you provide me gateways to some hardware that I can utilize.

Until then, one experiment at a time.