---
license: mit
datasets:
- AbstractPhil/geometric-vocab
pipeline_tag: zero-shot-classification
---
# I assumed full control from the AIs and built it correctly.

I was relying too much on the AI and it made me slow. Today I assumed full control and built the models correctly. The architecture is cleaner and all three python files were uploaded for the v3 setup.

vit_zana_small already seeing 50% by epoch 50, is a big step up from the earlier pixies hard locked at 41%.


# Zana the current version is quite small and quite fast
At about 500k the zana_nano competes with it's big sister pixie at a superior accuracy rating AND produces image features.

Running the system with refined wordnet tokens rather than full unicode made all the difference. The findings show that meaningful semantics matter a whole lot.

```
unicode; 21%
same model
wordnet_eng; >42%
```


# All losses modified heavily, the originals did not work at all with the structure.
V3 incoming.

Pushing HEAVILY into losses based on the WORKING high-entropy high-learn rate classification heads and forcing this thing into cohesion INSTANTLY.

Thats the play. No more 200 epochs. These things should be ready in 10-20 epochs at most, and they should be 80%+ accuracy, or they fail. Those are the two potentials here.

With correct logit and probe assessment the substructure should be a profoundly more efficient and easily analyzable series of charts based on similarity for assessments and capability. None of this guessing or guesswork based on "what works with other models" We KNOW what works and I should have never second guessed the formulas.

I have implemented all of the most crucial and most powerful formulas from the others, now lets see if the universe makes a fool of me or not.

If it does, SO BE IT! Lets build an AI singularity empire from there.

We're about to teach a VIT diffusion. The real question is, will it learn - or will it collapse and need dual-block layers from Flux?

# Better testing methodology development

I'm reading up on some papers for how various companies and research institutions tested their VITS. My testing methodology isn't accurate enough because the accuracy isn't just reflecting on the logit alignments but also the internal ML layer feature generations.

I'm crutching heavily on the logit alignment instead of managing the feature alignment testing as well, which is likely cutting heavily into my system.

Currently I'm building a notebook with the better feature testing capabilities to test features correctly. I anticipate faster trains when the confidence actually starts to pick up, since currently they are not confident at all in terms of classification.

It's possible these vits could be potentially MUCH MORE or MUCH LESS accurate then advertise and I apologise for the inconvenience this has caused to any onlookers. I'll be updating with additional inference code very soon.


# Tinkerbell
128d 128heads 4.0 mlp, depth 4 only geometric attention...

Well it might work. I could make it smaller, but I doubt tinkerbell would extract anything useful. Good luck little one.

# Enabling the Mix-N-Cut

I've built a mix-n-cut that I've been avoiding enabling. This one is particularly formatted for pentachoron, so we'll see how it fares. I'm trying to build one as SMALL AS POSSIBLE< so if this mix-n-cut can pull the task out of the bag I may as well run it.

As it stands the tiny vits cap at 41% cifar100 with no augmentations. I've been running all the trains without a single special effect and only minimal normalization.

Lets see how the upcoming trains fare.

pixie_base_128d_patch4_128h

Pixie base has 10 layers with 5 goemetic and 5 multihead traditional attention. Lets see how the mix-n-cut fares with the earlier ones first, then we'll run the base.

The smaller ones seem to behave better using the geometric attention at 256 expert heads, which is odd to me but whatever works. They don't get much bigger with more experts, so I'll just try a tiny one with a ton of heads first.


# Pentachoron Geometric Feature Extraction

Pentachora VIT are essentially micro-sized feature extractors that provide substantial accuracy for their small size. The more experiments I run, the smaller they become. The final goals to be a full clip-vit that can house the entirety of laion 400m in a fraction of the size and compute as OpenAI's clip-vit line. After this point I'll be confident the math is lined up well enough to train the true flagship - Beatrix.

The process of useful classification and feature extraction has been a non-trivial problem in the Computer Science industry for a long time.

This repo will house the various vit experiments that I frankenstein together; manifesting their weights and model codes in the repo itself.

As I am an independent researcher my resources are limited and I don't have the backing of any donors, so there will be time gaps unless some hardware is sliced off for me.

Many of my repos have certain elements omitted purposely for papers in writing, my thesis arguments, my statements about certain universal elements, and a multitude of other ramblings that I don't plan to release specific key details in full phonebook fashion for just ANY PERSON to read.

# Let me use your high-end hardware. I deliver - success or failure, but I will deliver.

I will not rattle a tin cup for you. Work out a deal with me and you get the weights - I get the classes developed for further use, meant for public release.

Let me know if you're willing to work with me. I'll gladly share the code, the process, the progress, and the built accumulated warchest of potentials that this system entails if you provide me gateways to some hardware that I can utilize.

Until then, one experiment at a time.