AbstractPhila PRO
AI & ML interests
Recent Activity
Organizations
After tomorrow I'm taking a few days to recover from overwork and relax, so over the weekend I'll likely set up a few experiment sweeps or run some sweeps on the weekend if I don't start them on Monday or Tuesday.
Burning the daytime oil and midnight oil has been taxing, I need to relax for a bit.
I believe I've discovered the first needs-based component from this structure.
Centrifuged dissonance rupture sampling.
Magnitude testing with ripter shows multiple reduced magnitudes will coalesce less infinites and still converge, while the squared magnitude standard will introduce a type of unilaterally conjunctive series of ripter identified infinites.
For the baseline H2 trigram training:
The adjudication principality of compartmentalized information shows those infinites are nearly fully encompassing the state of the model, they are 22 identifiable omegas with only one finite and two persistent features, while the structure when analyzed shows the structure is predominantly nearly fully utilized. The only way for this to be possible is if the model is directly utilizing the infinites to make decisions.
I have a working hypothesis based on omegas created by the H2 battery.
So far I've identified four types of potential Omega states using ripter which have been codified.
Using the default config magnitude of 2 on the baseline trigram battery we have generated;
"persistent_infinity_field" as the codified response.
While pi magnitude sings a different tune entirely.
"rupture_coalescence_field",
"persistent_infinity_field": (
"Most H0 classes remain infinite under the threshold; "
"axis residents are mostly separated and persistent."
),
"rupture_coalescence_field": (
"Many axes persist, but several finite H0 deaths indicate "
"controlled local coalescence/rupture."
),
"infinity_pair_field": (
"Mostly persistent infinity axes with small pair components."
),
"finite_carrier_field": (
"Finite merge activity is high enough to indicate a carrier "
"rather than pure infinity field."
),
Each is indicated to be a potential omega and this will require heavy ablation to determine if this is true or grasping at noise.
Transformers default sentencepiece for the t5-base can't be manually swapped to the fast version, I had to manually load a fast version of sentencepiece rust to actually make use of it.
This was an inconvenience that cost quite a bit of debugging time, but now that it's resolved I can start the full trains.
Prelim 50 epoch shows full recon is possible.
The most confused pairs may have a mathematical rounding fault to be addressed, I'll figure that out when the 100 epoch finishes.
Organizing the branch system:
I didn't like my first design, I'll be keeping the prototypes in a separate package in the same repo. This is cleaner.
geolip_svae
svae_proto
Basically separate readmes, separate requirements, and so on. This will be cleaner and will scale naturally.
Musings:
In a vain attempt to keep things organized, I'll be segmenting experiments to another branch, but I need to be strict about my own behavior. Forming an experimental branch won't matter at this point if I can't keep the repos organized, much of the repo WILL BE experimental prototypes within the geolip_svae.prototypes package - but without any explicit guards for my own behavior. I need to clean things up a bit and prevent my own regular habits from causing chaos.
I've done this before, where I go off on huge experiment lines and create massive tangents of 20-200 experiments, so I'll be forming an experimental branch which uses only the "main" geolip_svae package core model packages and code, while existing in the same experimental repo.
So the core code for the model, the behaviors, the optimization, and the expectations will exist in the primary "main" branch, and the "experimental" branch will house the incomplete, untested, or unrefined prototypes.
Merging into the main will be a modularization process, likely done by hand or using Claude to assist, and then the main code updated in the experimental while preserving the prototypes and their separation. This is a stop gap for me, because I notoriously climb into rabbit holes of experiment - only to find something interesting, but in the stupor I find the common cork board room full of yarn. It never starts out a cork board room, it just evolves into one naturally.
Trigram 128 converged cleanly by epoch 10 with 4096 patches. 100% byte recon, >99.9% recon for trigrams.
I set up a vocabulary/token experiment to probe the capacity for full tokenization, that's running next.
One thing to note. The trigram variants are behaving like miniature decoder llms more than they are encoders.
The energy concentration is internal decoder-centric, so this may potentially be a symptom of an emergent internal energy weighting spectrum that may require additional alpha. More tests will be required. From a glance, I can only guess. The curve is clear, decreasing S0 increasing SD throughout the train.
The symptom is present that this model is decoder-weighted, which could mean many things. They are self-encapsulated adjudication units so they will work, but this gives me a series of ideas based on the converged pattern to better orient the internals using a few continuum topology methods experimented on in the past.
These were experimental continuum rotators. They rotate a feature away from latent space, process there, and then rotate the result back. Early experimentation was yielding, but slow, which is why I moved in a different direction. The math can help explain some of the symptoms here I believe.
The naming schema for "omega" is not to be confused with this variation. That was an experimental attempt to curate an omega pathway that could yield, but more often produced NaN so it had limited scope in the gradients.
Upon review I'm thinking it might not contribute much. So I'll need to consult the continuum research closer.
Anyway this isn't important currently, tokenization is. I'll report later today with the findings.
Wikitext trigram 128 is cooking up on patch_size 2. The results are pretty solid so far and the model won't be done for a day or two. 4096 patches is a bit excessive but a fair experiment for both speed testing and conditioning testing for text.
The h2 trigram 64 was pretrained earlier with 50 epochs of wikitext 103, so it's ready for tinkering if you feel like playing with it.
I have also prepared:
https://huggingface.co/AbstractPhil/geolip-SVAE/tree/main/h2_linear_tiny_imagenet_256
https://huggingface.co/AbstractPhil/geolip-SVAE/tree/main/h2_linear_tiny_imagenet_128
https://huggingface.co/AbstractPhil/geolip-SVAE/tree/main/h2_linear_tiny_imagenet_64
Each of which were trained with 100 epochs of imagenet; tinyimagenet for the 64, and real imagenet 128 for the 128 variant, with cropped imagenet 128 for the 256 - which should allow more clean patchwork association downstream with masking controllers. The 256 is meant to target models that can highlight selective portions of the patchwork space for targeted geometric alignment anchoring, which can then be masked and adjudicated downstream for gradient updates. So say, a model that finds where a bird could be, then selects which bird could be there. This model would be better suited for training a communications tool between different experts that speak a similar required geometry. Images are images, so they speak the same language.
SAM models and the like would greatly benefit from this if utilized correctly.
Oh the 256 didn't finish. It needs more cooking.
The 128 was issued a gaussian codebook. I'll be formatting multiple formats of finetuning-presentable codebooks, specifically meant to highlight utility for the actual images targeted for downstream tasks. The process should be done in minutes, or even potentially be capable of operating at runtime depending how required it is down the chain of command.




