By trying to disprove the Omega H2 battery I have discovered; * Each topology formed by the H2 battery is deviant, none have a uniformly shared substrate of behavior. They are each uniquely independent per training set all with perfect recon. * Image recon can be tracked and mapped, yielding a consistently mapped and response 16.77m vocabulary potential. In the current spectrum testing at around 5 million unicode bytes. * The model scale shows patch size is related to how much data you want the model to represent within the model itself, and this has yet to see a capacity to this day. The MSE recons and yields - and the more data fed, the more they yield. * The scaling principle shows that the model indefinitely scales upward and each level of the model can be iteratively captured upward to form deviant and uniformly consistent repeatable pathways of implicit codewise response, not just arbitrary bitwise recall. Meaningful implicit learned utility. * Image recon patch size should match the slice of image you want to represent, as it uses patch smoothing per patch internally from identity. * byte trigrams are channel-agnostic, they do not require a channel count just a formula for recall at nGram recall 99.6% for byte-by-byte representations. With those comes an adjacently capable codebook. * sentencepiece preliminary tests show validity and reconstruction just like the byte trigrams, using the new byte trigram this would be arbitrarily convenient to recon a codebook for the structure. * binary trees learn a uniformly potent and powerful gating mechanism that required further exploration, each of them produces direct responsive independent capacity and the responses are controllable. * ternary experiments show the models are directly responsive to -1, 0, +1 behavior, so the quantization is very much a valid potential. * preliminary tests with the H2O1 series of batteries show the models are responding similar to natural universal elements in the universe itself
[DAY TWO] PROJECT CROWFEATHER - 5/1/2026 Que sera, what will he be?
Step 47,500 of 100,000. Loss hovering around 2.76 on 6.2B tokens. Throughput steady at 87k per second on the A100. Not a GH200, but she gets it done.
Still haven't named him. Scamp has a rascally charm. Quentin sounds like he'd wear a bow tie and think hard before speaking. Taking votes.
Phase two is what's keeping me up. Datasets everywhere and I can't pick. I'm fusing Google and DeepSeek's ideas: Gemma 4's alternating sliding and global attention, DeepSeek V4's Muon optimizer and WSD scheduler, Gemma 2's logit soft cap, and PaLM's z-loss. Sounds like peanut butter on a hamburger, but the loss curve says it works.
Tribe_v2 has real potential but needs more scaffolding than a barn raising before I throw it in. One thing's certain though. This model's gonna be a thinker. Not a Wikipedia parrot. Something that chews before it answers.
Finally got a use for my less popular datasets too. Some Opus-4.5-Writing-Style for polish. A few rows of Human-Archtypes-25k to see what personality bubbles up. Could be a poet, could be a grump. Either beats a flimsy fine-tune.
The bank's after my credit card. Until then, full steam.