| # BERT-Thetis: Geometric BERT Models | |
| This repository contains BERT-Thetis models with deterministic crystal embeddings. | |
| I don't like what raw geo-simplex did to Bert without full cantor-stairs control. | |
| I'm currently working out a way to negate the need for backprop by integrating elements of David, | |
| but the process isn't immediate. David WORKS because of feature compatability, so enabling education with this | |
| compatability into other systems is paramount to rapid learning. | |
| Eliminating full backprop will be a very time consuming and systems rigorous refactoring of each mathematical element into flow-geometric diffusion. | |
| David makes this possible, but the possibility requires many steps between here and a full realized restructuring. | |
| This will enable a new realm of experimentation and present it's own optimization issues, while simultaneously eliminating a large experimental | |
| overhead that backprop requires due to the hierarchical climb and return structure. | |
| This version was using the older geometric vocabulary system with standard backprop as a preliminary test and it didn't do very well. | |
| The next will feature a fully robust cantor stairing system with the vit-beatrix cohesion and full learning k-simplex cantor stairway embeddings, | |
| this variation will still have backprop but there will be an additional head and complexity analysis tool for geometric stability testing with divergent pathways. | |
| Likely the followup variation will utilize a full David-inspired shunt network that coalesces multiple Bert variants together while simultaneously acting as tiny | |
| experts in a form of MOE that should enable at least 5 variants of alternative bert models to intercommunicate opinions. | |
| Even without my own version of Bert, this can already happen with David I just haven't set it up. | |
| Backprop is both a glue and a burden to independent research, so I'll do my best to both mitigate it and keep solid cohesive responses to my bert variants. | |
| Some will work, some will not. | |
| This one, didn't work very well. However, it was not a completely useless experiment. | |
| ## π Repository Structure | |
| ``` | |
| AbstractPhil/bert-thetis-tiny-wikitext103/ | |
| βββ bert-thetis-tiny-wikitext103/ | |
| β βββ YYYY-MM-DD_HH-MM-SS/ (training run timestamp) | |
| β βββ best/ (best validation checkpoint) | |
| β βββ final/ (final checkpoint) | |
| β βββ step-N/ (intermediate checkpoints) | |
| ``` | |
| ## π What is BERT-Thetis? | |
| BERT-Thetis replaces traditional learned embeddings with **deterministic crystal structures**: | |
| - **Beatrix Staircase Encodings**: Zero-parameter positional structure | |
| - **Character Composition**: Learnable semantic bridge | |
| - **Crystal Inflation**: Deterministic 5-vertex simplex generation | |
| This reduces vocabulary parameters by ~95% while maintaining performance. | |
| ## π Quick Start | |
| ```python | |
| from geovocab2.train.model.core.bert_thetis import ThetisConfig, ThetisForMaskedLM | |
| # Load model | |
| config = ThetisConfig.from_pretrained("AbstractPhil/bert-thetis-tiny-wikitext103") | |
| model = ThetisForMaskedLM(config) | |
| ``` | |
| ## π Resources | |
| - **Repository:** [github.com/AbstractEyes/lattice_vocabulary](https://github.com/AbstractEyes/lattice_vocabulary) | |
| - **Author:** AbstractPhil | |
| --- | |
| **Latest Run:** 2025-10-13_20-09-33 | |
| **Model Variant:** bert-thetis-tiny-wikitext103 | |