Update README.md
Browse files
README.md
CHANGED
|
@@ -1,6 +1,34 @@
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
# geolip-svd-transformer API
|
| 5 |
|
| 6 |
```python
|
|
@@ -69,31 +97,44 @@ former = svd_transformer(
|
|
| 69 |
|
| 70 |
)
|
| 71 |
```
|
|
|
|
| 72 |
|
| 73 |
-
|
| 74 |
-
|
|
|
|
| 75 |
|
| 76 |
-
|
|
|
|
| 77 |
|
| 78 |
-
|
| 79 |
-
|
|
|
|
| 80 |
|
| 81 |
-
|
|
|
|
|
|
|
|
|
|
| 82 |
|
| 83 |
-
|
| 84 |
-
in a series of intended pretrain convergences that will manifest into the synthetic pixel solver structure.
|
| 85 |
|
| 86 |
-
|
| 87 |
-
|
| 88 |
-
very very small sizes.
|
| 89 |
|
| 90 |
-
|
|
|
|
| 91 |
|
| 92 |
-
|
| 93 |
-
|
|
|
|
| 94 |
|
| 95 |
-
|
| 96 |
-
each with their own benefits, own negatives, and own convergence speeds.
|
| 97 |
|
| 98 |
-
|
| 99 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
---
|
| 4 |
+
# First off, progress report
|
| 5 |
+
|
| 6 |
+
As disappointing at this is, **I could not fully converge the geolip-svd-transformer yet**.
|
| 7 |
+
|
| 8 |
+
I deeply apologize for my inability to handle this task, and I will be doing my very best to implement the structure in a unilaterally useful
|
| 9 |
+
scaling methodology using synthetic pretrained information as guideposts.
|
| 10 |
+
|
| 11 |
+
I have NOT given up this structure. I am expanding the entire differentiation underlying the system.
|
| 12 |
+
|
| 13 |
+
I have begun a heavy series of sweeps to test huge amounts of synthetic shapes, structural variances, coloration differentiations, and structural variants
|
| 14 |
+
in a series of intended pretrain convergences that will manifest into the synthetic pixel solver structure.
|
| 15 |
+
|
| 16 |
+
These weight sets will begin in notebook form, and evolve into structural SVD weight infusions that will intentionally
|
| 17 |
+
amplify learning speed to introduce huge amounts of potential autosolving encoder structures intentionally targeting
|
| 18 |
+
very very small sizes.
|
| 19 |
+
|
| 20 |
+
INTENTIONALLY small. These are going to be imperfect, but there will be MANY OPTIONS.
|
| 21 |
+
|
| 22 |
+
The "auto" spectrum will have a series of prefabricated "init" spectrums, intentionally meant to allow
|
| 23 |
+
skipping huge amounts of early pretraining using organized spectral attuned SVD attenuation mechanisms.
|
| 24 |
+
|
| 25 |
+
There will be multiple capable patchworks, multiple capable potentials, and multiple capable substructure options
|
| 26 |
+
each with their own benefits, own negatives, and own convergence speeds.
|
| 27 |
+
|
| 28 |
+
The goal here, is to synthetic shape expand the structural invariance of systems like this, to introduce
|
| 29 |
+
prefabricated utility-driven patchworks using SVD as a catalyst.
|
| 30 |
+
|
| 31 |
+
|
| 32 |
# geolip-svd-transformer API
|
| 33 |
|
| 34 |
```python
|
|
|
|
| 97 |
|
| 98 |
)
|
| 99 |
```
|
| 100 |
+
# What Works
|
| 101 |
|
| 102 |
+
**Huggingface Transformers**
|
| 103 |
+
If you snap transformers to process the tokens, it will work. Transformers are a beast and have tons of years of power capacity.
|
| 104 |
+
Using huggingface transformers will definitely work as a setting, they just add substantial overhead and eliminate a piece of the experiment.
|
| 105 |
|
| 106 |
+
**Conv2d, Conv3d**
|
| 107 |
+
Using CONV will definitely work as a setting. The convergence is high accuracy when correctly aligned with Cifar100, TinyImageNet, Imagenet128, and multiple datasets.
|
| 108 |
|
| 109 |
+
**Kymatio Scatterpoint2D**
|
| 110 |
+
This requires some conv but not much, and this produces corresponding powerhouse behavior stronger than Conv alone when adjudicating large amounts of
|
| 111 |
+
SVD information with the attention alignment spectrum.
|
| 112 |
|
| 113 |
+
# What Needs To Work
|
| 114 |
+
**Using MLP will reach fair accuracy and not use CONV or TRANSFORMERS.**
|
| 115 |
+
I have seen **around 60% on cifar100** with no traditional encoders, but the system was crutching the M_path to fill the gaps after enough epochs of the SVD path.
|
| 116 |
+
This structure is under the microscope now.
|
| 117 |
|
| 118 |
+
Instability allows SGD optimization to heavily benefit some image tasks while it fails completely on text tasks.
|
|
|
|
| 119 |
|
| 120 |
+
**Out Projection SUVt tokens are iffy**
|
| 121 |
+
The out projection is an MLP multiscale projection that took a while to set up, and it produces approximate transformer QKV with useful SUVt tokens downstream.
|
|
|
|
| 122 |
|
| 123 |
+
**Many activations corrupt geometry**
|
| 124 |
+
They are in there for experimentation. Feel free to experiment.
|
| 125 |
|
| 126 |
+
**without the expanded triton core spectrum larger systems suffer with triton**
|
| 127 |
+
Claude code is having trouble with this one as a full task, I'll need to build it in pieces. I've had OpenClaw working on it but the outcome
|
| 128 |
+
isn't looking good. The 4x4 and 5x4 won't converge, while the 6x6 crashes the system entirely instead of building it.
|
| 129 |
|
| 130 |
+
I'll need to wait for a fix for claude code, this is a known issue apparently.
|
|
|
|
| 131 |
|
| 132 |
+
|
| 133 |
+
## Additionally
|
| 134 |
+
|
| 135 |
+
There are multiple torch-access components meant to be utilized with this structure, so be aware there will be many ways to use this transformer in line with
|
| 136 |
+
torch standard use. There is no rigid backing structure to it, just install the geolip-core and you're set - once I actually get the experimental branch live.
|
| 137 |
+
|
| 138 |
+
Claude loves to inline invalid eigh gram svd instead of actually using the imports, so I need to make sure claude respects the structure every single time.
|
| 139 |
+
|
| 140 |
+
Experiments are slow going, I need more hardware.
|