Update README.md
Browse files
README.md
CHANGED
|
@@ -1,6 +1,42 @@
|
|
| 1 |
---
|
| 2 |
license: mit
|
| 3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
# V2 is blobby!
|
| 5 |
|
| 6 |
Time to go direct, going to train the whole model with SVD-related paradigms internally rather than trying to feed the model SVD.
|
|
|
|
| 1 |
---
|
| 2 |
license: mit
|
| 3 |
---
|
| 4 |
+
# V2 Redux - full decoder overhaul
|
| 5 |
+
|
| 6 |
+
Cascade bottlenecking didn't cut it, the decoder still bypassed the specifications.
|
| 7 |
+
|
| 8 |
+
This next variation is going to be a bit excessive in terms of conduit adjudication.
|
| 9 |
+
|
| 10 |
+
Every single layer of the encoder is going to be a full encoder/decoder overhaul.
|
| 11 |
+
|
| 12 |
+
```
|
| 13 |
+
ENCODER (bottom → up):
|
| 14 |
+
Level 0: 256 patches → MLP(384) → M(48×4) → SVD+conduit₀ → 256 tokens
|
| 15 |
+
Level 1: group 2×2 → 64 cells → attend(4) → MLP(128) → M(16×4) → SVD+conduit₁ → 64 tokens
|
| 16 |
+
Level 2: group 2×2 → 16 blocks → attend(4) → MLP(128) → M(16×4) → SVD+conduit₂ → 16 tokens
|
| 17 |
+
Level 3: group 2×2 → 4 groups → attend(4) → MLP(128) → M(16×4) → SVD+conduit₃ → 4 tokens
|
| 18 |
+
Top: cross-attention over 4 final tokens
|
| 19 |
+
|
| 20 |
+
SPECTRAL TOKEN (propagates between levels):
|
| 21 |
+
[S(4), log_friction(4), settle(4), char_coeffs(4)] = 16 values
|
| 22 |
+
S carries gradients. Conduit is detached. Difficulty trickles UP.
|
| 23 |
+
|
| 24 |
+
DECODER (top → down, with conduit skips):
|
| 25 |
+
Level 3': 4 tokens → expand × 4 → inject conduit₃ → attend → 16 tokens
|
| 26 |
+
Level 2': 16 tokens → expand × 4 → inject conduit₂ → attend → 64 tokens
|
| 27 |
+
Level 1': 64 tokens → expand × 4 → inject conduit₁ → attend → 256 tokens
|
| 28 |
+
Level 0': 256 tokens + stored (U₀, S₀, Vt₀, friction₀, settle₀, char_c₀) → MLP → pixels
|
| 29 |
+
|
| 30 |
+
CONDUIT AT EACH SCALE:
|
| 31 |
+
Level 0: friction from pixel-level Gram decomposition (how hard were patches?)
|
| 32 |
+
Level 1: friction from cell-level Gram decomposition (how hard were 2×2 interactions?)
|
| 33 |
+
Level 2: friction from block-level decomposition (how hard were meso-structures?)
|
| 34 |
+
Level 3: friction from global decomposition (how hard was the overall composition?)
|
| 35 |
+
```
|
| 36 |
+
|
| 37 |
+
It's a bit excessive, but it may be required. Everything has to have a little impurity, otherwise it will not deviate.
|
| 38 |
+
|
| 39 |
+
|
| 40 |
# V2 is blobby!
|
| 41 |
|
| 42 |
Time to go direct, going to train the whole model with SVD-related paradigms internally rather than trying to feed the model SVD.
|