metadata
license: mit
After establishing the weakness of attention, I ran a series of constellation experiments to preserve the relayed constellation behavior from cycle 1 to cycle nth.
The attention mechanism for standard attention degrades throughout the layers, roughly 8 for nearly 30% of the cosine similarity to the original.
Constellation cosine similarity remains a steadfast 99.5% cosine similarity diminishing only to 99.4% at cycle 16.
Geometric preservation is reliably contained and passed downstream through the constellation relay.
Standard attention collapses.
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
TEST 2: Depth stability β relay vs attention vs interleaved
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
A) Relay-only:
cycle CV CV_n eff_d cos_orig gate
1 0.0587 0.0590 119.3 0.995716 0.047
2 0.0580 0.0579 119.2 0.995005 0.047
4 0.0582 0.0591 119.2 0.994683 0.047
8 0.0595 0.0573 119.1 0.994514 0.047
12 0.0603 0.0590 119.1 0.994425 0.047
16 0.0583 0.0584 119.1 0.994436 0.047
B) Attention-only:
cycle CV CV_n eff_d cos_orig
1 0.0607 0.0597 119.3 0.990663
2 0.0724 0.0712 118.6 0.955266
4 0.0802 0.0799 117.8 0.301391
8 0.0713 0.0705 117.8 0.132908
12 0.0711 0.0671 117.8 0.093587
16 0.0699 0.0683 117.8 0.073860
C) Interleaved (attn β relay β ...):
step type CV_n eff_d cos_orig
1 attn 0.0598 119.2 0.990823
2 relay 0.0609 118.4 0.986670
4 relay 0.0605 118.4 0.985883
8 relay 0.0598 118.3 0.985115
12 relay 0.0603 118.3 0.984901
16 relay 0.0626 118.3 0.984822
The relay of constellation holds steadfast as a conduit that transfers information from A to B, but it requires speedups.
I ran it against ropes as well, similar result.
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
TEST 2: Depth sweep β 16 layers, all architectures
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
vanilla:
depth CV CV_n eff_d cos_orig
1 0.0590 0.0613 119.2 0.988720
2 0.0812 0.0835 118.6 0.906975
4 0.0782 0.0772 118.0 0.258551
8 0.0681 0.0678 118.0 0.124103
12 0.0672 0.0670 118.0 0.084707
16 0.0656 0.0673 118.0 0.070113
rope_std:
depth CV CV_n eff_d cos_orig
1 0.0599 0.0604 119.8 0.994533
2 0.0772 0.0784 119.6 0.937518
4 0.0715 0.0726 119.5 0.270651
8 0.0627 0.0633 119.5 0.124870
12 0.0637 0.0644 119.5 0.086343
16 0.0641 0.0635 119.5 0.072248
rope_ntk:
depth CV CV_n eff_d cos_orig
1 0.0607 0.0605 119.6 0.991296
2 0.0852 0.0831 119.3 0.909574
4 0.0745 0.0759 119.1 0.255422
8 0.0662 0.0664 119.1 0.123437
12 0.0633 0.0614 119.1 0.092321
16 0.0623 0.0634 119.1 0.076500
relay:
depth CV CV_n eff_d cos_orig
1 0.0581 0.0578 119.2 0.995456
2 0.0586 0.0577 119.2 0.994940
4 0.0592 0.0575 119.1 0.994511
8 0.0605 0.0578 119.1 0.994254
12 0.0587 0.0580 119.1 0.994219
16 0.0583 0.0591 119.1 0.994187