After establishing the weakness of attention, I ran a series of constellation experiments to preserve the relayed constellation behavior from cycle 1 to cycle nth.

The attention mechanism for standard attention degrades throughout the layers, roughly 8 for nearly 30% of the cosine similarity to the original.

Constellation cosine similarity remains a steadfast 99.5% cosine similarity diminishing only to 99.4% at cycle 16.

Geometric preservation is reliably contained and passed downstream through the constellation relay.

Standard attention collapses.

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
TEST 2: Depth stability β€” relay vs attention vs interleaved
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

  A) Relay-only:
   cycle       CV     CV_n    eff_d   cos_orig   gate
       1   0.0587   0.0590    119.3   0.995716  0.047
       2   0.0580   0.0579    119.2   0.995005  0.047
       4   0.0582   0.0591    119.2   0.994683  0.047
       8   0.0595   0.0573    119.1   0.994514  0.047
      12   0.0603   0.0590    119.1   0.994425  0.047
      16   0.0583   0.0584    119.1   0.994436  0.047

  B) Attention-only:
   cycle       CV     CV_n    eff_d   cos_orig
       1   0.0607   0.0597    119.3   0.990663
       2   0.0724   0.0712    118.6   0.955266
       4   0.0802   0.0799    117.8   0.301391
       8   0.0713   0.0705    117.8   0.132908
      12   0.0711   0.0671    117.8   0.093587
      16   0.0699   0.0683    117.8   0.073860

  C) Interleaved (attn β†’ relay β†’ ...):
    step   type     CV_n    eff_d   cos_orig
       1   attn   0.0598    119.2   0.990823
       2  relay   0.0609    118.4   0.986670
       4  relay   0.0605    118.4   0.985883
       8  relay   0.0598    118.3   0.985115
      12  relay   0.0603    118.3   0.984901
      16  relay   0.0626    118.3   0.984822

The relay of constellation holds steadfast as a conduit that transfers information from A to B, but it requires speedups.

I ran it against ropes as well, similar result.

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
TEST 2: Depth sweep β€” 16 layers, all architectures
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

  vanilla:
   depth       CV     CV_n    eff_d   cos_orig
       1   0.0590   0.0613    119.2   0.988720
       2   0.0812   0.0835    118.6   0.906975
       4   0.0782   0.0772    118.0   0.258551
       8   0.0681   0.0678    118.0   0.124103
      12   0.0672   0.0670    118.0   0.084707
      16   0.0656   0.0673    118.0   0.070113

  rope_std:
   depth       CV     CV_n    eff_d   cos_orig
       1   0.0599   0.0604    119.8   0.994533
       2   0.0772   0.0784    119.6   0.937518
       4   0.0715   0.0726    119.5   0.270651
       8   0.0627   0.0633    119.5   0.124870
      12   0.0637   0.0644    119.5   0.086343
      16   0.0641   0.0635    119.5   0.072248

  rope_ntk:
   depth       CV     CV_n    eff_d   cos_orig
       1   0.0607   0.0605    119.6   0.991296
       2   0.0852   0.0831    119.3   0.909574
       4   0.0745   0.0759    119.1   0.255422
       8   0.0662   0.0664    119.1   0.123437
      12   0.0633   0.0614    119.1   0.092321
      16   0.0623   0.0634    119.1   0.076500

  relay:
   depth       CV     CV_n    eff_d   cos_orig
       1   0.0581   0.0578    119.2   0.995456
       2   0.0586   0.0577    119.2   0.994940
       4   0.0592   0.0575    119.1   0.994511
       8   0.0605   0.0578    119.1   0.994254
      12   0.0587   0.0580    119.1   0.994219
      16   0.0583   0.0591    119.1   0.994187
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Collection including AbstractPhil/geolip-cv-noise-analysis