File size: 27,206 Bytes
8402b6d
 
 
cf6baa3
 
02b5239
 
 
 
 
 
 
 
 
 
 
 
 
 
 
be11bf0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f9d122e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3d255f9
 
f9d122e
85b6b29
 
3d255f9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
be11bf0
85b6b29
be11bf0
85b6b29
02b5239
57951ef
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2be9f75
 
 
 
57951ef
02b5239
 
6528116
0eee32a
 
6528116
 
 
 
 
8402b6d
031a9c8
 
 
 
 
90123a1
 
 
031a9c8
1340216
 
 
4736995
 
1340216
 
 
 
d42f6e8
 
 
5b5bebc
ecd44f1
 
 
d90b6aa
 
 
 
 
1340216
 
 
 
 
 
 
 
8402b6d
 
 
 
 
 
 
 
 
 
 
f8b83ad
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
075579f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
fc18f7f
 
 
 
 
154f608
 
7184350
 
075579f
a287ea8
 
 
 
b64da51
 
a287ea8
 
 
 
 
 
2e39a0e
1e88513
 
 
 
 
 
 
d5233b2
 
366f903
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d5233b2
 
0d7cfa3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
47432a4
 
 
 
 
 
 
 
 
 
5905a80
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
---
license: apache-2.0
---
# Large Experiment 2: Fixing AdamW.


# Hypothesis:
AdamW is killing performance in these geometric systems, causing cascade failure with weight decay.

Preliminary tests show this is not only a possibility, it is most likely what is happening overall.

## Reason:
The rounding elemental system applied by weight decay is helpful for many weights and it helps align rounded structures,
while simultaneously destroying optimization for rigid structures.

Until now this has proven valuable, and now that it's become a hinderance a new formula must be created to replace the AdamW limitations.

# Experiment 1: Retune AdamW Directly
I'll attempt to tweak AdamW specifically to not destroy the geometric shape, disabling weight_decay from this point onward.

The outcomes have shown there isn't much beyond tuning specifics to tweak this particular classifier. Introducing more anchors helps, and with that the dims can be reduced.
Essentially the anchors are capacity tuning forks in this variation rather than utilities, which is fine. We can allocate them to a student.

```
=================================================================
SWEEP RESULTS
=================================================================

  Config           v_acc  t_acc    gap      cv     Ξ”cv  eq_std  poly curve  star struct
  ------------------------------------------------------------------------------------------
  raw_adam         0.617  0.681 +0.064  1.3917 +1.1917  0.4075  0.39  0.75  0.86  0.61
  proven           0.722  0.706 -0.016  1.3629 +1.1629  0.4157  0.45  0.99  0.93  0.71
  +spread          0.669  0.686 +0.017  1.4491 +1.2491  0.4212  0.41  0.98  0.71  0.72
  +entropy         0.674  0.711 +0.037  1.4945 +1.2945  0.4237  0.42  0.97  0.70  0.74
  +ortho           0.695  0.690 -0.005  1.3454 +1.1454  0.4171  0.40  0.99  0.85  0.72
  +cluster         0.701  0.717 +0.016  1.3034 +1.1034  0.4131  0.44  0.93  0.91  0.70
  +drift           0.709  0.698 -0.012  1.3480 +1.1480  0.4134  0.41  1.00  0.91  0.72
  +spr+ort         0.723  0.698 -0.025  1.3881 +1.1881  0.4224  0.46  0.97  0.94  0.71
  +all_micro       0.694  0.700 +0.007  1.5181 +1.3181  0.4077  0.40  0.97  0.85  0.72

  Best accuracy: +spr+ort (val_acc=0.723)
  Best structure: +entropy (struct=0.737)
  Closest to CV=0.2: +cluster (cv=1.3034, Ξ”=+1.1034)
  Most equidistant: raw_adam (equi_std=0.4075)
  Most stable CV: raw_adam (cv_std=0.1734)

=================================================================
DONE
=================================================================
```
The outcomes show that we can definitely impact the outcome and the deviation of the system will conform to an entirely new spectrum of CV currently unoccupied.

There's a lot to unpack here, and I think the biggest most critical piece to unpack is a hyperparameter about controlling where on the latent spectrum
you want your model's continuum to exist within.

# Experiment 2: Teacher/Student hierarchy
I've ran plenty of genetic experiments, a single student anchored from a teacher should provide a more robust sweep

```
=================================================================
COMPARISON
=================================================================

  Config                v_acc  t_acc    gap      cv  poly curve  star struct
  ---------------------------------------------------------------------------
  Raw Adam              0.626  0.679 +0.053  1.3669  0.30  1.00  0.77  0.65
  Teacher               0.645  0.629 -0.017  1.5645  0.38  0.91  0.72  0.71
  Student+entropy       0.698  0.677 -0.021  2.1624  0.39  0.98  0.88  0.73
  Student+same          0.672  0.681 +0.010  1.5182  0.39  1.00  0.75  0.72

  Val accuracy trajectory:
  Epoch    Raw Adam        Teacher         Student+entropy Student+same   
  E1      0.198           0.167           0.164           0.189          
  E5      0.592           0.426           0.509           0.526          
  E10     0.623           0.576           0.614           0.652          
  E15     0.633           0.647           0.670           0.618          
  E20     0.658           0.638           0.630           0.603          
  E25     0.685           0.662           0.665           0.700          
  E30     0.626           0.645           0.698           0.672          

  Teacher→Student anchor drift:
    Mean drift: 0.3036
    Max drift:  0.4506
    Min drift:  0.1311

```
Genetic inheritance works quite well in this spectrum either way, the anchors can exist down the chain passed to student.

# Experiment 3: Dual Teacher/Student Procrustes Duality
A good analysis for my particular architectures, needs procrustes whitening, center, and alignment curation. With this the system conforms more cleanly.

This cannot exist on the teachers, but the student must see it.

As expected the student outperformed the teacher. That's the same results from the captionbert process, and we now have an
autograd representation that can produce a reusable state of this.

This demonstrates a practical use of the dual-teacher 

```
=================================================================
DUAL-TEACHER PROCRUSTES CONSENSUS DISTILLATION
=================================================================
  Device: cuda

  Generating data...
  Train: 15,000  Val: 3,000

=================================================================
STAGE 1A: TEACHER A β€” Raw Adam
=================================================================
  [A] E 1: t=0.073 v=0.200 cv=1.3069
  [A] E10: t=0.612 v=0.613 cv=1.4364
  [A] E20: t=0.655 v=0.590 cv=1.4770
  [A] E30: t=0.690 v=0.699 cv=1.3797

=================================================================
STAGE 1B: TEACHER B β€” Geometric (+spr+ort)
=================================================================
  [B] E 1: t=0.072 v=0.184 cv=1.4589
  [B] E10: t=0.578 v=0.606 cv=1.5603
  [B] E20: t=0.614 v=0.667 cv=1.5950
  [B] E30: t=0.658 v=0.649 cv=1.8004

=================================================================
STAGE 2: EXTRACT + PROCRUSTES ALIGN
=================================================================
  Teacher A embeddings: torch.Size([15000, 768])
  Teacher B embeddings: torch.Size([15000, 768])
  Raw cos(A, B): 0.4360
  GPA iter 1: delta=0.12673541
  GPA iter 5: delta=0.01321763
  GPA iter 10: delta=0.00224325
  cos(consensus, a): 0.8251
  cos(consensus, b): 0.8226
  Consensus CV: 0.1774
  Consensus anchors: torch.Size([30, 768])
  Teacher A anchors cos: 0.0008
  Teacher B anchors cos: -0.0160

=================================================================
STAGE 3: STUDENT β€” Consensus distillation + classification
=================================================================
  E 1: t=0.081 v=0.203 cos=0.230 cv=1.1871 rig=4.8/34.4 [polygon=0.04 curve=0.00 star=0.36 structure=0.35]
  E 5: t=0.610 v=0.618 cos=0.451 cv=0.6686 rig=12.9/98.8 [polygon=0.38 curve=0.83 star=0.67 structure=0.70]
  E10: t=0.660 v=0.659 cos=0.550 cv=0.5453 rig=15.5/99.6 [polygon=0.41 curve=0.94 star=0.71 structure=0.72]
  E15: t=0.711 v=0.702 cos=0.625 cv=0.4492 rig=18.7/97.8 [polygon=0.39 curve=0.88 star=0.93 structure=0.76]
  E20: t=0.735 v=0.703 cos=0.671 cv=0.4598 rig=18.8/96.4 [polygon=0.45 curve=1.00 star=0.84 structure=0.70]
  E25: t=0.745 v=0.736 cos=0.693 cv=0.4261 rig=18.3/92.9 [polygon=0.48 curve=1.00 star=0.92 structure=0.73]
  E30: t=0.763 v=0.761 cos=0.704 cv=0.3359 rig=17.9/90.4 [polygon=0.50 curve=0.98 star=0.97 structure=0.76]

=================================================================
FINAL COMPARISON
=================================================================

  Model            v_acc      cv  poly curve  star struct
  -------------------------------------------------------
  Teacher_A        0.699  1.4312  0.42  0.99  0.83  0.72
  Teacher_B        0.649  1.5969  0.38  0.95  0.79  0.66
  Student          0.761  0.3329  0.50  0.98  0.97  0.76

  Student anchor drift from consensus: mean=0.4458 max=0.6453

=================================================================
DONE
=================================================================
```

# Experiment 4: Genetic Hierarchy

Lets make some inbred mutants and see how they behave.


```
=================================================================
EVOLUTION SUMMARY
=================================================================

  Model        Gen  v_acc      cv  poly curve  star struct
  -------------------------------------------------------
  F0_geo         0  0.663  1.8428  0.36  1.00  0.75  0.72
  F0_raw         0  0.586  1.2137  0.48  0.29  0.91  0.64
  F1_new         1  0.690  0.0000  0.00  0.00  0.00  0.00
  G1_0           1  0.746  0.5866  0.50  0.99  0.93  0.74
  G1_1           1  0.749  0.4433  0.49  1.00  0.95  0.74
  G1_2           1  0.761  0.2822  0.49  1.00  0.99  0.75
  F2_new         2  0.699  0.0000  0.00  0.00  0.00  0.00
  G2_0           2  0.750  0.4110  0.49  0.99  0.94  0.75
  G2_1           2  0.751  0.4170  0.49  0.95  0.97  0.76
  G2_2           2  0.766  0.3079  0.54  0.98  0.96  0.75
  G2_3           2  0.764  0.3613  0.55  0.98  0.97  0.73
  FINAL          3  0.764  0.2954  0.55  1.00  0.96  0.73

  Per-generation averages:
    Gen 0: mean_acc=0.625 best=0.663 n=2
    Gen 1: mean_acc=0.737 best=0.761 n=4
    Gen 2: mean_acc=0.746 best=0.766 n=5
    Gen 3: mean_acc=0.764 best=0.764 n=1

  Consensus CV progression: G1=0.1258 β†’ G2=0.1031 β†’ G3=0.1456
```

Turns out this is a bit more selective and a bit less jitter than something like genetic inheritance.

Not only that, but it's actually not bad.

The polygon gain shows the real story, the inheritance is the geometric structure that didn't collapse in the anchors.

So over multiple generations, the geometric complexity enhances based on the losses and autograd naturally enhancing the output.


---------------------------------------------------------------------------------------------------
# Large Experimental Conclusion 1:

Experiment 1's patchmaker classifier was invalidly aligned to the anchors, the final must be reran.

The current adamw based modifier is killing geometric results.

AdamW is a limiter not a helper.
Adam with trajectory and separation control is more reliable and not enough.


# Discoveries:
Common case cross_entropy loses on every margin when using hypersphere coordinates as embeddings. Completely defeated by +12% or higher with just the geo losses.

This formula is missing a core component that cannot represent the necessary implications yet for full encoding cohesion.

Without the geodesic controllers applied by the more advanced novel controlling agents and losses, the system cannot differentiate useful measures on larger plane structures.

Though I knew that last one.

## Reason:
Training the anchor itself with the bert structure caused a large state of drift, which decoupled many internal learned structures.

This unto itself caused the Bert model's internal CV to deform, and I will need to roll back the last 2 unfrozen epochs because of it, but I have a backup so it's fine.

The assessment shows that the rigidity was destroyed and smoothed into a similar state as the hypersphere, which meant the pressure
from the hypersphere was predominantly being applied internally within the model through the averaging mechanisms rather than
the structure fully preserving the manifold.

This wasn't catastrophic, captionbert is predominantly fine, but the damage is internally extensive and will require a rollback causing -1mil samples on the tally total.

Externally you would never know. captionbert looks predominantly fine, the measures are even better than before. Internally, the systems collapse was extensive.

Many functional systems collapsed into more generic functional systems, destroying the preserved geometry when things "bloated" too much from the anchor drifting.
Natural attenuation will desire equilibrium, and there is 5 experts - there will never be true equilibrium.

Thus the anchor must be nearly completely frozen while training the core weights, but not completely frozen. True euclidean space requires some drift to compensate
for capacity differentiation and growth, but this system is unique to the emulation of superposition differentiation, and thus many of the quirks will be...

Unpredictable.

## Hypothesis for why:
The structural integrity must remain rigid while being prepared over a smooth surface. Some smoothing must occur to map multiplanar supported systems from multiple
adjacent rigid complex associations. However, due to the nature of the multiplanar rigidity being misaligned by nature, the structure conformed to an invisible "MIDDLEGROUND"
differentiation element. This middleground average formed a pooled structure in complete defiance of the anchor and the system, due to the anchor not being solidly enough preserved.

## Potential Solution:
Control the autograd to preserve the anchor as the predominant choice, potentially causing instability. This will require multiple tests.

# Experimental Hypothesis;
The euclidean autograd for pytorch is causing differential analysis to collapse in the final stages of MLP, reducing the overall capacity and
destroying the attenutated geometric anchored structure in a way that isn't beneficial nor helpful towards the geometric goal.

# Experiment 1;
gate-aware autograd interference autonomous adaptation

This will theoretically compensate for the autograd's tenacity to overly smooth complex structures, while compensating for those complex structural gains
and ignoring important structural systems that exist within the anchored CV geometric spectrum established.

This will potentially preserve rigidity while allowing a multiplanar smoothing effect to occur, which is native to hypersphere-based architectures
that sample rigid positioning from rigid manifolds to map the rigidity to smooth layered surfaces.

# Part 1;
Simple benchmark. Outcomes showed differentiation and potential utility tested on a simple 3 synthetic shape classifier using actual geometric shapes.

```
=================================================================
COMPARISON
=================================================================

  Metric                      Baseline      Gated
  -----------------------------------------------
  Val accuracy                   0.999      0.998
  Train accuracy                 1.000      0.999
  Overfit gap                    0.001      0.001
  Val CV                        0.7888     0.8755
  Proto similarity              -0.242     -0.253
  CV tri                         0.854      0.721
  CV circle                      0.768      0.669
  CV pentagon                    1.238      1.009

  CV trajectory (std over epochs):
    Baseline: 0.1125
    Gated:    0.1127
    Baseline more stable

  Overfit gap trajectory (mean Β± std):
    Baseline: -0.008 Β± 0.041
    Gated:    -0.011 Β± 0.043

=================================================================
DONE
=================================================================
```

The differentiation is radical, but the task was trivial. The solution was found strongly enough to potentially triadically bypass the problem.

# Attempt 2; 10 shapes < mini shape mnist
As the original geometric patchwork system would showcase, the shapes themselves are in fact capable of classification and not very hard to do.

```
=================================================================
COMPARISON
=================================================================

  Metric                      Baseline      Gated
  -----------------------------------------------
  Val accuracy                   0.991      0.989
  Train accuracy                 0.993      0.996
  Overfit gap                    0.002      0.007
  Val CV                        0.7236     0.7476
  Proto similarity              -0.081     -0.085
  CV triangle                    0.474      0.436
  CV circle                      0.644      0.592
  CV pentagon                    0.522      0.559
  CV square                      0.521      0.508
  CV hexagon                     0.778      0.623
  CV star5                       0.464      0.519
  CV star7                       0.721      0.645
  CV octagon                     0.683      0.500
  CV cross                       0.771      0.829
  CV spiral                      1.267      0.914

  CV trajectory (std over epochs):
    Baseline: 0.1144
    Gated:    0.1307
    Baseline more stable

  Overfit gap trajectory (mean Β± std):
    Baseline: -0.007 Β± 0.056
    Gated:    0.002 Β± 0.053

=================================================================
DONE
=================================================================
```
The outcome was still too trivial, the model managed to find nearly orthogonal solutions at 99.9% answers to the validity data.

The CV was meaningless due to the simplicity of the response. This would not yield the necessary implications to the result need.

# Attempt 3; 30 shapes - captionbert anchors for embedding vectors
This is considerably more complex. It forces the model to learn the differences rather than simply bypass the losses and funnel.


# Attempt 4; experimental hypersphere coordinate embedding
The results are okay and the 30 shapes make it harder to solve, but the fundamental issue still exists. The hypersphere does not conform with or without the
autograd gate yet. The rigidity is smoothed instead of existing simultaneously.

```
  Final constellation:
    Mean cos: 0.0025
    CV:       0.3251
    Rigidity: mean=9.4 max=100.0

  Per-anchor rigidity:
    triangle       : 1.9 β–ˆ
    square         : 1.9 β–ˆ
    pentagon       : 1.8 β–ˆ
    hexagon        : 2.0 β–ˆ
    heptagon       : 2.0 β–ˆβ–ˆ
    octagon        : 2.0 β–ˆβ–ˆ
    nonagon        : 2.1 β–ˆβ–ˆ
    decagon        : 2.0 β–ˆ
    dodecagon      : 2.1 β–ˆβ–ˆ
    circle         : 2.3 β–ˆβ–ˆ
    ellipse        : 1.9 β–ˆ
    spiral         : 8.2 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
    wave           : 20.9 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
    crescent       : 100.0 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
    star3          : 2.3 β–ˆβ–ˆ
    star4          : 1.8 β–ˆ
    star5          : 2.1 β–ˆβ–ˆ
    star6          : 2.4 β–ˆβ–ˆ
    star7          : 2.7 β–ˆβ–ˆ
    star8          : 3.2 β–ˆβ–ˆβ–ˆ
    cross          : 2.6 β–ˆβ–ˆ
    diamond        : 1.9 β–ˆ
    arrow          : 1.8 β–ˆ
    heart          : 1.3 β–ˆ
    ring           : 1.3 β–ˆ
    semicircle     : 100.0 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
    trapezoid      : 1.8 β–ˆ
    parallelogram  : 1.8 β–ˆ
    rhombus        : 1.9 β–ˆ
    chevron        : 1.6 β–ˆ
```
This causes cascade bias rather than helpful behavior.

SOMEHOW the rigidity of a circle is recorded as MAXIMUM rigidity, which is likely true due to the nature of circles having such a dense complexion.

In that rite you can probably say yes circles are represented by potentially an indefinite number of representation points, which is why 
we're measuring around one - not supposed to be literally measuring with one.

IT DID IN FACT classify, the shape of it's own most supported embedding. Which, is expected, however, I did not expect the rigidity to conform as well.

Maximumly rigid, minimally curved, entirely... wrong.


# Attempt 4; new understanding - AI liar paradox interferes with baseline geometric research

The AI systems I'm working through namely Claude, GPT, and Gemini are all running along the same banks of dirt as we attempt to debug this.

After analyzing the code I noticed that it was literally turned into a hypersphere analyzer, instead of using representation points to project a utility onto another surface.

The circle constraints cause the model internals to grind around the possibility of actuality, which is the result of the experimentation, in favor
of some sort of internalized geometric bias established at a rudimentary level that simply does not conform to the actual results.

Simply put, they know something wrong, and they each are defining that incorrectness over and over as a normality.

I'm attempting to compensate so I can get this next experiment done and then move onward to the next point.

Getting to the bottom of broken taught theorem isn't on my list here, I need the experiment ready.

# Attempt 5; returning to baseline data and running a sweep using the refined autograd.

Since none of the AI can guess their way out of it and I'm starting to see a pattern, we're going to run a full sweep here using simple autoregression MLP.

The resulting geometric alignment through the defined autograd will determine the direction of adjustment, and the formation of our constant intrinsic barrier for CV,
assuming such a CV barrier CAN exist. Which I'm starting to think the very relational nature of CV is a dynamic one, may not actually be controllable without weights.

```
=================================================================
SWEEP RESULTS
=================================================================

  Config           v_acc  t_acc    gap      cv     Ξ”cv  eq_std  poly curve  star struct
  ------------------------------------------------------------------------------------------
  baseline         0.691  0.714 +0.022  1.0222 +0.8222  0.4366  0.41  0.98  0.82  0.72
  tang_50          0.692  0.713 +0.021  1.1747 +0.9747  0.4328  0.41  0.99  0.83  0.72
  tang_100         0.705  0.717 +0.012  1.0296 +0.8296  0.4534  0.42  0.98  0.85  0.73
  equi_low         0.033  0.034 +0.001  0.0000 -0.2000  0.0427  0.00  0.00  0.00  0.10
  equi_med         0.033  0.033 -0.000  0.0000 -0.2000  0.0424  0.00  0.00  0.00  0.10
  equi_high        0.033  0.034 +0.001  0.0000 -0.2000  0.0422  0.00  0.00  0.00  0.10
  sep_low          0.657  0.718 +0.060  1.2410 +1.0410  0.4440  0.42  0.95  0.69  0.71
  sep_high         0.712  0.709 -0.003  1.7948 +1.5948  0.4317  0.45  0.96  0.91  0.71
  equi+sep         0.033  0.034 +0.001  0.0000 -0.2000  0.0429  0.00  0.00  0.00  0.10
  full_gentle      0.033  0.035 +0.001  0.0000 -0.2000  0.0421  0.00  0.00  0.00  0.10
  full_strong      0.033  0.034 +0.001  0.0000 -0.2000  0.0421  0.00  0.00  0.00  0.10
  max              0.033  0.035 +0.001  0.0000 -0.2000  0.0424  0.00  0.00  0.00  0.10

  Best accuracy: sep_high (val_acc=0.712)
  Best structure: tang_100 (struct=0.732)
  Closest to CV=0.2: full_strong (cv=0.0000, Ξ”=-0.2000)
  Most equidistant: full_gentle (equi_std=0.0421)
  Most stable CV: full_gentle (cv_std=0.0008)

=================================================================
DONE
=================================================================
```
As you can see, multiple toggles destroy the autograd procedure completely. I've devised a potential solution.

# Attempt 6: Updated autograd system with better controls

The structure for the last was using both invalid and old losses, as well as incorrect spectral control of the gradients.

```
=================================================================
SWEEP RESULTS
=================================================================

  Config           v_acc  t_acc    gap      cv     Ξ”cv  eq_std  poly curve  star struct
  ------------------------------------------------------------------------------------------
  baseline         0.719  0.710 -0.009  1.5066 +1.3066  0.4413  0.53  0.98  0.92  0.64
  cv_only_01       0.548  0.480 -0.068  0.2297 +0.0297  0.6814  0.12  0.93  0.76  0.61
  cv_only_05       0.478  0.472 -0.006  0.1963 -0.0037  0.6698  0.13  0.94  0.50  0.55
  cv_only_10       0.428  0.401 -0.027  0.2638 +0.0638  0.6322  0.11  0.94  0.44  0.45
  tang_50          0.711  0.706 -0.004  1.5108 +1.3108  0.4530  0.52  0.96  0.91  0.64
  tang_100         0.720  0.690 -0.030  1.5335 +1.3335  0.4431  0.52  0.98  0.92  0.65
  tang+cv          0.572  0.542 -0.030  0.4158 +0.2158  0.6602  0.22  0.84  0.78  0.63
  sep_low          0.709  0.723 +0.014  1.6462 +1.4462  0.4423  0.53  0.93  0.90  0.65
  sep_high         0.730  0.716 -0.014  1.6925 +1.4925  0.4530  0.56  0.96  0.94  0.65
  tang+cv+sep      0.552  0.507 -0.045  0.5011 +0.3011  0.5835  0.15  0.95  0.77  0.58
  full_med         0.575  0.540 -0.035  0.3207 +0.1207  0.7342  0.19  0.96  0.79  0.60
  full_strong      0.476  0.410 -0.066  0.2337 +0.0337  0.6569  0.16  0.91  0.50  0.53

  Best accuracy: sep_high (val_acc=0.730)
  Best structure: tang_100 (struct=0.649)
  Closest to CV=0.2: cv_only_05 (cv=0.1963, Ξ”=-0.0037)
  Most equidistant: baseline (equi_std=0.4413)
  Most stable CV: cv_only_01 (cv_std=0.0700)

=================================================================
DONE
=================================================================
```
The run is cleaner, but the geometrics all over the board. Getting closer.

# Attempt 7: Tighter constraints and more specific backward control.

With the hand on the CV constraint as a pulse control, the system must be much more lenient than the loss.

The loss was an echo, this is a shockwave controller. Akin to applying frequency band control, thus too much CV is akin to destroying the actual model's growth.

We don't want to ELIMINATE the CV, we want to curate it. We don't want to trim incorrect branches, we want the system to retain those incorrect branches that are most useful.

With that information, I formatted a more subtle and reduced power CV sweep with the best tangental and separation from the last.

```
=================================================================
SWEEP RESULTS
=================================================================

  Config           v_acc  t_acc    gap      cv     Ξ”cv  eq_std  poly curve  star struct
  ------------------------------------------------------------------------------------------
  no_cv            0.721  0.730 +0.009  1.7346 +1.5346  0.4400  0.54  0.99  0.89  0.65
  cv_0.001         0.712  0.706 -0.005  1.5553 +1.3553  0.4291  0.40  0.99  0.91  0.74
  cv_0.005         0.697  0.690 -0.006  1.5407 +1.3407  0.4622  0.38  0.97  0.88  0.73
  cv_0.01          0.640  0.649 +0.010  0.3353 +0.1353  0.6070  0.29  0.97  0.85  0.67
  cv_0.03          0.648  0.632 -0.016  0.2985 +0.0985  0.5909  0.30  0.98  0.85  0.67
  cv_0.06          0.586  0.568 -0.018  0.2331 +0.0331  0.5957  0.24  0.96  0.76  0.61

  Best accuracy: no_cv (val_acc=0.721)
  Best structure: cv_0.001 (struct=0.735)
  Closest to CV=0.2: cv_0.06 (cv=0.2331, Ξ”=+0.0331)
  Most equidistant: cv_0.001 (equi_std=0.4291)
  Most stable CV: cv_0.06 (cv_std=0.1087)

```
Less... is more. Next I'll be running the same cv with a 0.01 tangent and an increased sep.

I've got a good notion that this could work.