| --- |
| license: apache-2.0 |
| --- |
| # Large Experiment 2: Fixing AdamW. |
|
|
|
|
| # Hypothesis: |
| AdamW is killing performance in these geometric systems, causing cascade failure with weight decay. |
|
|
| Preliminary tests show this is not only a possibility, it is most likely what is happening overall. |
|
|
| ## Reason: |
| The rounding elemental system applied by weight decay is helpful for many weights and it helps align rounded structures, |
| while simultaneously destroying optimization for rigid structures. |
|
|
| Until now this has proven valuable, and now that it's become a hinderance a new formula must be created to replace the AdamW limitations. |
|
|
| # Experiment 1: Retune AdamW Directly |
| I'll attempt to tweak AdamW specifically to not destroy the geometric shape, disabling weight_decay from this point onward. |
| |
| The outcomes have shown there isn't much beyond tuning specifics to tweak this particular classifier. Introducing more anchors helps, and with that the dims can be reduced. |
| Essentially the anchors are capacity tuning forks in this variation rather than utilities, which is fine. We can allocate them to a student. |
| |
| ``` |
| ================================================================= |
| SWEEP RESULTS |
| ================================================================= |
| |
| Config v_acc t_acc gap cv Δcv eq_std poly curve star struct |
| ------------------------------------------------------------------------------------------ |
| raw_adam 0.617 0.681 +0.064 1.3917 +1.1917 0.4075 0.39 0.75 0.86 0.61 |
| proven 0.722 0.706 -0.016 1.3629 +1.1629 0.4157 0.45 0.99 0.93 0.71 |
| +spread 0.669 0.686 +0.017 1.4491 +1.2491 0.4212 0.41 0.98 0.71 0.72 |
| +entropy 0.674 0.711 +0.037 1.4945 +1.2945 0.4237 0.42 0.97 0.70 0.74 |
| +ortho 0.695 0.690 -0.005 1.3454 +1.1454 0.4171 0.40 0.99 0.85 0.72 |
| +cluster 0.701 0.717 +0.016 1.3034 +1.1034 0.4131 0.44 0.93 0.91 0.70 |
| +drift 0.709 0.698 -0.012 1.3480 +1.1480 0.4134 0.41 1.00 0.91 0.72 |
| +spr+ort 0.723 0.698 -0.025 1.3881 +1.1881 0.4224 0.46 0.97 0.94 0.71 |
| +all_micro 0.694 0.700 +0.007 1.5181 +1.3181 0.4077 0.40 0.97 0.85 0.72 |
|
|
| Best accuracy: +spr+ort (val_acc=0.723) |
| Best structure: +entropy (struct=0.737) |
| Closest to CV=0.2: +cluster (cv=1.3034, Δ=+1.1034) |
| Most equidistant: raw_adam (equi_std=0.4075) |
| Most stable CV: raw_adam (cv_std=0.1734) |
| |
| ================================================================= |
| DONE |
| ================================================================= |
| ``` |
| The outcomes show that we can definitely impact the outcome and the deviation of the system will conform to an entirely new spectrum of CV currently unoccupied. |
| |
| There's a lot to unpack here, and I think the biggest most critical piece to unpack is a hyperparameter about controlling where on the latent spectrum |
| you want your model's continuum to exist within. |
| |
| # Experiment 2: Teacher/Student hierarchy |
| I've ran plenty of genetic experiments, a single student anchored from a teacher should provide a more robust sweep |
| |
| ``` |
| ================================================================= |
| COMPARISON |
| ================================================================= |
| |
| Config v_acc t_acc gap cv poly curve star struct |
| --------------------------------------------------------------------------- |
| Raw Adam 0.626 0.679 +0.053 1.3669 0.30 1.00 0.77 0.65 |
| Teacher 0.645 0.629 -0.017 1.5645 0.38 0.91 0.72 0.71 |
| Student+entropy 0.698 0.677 -0.021 2.1624 0.39 0.98 0.88 0.73 |
| Student+same 0.672 0.681 +0.010 1.5182 0.39 1.00 0.75 0.72 |
| |
| Val accuracy trajectory: |
| Epoch Raw Adam Teacher Student+entropy Student+same |
| E1 0.198 0.167 0.164 0.189 |
| E5 0.592 0.426 0.509 0.526 |
| E10 0.623 0.576 0.614 0.652 |
| E15 0.633 0.647 0.670 0.618 |
| E20 0.658 0.638 0.630 0.603 |
| E25 0.685 0.662 0.665 0.700 |
| E30 0.626 0.645 0.698 0.672 |
| |
| Teacher→Student anchor drift: |
| Mean drift: 0.3036 |
| Max drift: 0.4506 |
| Min drift: 0.1311 |
| |
| ``` |
| Genetic inheritance works quite well in this spectrum either way, the anchors can exist down the chain passed to student. |
| |
| # Experiment 3: Dual Teacher/Student Procrustes Duality |
| A good analysis for my particular architectures, needs procrustes whitening, center, and alignment curation. With this the system conforms more cleanly. |
| |
| This cannot exist on the teachers, but the student must see it. |
| |
| As expected the student outperformed the teacher. That's the same results from the captionbert process, and we now have an |
| autograd representation that can produce a reusable state of this. |
| |
| This demonstrates a practical use of the dual-teacher |
| |
| ``` |
| ================================================================= |
| DUAL-TEACHER PROCRUSTES CONSENSUS DISTILLATION |
| ================================================================= |
| Device: cuda |
| |
| Generating data... |
| Train: 15,000 Val: 3,000 |
| |
| ================================================================= |
| STAGE 1A: TEACHER A — Raw Adam |
| ================================================================= |
| [A] E 1: t=0.073 v=0.200 cv=1.3069 |
| [A] E10: t=0.612 v=0.613 cv=1.4364 |
| [A] E20: t=0.655 v=0.590 cv=1.4770 |
| [A] E30: t=0.690 v=0.699 cv=1.3797 |
| |
| ================================================================= |
| STAGE 1B: TEACHER B — Geometric (+spr+ort) |
| ================================================================= |
| [B] E 1: t=0.072 v=0.184 cv=1.4589 |
| [B] E10: t=0.578 v=0.606 cv=1.5603 |
| [B] E20: t=0.614 v=0.667 cv=1.5950 |
| [B] E30: t=0.658 v=0.649 cv=1.8004 |
| |
| ================================================================= |
| STAGE 2: EXTRACT + PROCRUSTES ALIGN |
| ================================================================= |
| Teacher A embeddings: torch.Size([15000, 768]) |
| Teacher B embeddings: torch.Size([15000, 768]) |
| Raw cos(A, B): 0.4360 |
| GPA iter 1: delta=0.12673541 |
| GPA iter 5: delta=0.01321763 |
| GPA iter 10: delta=0.00224325 |
| cos(consensus, a): 0.8251 |
| cos(consensus, b): 0.8226 |
| Consensus CV: 0.1774 |
| Consensus anchors: torch.Size([30, 768]) |
| Teacher A anchors cos: 0.0008 |
| Teacher B anchors cos: -0.0160 |
| |
| ================================================================= |
| STAGE 3: STUDENT — Consensus distillation + classification |
| ================================================================= |
| E 1: t=0.081 v=0.203 cos=0.230 cv=1.1871 rig=4.8/34.4 [polygon=0.04 curve=0.00 star=0.36 structure=0.35] |
| E 5: t=0.610 v=0.618 cos=0.451 cv=0.6686 rig=12.9/98.8 [polygon=0.38 curve=0.83 star=0.67 structure=0.70] |
| E10: t=0.660 v=0.659 cos=0.550 cv=0.5453 rig=15.5/99.6 [polygon=0.41 curve=0.94 star=0.71 structure=0.72] |
| E15: t=0.711 v=0.702 cos=0.625 cv=0.4492 rig=18.7/97.8 [polygon=0.39 curve=0.88 star=0.93 structure=0.76] |
| E20: t=0.735 v=0.703 cos=0.671 cv=0.4598 rig=18.8/96.4 [polygon=0.45 curve=1.00 star=0.84 structure=0.70] |
| E25: t=0.745 v=0.736 cos=0.693 cv=0.4261 rig=18.3/92.9 [polygon=0.48 curve=1.00 star=0.92 structure=0.73] |
| E30: t=0.763 v=0.761 cos=0.704 cv=0.3359 rig=17.9/90.4 [polygon=0.50 curve=0.98 star=0.97 structure=0.76] |
| |
| ================================================================= |
| FINAL COMPARISON |
| ================================================================= |
| |
| Model v_acc cv poly curve star struct |
| ------------------------------------------------------- |
| Teacher_A 0.699 1.4312 0.42 0.99 0.83 0.72 |
| Teacher_B 0.649 1.5969 0.38 0.95 0.79 0.66 |
| Student 0.761 0.3329 0.50 0.98 0.97 0.76 |
|
|
| Student anchor drift from consensus: mean=0.4458 max=0.6453 |
|
|
| ================================================================= |
| DONE |
| ================================================================= |
| ``` |
| |
| # Experiment 4: Genetic Hierarchy |
| |
| Lets make some inbred mutants and see how they behave. |
| |
| |
| ``` |
| ================================================================= |
| EVOLUTION SUMMARY |
| ================================================================= |
|
|
| Model Gen v_acc cv poly curve star struct |
| ------------------------------------------------------- |
| F0_geo 0 0.663 1.8428 0.36 1.00 0.75 0.72 |
| F0_raw 0 0.586 1.2137 0.48 0.29 0.91 0.64 |
| F1_new 1 0.690 0.0000 0.00 0.00 0.00 0.00 |
| G1_0 1 0.746 0.5866 0.50 0.99 0.93 0.74 |
| G1_1 1 0.749 0.4433 0.49 1.00 0.95 0.74 |
| G1_2 1 0.761 0.2822 0.49 1.00 0.99 0.75 |
| F2_new 2 0.699 0.0000 0.00 0.00 0.00 0.00 |
| G2_0 2 0.750 0.4110 0.49 0.99 0.94 0.75 |
| G2_1 2 0.751 0.4170 0.49 0.95 0.97 0.76 |
| G2_2 2 0.766 0.3079 0.54 0.98 0.96 0.75 |
| G2_3 2 0.764 0.3613 0.55 0.98 0.97 0.73 |
| FINAL 3 0.764 0.2954 0.55 1.00 0.96 0.73 |
|
|
| Per-generation averages: |
| Gen 0: mean_acc=0.625 best=0.663 n=2 |
| Gen 1: mean_acc=0.737 best=0.761 n=4 |
| Gen 2: mean_acc=0.746 best=0.766 n=5 |
| Gen 3: mean_acc=0.764 best=0.764 n=1 |
| |
| Consensus CV progression: G1=0.1258 → G2=0.1031 → G3=0.1456 |
| ``` |
| |
| Turns out this is a bit more selective and a bit less jitter than something like genetic inheritance. |
| |
| Not only that, but it's actually not bad. |
| |
| The polygon gain shows the real story, the inheritance is the geometric structure that didn't collapse in the anchors. |
| |
| So over multiple generations, the geometric complexity enhances based on the losses and autograd naturally enhancing the output. |
| |
| |
| --------------------------------------------------------------------------------------------------- |
| # Large Experimental Conclusion 1: |
| |
| Experiment 1's patchmaker classifier was invalidly aligned to the anchors, the final must be reran. |
| |
| The current adamw based modifier is killing geometric results. |
| |
| AdamW is a limiter not a helper. |
| Adam with trajectory and separation control is more reliable and not enough. |
| |
| |
| # Discoveries: |
| Common case cross_entropy loses on every margin when using hypersphere coordinates as embeddings. Completely defeated by +12% or higher with just the geo losses. |
| |
| This formula is missing a core component that cannot represent the necessary implications yet for full encoding cohesion. |
| |
| Without the geodesic controllers applied by the more advanced novel controlling agents and losses, the system cannot differentiate useful measures on larger plane structures. |
| |
| Though I knew that last one. |
| |
| ## Reason: |
| Training the anchor itself with the bert structure caused a large state of drift, which decoupled many internal learned structures. |
| |
| This unto itself caused the Bert model's internal CV to deform, and I will need to roll back the last 2 unfrozen epochs because of it, but I have a backup so it's fine. |
| |
| The assessment shows that the rigidity was destroyed and smoothed into a similar state as the hypersphere, which meant the pressure |
| from the hypersphere was predominantly being applied internally within the model through the averaging mechanisms rather than |
| the structure fully preserving the manifold. |
| |
| This wasn't catastrophic, captionbert is predominantly fine, but the damage is internally extensive and will require a rollback causing -1mil samples on the tally total. |
| |
| Externally you would never know. captionbert looks predominantly fine, the measures are even better than before. Internally, the systems collapse was extensive. |
| |
| Many functional systems collapsed into more generic functional systems, destroying the preserved geometry when things "bloated" too much from the anchor drifting. |
| Natural attenuation will desire equilibrium, and there is 5 experts - there will never be true equilibrium. |
| |
| Thus the anchor must be nearly completely frozen while training the core weights, but not completely frozen. True euclidean space requires some drift to compensate |
| for capacity differentiation and growth, but this system is unique to the emulation of superposition differentiation, and thus many of the quirks will be... |
| |
| Unpredictable. |
| |
| ## Hypothesis for why: |
| The structural integrity must remain rigid while being prepared over a smooth surface. Some smoothing must occur to map multiplanar supported systems from multiple |
| adjacent rigid complex associations. However, due to the nature of the multiplanar rigidity being misaligned by nature, the structure conformed to an invisible "MIDDLEGROUND" |
| differentiation element. This middleground average formed a pooled structure in complete defiance of the anchor and the system, due to the anchor not being solidly enough preserved. |
| |
| ## Potential Solution: |
| Control the autograd to preserve the anchor as the predominant choice, potentially causing instability. This will require multiple tests. |
| |
| # Experimental Hypothesis; |
| The euclidean autograd for pytorch is causing differential analysis to collapse in the final stages of MLP, reducing the overall capacity and |
| destroying the attenutated geometric anchored structure in a way that isn't beneficial nor helpful towards the geometric goal. |
| |
| # Experiment 1; |
| gate-aware autograd interference autonomous adaptation |
| |
| This will theoretically compensate for the autograd's tenacity to overly smooth complex structures, while compensating for those complex structural gains |
| and ignoring important structural systems that exist within the anchored CV geometric spectrum established. |
| |
| This will potentially preserve rigidity while allowing a multiplanar smoothing effect to occur, which is native to hypersphere-based architectures |
| that sample rigid positioning from rigid manifolds to map the rigidity to smooth layered surfaces. |
| |
| # Part 1; |
| Simple benchmark. Outcomes showed differentiation and potential utility tested on a simple 3 synthetic shape classifier using actual geometric shapes. |
| |
| ``` |
| ================================================================= |
| COMPARISON |
| ================================================================= |
|
|
| Metric Baseline Gated |
| ----------------------------------------------- |
| Val accuracy 0.999 0.998 |
| Train accuracy 1.000 0.999 |
| Overfit gap 0.001 0.001 |
| Val CV 0.7888 0.8755 |
| Proto similarity -0.242 -0.253 |
| CV tri 0.854 0.721 |
| CV circle 0.768 0.669 |
| CV pentagon 1.238 1.009 |
|
|
| CV trajectory (std over epochs): |
| Baseline: 0.1125 |
| Gated: 0.1127 |
| Baseline more stable |
| |
| Overfit gap trajectory (mean ± std): |
| Baseline: -0.008 ± 0.041 |
| Gated: -0.011 ± 0.043 |
| |
| ================================================================= |
| DONE |
| ================================================================= |
| ``` |
| |
| The differentiation is radical, but the task was trivial. The solution was found strongly enough to potentially triadically bypass the problem. |
| |
| # Attempt 2; 10 shapes < mini shape mnist |
| As the original geometric patchwork system would showcase, the shapes themselves are in fact capable of classification and not very hard to do. |
| |
| ``` |
| ================================================================= |
| COMPARISON |
| ================================================================= |
|
|
| Metric Baseline Gated |
| ----------------------------------------------- |
| Val accuracy 0.991 0.989 |
| Train accuracy 0.993 0.996 |
| Overfit gap 0.002 0.007 |
| Val CV 0.7236 0.7476 |
| Proto similarity -0.081 -0.085 |
| CV triangle 0.474 0.436 |
| CV circle 0.644 0.592 |
| CV pentagon 0.522 0.559 |
| CV square 0.521 0.508 |
| CV hexagon 0.778 0.623 |
| CV star5 0.464 0.519 |
| CV star7 0.721 0.645 |
| CV octagon 0.683 0.500 |
| CV cross 0.771 0.829 |
| CV spiral 1.267 0.914 |
|
|
| CV trajectory (std over epochs): |
| Baseline: 0.1144 |
| Gated: 0.1307 |
| Baseline more stable |
| |
| Overfit gap trajectory (mean ± std): |
| Baseline: -0.007 ± 0.056 |
| Gated: 0.002 ± 0.053 |
| |
| ================================================================= |
| DONE |
| ================================================================= |
| ``` |
| The outcome was still too trivial, the model managed to find nearly orthogonal solutions at 99.9% answers to the validity data. |
| |
| The CV was meaningless due to the simplicity of the response. This would not yield the necessary implications to the result need. |
| |
| # Attempt 3; 30 shapes - captionbert anchors for embedding vectors |
| This is considerably more complex. It forces the model to learn the differences rather than simply bypass the losses and funnel. |
| |
| |
| # Attempt 4; experimental hypersphere coordinate embedding |
| The results are okay and the 30 shapes make it harder to solve, but the fundamental issue still exists. The hypersphere does not conform with or without the |
| autograd gate yet. The rigidity is smoothed instead of existing simultaneously. |
| |
| ``` |
| Final constellation: |
| Mean cos: 0.0025 |
| CV: 0.3251 |
| Rigidity: mean=9.4 max=100.0 |
| |
| Per-anchor rigidity: |
| triangle : 1.9 █ |
| square : 1.9 █ |
| pentagon : 1.8 █ |
| hexagon : 2.0 █ |
| heptagon : 2.0 ██ |
| octagon : 2.0 ██ |
| nonagon : 2.1 ██ |
| decagon : 2.0 █ |
| dodecagon : 2.1 ██ |
| circle : 2.3 ██ |
| ellipse : 1.9 █ |
| spiral : 8.2 ████████ |
| wave : 20.9 ████████████████████ |
| crescent : 100.0 ████████████████████████████████████████████████████████████████████████████████████████████████████ |
| star3 : 2.3 ██ |
| star4 : 1.8 █ |
| star5 : 2.1 ██ |
| star6 : 2.4 ██ |
| star7 : 2.7 ██ |
| star8 : 3.2 ███ |
| cross : 2.6 ██ |
| diamond : 1.9 █ |
| arrow : 1.8 █ |
| heart : 1.3 █ |
| ring : 1.3 █ |
| semicircle : 100.0 ████████████████████████████████████████████████████████████████████████████████████████████████████ |
| trapezoid : 1.8 █ |
| parallelogram : 1.8 █ |
| rhombus : 1.9 █ |
| chevron : 1.6 █ |
| ``` |
| This causes cascade bias rather than helpful behavior. |
| |
| SOMEHOW the rigidity of a circle is recorded as MAXIMUM rigidity, which is likely true due to the nature of circles having such a dense complexion. |
|
|
| In that rite you can probably say yes circles are represented by potentially an indefinite number of representation points, which is why |
| we're measuring around one - not supposed to be literally measuring with one. |
|
|
| IT DID IN FACT classify, the shape of it's own most supported embedding. Which, is expected, however, I did not expect the rigidity to conform as well. |
|
|
| Maximumly rigid, minimally curved, entirely... wrong. |
|
|
|
|
| # Attempt 4; new understanding - AI liar paradox interferes with baseline geometric research |
|
|
| The AI systems I'm working through namely Claude, GPT, and Gemini are all running along the same banks of dirt as we attempt to debug this. |
|
|
| After analyzing the code I noticed that it was literally turned into a hypersphere analyzer, instead of using representation points to project a utility onto another surface. |
|
|
| The circle constraints cause the model internals to grind around the possibility of actuality, which is the result of the experimentation, in favor |
| of some sort of internalized geometric bias established at a rudimentary level that simply does not conform to the actual results. |
|
|
| Simply put, they know something wrong, and they each are defining that incorrectness over and over as a normality. |
|
|
| I'm attempting to compensate so I can get this next experiment done and then move onward to the next point. |
|
|
| Getting to the bottom of broken taught theorem isn't on my list here, I need the experiment ready. |
|
|
| # Attempt 5; returning to baseline data and running a sweep using the refined autograd. |
|
|
| Since none of the AI can guess their way out of it and I'm starting to see a pattern, we're going to run a full sweep here using simple autoregression MLP. |
|
|
| The resulting geometric alignment through the defined autograd will determine the direction of adjustment, and the formation of our constant intrinsic barrier for CV, |
| assuming such a CV barrier CAN exist. Which I'm starting to think the very relational nature of CV is a dynamic one, may not actually be controllable without weights. |
|
|
| ``` |
| ================================================================= |
| SWEEP RESULTS |
| ================================================================= |
|
|
| Config v_acc t_acc gap cv Δcv eq_std poly curve star struct |
| ------------------------------------------------------------------------------------------ |
| baseline 0.691 0.714 +0.022 1.0222 +0.8222 0.4366 0.41 0.98 0.82 0.72 |
| tang_50 0.692 0.713 +0.021 1.1747 +0.9747 0.4328 0.41 0.99 0.83 0.72 |
| tang_100 0.705 0.717 +0.012 1.0296 +0.8296 0.4534 0.42 0.98 0.85 0.73 |
| equi_low 0.033 0.034 +0.001 0.0000 -0.2000 0.0427 0.00 0.00 0.00 0.10 |
| equi_med 0.033 0.033 -0.000 0.0000 -0.2000 0.0424 0.00 0.00 0.00 0.10 |
| equi_high 0.033 0.034 +0.001 0.0000 -0.2000 0.0422 0.00 0.00 0.00 0.10 |
| sep_low 0.657 0.718 +0.060 1.2410 +1.0410 0.4440 0.42 0.95 0.69 0.71 |
| sep_high 0.712 0.709 -0.003 1.7948 +1.5948 0.4317 0.45 0.96 0.91 0.71 |
| equi+sep 0.033 0.034 +0.001 0.0000 -0.2000 0.0429 0.00 0.00 0.00 0.10 |
| full_gentle 0.033 0.035 +0.001 0.0000 -0.2000 0.0421 0.00 0.00 0.00 0.10 |
| full_strong 0.033 0.034 +0.001 0.0000 -0.2000 0.0421 0.00 0.00 0.00 0.10 |
| max 0.033 0.035 +0.001 0.0000 -0.2000 0.0424 0.00 0.00 0.00 0.10 |
|
|
| Best accuracy: sep_high (val_acc=0.712) |
| Best structure: tang_100 (struct=0.732) |
| Closest to CV=0.2: full_strong (cv=0.0000, Δ=-0.2000) |
| Most equidistant: full_gentle (equi_std=0.0421) |
| Most stable CV: full_gentle (cv_std=0.0008) |
|
|
| ================================================================= |
| DONE |
| ================================================================= |
| ``` |
| As you can see, multiple toggles destroy the autograd procedure completely. I've devised a potential solution. |
| |
| # Attempt 6: Updated autograd system with better controls |
| |
| The structure for the last was using both invalid and old losses, as well as incorrect spectral control of the gradients. |
| |
| ``` |
| ================================================================= |
| SWEEP RESULTS |
| ================================================================= |
|
|
| Config v_acc t_acc gap cv Δcv eq_std poly curve star struct |
| ------------------------------------------------------------------------------------------ |
| baseline 0.719 0.710 -0.009 1.5066 +1.3066 0.4413 0.53 0.98 0.92 0.64 |
| cv_only_01 0.548 0.480 -0.068 0.2297 +0.0297 0.6814 0.12 0.93 0.76 0.61 |
| cv_only_05 0.478 0.472 -0.006 0.1963 -0.0037 0.6698 0.13 0.94 0.50 0.55 |
| cv_only_10 0.428 0.401 -0.027 0.2638 +0.0638 0.6322 0.11 0.94 0.44 0.45 |
| tang_50 0.711 0.706 -0.004 1.5108 +1.3108 0.4530 0.52 0.96 0.91 0.64 |
| tang_100 0.720 0.690 -0.030 1.5335 +1.3335 0.4431 0.52 0.98 0.92 0.65 |
| tang+cv 0.572 0.542 -0.030 0.4158 +0.2158 0.6602 0.22 0.84 0.78 0.63 |
| sep_low 0.709 0.723 +0.014 1.6462 +1.4462 0.4423 0.53 0.93 0.90 0.65 |
| sep_high 0.730 0.716 -0.014 1.6925 +1.4925 0.4530 0.56 0.96 0.94 0.65 |
| tang+cv+sep 0.552 0.507 -0.045 0.5011 +0.3011 0.5835 0.15 0.95 0.77 0.58 |
| full_med 0.575 0.540 -0.035 0.3207 +0.1207 0.7342 0.19 0.96 0.79 0.60 |
| full_strong 0.476 0.410 -0.066 0.2337 +0.0337 0.6569 0.16 0.91 0.50 0.53 |
| |
| Best accuracy: sep_high (val_acc=0.730) |
| Best structure: tang_100 (struct=0.649) |
| Closest to CV=0.2: cv_only_05 (cv=0.1963, Δ=-0.0037) |
| Most equidistant: baseline (equi_std=0.4413) |
| Most stable CV: cv_only_01 (cv_std=0.0700) |
|
|
| ================================================================= |
| DONE |
| ================================================================= |
| ``` |
| The run is cleaner, but the geometrics all over the board. Getting closer. |
| |
| # Attempt 7: Tighter constraints and more specific backward control. |
| |
| With the hand on the CV constraint as a pulse control, the system must be much more lenient than the loss. |
| |
| The loss was an echo, this is a shockwave controller. Akin to applying frequency band control, thus too much CV is akin to destroying the actual model's growth. |
| |
| We don't want to ELIMINATE the CV, we want to curate it. We don't want to trim incorrect branches, we want the system to retain those incorrect branches that are most useful. |
| |
| With that information, I formatted a more subtle and reduced power CV sweep with the best tangental and separation from the last. |
| |
| ``` |
| ================================================================= |
| SWEEP RESULTS |
| ================================================================= |
|
|
| Config v_acc t_acc gap cv Δcv eq_std poly curve star struct |
| ------------------------------------------------------------------------------------------ |
| no_cv 0.721 0.730 +0.009 1.7346 +1.5346 0.4400 0.54 0.99 0.89 0.65 |
| cv_0.001 0.712 0.706 -0.005 1.5553 +1.3553 0.4291 0.40 0.99 0.91 0.74 |
| cv_0.005 0.697 0.690 -0.006 1.5407 +1.3407 0.4622 0.38 0.97 0.88 0.73 |
| cv_0.01 0.640 0.649 +0.010 0.3353 +0.1353 0.6070 0.29 0.97 0.85 0.67 |
| cv_0.03 0.648 0.632 -0.016 0.2985 +0.0985 0.5909 0.30 0.98 0.85 0.67 |
| cv_0.06 0.586 0.568 -0.018 0.2331 +0.0331 0.5957 0.24 0.96 0.76 0.61 |
| |
| Best accuracy: no_cv (val_acc=0.721) |
| Best structure: cv_0.001 (struct=0.735) |
| Closest to CV=0.2: cv_0.06 (cv=0.2331, Δ=+0.0331) |
| Most equidistant: cv_0.001 (equi_std=0.4291) |
| Most stable CV: cv_0.06 (cv_std=0.1087) |
| |
| ``` |
| Less... is more. Next I'll be running the same cv with a 0.01 tangent and an increased sep. |
| |
| I've got a good notion that this could work. |